Module:sa-Java-translit
- The following documentation is located at Module:sa-Java-translit/documentation. [edit] Categories were auto-generated by Module:module categorization. [edit]
- Useful links: subpage list • links • transclusions • testcases • sandbox
Interfacing
[edit]This module works on text in the Javanese script.
This module will transliterate Sanskrit language text per WT:SA TR.
The module should preferably not be called directly from templates or other modules.
To use it from a template, use {{xlit}}
.
Within a module, use Module:languages#Language:transliterate.
For testcases, see Module:sa-Java-translit/testcases.
Functions
tr(text, lang, sc)
- Transliterates a given piece of
text
written in the script specified by the codesc
, and language specified by the codelang
. - When the transliteration fails, returns
nil
.
It transliterates Sanskrit text in accordance with the IAST convention.
Method
[edit]The core of the transliteration is the conversion of CV? sequences where V is a vowel or a mark of its absence. The Javanese script is more complicated than the Devanagari script, so the process is a bit more complicated.
The characters of the script that may be transliterated consist of consonants, both base and subscript consonants, dependent vowels, and others. The base consonants are listed in the variable C and the subscript consonants are listed in the variable S. Their transliterations are stored in the table consonants. The transliterations of the dependent vowels are stored in the table diacritics. Other transliterations are stored in the table tt. These include independent vowels and anusvara.
The first step is to partially transliterate the sequences 'CS', for there is no implicit vowel between the two parts. The 'C' part is transliterated, and the 'S' part is left for further consideration. This step is repeated, so as to handle any potential sequences CSSS, though there should not be any.
The next step is to transliterate CV? combinations. Some vowels are encoded as three characters (virama, liquid vowel letter, and length mark). (TODO: Trap undefined sequences.) The structure of vowels is simple enough to be captured inline in the coding of the substitution. Note that if there were any CSSS sequences, the first letters of the transliterations of the subscript consonants would have to be treated as vowels.
The final step is to transliterate the other symbols. Some symbols (certain of the independent vowels) have a second character, which is always TARUNG. These are transliterated first, and then the symbols consisting of a single character are transliterated.
local export = {}
local gsub = mw.ustring.gsub
local consonants = {
['ꦏ']='k', ['ꦑ']='kh', ['ꦒ']='g', ['ꦓ']='gh', ['ꦔ']='ṅ',
['ꦕ']='c', ['ꦖ']='ch', ['ꦗ']='j', ['ꦙ']='jh', ['ꦚ']='ñ',
['ꦛ']='ṭ', ['ꦜ']='ṭh', ['ꦝ']='ḍ', ['ꦞ']='ḍh', ['ꦟ']='ṇ',
['ꦠ']='t', ['ꦡ']='th', ['ꦢ']='d', ['ꦣ']='dh', ['ꦤ']='n',
['ꦥ']='p', ['ꦦ']='ph', ['ꦧ']='b', ['ꦨ']='bh', ['ꦩ']='m',
['ꦪ']='y', ['ꦫ']='r', ['ꦭ']='l', ['ꦮ']='v', -- ['ળ']='ḷ',
['ꦯ']='ś', ['ꦰ']='ṣ', ['ꦱ']='s', ['ꦲ']='h',
-- Include subscript ('medial') consonants for translation only.
['ꦿ']='r', ['ꦾ']='y',
}
local diacritics = {
['ꦴ']='ā', ['ꦶ']='i', ['ꦷ']='ī', ['ꦸ']='u', ['ꦹ']='ū', ['ꦽ']='ṛ', ['ꦽꦴ']='ṝ',
['꧀ꦊ']='ḷ', ['꧀ꦋ']='ḹ', ['ꦺ']='e', ['ꦻ']='ai', ['ꦺꦴ']='o', ['ꦵ']='o', ['ꦻꦴ']='au', ['꧀']='',
-- In general, include results of second level diacritics. I think not needed for Javanese.
-- ['y']='y', ['r']='r',
}
local tt = {
-- vowels
['ꦄ']='a', ['ꦄꦴ']='ā', ['ꦆ']='i', ['ꦇ']='ī', ['ꦈ']='u', ['ꦈꦴ']='ū', ['ꦉ']='ṛ', ['ꦉꦴ']='ṝ',
['ꦊ']='ḷ', ['ꦋ']='ḹ', ['ꦌ']='e', ['ꦍ']='ai', ['ꦎ']='o', ['ꦎꦴ']='au',
-- chandrabindu
['ꦀ']='m̐', --until a better method is found
-- anusvara
['ꦁ']='ṃ', --until a better method is found
-- visarga
['ꦃ']='ḥ',
-- avagraha
-- ['ઽ']='’',
-- others
['ꦂ']='r',
--numerals
['꧐']='0', ['꧑']='1', ['꧒']='2', ['꧓']='3', ['꧔']='4', ['꧕']='5', ['꧖']='6', ['꧗']='7', ['꧘']='8', ['꧙']='9', ['꧇']='',
--punctuation
['꧉']='.', --double danda
['꧈']='.', --danda
--Vedic extensions
-- ['ᳵ']='x', ['ᳶ']='f',
--Om
['ꦎꦴꦀ']='oṃ',
--reconstructed
['*'] = '',
}
-- List the consonants
local S = 'ꦾꦿ' -- Subscript y and r.
local C = 'ꦏꦑꦒꦓꦔꦕꦖꦗꦙꦚꦛꦜꦝꦞꦟꦠꦡꦢꦣꦤꦥꦦꦧꦨꦩꦪꦫꦭꦮꦯꦰꦱꦲ'..S
function export.tr(text, lang, sc)
-- Handle subscript consonants
local fn = function(c, d) return consonants[c]..d end
local search = '(['..C..'])(['..S..'])'
text = gsub(text, search, fn);
text = gsub(text, search, fn); -- and again
text = gsub(
text,
'(['..C..S..'])'..
'(꧀?[ꦴꦶꦷꦸꦹꦽꦊꦋꦺꦻꦵ꧀]?ꦴ?)',
function(c, d)
if d == "" then
return consonants[c] .. 'a'
else
return consonants[c] .. diacritics[d]
end
end)
text = mw.ustring.gsub(text, '.ꦴ', tt) -- Two part independent vowels.
text = mw.ustring.gsub(text, '.', tt)
return text
end
return export