Jump to content

Module:sa-Java-translit

From Wiktionary, the free dictionary

Interfacing

[edit]

This module works on text in the Javanese script. This module will transliterate Sanskrit language text per WT:SA TR. The module should preferably not be called directly from templates or other modules. To use it from a template, use {{xlit}}. Within a module, use Module:languages#Language:transliterate.

For testcases, see Module:sa-Java-translit/testcases.

Functions

tr(text, lang, sc)
Transliterates a given piece of text written in the script specified by the code sc, and language specified by the code lang.
When the transliteration fails, returns nil.

It transliterates Sanskrit text in accordance with the IAST convention.

Method

[edit]

The core of the transliteration is the conversion of CV? sequences where V is a vowel or a mark of its absence. The Javanese script is more complicated than the Devanagari script, so the process is a bit more complicated.

The characters of the script that may be transliterated consist of consonants, both base and subscript consonants, dependent vowels, and others. The base consonants are listed in the variable C and the subscript consonants are listed in the variable S. Their transliterations are stored in the table consonants. The transliterations of the dependent vowels are stored in the table diacritics. Other transliterations are stored in the table tt. These include independent vowels and anusvara.

The first step is to partially transliterate the sequences 'CS', for there is no implicit vowel between the two parts. The 'C' part is transliterated, and the 'S' part is left for further consideration. This step is repeated, so as to handle any potential sequences CSSS, though there should not be any.

The next step is to transliterate CV? combinations. Some vowels are encoded as three characters (virama, liquid vowel letter, and length mark). (TODO: Trap undefined sequences.) The structure of vowels is simple enough to be captured inline in the coding of the substitution. Note that if there were any CSSS sequences, the first letters of the transliterations of the subscript consonants would have to be treated as vowels.

The final step is to transliterate the other symbols. Some symbols (certain of the independent vowels) have a second character, which is always TARUNG. These are transliterated first, and then the symbols consisting of a single character are transliterated.


local export = {}
local gsub = mw.ustring.gsub

local consonants = {
	['ꦏ']='k', ['ꦑ']='kh', ['ꦒ']='g', ['ꦓ']='gh', ['ꦔ']='ṅ',
	['ꦕ']='c', ['ꦖ']='ch', ['ꦗ']='j', ['ꦙ']='jh', ['ꦚ']='ñ', 
	['ꦛ']='ṭ', ['ꦜ']='ṭh', ['ꦝ']='ḍ', ['ꦞ']='ḍh', ['ꦟ']='ṇ', 
	['ꦠ']='t', ['ꦡ']='th', ['ꦢ']='d', ['ꦣ']='dh', ['ꦤ']='n', 
	['ꦥ']='p', ['ꦦ']='ph', ['ꦧ']='b', ['ꦨ']='bh', ['ꦩ']='m',
	['ꦪ']='y', ['ꦫ']='r', ['ꦭ']='l', ['ꦮ']='v', -- ['ળ']='ḷ',
	['ꦯ']='ś', ['ꦰ']='ṣ', ['ꦱ']='s', ['ꦲ']='h',
-- Include subscript ('medial') consonants for translation only.
	['ꦿ']='r', ['ꦾ']='y',
}

local diacritics = {
	['ꦴ']='ā', ['ꦶ']='i', ['ꦷ']='ī', ['ꦸ']='u', ['ꦹ']='ū', ['ꦽ']='ṛ', ['ꦽꦴ']='ṝ', 
	['꧀ꦊ']='ḷ', ['꧀ꦋ']='ḹ', ['ꦺ']='e', ['ꦻ']='ai', ['ꦺꦴ']='o', ['ꦵ']='o', ['ꦻꦴ']='au', ['꧀']='',
-- In general, include results of second level diacritics.  I think not needed for Javanese.
--	['y']='y', ['r']='r',
}

local tt = {
	-- vowels
	['ꦄ']='a', ['ꦄꦴ']='ā', ['ꦆ']='i', ['ꦇ']='ī', ['ꦈ']='u', ['ꦈꦴ']='ū', ['ꦉ']='ṛ', ['ꦉꦴ']='ṝ',
	['ꦊ']='ḷ', ['ꦋ']='ḹ', ['ꦌ']='e', ['ꦍ']='ai', ['ꦎ']='o', ['ꦎꦴ']='au', 
	-- chandrabindu    
	['ꦀ']='m̐', --until a better method is found
	-- anusvara    
	['ꦁ']='ṃ', --until a better method is found
	-- visarga    
	['ꦃ']='ḥ',
	-- avagraha
	-- ['ઽ']='’',
	-- others
	['ꦂ']='r',
	--numerals
	['꧐']='0', ['꧑']='1', ['꧒']='2', ['꧓']='3', ['꧔']='4', ['꧕']='5', ['꧖']='6', ['꧗']='7', ['꧘']='8', ['꧙']='9', ['꧇']='',
	--punctuation        
    ['꧉']='.', --double danda
	['꧈']='.', --danda
    --Vedic extensions
    -- ['ᳵ']='x', ['ᳶ']='f',
    --Om
    ['ꦎꦴꦀ']='oṃ',
    --reconstructed
    ['*'] = '',
}
-- List the consonants
local S = 'ꦾꦿ' -- Subscript y and r.
local C = 'ꦏꦑꦒꦓꦔꦕꦖꦗꦙꦚꦛꦜꦝꦞꦟꦠꦡꦢꦣꦤꦥꦦꦧꦨꦩꦪꦫꦭꦮꦯꦰꦱꦲ'..S

function export.tr(text, lang, sc)
-- Handle subscript consonants
	local fn = function(c, d) return consonants[c]..d end
	local search = '(['..C..'])(['..S..'])'
	text = gsub(text, search, fn);
	text = gsub(text, search, fn); -- and again
	text = gsub(
		text,
		'(['..C..S..'])'..
		'(꧀?[ꦴꦶꦷꦸꦹꦽꦊꦋꦺꦻꦵ꧀]?ꦴ?)',
		function(c, d)
			if d == "" then        
				return consonants[c] .. 'a'
			else
				return consonants[c] .. diacritics[d]
			end
		end)

	text = mw.ustring.gsub(text, '.ꦴ', tt) -- Two part independent vowels.
	text = mw.ustring.gsub(text, '.', tt)
	
	return text
end
 
return export