Jump to content

Module:sa-convert/documentation

From Wiktionary, the free dictionary
Documentation for Module:sa-convert. [edit]
This page contains usage information, categories, interwiki links and other content describing the module.

This module is used to convert Sanskrit Devanagari text to other scripts. It is principally used in Template:sa-alt and its function tr is exported in Template:sa-convert.

Example

[edit]

ॐ त्र्यम्बकं यजामहे सुगन्धिं पुष्टिवर्धनम् । उर्वारुकमिव बन्धनान् मृत्योर् मुक्षीय माऽमृतात् ॥ कः खगौघाङचिच्छौजा झाञ्ज्ञोऽटौठीडडण्ढणः। तथोदधीन् पफर्बाभीर्मयोऽरिल्वाशिषां सहः॥

Unresolved Issues

[edit]
  • Burmese:
    • Round AA also needs to be replaced with tall AA in some situations. Done Done
    • Some conjuncts need to be cleanup like -y-, -r-, -v- when they come together.
    • NGA floating င္ → င်္ Done Done
    • RA repha ရ္ → ရ်္ (This never happens in Pali.) Done Done
    • NYA + virama + NYA → great NYA Done Done
    • SA + virama + SA → great SA Done Done
    • Final virama → asat Done Done
  • Lao:
    • Lao does not have characters for ऋ ॠ ऌ ॡ so it uses equivalent ຣິ ຣີ ລິ ລີ instead. Done Done
      • Evidence? I've read that it uses ຣຶ ຣື ລຶ ລື, which would eliminate the ambiguity.
      • In "Lanexang Mon4" font, there are already invented characters ຤(=ฤ) ຦(=ฦ) at unassigned codepoints but their usages are nowhere to attest.
  • Khmer:
    • RA repha រ្ → robat over next consonant ៌ (This never happens in Pali.) Done Done
    • Final virama → viriam Done Done
  • Javanese: ꦨꦹꦂꦨꦸꦮꦃꦱ꧀ꦮꦃꦠꦠ꧀ꦱꦮꦶꦠꦸꦂꦮꦫꦺꦟꦾꦁ꧉꧇꧑꧇꧉
    • no spaces in the script (need to remove the ones that enter the module); also causes the following two issues
    • ꦾ and ꦿ for word medial conjuncts, but ꦪ and ꦫ for conjuncts that cross word boundaries, e.g.
    • ꦂ for aksaras that end with r, but aren't aksara initial, e.g.
    • enclosing numbers around ꧇ (꧇꧑꧙꧇ = 19). Test: त्र्य०६म्बकं -> ꦠꦿꦾ꧇꧐꧖꧇ꦩ꧀ꦧꦏꦁ
    • ꦘ should be used for the conjunct ज्ञ, not ꦗ꧀ꦚ. Test: ज्ञ ->
  • Balinese:
    • also no spaces, and causes the following issue
    • ◌ᬃ for syllables that begin with r
    • enclosing numbers around ᭞ (᭞᭑᭞ = 1). Test: त्र्य०६म्बकं: ᬢ᭄ᬭ᭄ᬬ᭞᭐᭖᭞ᬫ᭄ᬩᬓᬂ
  • Bengali:
  • Assamese:
  • Sinhala
    • for Sanskrit, conjuncts are formed not by simply using its virama (U+0DCA) but by either abutting the consonants, encoded by the sequence <U+200D, U+0DCA> or by forming a ligature, encoded by <U+0DCA, U+200D>. (The extra character is ZWJ.) Which is used depends on the consonants, but as a general rule forms a ligature with a consonant to either side (very like Devanagari w:repha and rakar), while formally (ya) ligates with a preceding consonant, but in fact the glyph simply changes shape. There is some evidence for geminate (ya) being ය‍්ය in Sanskrit rather than ය්‍ය as in Pali. Finally, at least one pair form a separately encoded ligature - plus (ña) becomes (gna). My best estimate so far for the combinations has been encoded in Module:sa-utilities/translit/SLP1-to-Sinh, and ultimately I believe this module and that module should share common code for the fix-up of naive transliteration that just uses U+0DCA. Done Done
    • Additionally, for the Pali and Sanskrit I can find, /e/ and /o/ do not have their length marked, but use the same symbols as the Sinhalese language uses for the short vowels. Done Done
    • I have just (18/19 December 2023) added some evidence-based test cases to Module:sa-convert/testcases. Research continues to plod along.
  • Tamil
    • Final nasals.
    • Final visarga - the Grantha visarga is used. Done Done
    • Encoding of superscript digits and vowels.
    • Syllabic consonants
    • Rules for /n/ - (na) v. (ṉa).
    • Alternative forms, e.g. subscript digits.