Module talk:sa-Java-translit

From Wiktionary, the free dictionary
Latest comment: 3 years ago by RichardW57m
Jump to navigation Jump to search

@Wyang Hello! As in ꦄꦡꦮꦴ (athavā), it seems that diacritic-less aksharas are not transliterated. Do you know how to fix this? DerekWinters (talk) 20:27, 22 June 2018 (UTC)Reply

@DerekWinters Fixed. I hope the algorithm is correct. More testcases would be great. :) Wyang (talk) 00:14, 23 June 2018 (UTC)Reply
@Wyang: Will do! DerekWinters (talk) 15:44, 27 June 2018 (UTC)Reply

For the record (lest I fall under a bus), there are two failures revealed by the test of transliteration from Devanagari:

  1. Multi-character independent vowels. ('.' won't match them.)
  2. Subscript consonants (jackbootedly labelled as medial consonants).

They are both fixed for the Burmese script in module:pi-translit, so one solution is just to use the logic there. That is what I intend to do. RichardW57m (talk) 14:06, 26 April 2021 (UTC)Reply

I fixed these for Javanese last night. I feel I should explain the changes.
The multicharacter independent vowels all end in TARUNG, so they could be mopped up in a simple substitution.
Letting S denote a one-character subscript consonant, we have to handle sequences such as CSV, which caused problems when picking out C[V] sequences. The simple trick is to do a prior pass which handles CS by transliterating the C part and reserving the S part for the CV handling. For this, we want to include S in C. Then the changes go CSV > cSV > csv where lowercase denotes the transliteration. We also have to consider longer sequences such as CSSV. This is handled by doing the change twice, so: CSSV > cSSV > csSV > cssv. CSSSV could be awkward: CSSSV > cSsSV >... In general, we would have to include 'Ss' in our CV processing. But, so far as I am aware, Sanskrit in the Javanese script can't have three one-character subscript consonants in a row. I've put the extra code in the diacritic table but commented it out as redundant. 11:10, 27 April 2021 (UTC)