Module talk:my-translit

From Wiktionary, the free dictionary
Latest comment: 7 years ago by Wyang in topic Adding a hyphen
Jump to navigation Jump to search

Draft module

[edit]

@Angr Could you please have a look at this draft transliteration module for Burmese and see if anything needs to be improved? It's just MLC at present since I'm most familiar with it and most Burmese transliteration information on Wikipedia is about MLC. I want to do the other three systems as well, especially the BGN-PCGN system which is the de facto standard here. BGN-PCGN and Okell require additional information - we could probably write a function that analyses symbols used to denote pronunciation not inferrable from the script.

eg. (l/my: linking template; my-pron: pronunciation and romanisation template)

{{l/my|(ခ)လုတ်}} elsewhere, and {{my-pron|(ခ)လုတ်}} or {{my-pron|minor=1}} at ခလုတ်
{{l/my|ဆီ^ပုံး}} elsewhere, and {{my-pron|ဆီ^ပုံး}} or {{my-pron|voiced=2}} at ဆီပုံး
{{l/my|စတော်ဘယ်*ရီ}} elsewhere, and {{my-pron|စတော်ဘယ်*ရီ}} or {{my-pron|r=4}} at စတော်ဘယ်ရီ
{{l/my|ဘီး|pron=ဘိန်း}} elsewhere, and {{my-pron|ဘိန်း}} or {{my-pron|pron=ဘိန်း}} at ဘီး

There isn't a lot of information about the other systems though - I'm not sure how to transliterate the stacking consonants in Burmese alphabet#Stacked consonants in the other systems. The eventual goal would be Lua-ify {{my-roman}} and IPA, much like what Module:ko-pron is doing for {{ko-IPA}}.

All testcases pass currently, but there was one fail which I removed: ယောက်ျား. I don't understand the structure of that Burmese word - why does the medial 'y' not follow a base consonant letter?

Thanks! Wyang (talk) 14:34, 22 May 2014 (UTC)Reply

Should letter (ra.) be "r" or "j"? E.g. is "ci:pwa:re:" correct in စီးပွားရေး (ci:pwa:re:)? Sorry, if this question is silly.--Anatoli (обсудить/вклад) 05:53, 23 May 2014 (UTC)Reply
It is 'r' in the MLC system, regardless of its pronunciation. Wyang (talk) 06:13, 23 May 2014 (UTC)Reply

ကိုယ့်မင်းကိုယ့်ချင်း (kuiy.mang:kuiy.hkyang:) fails badly in testcases and the whole module fails. --Anatoli (обсудить/вклад) 02:39, 26 May 2014 (UTC)Reply

How should the final in the first syllable ကိုယ့် (kuiy.) be transliterated? uiy., uii.? Wyang (talk) 23:39, 26 May 2014 (UTC)Reply

I don't know, sorry :( I was just letting you know that some words are not transliterated. --Anatoli (обсудить/вклад) 00:18, 27 May 2014 (UTC)Reply

Belated comments

[edit]

For some reason I didn't get pinged when Wyang pinged me above, so I didn't know this module even existed until right now! I'm actually happy keeping it MLC and am hoping to make it the new de-facto standard for Burmese here at Wiktionary. BGN-PCGN would be hard to automate because it's pronunciation-based rather than spelling-based. ယောက်ျား is a weird spelling; it's basically a contraction of ယောက်ကျား and I guess it should be transliterated as if it were spelled that way. I'd transliterate ကိုယ့် (kuiy.) "kuiy.". —Aɴɢʀ (talk) 17:28, 4 June 2014 (UTC)Reply

Thanks, I have changed it to 'uiy'. Now we need to fill the following table of finals...
IPA MLCTS ALA-LC BGN-PCGN Okell
က [ka̰], [kə] ka.
ဂါ [ɡà] ga
ဂါး [ɡá] ga:
ကာ [kà] ka
ကား [ká] ka:
ကက် [kɛʔ] kak
ကင် [kɪ̀ɴ] kang
ကင့် [kɪ̰ɴ] kang.
ကင်း [kɪ́ɴ] kang:
ကစ် [kɪʔ] kac
ကည် [kì], [kè], [kɛ̀] kany
ကဉ် [kɪ̀ɴ]
ကည့် [kḭ], [kḛ], [kɛ̰] kany.
ကဉ့် [kɪ̰ɴ]
ကည်း [kí], [ké], [kɛ́] kany:
ကဉ်း [kɪ́ɴ]
ကတ် [kaʔ] kat
ကန် [kàɴ] kan
ကန့် [ka̰ɴ] kan.
ကန်း [káɴ] kan:
ကပ် [kaʔ] kap
ကမ် [kàɴ] kam
ကမ့် [ka̰ɴ] kam.
ကမ်း [káɴ] kam:
ကယ် [kɛ̀] kai
ကံ [kàɴ] kam
ကံ့ [ka̰ɴ] kam.
ကံး [káɴ] kam:
ကိ [kḭ] ki.
ကိတ် [keɪʔ] kit
ကိန် [kèɪɴ] kin
ကိန့် [kḛɪɴ] kin.
ကိန်း [kéɪɴ] kin:
ကိပ် [keɪʔ] kip
ကိမ် [kèɪɴ] kim
ကိမ့် [kḛɪɴ] kim.
ကိမ်း [kéɪɴ] kim:
ကိံ [kèɪɴ] kim
ကိံ့ [kḛɪɴ] kim.
ကိံး [kéɪɴ] kim:
ကီ [kì] ki
ကီး [kí] ki:
ကု [kṵ] ku.
ကုတ် [koʊʔ] kut
ကုန် [kòʊɴ] kun
ကုန့် [ko̰ʊɴ] kun.
ကုန်း [kóʊɴ] kun:
ကုပ် [koʊʔ] kup
ကုမ် [kòʊɴ] kum
ကုမ့် [ko̰ʊɴ] kum.
ကုမ်း [kóʊɴ] kum:
ကုံ [kòʊɴ] kum
ကုံ့ [ko̰ʊɴ] kum.
ကုံး [kóʊɴ] kum:
ကူ [kù] ku
ကူး [kú] ku:
ကေ [kè] ke
ကေ့ [kḛ] ke.
ကေး [ké] ke:
ကဲ [kɛ́] kai:
ကဲ့ [kɛ̰] kai.
ဂေါ [ɡɔ́] gau:
ဂေါက် [ɡaʊʔ] gauk
ဂေါင [ɡàʊɴ] gaung
ဂေါင့် [ɡa̰ʊɴ] gaung.
ဂေါင်း [ɡáʊɴ] gaung:
ဂေါ့ [ɡɔ̰] gau.
ဂေါ် [ɡɔ̀] gau
ကော [kɔ́] kau:
ကောက် [kaʊʔ] kauk
ကောင် [kàʊɴ] kaung
ကောင့် [ka̰ʊɴ] kaung.
ကောင်း [káʊɴ] kaung:
ကော့ [kɔ̰] kau.
ကော် [kɔ̀] kau
ကို [kò] kui
ကိုက် [kaɪʔ] kuik
ကိုင် [kàɪɴ] kuing
ကိုင့် [ka̰ɪɴ] kuing.
ကိုင်း [káɪɴ] kuing:
ကို့ [ko̰] kui.
ကိုး [kó] kui:
ကွတ် [kʊʔ] kwat
ကွန် [kʊ̀ɴ] kwan
ကွန့် [kʊ̰ɴ] kwan.
ကွန်း [kʊ́ɴ] kwan:
ကွပ် [kʊʔ] kwap
ကွမ် [kʊ̀ɴ] kwam
ကွမ့် [kʊ̰ɴ] kwam.
ကွမ်း [kʊ́ɴ] kwam:

Wyang (talk) 07:03, 5 June 2014 (UTC)Reply

MLC doesn't distinguish between ဉ and ည, so ကဉ် and ကည် are both kany. I can add the ALA-LC translits to the table above if you like (they're all at Appendix:Burmese transliteration anyway), but I really don't think there's any point in automating BGN-PCGN and Okell since they're pronunciation-based, not orthography-based. —Aɴɢʀ (talk) 15:28, 5 June 2014 (UTC)Reply
Those two are pronunciation-based, but largely predictable, no? Minor syllables, voicing, etc. I think can be taken into account using special symbols. Ideally, {{my-roman}} could be fully automated, using nothing or an annotated phonetic respelling as input. Wyang (talk) 05:43, 6 June 2014 (UTC)Reply
I reckon they're predictable from the spelling about 75% of the time. An open, creaky-voiced syllable followed by another syllable is usually a minor syllable (in other words, တကာ is far more likely to be /təkà/ than /ta̰kà/). As long as the templates used outside Burmese entries ({{t}}, {{l}}, {{m}}, etc.) use only MLC, then I don't mind automating {{my-roman}}. If the phonetic transcriptions are predictable from the orthography, then we don't have to add anything; if not, we can add a single phonetic transcription as a parameter to {{my-roman}} and have both BGN-PCGN and Okell be predicted from that. I'd recommend using BGN-PCGN as that single transcription, but substituting "@" for "ă" and "E" for "è" so that everything can be typed from an unmodified keyboard. —Aɴɢʀ (talk) 18:50, 6 June 2014 (UTC)Reply
Thanks, I see what you mean. Do you think it would be better to use BGN-PCGN as the transcription input, or an annotated Burmese-script word (like the examples in my first post)? It would only be used in {{my-roman}} if the romanisation used in linking templates is MLC. Wyang (talk) 23:53, 11 June 2014 (UTC)Reply
Well, it would certainly make my life easier if the transcription input consisted only of characters I can get on an unmodified ASCII keyboard. —Aɴɢʀ (talk) 14:31, 12 June 2014 (UTC)Reply

Abbreviations

[edit]

The following abbreviations should be added to the module please: ['၌']='hnai.', ['၍']='rwe'. Thanks! —Aɴɢʀ (talk) 20:58, 7 August 2014 (UTC)Reply

Added now. Wyang (talk) 23:36, 7 August 2014 (UTC)Reply

Adding a hyphen

[edit]

@Wyang: could you please add a function to insert a hyphen before a syllable-initial vowel when it's preceded by a consonant letter? For example, ဝက်အူ is currently transliterated as waku (ဝက်အူ (wak-u)), but it should really be wak-u. (The transliteration waku would be appropriate only for "ဝကူ".) Thanks! —Aɴɢʀ (talk) 21:31, 3 December 2016 (UTC)Reply

Hi Angr. No problem - I added the hyphen for these cases. Wyang (talk) 22:49, 3 December 2016 (UTC)Reply