Module talk:mn-IPA
Add topicLong vowels
[edit]Thanks, @Octahedron80. I think you need to use "gsub" for long vowels: text = mw.ustring.gsub(text, 'аа', 'aː')
, etc. --Anatoli T. (обсудить/вклад) 04:49, 29 April 2019 (UTC)
- I already solved that. I had problem with imperfect regex here. --Octahedron80 (talk) 04:50, 29 April 2019 (UTC)
Some rules
[edit]@Crom daba: User:Octahedron80 has started this module, which also works with respellings, e.g. {{mn-IPA|баярлаа}}
(e.g. in баярлалаа (bajarlalaa)). It's a good start!
Could you define the rules for dropping the final vowels, if there are any, or should every entry have a respelling? Fore example, are pronunciations of энэ (ene), байна (bajna) regular where the final vowel is just a signal to pronounce the last н as /n/?
What are the shwa rules, e.g. "а" or "и", pronounced as "ə", if they exist? Should such syllables/vowels be marked as reduced somehow? --Anatoli T. (обсудить/вклад) 06:08, 29 April 2019 (UTC)
- A thing to do is add a stress mark, e.g.
{{mn-IPA|авто́бус}}
in автобус (avtobus) should produce /aw̜ˈtʰɔpʊs/. --Anatoli T. (обсудить/вклад)- See my exchange with AWESOME meeos ! on my talk page, and also the preliminary document. Crom daba (talk) 10:16, 29 April 2019 (UTC)
- @Crom daba: OK, thanks. It's a good document. I see the rules are rather complicated. Do you think the phonemic module is good as is - dos it render the phonemes accurately? It can be tweaked over time to become more phonetic. e.g. there's no reason why devoicing can be rendered by the module, etc. Suppose some of the things can be automated and we could add a special characters to show shwa, respell when vowels are dropped or inserted or otherwise pronounced differently. I've got Bayarmandakh & Gaunt now + the audio. Reading and listening when I have time.
- See my exchange with AWESOME meeos ! on my talk page, and also the preliminary document. Crom daba (talk) 10:16, 29 April 2019 (UTC)
- @Octahedron80 Thanks for the great work so far. Could you please add a pronunciation test module, so we can add some test cases based on feasible rules, if it's OK? The test cases should ideally have the respelling parameter, e.g. баярлалаа (bajarlalaa) -> respelled "баярлаа". --Anatoli T. (обсудить/вклад) 10:48, 29 April 2019 (UTC)
- @Atitarev: I created a testcases module. — Eru·tuon 02:25, 30 April 2019 (UTC)
- @Erutuo: Hi, thanks. How do I use respellings, e.g. монгол хэл (mongol xel) respelled as "монгәл хэл" but linked to монгол хэл? --Anatoli T. (обсудить/вклад) 05:37, 30 April 2019 (UTC)
- @Atitarev: You add them in a third parameter, like this. — Eru·tuon 05:51, 30 April 2019 (UTC)
- @Erutuo: Hi, thanks. How do I use respellings, e.g. монгол хэл (mongol xel) respelled as "монгәл хэл" but linked to монгол хэл? --Anatoli T. (обсудить/вклад) 05:37, 30 April 2019 (UTC)
- @Atitarev: I created a testcases module. — Eru·tuon 02:25, 30 April 2019 (UTC)
- @Octahedron80 Thanks for the great work so far. Could you please add a pronunciation test module, so we can add some test cases based on feasible rules, if it's OK? The test cases should ideally have the respelling parameter, e.g. баярлалаа (bajarlalaa) -> respelled "баярлаа". --Anatoli T. (обсудить/вклад) 10:48, 29 April 2019 (UTC)
- (edit conflict)
- Basically, all "short vowels" outside the first syllable are reduced. The distribution of reduced vowels is only marginally phonemic, they are inserted to break up consonants in a way that satisfies Mongolian phonotactics:
- Reduced vowels can't stand at an absolute final position.
- Voiced consonants (/b w ɢ g r n m ɮ ŋ/) cannot be surrounded by consonants on both sides and cannot stand between a consonant and the end of the word.
- Multiple unvoiced consonants cannot form a coda by themselves, except for /stʰ xtʰ st͡ɕʰ xt͡ɕʰ/.
- Reduced vowels shouldn't be inserted when not phonotactically neccessary, except in the case of infinitives, which always end in /əx/ (after consonants).
- Cyrillic Mongolian also follows these rules, except it breaks the first to distinguish /n ɢ/ from /ŋ g/ баг (bag /pag/), бага (baga /paɢ/), and to uphold the second against its exceptions.
- So you have often have cases such as арга (arga /arəɢ/), which cannot be spelled *арага (*araga) due to the last economy principle which thus neccessitates that the module performs the syllabification itself. Crom daba (talk) 11:04, 29 April 2019 (UTC)
- Basically, all "short vowels" outside the first syllable are reduced. The distribution of reduced vowels is only marginally phonemic, they are inserted to break up consonants in a way that satisfies Mongolian phonotactics:
- (edit conflict)
- @Crom daba Side note: have a look at the shoddy draft I made on Wikipedia as well — oi yeah nah mate amazingJUSSO ... [ɡəˈdæɪ̯]! 08:23, 1 May 2019 (UTC)
- Also pinging @Octahedron80 — oi yeah nah mate amazingJUSSO ... [ɡəˈdæɪ̯]! 08:25, 1 May 2019 (UTC)
Some problems
[edit]Here are some inconsistencies with the way I entered pronunciations thus far:
- Non-initial "short vowels" are written as full vowels instead of reduced. хавар /xaw̜ar/ vs /xawər/.
- ө is written as /ɵ/, which is more precise phonetically, but hides the phonemical connection with өө /oː/. төгрөг /tʰɵɡrɵɡ/ vs /tʰoɡrəɡ/.
- Using /t͡ʃ/ instead of /t͡ɕ/, either way is fine, since both are supported by the literature, but I've used /t͡ɕ/ so far.
- Russian words generally are not ruled by the same rules as older vocabulary, see Appendix:Russian_loanwords_in_Mongolian, respelling might help here, but I think that linking to the appendix is vital since the pronunciation varies too much.
- Vowel + й combinations are not /Vj/ as in Russian, but represent (phonemic) diphthongs /Vi/ [Ve], there is also coda /j/ that is written as vowel + я е ё /Vj/ [Vi].
- I've generally avoided adding pronunciations for multi-word terms, since there seem to be some liaison effects especially with enclitics/phonemic suffixes, and they are poorly understood (by me and possibly in general). One known effect is that final /ŋ/ assimilates by PoA to the following word so чихрийн манжин should probably be something like /t͡ɕʰixrim mant͡ʃəŋ/.
Crom daba (talk) 12:12, 29 April 2019 (UTC)
- Before further answers, please see also w:Mongolian_language & w:Help:IPA/Mongolian.
- Schwa is unpredictable and cannot put into logic. So manual Cyrillic schwa (U+04D9) respelling is needed. (Or do you want another letter?)
- I think the connection is not important since /o/ is not a phoneme of the language. (It's an allophone).
- /ʃ/ and other affricates are used as stated in Wikipedia.
- Respelling is actually needed for unusual pronunciation.
- /Vi/ (diphthong) or /Vj/ (semivowel), they are really exchangeable. No one can distinguish one from another. By the way, I will use /Vi/ then. (Even it is harder to manipulate.)
- That will need respelling if it applies.
Module is very helpful but it is not able to solve every cases. Users must make some action. --Octahedron80 (talk) 02:17, 30 April 2019 (UTC)
- @Octahedron80: First, thank you for putting in work to create this module.
- Schwa is completely or almost completely predictable from the Cyrillic spelling, what the module marks as short vowels in non-first syllables, should be reduced vowels, so будаг (budag) needs to be /pʊtəg/ not */pʊtag/ (this would be будааг). Due to orthographical silent vowels and other such complications, however, this is not a simple function to implement in a script so I understand your frustration.
- Yes, we could mark this phoneme with either allophone, but I've been using /o/ following the authors of The Phonology of Mongolian.
- We might as well use /t͡ʃ/ (Svantesson et al use /č/ which isn't proper IPA and call it the alveopalatal affricate), but I've been using /t͡ɕ/ so far.
- Acoustically, I would agree that /Vi/ and /Vj/ can be indistinguishable, but phonemically, Mongolian does make a difference between /Vi/ and /Vj/, so that ай (aj, “category”) /ai/ [æe] is not a homophone of ая (aja, “sound”) /aj/ [ai].
- I am not sure how useful would be a pronunciation module that requires that a knowledgeable editor respell the word. It is much simpler for me to just spell it out in IPA than in meta-Cyrillic, making this module perform what is required is a big undertaking and I'm sorry I didn't warn you earlier, I tried disuading Awesomemeeos before for the same reasons. Crom daba (talk) 15:30, 30 April 2019 (UTC)
- I think the module could be made to work for most of the phonemic IPA but the phonetics could be added manually. It’s a lot of work but we’re not in a hurry. Frequent respellings are fine if we cover at least the core vocabulary. Some pronunciation modules are quite complex, consider the Russian module or some Asian languages and dialects. Some significant progress is already made by Octahedron80 and we can seek help too. Wyang has left, unfortunately, at least for now but we have Benwing2 and others with experience in this. —Anatoli T. (обсудить/вклад) 21:51, 30 April 2019 (UTC)
- The problem with frequent respellings is that they defeat the purpose of having a pronunciation module, I'd rather write IPA than try to piece together an un-Mongolian looking Cyrillic spelling that the module will process, and people who don't understand Mongolian phonology will either avoid using it, or use it unknowingly displaying incorrect information.
- I support the development of this module since it could eventually be very useful, and it could be a learning experience and a documentation of conventions in the meanwhile, but at the present it isn't very useful and it could be potentially harmful.
- I suggest curtailing the depoloyment of the module in the mainspace for the time being, at least until it is able to handle all or most of the native vocabulary automatically. I can add some test cases to help the process along. Crom daba (talk) 00:40, 1 May 2019 (UTC)
- Yes, it's a good idea to reduce or stop deployment. Lets' sort out the main problems first. I might add some uses when I'm certain. Certain things can be parameterised, such as, syllable number reduced or stressed (stress mark). Dual pronunciations can also be marked. It's common with other languages. Mongolian is more phonetic than I originally thought. I am sure it's doable. --Anatoli T. (обсудить/вклад) 00:56, 1 May 2019 (UTC)
- I think the module could be made to work for most of the phonemic IPA but the phonetics could be added manually. It’s a lot of work but we’re not in a hurry. Frequent respellings are fine if we cover at least the core vocabulary. Some pronunciation modules are quite complex, consider the Russian module or some Asian languages and dialects. Some significant progress is already made by Octahedron80 and we can seek help too. Wyang has left, unfortunately, at least for now but we have Benwing2 and others with experience in this. —Anatoli T. (обсудить/вклад) 21:51, 30 April 2019 (UTC)
@Crom daba, Octahedron80: Should it be /ɡ/ or /ɢ/ here in шалгалт (šalgalt)? --Anatoli T. (обсудить/вклад) 23:52, 29 April 2019 (UTC)
- @Atitarev: /ɢ/ is always found in back-harmonic words before vowels, except in inflection of words ending in /g/ or supposedly when preceeded by two consonants (Kullman claims this, but I'm not so sure about it). Crom daba (talk) 14:18, 30 April 2019 (UTC)
Final /n/
[edit]@Octahedron80: Hi. We need to respell final /n/ with silent vowel letters without generating a /ŋ/. How can I do it? E.g. in сайн байна уу (sajn bajna uu) /saiŋ pain ʊː/the first "н" is "ŋ" because it's final and the 2nd "н" is "n" and the final "-а" in "байна" is silent. --Anatoli T. (обсудить/вклад) 23:58, 29 April 2019 (UTC)
- These rules are such overlapping each other; I cannot simply replace either na>n then n>ŋ or n>ŋ then na>n. Somehow I must use extra symbol to help. I think it is work now. Please check. --Octahedron80 (talk) 02:43, 30 April 2019 (UTC)
- @Octahedron80: Thanks, I don't quite understand what you did but I think any final short vowel is always silent after "н" in the pattern "vowel+n+short vowel", e.g. энэ (ene) is pronounced /en/. @Crom daba: Is that right? I don't understand why "э" is often /i/, it's currently /in/ in энэ (ene). Is it regular? --Anatoli T. (обсудить/вклад) 05:46, 30 April 2019 (UTC)
- @Atitarev:, сайн байна уу should perhaps better be written as /saim pain ʊː/, I've previously put this assimilation as a phonetic rather than a phonemic feature, but I now believe phonemic would be better.
- /e/ merges completely with /i/ in Ulaanbaatar, but is separate elsewhere in Mongolia (and Inner Mongolia especially). Ulaanbaatar variety has the most thorough descriptions and is what our users are most likely to encounter, but keeping the /e/ unmerged on a phonemic (supra-dialectal) level seems advisable.
- Final short vowels are always silent. Crom daba (talk) 14:28, 30 April 2019 (UTC)
- I think these can be added as new legitimate test cases. We should probably also make parameterised calls to
{{mn-IPA}}
, defaulting to Ulaanbaatar or somewhere else, plus Inner Mongolia. We probably won’t have much data outside the capital of Mongolia. —Anatoli T. (обсудить/вклад) 21:38, 30 April 2019 (UTC)- Ulaanbaatar accent is presumably the closest to literary Cyrillic Mongolian among actually spoken varieties, but I'd prefer our phonemic transcriptions not to be tied to a locality. When it comes to narrow transcription, we can only speak of Ulaanbaatar with sufficient certainty.
- I have some materials on Khalkha dialects and dialects of Inner Mongolia as well, and I'm lacking understanding of the data rather than data itself. They seem to be somewhat predictable from the (Central Khalkha based) Cyrillic spelling (except for the famously conservative Ordos Mongolian), but there also seem to be preservation of old final short vowels in some words, or sometimes unetymological addition, which makes me doubt the quality of the materials. Crom daba (talk) 02:09, 1 May 2019 (UTC)
- I think these can be added as new legitimate test cases. We should probably also make parameterised calls to
- @Octahedron80: Thanks, I don't quite understand what you did but I think any final short vowel is always silent after "н" in the pattern "vowel+n+short vowel", e.g. энэ (ene) is pronounced /en/. @Crom daba: Is that right? I don't understand why "э" is often /i/, it's currently /in/ in энэ (ene). Is it regular? --Anatoli T. (обсудить/вклад) 05:46, 30 April 2019 (UTC)
@Octahedron80 Re: diff I don't think the last syllable is long: /tʰakʰsiː/ but there is a distinct stress on the last syllable /tʰakʰˈsi/, imitating the Russian pronunciation, not like regular Mongolian words. "и" is probably a semi-long vowel by nature in Mongolian - it's similar to Russian /i/. Would be great to have accent marks e.g. "такси́" in re-spellings in the near future. --Anatoli T. (обсудить/вклад) 07:27, 30 April 2019 (UTC)
- I must put 'ии' because the first rule that final short vowel 'и' will be dropped; it affects this and becomes /tʰakʰs/. Будда also same situation. --Octahedron80 (talk) 07:38, 30 April 2019 (UTC)
- @Octahedron80 Будда (Budda) is definitely short in Bolor toli but the stress is on the last syllable: [pʊtˈta]. --Anatoli T. (обсудить/вклад) 07:42, 30 April 2019 (UTC)
- Have a solution, put ~ at the end of respelling to prevent vowel cutoff. Also stress mark is work either. --Octahedron80 (talk) 08:21, 30 April 2019 (UTC)
- There are no long vowels after the first syllable, at least when using the analysis from Svantesson et al, which I've been doing so far. "Short vowels" (orthographical single vowels) in non-first syllables are reduced vowels /ə/ (when not merely orthographic as in cases listed above) and "long vowels" (orthographic doubled vowels) are full vowels.
- Будда is a Russian loan (the native term is Бурхан), the stress isn't phonemic, but a consequence of the second syllable being non-reduced (it is however non-reduced because it is stressed in Russian). One might also wonder how the дд is pronounced by most Mongols, in native vocabulary geminates occur only across morpheme boundaries and either have an audible release between them, or are reduced to single stops, while in the Bolor Toli recording, we hear an actual long consonant. Crom daba (talk) 15:03, 30 April 2019 (UTC)
- In Russian, Бу́дда (Búdda) is stressed on the first syllable, the Mongolian stress is their own invention. —Anatoli T. (обсудить/вклад) 21:32, 30 April 2019 (UTC)
- Have a solution, put ~ at the end of respelling to prevent vowel cutoff. Also stress mark is work either. --Octahedron80 (talk) 08:21, 30 April 2019 (UTC)
- @Octahedron80 Будда (Budda) is definitely short in Bolor toli but the stress is on the last syllable: [pʊtˈta]. --Anatoli T. (обсудить/вклад) 07:42, 30 April 2019 (UTC)
New test cases
[edit]@Octahedron80, Crom daba: Please check and review my latest relatively simple new or modified test cases. I have put my comments for redefined or new cases. I think we need to target just one standard (Ulaanbaatar) of pronunciation and worry about other regional variations later.
Changes:
- predictable stress in монгол (mongol), if the stress is unpredictable, use a stress mark as in автобус (avtobus), respelled "афто́бус".
- implement Ulaanbaatar realisation of "э" - /i/, so хэл (xel) should be /xiɮ/, shall we just change the table?
- Assimilation of /n/ before /b/ or /p/ = /m.../
- энэ (ene): { 'энэ', 'in', 'инэ' }, -- forced Ulaanbaatar realisation of "э" as /i/, final short vowel dropped
- энэ (ene): { 'энэ', 'in' }, -- automatic Ulaanbaatar realisation of "э" as /i/, final short vowel dropped
- монгол хэл (mongol xel): { 'монгол хэл', 'ˈmɔŋɡəɮ xiɮ', 'мо́нгәл хил' }, -- with respelling, stress mark and shwa, forced Ulaanbaatar realisation of "э" as /i/"
- монгол хэл (mongol xel): { 'монгол хэл', 'ˈmɔŋɡəɮ xiɮ'}, -- regular and predictable stress, vowel reduction (ə), automatic Ulaanbaatar realisation of "э" as /i/
- сайн байна уу (sajn bajna uu): { 'сайн байна уу', 'saim pain ʊː' }, -- predictable assimilation /n/ + /b/ or /p/ = /m.../
No pressure to implement, just say if you agree with the cases first. --Anatoli T. (обсудить/вклад) 12:43, 1 May 2019 (UTC)
- Is /f/ in автобус (avtobus) predictable? If not, let's leave it with the respelling "афто́бус" where stress is also unpredictable. --Anatoli T. (обсудить/вклад) 12:50, 1 May 2019 (UTC)
- I found that that the YouTube video code: "XznEUUwQI-I", "1 MON. Vowels - Mongolian Alphabet" is a perfect introduction to the standard Mongolian pronunciation (Ulaanbaatar), I think we should implement the diphthongs as well, an easy win!. --Anatoli T. (обсудить/вклад) 13:25, 1 May 2019 (UTC)
About 4, I add a condition if there is no stress mark, assume the stress mark on first syllable. But this leads more errors because every word has stress mark in front. Are you sure to want to be like this? --Octahedron80 (talk) 22:55, 1 May 2019 (UTC)
- @Octahedron80: Thank you again for the efforts. I think it's up to us (we can decide ourselves). Normally, monosyllabic words don't require a stress mark but it doesn't hurt if we have them. If we decide so, we can just update the test cases accordingly. Whatever is easier? @Crom daba: What do you think? We may need to define simple syllabification rules and stress rules. --Anatoli T. (обсудить/вклад) 00:03, 2 May 2019 (UTC)
- /w/ is (like other segments) devoiced before aspirated consonants (it is unclear whether /s/ is an aspirated consonant) and ofter surfaces as [f].
- Монгол is /mɔŋɢəɮ/.
- Stress is predictable from the shape of the word, but it might be useful for Russian loans, here are some examples of adaptation to Mongolian phonology from The Phonology of Mongolian. Crom daba (talk) 19:44, 2 May 2019 (UTC)
@Octahedron80, Crom daba I want to define test cases for letter "г" but I want to clarify.
Should we use mostly use /ɡ/ and /ɢ/ only when letter "г" is followed by a masculine vowel ("−ATR or "back vowel"), even if it's silent, e.g. in бага (baga): /ˈpaɢ/?
Masculine vowels are: а, о, у, ы (IPA a, ɔ, ʊ, i), so цагаан (cagaan) should be /ˈt͡saɢaːŋ/, монгол (mongol) should be /ˈmɔŋɢəɮ/ and тогос (togos) should be /ˈtʰɔɢəs/, etc.? The letter sounds differently in монгол (mongol), though. It sounds like a normal English /ɡ/ in recordings, not /ɢ/. --Anatoli T. (обсудить/вклад) 13:36, 2 May 2019 (UTC)
- цагаан (cagaan) is /t͡sʰaɢaŋ/, doubled vowels in non-first syllables are not long since they are generally up to twice shorter than long vowels of the first syllable (although they are longer than first-syllable short vowels) and өө is realized as [ɵ] in non-first syllables, like first-syllable short ө and unlike long өө [o:].
- монгол (mongol) sounds uvular (or at least post-velar) on the Bolor Toli recording to me.
- {m|mn|тогос}} is /tʰɔɢəs/, but note that /ɢ/ can form clusters, so багатай (bagataj) is /paɢtʰai/. Crom daba (talk) 20:17, 2 May 2019 (UTC)