Module talk:mr-IPA

From Wiktionary, the free dictionary
Latest comment: 4 years ago by Kutchkutch in topic Deployment (again)
Jump to navigation Jump to search

@Aryamanarora Wow, I wasn't expecting the implementation to happen so quickly! Thanks!

I was wondering what you think about the following:

Done Done At what point do you think this module could be moved to Mod:mr-IPA for actual use on entries? Could that be done soon, or would the:

Done Done ċ, j̈, j̈h
¯\_(ツ)_/¯ syllable boundary
¯\_(ツ)_/¯ declined/inflected-form
issues have to be resolved first?

I thought the IPA values for most of the letters would be very similar to Hindi other than ċ, j̈, j̈h + ळ. According to the Wikipedia page Marathi phonology:

Done Done आ is /a/ instead of /aː/ or /ɑː/: /a/ is sometimes interchangable with /ə/ so perhaps /a/ is fine. I've avoided making some articles because both versions are found: हलणे = हालणे (to move) , हरणे (harṇe) = हारणे, हंबरणे (hambarṇe) = हांबरणे, etc.

Done Done इ is /i/ instead of /ɪ/, उ is /u/ instead of /ʊ/: I agree with this because it shows that they are closer to the long high vowels ई, ऊ.

Done Done ए is /e/ instead of /eː/: /eː/ makes sense for Hindi since it has /ɛː/ for ऐ or for South Indian languages that contrast short and long /e/, but I see no need for ː for Marathi.

Done Done न is /n̪/ instead of /n/: This looks like an unnecessary hypercorrection that makes it look like न is dental like in Sanskrit or Malayalam.

Done Done श is /ɕ/ instead of /ʃ/, च (c pronunciation) is /t͡ɕ/ instead of /t͡ʃ/, ज (j pronunciation) is /d͡ʑ/ instead of /d͡ʒ/: This makes this look more like an East Asian language. Is this plausible?

Kutchkutch (talk) 04:26, 8 November 2017 (UTC)Reply

@Kutchkutch: Looks accurate to me; so that means Classical Sanskrit श, च, ज (śa, ca, ja) were actually identical in pronunciation to Marathi! That's totally plausible. —Aryaman (मुझसे बात करो) 15:53, 8 November 2017 (UTC)Reply
@Aryamanarora: Thanks for that insight! There was a time when Marathi phonology said (ja) (j pronunciation) was ɟʝə, but that's been fixed there by User:Kwamikagami as the user noted at w:Talk:Marathi_phonology. That looked really strange.
¯\_(ツ)_/¯ Dhongde & Wali says "[ʂ] does not occur in modern Marathi, its symbol is retained only in writing." This suggests should be whatever is for ordinary use.
¯\_(ツ)_/¯ The etymology-based pattern of problematic issues first seen at Module talk:mr-translit/testcases might apply elsewhere too. Native, Sanskrit, Perso-Arabic, and English appear to be separate paradigms that may overlap or assimilate into the native paradigm. The Perso-Arabic borrowings are usually older and there are few modern borrowings so those words are the most likely to assimilate into the native paradigm. Sanskrit borrowings, of course, are for prestige and neologisms. English borrowings may also have prestige, and now almost any English word could be borrowed.

Kutchkutch (talk) 23:36, 8 November 2017 (UTC)Reply

Done Done @Aryamanarora: Until now the expected result for the शहर (śahar) testcase under was a guess. The actual rule is in "2.6.11.3 Word-medial [h] is optionally deleted" in Dhongde & Wali and there's a related rule "2.6.3 Aspiration" for the murmured consonants. So if more accuracy is needed I might add a few testcases based on those rules. Kutchkutch (talk) 02:14, 9 November 2017 (UTC)Reply
@Kutchkutch: Great, I'll update the module as you add them. Wow, I used to be so bad with Lua but I'm kind of getting the hang of it. —Aryaman (मुझसे बात करो) 02:15, 9 November 2017 (UTC)Reply
@Aryamanarora: As always, Thanks for your willingness to help out with your abilities!
Since I was intrigued by the शहर (śahar) testcase, I added the additional testcases in Dhongde & Wali and explained the rules as I understood them. However, I realised at the end of the exercise that like Gujarati (Gujarati_phonology#Murmur) there is a formal pronunciation without ह deletion that mirrors the spelling and a casual pronunciation with ह deletion.
The module currently shows the correct formal transcription
Fixed (except चेहरा since eʱ is a murmured vowel, and Marathi has no murmured vowels).
If /m/, /n/, /l/, /ʋ/, /ɾ/ or /d͡ʑ/, /d̪/, /b/ does not occur as C in CV₁ɦV₂, parentheses could be used to show both the formal and casual pronunciations at the same time.
If /m/, /n/, /l/, /ʋ/, /ɾ/ or /d͡ʑ/, /d̪/, /b/ does occur as C in CV₁ɦV₂ and V₂ ≠ /o/, then CV₁ɦV₂ → C(ʱ)V₁V₂ (V₁ = /ə/ or /a/; if V₂ ≠ /i/ or /iː/, then /ə/ → /∅/) is the rule for obtaining the casual pronunciation. Conceptually,
Done Done Is there any way the module can show the formal transcription and the casual transcription simultaneously?
Done Done As for implementation, if it's too much work or too difficult, it's fine, and the formal transcription can be used. If you know how to implement the rule and only one pronunciation can be shown perhaps only the casual transcription should be shown since the transliteration is the formal/spelling pronunciation. If you know how the module can handle formal transcription and casual transcription, that would really be innovative! Kutchkutch (talk) 07:27, 9 November 2017 (UTC)Reply
@Kutchkutch: I think it definitely can be done. Why not keep the casual pronunciation in phonemic ([] instead of //) transcription? Let me try to implement something. —Aryaman (मुझसे बात करो) 23:32, 9 November 2017 (UTC)Reply
@Aryamanarora: Again, Wow! I'm really impressed with the results. Thanks! I thought the murmur rule might be too complicated or confusing to achieve.
Fixed I noticed that the actual results don't have any parentheses, and perhaps that's better for readability. So should all the parentheses be removed from the expected outcomes? Their original purpose was to show ह-deletion is optional, but that's may not be necessary.
¯\_(ツ)_/¯ If बाहेर (bāher) were syllabified it would probably have a syllable boundary between "a" and "e" since "ae" is not a diphthong and "æ" is only in English words, and I thought हाँग काँग (hŏṅga kŏṅga) was working for a while. However, these are really minor things compared to the implementation of that murmur rule.
Done Done I was actually wondering if an IPA module is for [------] or /-----/, and I think what you're saying is:
If a word's casual prounciation and spelling pronuciation vary, then use [------] instead of /-----/ when this module is used.
Done Done MOD:hi-IPA and T:hi-IPA don't appear to be adding // around the IPA transcription, but they appear in entries using them. So I haven't figured out if // is being added automatically or manually when {{hi-IPA}} is used in the Pronunciation section. Kutchkutch (talk) 03:57, 10 November 2017 (UTC)Reply
Done Done @Aryamanarora: I think I've found a way to resolve the issue shown by पलंग (palaṅga). Dhongde & Wali show सांग as 'saŋg' and say 'g' is optionally deleted so perhaps 'g' should remain even word-finally. I was assuming 'g' would be deleted based on English such as in /rɪŋ/ 'ring', but perhaps Marathi retains the 'g' and the 'ə' at the end might be an illusion caused by the word-final 'g'. By extension perhaps the optional ɦ in word-medial and word-final positions should remain as well. Kutchkutch (talk) 21:08, 10 November 2017 (UTC)Reply
@Kutchkutch: Hindi maintains final /g/ as well, so I was confused by that testcase. I can't say anything about /ɦ/, since I am unclear on what happens to them in Hindi too. —Aryaman (मुझसे बात करो) 21:27, 10 November 2017 (UTC)Reply

Deployment

[edit]

@Kutchkutch I'd like to get this deployed. Could you explain tell me the informal pronunciation variants of शहाणा (śahāṇā) and तहान (tahān) specifically? I'm a bit confused about the rules. —AryamanA (मुझसे बात करेंयोगदान) 23:10, 18 February 2018 (UTC)Reply

@AryamanA : Currently, the murmur+aspiration rule is:

In CV₁ɦV₂ (V₁ = /ə/ or /a/ and V₂ ≠ /o/)
If C is /m/, /n/, /l/, /ʋ/, /ɾ/ -- murmur class
or C is /d͡ʑ/, /d̪/, /b/ -- aspiration class
CV₁ɦV₂ → CʱV₁V₂ (If V₂ ≠ /i/ or /iː/ and V₁ = /ə/ then V₁ → ∅)
The subsequent ह-deletion rule that I forgot to add appears to be:
If C is not in either of those two classes (murmur or aspiration)
CV₁ɦV₂ → CV₁V₂
For V₁ = /ə/ and V₂ = /a/, V₁ → ∅
Done Done तहान /t̪ə.ɦan/ → [t̪an]
Done Done शहाणा /ɕə.ɦa.ɳa/ → [ɕa.ɳa]
Done Done पाह, सहन, शहर, चेहरा, पेहलवान have V₂ → /∅/ instead. Kutchkutch (talk) 12:37, 19 February 2018 (UTC)Reply
@Kutchkutch: Thanks! I'll get on it. I feel that murmur and aspiration classes are the same to be honest (both use /Cʱ/, which is murmuring, while aspiration is /Cʰ/). —AryamanA (मुझसे बात करेंयोगदान) 23:35, 19 February 2018 (UTC)Reply
@AryamanA: Yes, the two classes could be merged since voiced aspiration is really breathy/mumured voice. The reason for the separation was that /d͡ʑʱ/, /d̪ʱ/, /bʱ/ could be phonetically represented with a single letter: झ, ध, भ
What was meant by the last line is:
Done Done चेहरा /t͡ɕeɦ.ɾa/ → [t͡ɕe.ɾa]
Done Done पाह /paɦ/ → [pa]
Done Done शाह /ɕaɦ/ → [ɕa]
for C not in the two classes
Fixed “There are no word-final consonant-clusters except in words borrowed from English”
So perhaps the ह-deletion rule could be used to avoid the coda [ɦɾ], [ɦn] caused by the schwa-dropping in:
Done Done शहर /ɕə.ɦəɾ/ → [ɕəɾ]
Done Done सहन /sə.ɦən/ → [sən]
or these testcases could just have /ɕə.ɦəɾ/ and /sə.ɦən/ without ह-deletion or schwa-dropping if [ɕəɾ] and [sən] are too reduced.
Since there are no consonant clusters in codas (except perhaps homorganic nasals like सांग), रक्त would be [ɾәk.t̪ə], but it would probably need to be transliterated as rakta first.
Now that the module handles phonemic and phonetic IPA it could show the
Fixed /t̪s/ → [t͡sʰ] rule in उत्सव /ut̪.səʋ/ → [u.t͡sʰəʋ]
and /əʋ/ → [əu] in लवकर /ləʋ.kəɾ/ → [ləu.kəɾ], अवकाश /əʋ.kaɕ/ → [əu.kaɕ].
¯\_(ツ)_/¯ /ʂ/ → [ɕ] Kutchkutch (talk) 05:33, 20 February 2018 (UTC)Reply

Deployment (again)

[edit]

@Kutchkutch Okay, so I've moved this module to namespace. Following the precedent of MOD:hi-IPA we won't show any of the aspiration/murmur assimilations in the broad transcription //, so that means for that the module is ready. I'll be implementing the aspiration rules for narrow transcriptions, as well as all that was previously discussed, soon. I want to note some useful things I have added:

  • The nuqta can be used in respelling to indicate j̈/j̈h/ċ/ċh
  • Like hi-IPA, the asterisk * can be used to force schwa insertion.

This way, mr-IPA can use the Devanagari script entirely. —AryamanA (मुझसे बात करेंयोगदान) 01:55, 23 September 2020 (UTC)Reply

Also, I have removed vowel length indication in the broad transcription, since vowel length is not contrastive in Marathi. It will be present in the narrow one. —AryamanA (मुझसे बात करेंयोगदान) 02:08, 23 September 2020 (UTC)Reply
@AryamanA: Thanks for your renewed interest! The development this infrastructure does depend on your interest and how much time you have available since you have the ability to understand both the language and coding aspects. Since {{R:mr:Dhongde-Wali 2009}} isn't thorough enough on declension, I started User:Kutchkutch/mr-decl but it's nowhere near completion yet. Perhaps a slow manual deployment as you've been doing so far is probably better for now compared to mass deployment by bot since mass deployment would reveal too many weaknesses at once. The nuqta for j̈/j̈h/ċ/ċh and * to force schwa insertion are certainly useful. See here for a paper on phonology. The study refers to Pandharipande (1997) and {{R:mr:Dhongde-Wali 2009}}. Unfortunately, I don't have access to Pandharipande (1997).
Fixed च़ and the * seem to work fine, but ज़ and झ़ appear as /d͡ʑ̈/ and /d͡ʑ̈ɦ/, which should be /d͡z/ and /d͡zʱ/
मत्सर (matsar) would expected to be: /mət̪.səɾ/, [mə.t͡sʰəɾ] according to {{R:mr:Dhongde-Wali 2009|15}}. /mət̪.səɾ/ is the default output, but {{mr-IPA|मछ़र}} results in /mə.t͡sɦəɾ/.
Fixed CC codas are not allowed according to {{R:mr:Dhongde-Wali 2009|18}} so according to that restriction मार्ग (mārga) would be /maɾ.ɡə/ instead of /maɾɡ/. Kutchkutch (talk) 08:36, 23 September 2020 (UTC)Reply
@AryamanA: After some manual deployment, the module seems to work in most cases for the broad transcription. मत्सर (matsar) and any other narrow transcription issues can be addressed later. Many of the broad transcription issues can be fixed with the manual intervention tools (such as *, -, .).
Fixed The asterisk * is a good way to address the issue with मार्ग (mārga), पलंग (palaṅga), etc. especially if it's not predictable.
Perhaps some of the broad transcription issues that require manual intervention are due to MOD:mr-translit (see Category talk:Konkani language).
Fixed -च (-ca) and आयते (āyte) are cases in which /t͡s/ and /aj.t̪e/ would be better compared to /t͡sə/ and /a.jə.t̪e/. /a.jə.t̪e/ can be manually fixed with {{mr-IPA|आय.ते}}.
Fixed डिवचणे (ḍivacṇe) and खेचणे (khecṇe) can be fixed with {{mr-IPA|डिवच़.णे}} and {{mr-IPA|खेच़.णे}}.
Fixed Many instances of word-medial ज़ and झ़ show /d͡ʑ/. This can be seen with अंदाज: /ən.d̪ad͡z/ and अंदाजे: /ən.d̪ad͡ʑ.̈e/. अंदाज (andāj) works but अंदाजे (andāje) doesn't work without manual syllabification. So, {{mr-IPA|माझ़े}}{{mr-IPA|मा.झ़े}} fixes the issue. Here are examples:
खाजवणे (khājavṇe), गाझियाबाद (gājhiyābād), गुजरात (gujrāt), आजकाल (ājkāl), उजवे (ujve), आजोबा (ājobā), मोजणे (mojṇe), तिजोरी (tijorī), ताजे (tāje), गुजराती (gujrātī), गाजर (gājar), अंदाजे (andāje), आजारी (ājārī)
Fixed The transliteration should have a 'a' in the second position in many words beginning with 'd'. For दरवाजा (darvājā), this absence of the 'a' in the second position of the transliteration leads to /d̪ɾ.ʋad͡ʑ.̈a/ in the automated IPA. There doesn't appear to be a fix for the IPA using the Devanagari script. Here are examples:
दगड (dagaḍ), दहशत (dahśat), दर्या (daryā), दहा (dahā), दही (dahī), दचकणे (dacakṇe) Kutchkutch (talk) 11:41, 24 September 2020 (UTC)Reply
@AryamanA: Thanks for all the work and addressing many of the issues. Perhaps best characterisation of codas is in that article.
Fixed Consonant clusters are not allowed in coda position. Word-final consonant clusters are therefore not allowed, except in borrowings from English such as ‘silk’ or ‘test’.
So, English borrowings such as ऑगस्ट (ŏgasṭa, August), पोर्ट (porṭa) in पोर्ट ब्लेयर (porṭa bleyar) and फ्रान्स (phrānsa) would be the exceptions to the rule ‘CC codas are not allowed’. Since modules have no way to know the etymology, perhaps there could be a hack in the Devanagari respelling to indicate English borrowings can have CC codas like the nuqta for च़, ज़ and झ़.
Fixed Although words with homorganic nasals are transcribed as /saŋɡ-/ in Dhongde and Wali for सांग- (sāṅga-), perhaps it would make more sense to have a word-final schwa at the end in these cases. In fact, गंज़ (gañj̈a, rust) is transcribed as /ɡən.zə/ in the Grezause paper. The existence of schwas following homorganic nasals may feel like an illusion but they're certainly there. So, if this weakness is to be emphasised, it could transcribed with a superscript schwa /ᵊ/. In that case:
क वर्ग
अंक (aṅka) would be /əŋ.kᵊ/
शंख (śaṅkha) would be /ɕəŋ.kʰᵊ/
सांग- (sāṅga-) would be /saŋ.ɡᵊ-/
पलंग (palaṅga) would be /pə.ləŋ.ɡᵊ/
संघ (saṅgha) would be /səŋ.ɡʱᵊ/
च/च़ वर्ग
पंच (pañca) would be /pən.t͡ɕᵊ/
मुंज (muñja) would be /mun.d͡ʑᵊ/
उंच़ (uñċa) would be /un.t͡sᵊ/
गंज़ (gañj̈a) would be /ɡən.d͡zᵊ/
ट वर्ग
वाळवंट (vāḷvaṇṭa) would be /ʋaɭ̆.ʋəɳ.ʈᵊ/
कंठ (kaṇṭha) would be /kəɳ.ʈʰᵊ/
लंड (laṇḍa) would be /ləɳ.ɖᵊ/
त वर्ग
पसंत (pasanta) would be /pə.sən.t̪ᵊ/
मंद (manda) would be /man.d̪ᵊ/
ग्रंथ (grantha) would be /ɡɾən.t̪ʰᵊ/
संबंध (sambandha) would be /səm.bən.d̪ʱᵊ/
प वर्ग
भूकंप (bhūkampa) would be /bʱu.kəm.pᵊ/
बंब (bamba) would be /bəm.bᵊ/
प्रारंभ (prārambha) would be /pɾa.ɾəm.bʱᵊ/
Compare तोंड (toṇḍa, neut sg): /t̪oɳ.ɖᵊ/ with तोंडं (toṇḍa, neut pl): /t̪oɳ.ɖə/ and Konkani तोण (toṇ).
Perhaps the following cases could retain the full schwa:
Fixed Geminates:
खुद्द (khudda): /kʰud̪.d̪ə/
शुद्ध (śuddha): /ɕud̪.dʱə/
घट्ट (ghaṭṭa): /ɡʱəʈ.ʈə/
भिन्न (bhinna): /bʱin.nə/
Fixed Consonant clusters:
मार्ग (mārga): /maɾ.ɡə/
कर्म (karma): /kəɾ.mə/
शब्द (śabda): /ɕəb.d̪ə/
पत्र (patra): /pət̪.ɾə/
वृक्ष (vŕkṣa): /ʋɾuk.ʂə/
महाराष्ट्र (mahārāṣṭra): /mə.ɦa.ɾaʂ.ʈɾə/
Done Done In word medial position, the first consonant of a cluster is assigned to the coda position of the preceding syllable and the rest of the cluster is assigned to the onset of the next syllable
¯\_(ツ)_/¯ There is a related process for verbs with a stem-final homorganic nasal such as नोंद-णे (nonda-ṇe) (not हंबर-णे (hambar-ṇe)).
verbal stem नोंद- /non.d̪ᵊ-/ + verbal suffix -णे /-ɳe/नोंदणे /non.d̪ᵊ.ɳe/.
{{R:mr:Dhongde-Wali 2009|27}} goes into further detail on how the consonant following the nasal and before -णे is deleted in the narrow transcription:
The voiced non-aspirated syllable-final stops /b/, /d̪/, /ɖ/ and /ɡ/ preceded by a homorganic nasal and followed by a nasal are deleted.
सांगणे (sāṅgṇe): /saŋ.ɡᵊ.ɳe/[saŋ.ɳe]
सांडणे (sāṇḍṇe): /saɳ.ɖᵊ.ɳe/[saɳ.ɳe]
नोंदणे (nondṇe): /non.d̪ᵊ.ɳe/[non.ɳe]
बांधणे (bāndhṇe): /ban.d̪ʱᵊ.ɳe/[ban.ɳe] (compare Old Marathi बानणे (bānaṇe))
कोंबणे (kombṇe): /kom.bᵊ.ɳe/[kom.ɳe]
The process appears to have gone even even further in Konkani from what User:Bhagadatta said at Category talk:Konkani language:
[For nasal + voiced stop] you get the corresponding nasal: for and , for and and so on. Following this, the voiced stop is then dropped (see the pronunciation of उंदिर (undir), भांगर (bhāṅgar)). Kutchkutch (talk) 12:15, 25 September 2020 (UTC)Reply
@Kutchkutch: Thank you so much for all this analysis, I have fixed all the bad errors: ज़/झ़/च़/छ़ are now handled correctly; syllabification should be okay now; final schwas now occur after clusters/geminates; y is treated as any other consonant (so we get āyte). English borrowings with final clusters will require a phonetic respelling with the virāma unfortunately, there's no easy way to automate it. I'll be starting the phonetic IPA implementation now. I wanted to ask, are there any words that are not verbs which end in -णे? If no, then I can easily implement the phonetic rules you gave at the end. If yes, verb info will have to be passed to the template. —AryamanA (मुझसे बात करेंयोगदान) 04:55, 26 September 2020 (UTC)Reply
@AryamanA: Thanks again for all the work.
-णे
¯\_(ツ)_/¯ Although most words that end in -णे are verbs, there are a considerable number of words that end in -णे that are not verbs (or derived directly from verbs). A few examples are below. Although Dhongde and Wali doesn't explicitly restrict the rule to just verbs, only च़ांदणे (ċāndṇe, moonlight) and ठेंगणे (ṭheṅgṇe, short, low)) in the following list would qualify. च़ांदणे (ċāndṇe, moonlight) is a poetic word that is usually pronounced very carefully. Therefore, this analysis indicates that that some info would have to be passed to the template if there's no way to automatically detect N[/b/, /d̪/, /ɖ/, /ɡ/]N.
Nouns:
उपरणे (uparṇe) (Hindi उपरना (uparnā))
गाणे (gāṇe, song) (Hindi गाना (gānā))
घराणे (gharāṇe) (Hindi घराना (gharānā))
च़ांदणे (ċāndṇe, moonlight)
ठाणे (ṭhāṇe) (> Thane, Hindi थाना (thānā))
नाणे (nāṇe, coin)
Other:
काणा (kāṇā, adj masc) lemmatised to काणे (kāṇe) (Hindi काना (kānā))
उणा (uṇā, lacking, less, adj masc) lemmatised to उणे (uṇe, minus, adverb) (Hindi ऊना (ūnā))
ठेंगणा (ṭheṅgṇā, short, low, adj masc) lemmatised to ठेंगणे (ṭheṅgṇe)
दिवाणा (divāṇā, adj masc) lemmatised to दिवाणे (divāṇe) (Hindi दीवाना (dīvānā))
-पणे (-paṇe, forms adverbs, suffix)
प्रमाणे (pramāṇe, according to, like, as, postposition)
-वाणे (-vāṇe, forms adjectives, suffix)
शहाणा (śahāṇā, adj masc) lemmatised to शहाणे (śahāṇe) (> Hindi शाणा (śāṇā))
¯\_(ツ)_/¯ is incorrectly syllabified when it is in word-medial codas (e.g. काळजी (kāḷjī): /ka.ɭ̆d͡ʑi/):
उकळणे (ukaḷṇe), ओवाळणे (ovāḷṇe), कळणे (kaḷṇe), काळजी (kāḷjī), कोळसून (koḷsūn), जळणे (jaḷṇe), तळणे (taḷṇe), पिळणे (piḷṇe), मिळणे (miḷṇe), वळणे (vaḷṇe), वाळवंट (vāḷvaṇṭa), रावळपिंडी (rāvaḷpiṇḍī)
The verbs could be fixed by using -णे (for कळणे (kaḷṇe): /kə.ɭ̆ɳe/, ओवाळणे (ovāḷṇe): /o.ʋa.ɭ̆ɳe/, etc.)
वाळवंट (vāḷvaṇṭa) and रावळपिंडी (rāvaḷpiṇḍī) are possibly compounds involving वाळू (vāḷū, sand) and ਪਿੰਡ (piṇḍ) so it might be okay to use respelling with -.
¯\_(ツ)_/¯ {{word-final anusvara form of}} {{R:mr:Dhongde-Wali 2009|9}}
e-stem neuter nouns (e.g. सोने (sone)), declined adjectives (e.g. थोडे (thoḍe)) and all verbs are all usually pronounced with a schwa at the end in their lemma forms despite word-final character being . This is only used instead of the schwa when being pedantic. The colloquial schwa is indicated in writing with {{word-final anusvara form of}}). It would be helpful if this word-final schwa could be indicated along with the pedantic . Only the verbs have no exceptions. Deciding whether the other parts of speech are applicable may require manual judgement. The format at मध्ये (madhye) could possibly be used.
Done Done The length of high vowels
The conclusion that the paper indicates that high vowel length can be indicated in the narrow transcription.
Average vowel durations for /i/ versus /iː/ are 145ms for short vowels and 238ms for long vowels with an average long to short vowel ratio of 1:1.69. Results for these two minimal pairs indicate that high short vowels are indeed shorter than long vowels.
The following might be considered lower priority:
Done Done Orthographic CʰCʰ → Phonological CCʰ in अख्खे (akhkhe), चिठ्ठी (ciṭhṭhī), पिझ्झा (pijhjhā), पुठ्ठा (puṭhṭhā) and मठ्ठ (maṭhṭha) {{R:mr:Dhongde-Wali 2009|35}}
¯\_(ツ)_/¯ Word-initial ज्ञ in ज्ञात (jñāt), ज्ञान (jñān), etc. are currently showing as /d̪n-/ in the broad transcription, which would need to be changed manually if entries for those words are created. As the word Dnyaneshwar shows, the transliteration and the broad transcription word- would be dny- /dnj-/. In a narrow transcription, word-initial ज्ञ could be represented as [nj-]. Kutchkutch (talk) 13:29, 26 September 2020 (UTC)Reply
@AryamanA: The h-deletion and aspiration rules work in most cases, but here are some cases to consider:
¯\_(ツ)_/¯ Diphthongs (with the first vowel being /ə/) would be better in the following:
[əi]:
पहिले (pahile) /pə.ɦi.le/, [pi.le]
सही (sahī) /sə.ɦi/, [siː]
[əu]:
गहू (gahū) /ɡə.ɦu/, [ɡuː]
बहुतेक (bahutek) /bə.ɦu.t̪ek/, [bʱu.t̪ek]
Here are other words in which the h-deletion and aspiration rules apply:
V₁ = /ə/
[əC]: दहशत (dahśat), सहमत (sahmat)
[əə]: महत्त्व (mahattva), शहर (śahar)
[əa]: चहा (cahā), तहान (tahān), दहा (dahā), पहाट (pahāṭ), महान (mahān), महाराष्ट्र (mahārāṣṭra), महाविद्यालय (mahāvidyālay), लहान (lahān), सहा (sahā)
[əi]: दही (dahī), बहीण (bahīṇ), बहिरा (bahirā), वही (vahī), वहिणी (vahiṇī)
V₁ = /a/
[aC]: उदाहरण (udāhraṇ), राजहंस (rājhausa), राहणे (rāhṇe)
[ai]: काही (kāhī), जाहिरात (jāhirāt), नाही (nāhī), माहीत (māhīt)
[au]: पाहुणा (pāhuṇā), बाहुली (bāhulī)
[ae]: बाहेर (bāher), साहेब (sāheb)
V₁ = /u/
[uə]: कुतूहल (kutūhal)
V₁ = /e/
[eC]: मेहनत (mehnat)
[eə]: पेहलवान (pehalvān)
[eu]: मेहुणी (mehuṇī)
V₁ = /o/
[oi]: मोहीम (mohīm)
¯\_(ツ)_/¯ लिहीन (lihīn, will write) is shown as li•in/lin on page {{R:mr:Dhongde-Wali 2009|26}}:
लिहिणे (lihiṇe) /li.ɦi.ɳe/, [lʱii.ɳe]
मेहेरबानी (meherbānī) /me.ɦeɾ.ba.ni/, [mʱeeɾ.ba.niː]
¯\_(ツ)_/¯ Of the remaining narrow transcription rules, the diphthongisation rules on page {{R:mr:Dhongde-Wali 2009|24}} might be helpful since many lemmas qualify. Although it says ʻespecially in fast speechʼ, it might actually vary from ʻwhen the pronunciation is not carefulʼ to ʻall the timeʼ. English borrowings probably apply as well.
/ai/[əi]
आई (āī): /a.i/[əiː]
नाईलाज़ (nāīlāj̈): /na.i.lad͡z/[nəi.lad͡z]
पंचाईत (pañcāīt): /pən.t͡ɕa.it̪/[pən.t͡ɕəiːt̪]
हलवाई (halvāī): /ɦəl.ʋai/[ɦəl.ʋəiː]
/ei/[əi]
The only lemmas that have /ei/ appear to be Perso-Arabic words with बे- (be-) + word-initial such as बेइज्जत (beijjat) and बेइमान (beimān).
/əʋ/[əu]
अवकाश (avkāś): /əʋ.kaɕ/[əu.kaɕ]
अवघड (avghaḍ): /əʋ.ɡʱəɖ/[əu.ɡʱəɖ]
आठवडा (āṭhavḍā): /a.ʈʰəʋ.ɖa/[a.ʈʰəu.ɖa]
लवकर (lavkar): /ləʋ.kəɾ/[ləu.kəɾ]
All verbs with -व- (≈ Hindi -आ- (-ā-) in उठाना (uṭhānā))
उठवणे (uṭhavṇe) from उठणे (uṭhṇe): /u.ʈʰəʋ.ɳe/[u.ʈʰəu.ɳe]
पाठवणे (pāṭhavṇe): /pa.ʈʰəʋ.ɳe/[pa.ʈʰəu.ɳe]
रविवार (ravivār) has an idiosyncratic pronunciation [ɾəi.ʋaɾ]
/au/[əu]
आगाऊ (āgāū): /a.ɡau/[a.ɡəuː]
पाऊल (pāūl): /pa.ul/[pəuːl]
भाऊ (bhāū): /bʱa.u/[bʱəuː]
ठाऊक (ṭhāūk): /ʈʰa.uk/[ʈʰəuːk]
In terms of Bleeding order and Feeding order, the diphthongisation rules are probably after h-deletion since अवहेलना (avhelnā): /ə.ʋʱel.na/ has /ə.ʋʱ/ so it wouldn't qualify for /əʋ/[əu].
¯\_(ツ)_/¯ Although there are some guidelines for stress on page {{R:mr:Dhongde-Wali 2009|19}} and in the paper, not even MOD:hi-IPA has stress, so those could be considered low priority. Kutchkutch (talk) 14:08, 27 September 2020 (UTC)Reply
¯\_(ツ)_/¯ @AryamanA: {{mr-word-final-schwa}} was created to see how showing both the and the schwa pronunciations might look like. If there's a better way, then feel free to make the necessary changes (including deleting the template).
Fixed This is how {{R:mr:Dhongde-Wali 2009|19}} transcribe words with /t͡sʰ/:
उत्सव (utsav): [utchəw]
वत्स (vatsa): [wətchə]
मत्सर (matsar): [mətchər]
/t͡sʰ/ is often divided across syllable boundaries as /t͡s.ɦ/
उत्सव (utsav) /ut͡s.ɦəʋ/
उत्साह (utsāh) /ut͡s.ɦaɦ/, [ut͡s.ɦa]
उत्सुक (utsuk) /ut͡s.ɦuk/, [ut͡s.ɦuːk]
कुत्सित (kutsit) /kut͡s.ɦit̪/, [kut͡s.ɦiːt̪]
चिकित्सा (cikitsā) /t͡ɕi.kit͡s.ɦa/
निरुत्साह (nirutsāh) /ni.ɾut͡s.ɦaɦ/, [ni.ɾut͡s.ɦa]
निर्भर्त्सना (nirbhartsanā) /niɾ.bʱət͡s.ɦna/
प्रोत्साहन (protsāhan) /pɾot͡s.ɦa.ɦən/, [pɾot͡s.ɦan]
मत्सर (matsar) /mət̪səɾ/, [mət͡sʰəɾ]
महोत्सव (mahotsav) /mə.ɦot͡s.ɦəʋ/, [mʱot͡s.ɦəʋ]
मुत्सद्दी (mutsaddī) /mut͡s.ɦəd̪.d̪i/, [mut͡s.ɦəd̪.d̪iː]
वत्स (vatsa) /ʋət͡sʰ/
वत्सल (vatsal) /ʋət͡s.ɦəl/
वात्सल्य (vātsalya) /ʋat͡s.ɦəl.jə/ Kutchkutch (talk) 12:56, 28 September 2020 (UTC)Reply

────────────────────────────────────────────────────────────────────────────────────────────────────

h-Deletion and Murmur Rules

C V₁ V₂ / C₂ Environment Rule Example
/m/, /n/, /l/, /ʋ/, /ɾ/
/d͡ʑ/, /d̪/, /b/
/ə/ /i/ Cə_i CV₁ɦV₂ → CʱV₁V₂ दही (dahī): /d̪ə.ɦi/[d̪ʱəiː]
/m/, /n/, /l/, /ʋ/, /ɾ/
/d͡ʑ/, /d̪/, /b/
/ə/ /u/ Cə_u CV₁ɦV₂ → CʱV₁V₂ बहुतेक (bahutek): /bə.ɦu.t̪ek/[bʱəu.t̪ek]
/m/, /n/, /l/, /ʋ/, /ɾ/
/d͡ʑ/, /d̪/, /b/
/a/ /i/ Ca_i CV₁ɦV₂ → CʱV₁V₂ जाहिरात (jāhirāt): /d͡ʑa.ɦi.ɾat̪/[d͡ʑʱai.ɾat̪]
/m/, /n/, /l/, /ʋ/, /ɾ/
/d͡ʑ/, /d̪/, /b/
/a/ /u/ Ca_u CV₁ɦV₂ → CʱV₁V₂ बाहुली (bāhulī): /ba.ɦu.li/[bʱau.liː]
/m/, /n/, /l/, /ʋ/, /ɾ/
/d͡ʑ/, /d̪/, /b/
/ə/ /i/, /u/, /o/ Cə_ CV₁ɦV₂ → CʱV₂ दहा (dahā): /d̪ə.ɦa/[d̪ʱa]
/m/, /n/, /l/, /ʋ/, /ɾ/
/d͡ʑ/, /d̪/, /b/
/a/ /i/, /u/, /o/ Ca_ CV₁ɦV₂ → CʱV₂ उदाहरण (udāhraṇ): /u.d̪aɦ.ɾəɳ/[u.d̪ʱa.ɾəɳ]
Other /ə/ /i/ Cə_i ɦ → ∅ / CV₁_V₂ सही (sahī): /sə.ɦi/[səiː]
Other /ə/ /u/ Cə_u ɦ → ∅ / CV₁_V₂ गहू (gahū): /ɡə.ɦu/[ɡəuː]
Other /a/ /i/ Ca_i ɦ → ∅ / CV₁_V₂ काही (kāhī): /ka.ɦi/[kaiː]
Other /a/ /u/ Ca_u ɦ → ∅ / CV₁_V₂ पाहुणा (pāhuṇā): /pa.ɦu.ɳa/[pau.ɳa]
Other /ə/ /i/, /u/, /o/ Cə_ CV₁ɦV₂ → CV₂ तहान (tahān): /t̪ə.ɦan/[t̪an]
Other /a/ /i/, /u/, /o/ Ca_ ɦ → ∅ / CV₁_V₂ साहेब (sāheb): /sa.ɦeb/[sa.eb]
Other /ə/, /a/ Any CV_ ɦ → ∅ / CV_ सहमत (sahmat): /səɦ.mət̪/[sə.mət̪]