Module talk:ar-IPA
Add topicروسيا, etc
[edit]Hi,
I've just added a test case روسيا but it's not a valid/.real test case. I will add real ones later. A lot of changes will be required. --Anatoli T. (обсудить/вклад) 06:14, 26 May 2017 (UTC)
Vowels ɑ and ɑː vs æ, æː vs a and aː
[edit]@Wyang, Benwing2, Erutuon, Wikitiki89, Mahmudmasri, Kolmiel, Stephen G. Brown, Backinstadiums
Hi,
Can we add a new rule to use [æ] and [æː] instead of [a] and [aː] in most cases and [ɑ] and [ɑː] around all emphatic consonants, [r] and [q]?
We will need more work with [r], perhaps it's only [ɑ] and [ɑː] when [r] is AFTER the vowel, not before.
For example, أَنْتَ (ʔanta) should actually be [ʔæn.tæ], [ʔan.ta] (plain consonants) and طَبَّ (ṭabba) should be [tˁɑbːæ], not [tˤab.ba] (emphatic first consonant). Not sure about the dot (.) but I used [ː] for geminated consonants.
By "around" I mean when one of the sides have any of these consonants. Emphatic consonants are all consonants with ˤ, e.g. [tˤ].
Please confirm the rule before I add any test cases. If you're not working on the module but you got pinged, just stay tuned. You can add to the discussion or test cases. --Anatoli T. (обсудить/вклад) 07:40, 26 May 2017 (UTC)
- I guess we would like this module to output both phonemic (current) and phonetic IPAs? The latter would be very useful (if it can be generated automatically). Wyang (talk) 07:44, 26 May 2017 (UTC)
- Good point, thanks. I think the phonemic will be possible if we agree on what variety of MSA we represent as priority No. 1, otherwise we won't be getting anywhere. Phonetic respellings and additional tricks with some symbols can always be considered, as you guys have been using in other languages. --Anatoli T. (обсудить/вклад) 07:46, 26 May 2017 (UTC)
- Emphatic spreading is a complex issue which surpasses even lexical junctures. In this respect, the thriad of vowels of Classic Arabic is a bless which I would keep, not adding phonetic variation. --Backinstadiums (talk) 07:56, 26 May 2017 (UTC)
- (edit conflict) I think only /a aː/ are appropriate in a phonemic transcription (between slashes). I intended the module to produce a phonemic transcription; phonetic transcription would be too variable and complex. [æ æː ɑ ɑː] are phonetic (written between square brackets) and determined by dialect-specific rules on emphasis spreading. (For an example, see Egyptian Arabic phonology § Emphasis spreading on Wikipedia.) And some dialects (Varieties of Arabic says Hejazi) just have generic [a aː] in their pronunciation, no front [æ æː] or back [ɑ ɑː]. So, to show front and back a-vowels, we'd have to pick a dialect, or add several dialectal transcriptions each with its own variety of emphasis-spreading. (That raises the question of how much a dialect's version of emphasis spreading influences MSA.) Anyway, I could be wrong on some or all of this; most of it came from Wikipedia, not from actual papers or books on the subject. But I intended the module to just produce the phonemic transcription, a very basic IPA representation of the transliteration system, and hence it uses only /a aː/. — Eru·tuon 08:13, 26 May 2017 (UTC)
- Agree. The "front" vowel is definitely a central [ä] in eastern Penninsula dialects. The "back" version might still be [ɑ], but this distinction is not phonematic. Such a pronunciation is also sometimes heard in careful speech in the Levant (where the dialects have a very strong imala, which goes so far as [eː]). There is also some variation concerning the effects of /ħ/, /ʕ/, /r/, I think. And generally it's a dialectal phenomenon that is not part of the original standard Arabic vowel system, and hence isn't reflected in any transcription system. So the split should be reflected in the regional narrow IPA transcriptions, but not in the broad one. Kolmiel (talk) 11:35, 26 May 2017 (UTC)
- Yes, this is a complex issue. In Egyptian Arabic, emphasis usually spreads to both ends of the word, but the effect of /r/ is tricky; some instances of /r/ trigger emphasis and some don't, and it varies even within a set of words derived from the same root. (The original distribution was that /r/ followed by a back vowel, or preceded by a back vowel when no vowel followed, triggered emphasis, but this is now only approximate.) Furthermore there are instances of "autonomous [ɑ ɑː]" when there is no possible emphatic consonant in a word, esp. in borrowings, such as [kɑʃ] "cash". The distribution of emphatic consonants also doesn't agree well with written MSA. Meanwhile, in Moroccan Arabic, emphasis tends to spread only as far as the first full vowel (i.e. [a i u], which are diachronically derived from long vowels), but on the other hand all vowels are affected by emphasis, not just [ǝ a] (the reflexes of /a aː/). In Moroccan Arabic, original /r/ has split into two phonemes /r ṛ/, where /ṛ/ triggers emphasis and /r/ doesn't. The original distribution was the same as the original distribution in Egyptian (depending on whether there was a back vowel nearby), but it has since become entirely lexically determined and tends to be the same across all words derived from a given root, except in a few cases. Note that for these and other reasons I believe it's a terrible idea to represent "dialectal" Arabic words with Arabic script rather than a Latin transcription, as is always used in scientific and linguistic works. Now, as for the regional pronunciation of MSA in Egypt and Morocco, I don't really know but I suspect it's strongly influenced by the "dialectal" pronunciation. Benwing2 (talk) 14:11, 26 May 2017 (UTC)
- Scientific and linguistic works usually treat everything in transliteration. I think having dialectal words in Arabic script with transliterations is perfectly fine. --WikiTiki89 15:35, 26 May 2017 (UTC)
- Yes, this is a complex issue. In Egyptian Arabic, emphasis usually spreads to both ends of the word, but the effect of /r/ is tricky; some instances of /r/ trigger emphasis and some don't, and it varies even within a set of words derived from the same root. (The original distribution was that /r/ followed by a back vowel, or preceded by a back vowel when no vowel followed, triggered emphasis, but this is now only approximate.) Furthermore there are instances of "autonomous [ɑ ɑː]" when there is no possible emphatic consonant in a word, esp. in borrowings, such as [kɑʃ] "cash". The distribution of emphatic consonants also doesn't agree well with written MSA. Meanwhile, in Moroccan Arabic, emphasis tends to spread only as far as the first full vowel (i.e. [a i u], which are diachronically derived from long vowels), but on the other hand all vowels are affected by emphasis, not just [ǝ a] (the reflexes of /a aː/). In Moroccan Arabic, original /r/ has split into two phonemes /r ṛ/, where /ṛ/ triggers emphasis and /r/ doesn't. The original distribution was the same as the original distribution in Egyptian (depending on whether there was a back vowel nearby), but it has since become entirely lexically determined and tends to be the same across all words derived from a given root, except in a few cases. Note that for these and other reasons I believe it's a terrible idea to represent "dialectal" Arabic words with Arabic script rather than a Latin transcription, as is always used in scientific and linguistic works. Now, as for the regional pronunciation of MSA in Egypt and Morocco, I don't really know but I suspect it's strongly influenced by the "dialectal" pronunciation. Benwing2 (talk) 14:11, 26 May 2017 (UTC)
- Agree. The "front" vowel is definitely a central [ä] in eastern Penninsula dialects. The "back" version might still be [ɑ], but this distinction is not phonematic. Such a pronunciation is also sometimes heard in careful speech in the Levant (where the dialects have a very strong imala, which goes so far as [eː]). There is also some variation concerning the effects of /ħ/, /ʕ/, /r/, I think. And generally it's a dialectal phenomenon that is not part of the original standard Arabic vowel system, and hence isn't reflected in any transcription system. So the split should be reflected in the regional narrow IPA transcriptions, but not in the broad one. Kolmiel (talk) 11:35, 26 May 2017 (UTC)
Feature request
[edit]@Erutuon, Wikitiki89, Wyang or anyone interested.
- Could a handling for the final ة without diacritics be added with a flag? E.g. silent for the first occurrence and second occurrence in الْمَمْلَكَة الْعَرَبِيَّة السُّعُودِيَّة (al-mamlaka l-ʿarabiyya s-suʿūdiyya)
{{ar-IPA|الْمَمْلَكَة الْعَرَبِيَّة السُّعُودِيَّة||tam1=silent|tam2=silent}}
(or similar). - Can we have marginal sounds added, e.g. /ɡ/ as in إِنْجْلِيزِيّ (ʾinglīziyy) or إِنْكْلِيزِيّ (ʾinglīziyy)? --Anatoli T. (обсудить/вклад) 06:02, 6 August 2017 (UTC)
- The marginal sounds can easily be added, though of course they'd have to be input using the transliteration parameter. The ة feature should also be possible, because the transliteration module outputs
(t)
and that could be iterated over. (I need to make a function that's like a combination ofgmatch()
andipairs()
, that returns both the index of the match and the match.) — Eru·tuon 20:04, 6 August 2017 (UTC)- @Erutuon Thanks. Let me know if you need anything. /ɡ/ is just an example there are a few common consonants and vowels (including o, ō, e, ē) missing in classical Arabic, which need to be added. I'm mostly following H. Wehr for transliterations. --Anatoli T. (обсудить/вклад) 12:41, 7 August 2017 (UTC)
- @Atitarev: Well, could you add some examples in the testcases? – though for the non-Classical phonemes, I have to write a function that works from the transliteration. — Eru·tuon 18:51, 7 August 2017 (UTC)
- @Erutuon Thanks. Let me know if you need anything. /ɡ/ is just an example there are a few common consonants and vowels (including o, ō, e, ē) missing in classical Arabic, which need to be added. I'm mostly following H. Wehr for transliterations. --Anatoli T. (обсудить/вклад) 12:41, 7 August 2017 (UTC)
- I suggest having it be silent without diacritics and as /t/ when a sukuun is added. That way, we can just add a sukuun, rather than extra parameters, and that should be much easier to implement as well. --WikiTiki89 15:50, 7 August 2017 (UTC)
- @Wikitiki89: That would be easier. The transliteration is correct: عَرَبِيَّةْ (ʕarabiyyat). In fact, all we have to do is delete the
(t)
that is generated for unmarked TM. — Eru·tuon 17:56, 7 August 2017 (UTC)- Exactly. And also delete any following elidable vowel. --WikiTiki89 17:59, 7 August 2017 (UTC)
- @Erutuon Hi. Pls fix the new test case with "ʾinglīziyy". There's a module error but I'll leave for a little while. --Anatoli T. (обсудить/вклад) 02:57, 8 August 2017 (UTC)
- @Erutuon Actually, what are the parameters to make it work with the transliteration? I have moved "ʾinglīziyy" to test1_transliteration. --Anatoli T. (обсудить/вклад) 03:02, 8 August 2017 (UTC)
- @Erutuon Hi. Pls fix the new test case with "ʾinglīziyy". There's a module error but I'll leave for a little while. --Anatoli T. (обсудить/вклад) 02:57, 8 August 2017 (UTC)
- Exactly. And also delete any following elidable vowel. --WikiTiki89 17:59, 7 August 2017 (UTC)
- @Wikitiki89: That would be easier. The transliteration is correct: عَرَبِيَّةْ (ʕarabiyyat). In fact, all we have to do is delete the
Tanween before wasla
[edit]@Atitarev: Can you explain why you removed this test case? --WikiTiki89 15:04, 28 August 2017 (UTC)
- @Wikitiki89: Sorry, it seemed that this case would invariably fail. Can you explain what rule, from the module's perspective, should be used here? Pls restore if you feel it's doable. --Anatoli T. (обсудить/вклад) 21:53, 28 August 2017 (UTC)
- Well I don't know the full rule, but one might exist. I think it's worth having some long-term goals on the test page even if we can't see them passing in the near future. And even if it's not possible for the module to handle this case completely automatically, it would serve as a reminder to come up with a convenient way to mark it manually without providing a full transliteration. Or perhaps it's not so important. It's just that you removed it without much comment. --WikiTiki89 17:15, 29 August 2017 (UTC)
Velarized L and final H
[edit]Velarized L is problemetic, in words like abdalla, since the first geminate isn't velarized.
Final H is normally not pronounced in words like allah. --Mahmudmasri (talk) 00:55, 27 March 2018 (UTC)
- @Mahmudmasri: What do you think about putting final /h/ in brackets? Guldrelokk (talk) 16:19, 16 April 2018 (UTC)
- Omitting /h/ entirely is better. --Mahmudmasri (talk) 17:02, 16 April 2018 (UTC)
- @Mahmudmasri: The problem is that in Classical and higher literary Arabic /aɫɫaː(h)/ is only a pre-pausal form of /aɫɫaːhu/, /aɫɫaːhi/ or /aɫɫaːha/, so it is phonemically present, and the module conveys phonemics right now. And are you entirely sure it is never pronounced? Guldrelokk (talk) 18:32, 16 April 2018 (UTC)
- Omitting /h/ entirely is better. --Mahmudmasri (talk) 17:02, 16 April 2018 (UTC)
- Never in pausal. It is transliterated because this is how it is spelled. It's similar to final ה in Hebrew, where it is not pronounced in such cases. --Mahmudmasri (talk) 18:40, 16 April 2018 (UTC)
- @Mahmudmasri: It is mostly present on forvo, for example. I understand that this is an emphatic and unnatural pronunciation, but a complete lack of /h/ in the phonemic transcription would make one guess where does it come from when words are cited out, as well as throughout the inflection paradigm. Guldrelokk (talk) 18:49, 16 April 2018 (UTC)
- Never in pausal. It is transliterated because this is how it is spelled. It's similar to final ה in Hebrew, where it is not pronounced in such cases. --Mahmudmasri (talk) 18:40, 16 April 2018 (UTC)
I also found out that final /h/ is kept even in some regular local speech. Qafisheh (2000) states so for Sanaʿan. Guldrelokk (talk) 23:18, 16 April 2018 (UTC)
- @Mahmudmasri: Would you state this as a general rule (final h isn't pronounced), or are there words in which it is pronounced, in the varieties that you are thinking of? — Eru·tuon 05:35, 28 April 2018 (UTC)
- Normally the final /h/ is silent but there is some inconsistency. In https: youtube/diij7UMNbDE?t=19 (Youtube links are not allowed to post, just paste "diij7UMNbDE" in the search and it's around 0:22, I posted the full Arabic text and the translation in the comments) the reader pronounces "كسره" as kasarah "(he) broke it". In the same video, the final h is not pronounced. --Anatoli T. (обсудить/вклад) 05:51, 28 April 2018 (UTC)
- A more insight on Atitarev's comment: In Literary Arabic:
- Final ة that is not pronounced as /h/, but in construct form as /t/.
- Final ه is sometimes pronounced, if it is a possessive /fiːh, kasarah(u)/ (spoken dialects: /fiː, kasaru/) or in a final stressed syllable /ħaˈmaːh(u)/ (spoken: /ħaˈmaː/). The video he posted has 2 of these examples.
- There is a practice (becoming less popular) for spelling final ة when not pronounced as /t/ as ه, but that doesn't change the pronunciation rules.
- What a horrid story in the story! --Mahmudmasri (talk) 08:57, 28 April 2018 (UTC)
- @Mahmudmasri: Thanks and sorry for the late reply. /h/ is also definitely pronounced in the interjections like آه (ʔāh, “ah”). --Anatoli T. (обсудить/вклад) 04:40, 13 May 2018 (UTC)
- Optionally, like in Egyptian Arabic (meaning yeah/ow), it could either be [ʔɑː] (typical) or [ʔɑːh]. --Mahmudmasri (talk) 14:18, 15 May 2018 (UTC)
- @Mahmudmasri: Thanks and sorry for the late reply. /h/ is also definitely pronounced in the interjections like آه (ʔāh, “ah”). --Anatoli T. (обсудить/вклад) 04:40, 13 May 2018 (UTC)
- A more insight on Atitarev's comment: In Literary Arabic: