Module talk:aii-translit
Optimisation
[edit]Hi there @ColumbaBush and @Fenakhay, I'm currently working to try and optimise this transliteration module as there are a few inconsistencies. First of all thank you so much for being the top contributors on this module as it’s really helped me optimise it so far but since you two know the most of how it works I need your help.
Here is a list of what we need:
- Doubling of consonants when it follows firstly pthaha and, zlama_horizontal or the short waw+hbasa, aswell as preceded by another vowel . These are short vowels and are always followed by a double consonant. For example; “ܚܲܒܘܼܫܵܐ” is supposed to be “ḥabbūšā” as the beth is preceded by a pthaha, however it is currently “ḥabūšā”.
- A long hbasa “ܝܼ” should always be “ī” but often showing up as only i. For example the decade numbers should be -- “ܥܸܣܪܝܼܢ” as “ˁisrīn” -- “ܬܠܵܬ݂ܝܼܢ” as “tlāṯīn” -- “ܐܲܪܒܥܝܼܢ” as “arbˁīn” etc.
- The separation of long and short ouōū. “ܘܼ” always should be transliterated as a long vowel “ū” unless specified otherwise. Examples where it should be short include.
-- “ܩܘܼܪܒܵܐ” as “qurbā” -- “ܡܘܼܠܦܵܐ” as “mulpā” -- “ܗܘܼܡܙܸܡܵܐ” as “humzimmā” -- “ܚܘܼܒܵܐ” as “ḥubbā” -- “ܝܘܼܠܦܵܢܵܐ” as “yulpānā” -- “ܚܘܼܡܵܠܵܐ” as “ḥummālā” -- “ܦܘܼܢܵܝܵܐ” as “punnāyā”
- The auto transliterations seems not to work when the bolden text in the example sentence is the last word. For example if the word is “ܚܲܒܘܼܫܵܐ”, and it is the final word of the sentence, the entire transliteration won’t go through if it preceds a punctuation mark especially question marks, exclamation marks or full stops. I understand however his may not be entirely possible. Antonklroberts (talk) 00:37, 30 October 2023 (UTC)
- I have also edited the testcases accordingly Antonklroberts (talk) 00:49, 30 October 2023 (UTC)
- For aii-translit and other aii modules, let's workshop intermediate changes in the sandbox as a first pass before getting them in. That way we have a tidier edit history which makes the module more approachable by others (myself included) wanting to improve it. If you want, DM me and I can explain how to run the lua script for aii-translit and its testcases on a macbook for rapid iteration. This is what I do to validate my changes before creating an edit.
- Another thing we should strive to do is couple changes to aii-translit with updates to test cases. For example none of the special cases relating to "houses" were taking effect. If we had test cases for these, it would make it clear that the transliterator is not doing what we intend. I know it was probably clear to you but I only realized it by coincidence. The extent to which we can succeed in "optimization" is limited by how easy it is to "see" if the module behaving unexpectedly. Btw the reason these substitutions weren't happening is because they were preceded by a substitution which removes the siyameh/combining_diaeresis, but I've just made an edit to resolve that issue.
- If we're trying to be consistent w/ how related, widely-used modules like ar-translit are used, then we should limit special cases only to the most common words for which the transliterator is way off. For example we could remove the replacements under "classical because" (not commonly used) and "houses" (minimal difference between bāttay and bātay) but leave ones like "all", "each", "every" because they are very common and unless there is a special case, produce a consequentially different transliteration. Every special case we add negatively impacts the runtime performance of the transliterator for all sentences we're transliterating with this module.
- I don't know about some of the suggestions regarding doubling consonants or yudh+kwasa. For example it sounds like ashur bet sar-gis to me more than ashur bet sar-geese. That said, I did my best to work towards 100% passing tests given the criteria you mentioned (we're almost there.)
- For 'ū' vs 'u' - is there a predictable rule i can follow to implement that programmatically? According to https://r12a.github.io/scripts/syrc/block.html#char0718 for waw+rvasa, "Before a consonant cluster this vowel is pronounced shorter" but I wanted to confirm.
- Can you give an example of the transliterator not working when the last word of a sentence is bold text?
- (Mentioning @Shuraya so he has visibility as a core contributor)
- ColumbaBush (talk) 21:54, 9 November 2023 (UTC)
- Hi, I can confirm that yes a Waw rvīṣā (waw with rvasa) is pronounced short if there are 2 consonants in a row after it. Examples: burkā, muqrā, dunyē, lumdā.
- There is another instance where a waw with rvasa is pronounced short, and that is with all verbal nouns of the D (pa’’el) stem. When this happens, the consonant after it is also doubled. Examples: buššālā, zubbānā, dubbārā, shunnāyā, etc. Shuraya (talk) 22:12, 9 November 2023 (UTC)
- Hi thank you all replying to my message and thanks for messaging @Shuraya as he is also a huge help on this project.
- - Shuraya is correct when it comes to the waw+rvasa doubling, if its after 2 consonants in a row or the D stem verbal noun.
- - Just one more suggestion is cleaning up the special cases. Obviously they are special cases so it should be short and sweet. A way we can clean this up which I haven’t quite got it to work yet, is to fix up the "houses" or the "classical because" by having one subsitution using the masc_genitive_endings and not a separate subsitution for every time it ends with a different genitive suffix. Antonklroberts (talk) 02:19, 10 November 2023 (UTC)
- ofc, the more we collaborate with each other, the more useful assyrian wiktionary becomes
- yeah i def noticed that but i didn't have time to make the change yesterday bc lua is very limited so implementing the logic is tricky - anyhoo i just made an edit to accommodate it.
- btw do you have an example of the transliterator not working when the last word of a sentence is bold text? ColumbaBush (talk) 03:18, 11 November 2023 (UTC)
- thank you kind sir, ill try to get to this soon...
- it should be straightforward to determine if there's 2 consonants in a row, is there a way i can programmatically determine if something is a verbal noun of the d stem? ColumbaBush (talk) 03:15, 11 November 2023 (UTC)
- The way to program the verbal noun of the d stem is to determine if it follows the morphological pattern 1u22ā3ā with the numbers being the radicals. Antonklroberts (talk) 02:51, 12 November 2023 (UTC)
- Hi there, I added some testcases with examples of when it should be a short u. Do you know any way we can make this possible? Antonklroberts (talk) 03:12, 18 December 2023 (UTC)
- addressed the feedback, there's still some failing test cases (i think some might be able to be removed) ColumbaBush (talk) 19:52, 2 February 2024 (UTC)
Long and short u
[edit]Hi there @ColumbaBush and @Fenakhay hope you are well. We still haven’t fully optimised the difference between the short and long u specially when it comes to the passive participles of the D stems. I have added some more testcases in hopes to make more clearer what examples need to be ironed out. For example:
- the verb “ܡܫܲܠܸܚ”, its passive participles are “ܡܫܘܼܠܚܵܐ” and “ܡܫܘܼܠܲܚܬܵܐ” masculine and feminine respectively. The feminine is coming up as mšūlaḥtā when in fact should be mšullaḥtā. And in the past tense “ܡܫܘܼܠܸܚ ܠܹܗ” coming up as mšūliḥ lēh when in fact should be mšulliḥ lēh.
- the verb “ܡܫܲܪܹܐ”, its passive participles are “ܡܫܘܼܪܝܵܐ” and “ܡܫܘܼܪܝܼܬ݂ܵܐ” masculine and feminine respectively. The feminine is coming up as mšūrīṯā when in fact should be mšurrīṯā. And in the past tense “ܡܫܘܼܪܹܐ ܠܹܗ” coming up as mšūrē lēh when in fact should be mšurrē lēh.
- the verb “ܡܛܲܘܸܠ”, its passive participles are “ܡܛܘܼܘܠܵܐ” and “ܡܛܘܼܘܲܠܬܵܐ” masculine and feminine respectively. The masculine is coming up as mṭūwlā when in fact it should be mṭuwlā, its feminine is coming up as mṭūwaltā when in fact should be mṭuwwaltā. And in the past tense “ܡܛܘܼܘܸܠ ܠܹܗ” coming up as mṭūwil lēh when in fact should be mṭuwwil lēh.
Antonklroberts (talk) 04:33, 13 June 2024 (UTC)
- i was tinkering w/ a test case you added and struggled, i think it's bc the first utf8 char of the assyrian was a diacritic - can you double check the correctness of the ones you just added?
- thanks for the suggestion, honestly this seems like overkill but i can definitely give it a shot if we're able to make some progress towards https://en.wiktionary.org/wiki/Category_talk:Assyrian_Neo-Aramaic_language#Prioritizing_the_creation_of_new_pages ColumbaBush (talk) 21:48, 14 June 2024 (UTC)
(Notifying ColumbaBush, ܐܢܐ, Antonklroberts, Shuraya): Why aren't you using the normal sized ʔ and ʕ? — Fenakhay (حيطي · مساهماتي) 01:59, 10 September 2024 (UTC)
- i think anton set these a while ago from the little half rings to what they currently are
- what it's currently set to is more latin'y compared to ʔ and ʕ which are more ipa'y so my guess is that it was seen as more fitting since the transliterator is going from aii to latin
- but yes you're right - now that they're constants, they can be changed quite easily ColumbaBush (talk) 02:54, 10 September 2024 (UTC)