Module talk:Ethi-translit
Add topicThis module is meant to work for Amharic, Tigrinya, Ge'ez, Tigre, Harani, etc. languages.
Test: ʾämarña. Getting "Module error". --Anatoli (обсудить/вклад) 04:30, 21 January 2014 (UTC)
- Longer test: ʾämarña
ʾämarña yäʾityop̣ya mädäbäña ḳʷanḳʷa näw. käsemawi ḳʷanḳʷawoč ʾəndä ʿəbraysəṭ wäym ʿaräbña ʾändu näw. bäʾäfrika wəsṭ dägmo kämʿərab ʾäfrikaw ḥäwsana kämśəraḳ ʾäfrikaw səwahili ḳäṭlo 3ñawn bota yäyazä näw. ʾəndiyawm 62 miliyon yahl tänagariwoč ʾəyalut, ʾämarña käʿaräbña ḳäṭlo təlḳu semawi ḳʷanḳʷa näw. yämiṣafäwm bäʾämarña fidäl näw. ʾämarña käʿaräbñana käʿbəraysəṭ yaläw mäsärätawi ləyunät ʾəndä latin kägra wädä ḳäñ mäṣafu näw.
yäḥämara * gəzat täblo yämitawäḳäw bota bäʾähunu mäkakäläñana däbub wälo yəgäñ ʾəndänäbär bätarik yəṭäḳäsal. käkrəstos lədät bäfit kä200-130ʿa.ʿa. yänäbäräw ʾägatarkäs səlä ḳäy bahr ʾəna ʾäkababiw siṣf, tərogodolayt yalačäw həzboč τής Kαμάρ λέξιςα ( yäkamara Camàra ḳʷanḳʷa) wäyänməKαμάρα λέξιςα ( kamara Camàra ḳʷanḳʷa) yənagäru ʾəndänäbär zägbʷal[3]. käzih tänästäw yätäläyayu tarik ʾäṭñəwäč yäʾägatarkäs kamara ḳʷanḳʷa yäʾähunu ʾämarña wälaǧ ʾəndähonä yasrädalu.
təkkəläñaw ʾämarña ʾändande «yänguś ḳʷanḳʷa» wäym dägmo «ləsanä nəguś» bämäsäyäm tawäḳʷal. ʾämarña ləsanä nəguś yähonäw bä1272 ʿa.mə. käzagʷe śərwä mängəśt bäḫʷala ʾäṣe yəkuno ʾämlak sälomonawiwn śərwä mängəśt mälso siyaḳʷaḳum näbär. ʾämarña ləsanä ṣəḥuf mähon yäǧämäräw bä14ñaw kəflä zämän lay sihon yəhnənm yadärägäw hulunm yägʿəz fädälatn bämäwsädna 6 ʾädadis yälanḳa fidälatn (malätm šä , čä , ñä , žä , ǧä , č̣ä) ʾəna xän bämäč̣ämär näbär. nägär gən bäṣḥuf yəbälṭ mäsfafat yäǧämäräw käʾäṣe tewodros ǧämro sihon läzihm bätäläy ʾästäwaṣʾo yadärägäw ṣäḥäfiyačäw däbtära zänäb näw. ʾämarña bätäläy yätäsfafaw yädagmawi ʾäṣe mənilikn yägzat masfafat zämäča täkätlona ʾəndihum zämänawi təmhərt ʾityop̣ya wəsṭ kätäǧämärä bäḫʷala näbär.
- --Anatoli (обсудить/вклад) 05:39, 21 January 2014 (UTC)
- It does something weird near around the Greek part. And it seems some of the punctuation (like ።) is not transliterated. --WikiTiki89 05:40, 21 January 2014 (UTC)
- Yes, it does. I don't know what it is. I've taken out the hyphens. It's probably OK for the native script. I've added some punctuation marks but not sure about ፠, ፦ and ፨. --Anatoli (обсудить/вклад) 05:56, 21 January 2014 (UTC)
End schwa
[edit]For a few months, according to my editing experience, from about the time of the March 2021 edits of @Erutuon, we have to manually remove stray ə in an annoying number of cases, that is at the end of words, where it should never appear (though theoretically it was present in older Geʿez, as the regular outcome of the nominative /u/ and genitive /i/ of the noun desinences). Maybe you have missed it due to only looking at the test cases. Fay Freak (talk) 20:00, 28 July 2021 (UTC)
- @Fay Freak: Yep. I can only rely on the testcases because I know very little about Ethiopic script. — Eru·tuon 20:05, 28 July 2021 (UTC)
- I've removed the final schwa in the testcase you added. I believe this means that the final schwa will always be removed. — Eru·tuon 20:11, 28 July 2021 (UTC)
Gemination
[edit]Macron
[edit]Some dictionaries use the macron above a sign to indicate gemination, e.g. Littmann, Enno, Höfner, Maria (1962) “all such words”, in Wörterbuch der Tigrē-Sprache. Tigrē—Deutsch—Englisch (Veröffentlichungen der Orientalischen Kommission der Akademie der Wissenschaften und der Literatur; XI)[1], Wiesbaden: Franz Steiner Verlag GmbH, before that already Francesco da Bassano’s 1918 Vocabolario tigray-italiano. Hence we could use that for auto-tr of gemination, stripping them in links. Fay Freak (talk) 20:00, 28 July 2021 (UTC)
I refer to the general reasoning outlining why one has or has not encoded a script-specific combining diacritic character as contrasted with remitting to a general purpose combining diacritic character as found in the Combining Diacritical Marks block. Apart from the obvious criteria of glyph shape and behaviour they note supposed “own history of diacritic development” and “specific function fundamentally unrelated to the generic diacritical mark.” This means that the macron mentioned above will never have a separate encoding because it has the same function as the macron over Latin texts, indicating length, and presumably also transferred from it by Western scholars. This is to say we are correct on the encoding side if we use the macron to indicate gemination in Ethiopic script, as well as with existing usage in Ethiopian Semitic linguistics at least.
Bird
[edit]We read in Lipiński, Edward (2001) Semitic Languages: Outline of a Comparative Grammar (Orientalia Lovaniensia Analecta; 80), 2nd edition, Leuven: Peeters, →ISBN, page 94: “Two additional symbols indicating gemination and non-gemination are often used in traditional grammars written in Amharic. The gemination is marked by a small ṭə, an abbreviation of ṭəbq, “tight”, placed above the letter, while the non-gemination is marked by a la, an abbreviation of yälalla, “that is loose”, placed also above the letter.” I found in the wild an inverted version of ጠ (ṭä) used for this purpose in Enno Littmann, editor (1913), Publications of the Princeton Expedition to Abyssinia. Volume III: Lieder der Tigrē-Stämme: Tigrē Text.[2] (in Tigre), Leiden: E. J. Brill, pages 118–119, mentioned vol. 1 page XV.
Diaresis
[edit]But Unicode has instead the character ◌፟ [U+135F ETHIOPIC COMBINING GEMINATION MARK], and also similar ◌̎ [U+030E COMBINING DOUBLE VERTICAL LINE ABOVE] could be used for this purpose, while some novelist used instead one dot, I read in a documentation about this script that seems just right for @Erutuon. It turns out these two dots were also only encoded after one author, who, granted, probably did not use it for the first time in his book, but presents French practice rather than a native one:
For I find the script specimens used to provide proofs for the encoding of the ◌፟ [U+135F ETHIOPIC COMBINING GEMINATION MARK]: It was included in the original proposal to encode the Ethiopic script at all page 20 with but a scan of Marcel Cohen’s Traité de langue amharique, under a soon briefly moved codepoint and encoded then in a later version than the script itself. Half a decade later the ◌፞ U+135E ETHIOPIC COMBINING VOWEL LENGTH MARK and ◌፝ U+135D ETHIOPIC COMBINING GEMINATION AND VOWEL LENGTH MARK were added, even more spuriously, with a scan from an unpublished Basketo writing from a “trial project”, pictured figure 12.
Easily after his French upbringing the diaresis is used as of 1967 in Leslau’s Amharic textbook.
- This seems like a very good idea! Thadh (talk) 20:15, 22 January 2022 (UTC)
- I repeat my prospect since @Iwsfutcmd is making Gəʿəz declension tables (and @Theknightwho fixing them). You wrote Template:User:Iwsfutcmd/gez-decl/documentation “The Ethiopic script underspecifies the language, not marking gemination.” By the aforementioned ways it does – in the few times I have stumbled upon Ethiopic since 2021 I have noted few more instances. So generic macron and Ethiopic diaresis should be handled, nothing unreasonable here. Fay Freak (talk) 14:50, 12 April 2024 (UTC)
Gemination recognition shall affect the output of ⟨ə⟩
[edit]If the module does something with gemination then it will of course influence the transcription of those letters which stand for either a consonant without vowel or with schwa: In the direct neighbourhood of such a geminated consonant there will not be a consonant but only ə (which generally stands for /ɨ/, by the way)—in Semitic languages at least, which are the bulk of the userbase of this script since one retreats to Latin script when writing Cushitic and Omotic and Nilo-Saharan etc., in particular on Wiktionary.
Input
[edit]I have not wholly forgotten the minor point that I have not looked into how Ethiopic script can be entered by heavy users; I suspect that the presented characters are all not concluded by keyboard layouts or input methods, at least in general. But there will be room in them to add them and they will be added by maintainers without undue embarrassment, provided one affords proof. Fay Freak (talk) 04:07, 29 July 2021 (UTC)
Initial schwa indication
[edit]@Fay Freak, Metaknowledge: I propose we replace the transcription of the initial ⟨ʾä⟩/⟨ʿä⟩ with ⟨ʾa⟩/⟨ʿa⟩. This seems consistent with both Kane and Leslau's transliterations (although Kane omits the alephs and ayins), and pronunciationwise, it's more accurate. However, considering the pronunciation is already deductable from the rule that initial /(ʔ)ə/ is almost non-existant, I can see why it may not be worth the confusion between the actual written characters. What are your thoughts on this? Thadh (talk) 23:40, 2 October 2021 (UTC)
- Whoops, I forgot to mention that I propose this for Amharic, and I don't know the situation with other Ethiopic languages, whether that distinction is present there. So it's probable that this change will not actually concern the module, unless it is a script-wide issue. Thadh (talk) 00:14, 3 October 2021 (UTC)
- If we do this, we would have to split Amharic off into its own module. As I see it, the only real argument against doing this is consistency between Ethiopian Semitic languages (the same reason we use ä/a for Ge'ez instead of a/ā). —Μετάknowledgediscuss/deeds 00:27, 3 October 2021 (UTC)
- In Tigrinya, it is complicated. The vowels transcribed ä and a are very close in the first place, supposedly /ɐ/ and /a/.
- Length is not there in Tigrinya either, unlike in Tigre, where the realization seems to vary depending on which regiolect is described, and also on whether the utterance is in a sentence or isolated, Elias, David L. (2014 May 22) The Tigre Language of Gindaʿ, Eritrea: Short Grammar and Texts (Studies in Semitic Languages and Linguistics; 75), Leiden: Brill, →ISBN, page 22; no, schwa is not the basis but a feature of Amharic; to the confused reader: where I talked of schwas two sections above, I meant the sign used to transcribe /ɨ/, so Thadh’s section title ”Initial schwa indication” is the most ironic he could come up with. In sum the truth is the vowels are somewhere in the lower left corner of the vowel chart, between the idealized positions represented by IPA letters …
- I have found many spelling variants in Tigrinya: People write ኣድሪ (ʾadri) apparently phonetically, while old sources only give አድሪ (ʾädri)—and so on.
- The argument could go on for any signs. Should ሥጋ (śəga) be transcribed like ስጋ (səga)? ራኅ (raḫ) like ራሕ (raḥ)? The variants of ኣዳጉራ (ʾadagura), which have hardly been pronounced differently either? Ere you notice you are confused what gets transcribed alike where, so yes, consistency between the Ethiopian Semitic language is more than a real argument.
- And there is that, not written onto this page yet, that your precious attention should not be occupied with manually providing transcriptions or transliterations either, so the suggestion is to implement the macron so you aren’t tempted either when transcribing manually because of gemination to ponder about mergers. Of course, it is desirable that you give the spelling and are good to go, and conversely, the transcription or transliteration should show what you have spelt (many have been saved from adding nonsense in foreign scripts by having been warned by clear transcriptions or transliterations). Fay Freak (talk) 13:14, 3 October 2021 (UTC)
- I do not think we should merge the letters for /h/ or the aleph and ayin, but using the schwa ä for the phoneme /a/ only in initial position may be confusing to readers. We could also just choose to do this manually, but as Fay Freak points out, we ought to strive towards automatic transliteration... I'm personally leaning towards splitting the Amharic transliteration off of the common Ethiopic one, which also has the upside that we could predict consonant clusters better (like, for instance, kr- is a valid cluster). I don't have the coding skills for that, though. What do you prefer? Thadh (talk) 16:46, 3 October 2021 (UTC)
- Just a matter of execution. If there are different syllabification rules between the Ethiosemitic languages then that can be shown. One still has to work out from the materials how they are in detail for Gəʿəz, Amharic, Təgreñña and Təgre. (It scares me, if I take on an Ethiosemitic grammar now, it should draw me in for months to be really sure.)
- It is an interesting view that it would be confusing to write ä for “the phoneme /a/ in initial position”. This is of course if you start from the phonocentrist (this time not pejorative) viewpoint that there is this phoneme that just happens to be written in that way. I of course think otherwise, that there are certain patterns, morphological reasons why there is one spelling and not the other, coming from a comparativist-Semitist view (like there is the pattern KaLM rather than KāLM), and if the view is more diachronic then there are just deviating realizations where one ideally has ä (the reflex of common Semitic short a). That there would be, instead of this view I have espoused, a phoneme a, is of course also ideal, since the type-level is ideal. Fay Freak (talk) 17:27, 3 October 2021 (UTC)
- Surely the split would be helpful for syllabification. But as FF says, the rules are complex and generally inadequately described in the literature, so it would take a great deal of work — it would be worth it, but somebody with the technical aptitude needs to take it on, and then we can start collating the rules. —Μετάknowledgediscuss/deeds 19:16, 3 October 2021 (UTC)
- I do not think we should merge the letters for /h/ or the aleph and ayin, but using the schwa ä for the phoneme /a/ only in initial position may be confusing to readers. We could also just choose to do this manually, but as Fay Freak points out, we ought to strive towards automatic transliteration... I'm personally leaning towards splitting the Amharic transliteration off of the common Ethiopic one, which also has the upside that we could predict consonant clusters better (like, for instance, kr- is a valid cluster). I don't have the coding skills for that, though. What do you prefer? Thadh (talk) 16:46, 3 October 2021 (UTC)