Module talk:kk-translit
Add topic@Vtgnoq7238rmqco, Metaknowledge, Rua We may want to update our romanisation to match the the latest standard (2018), which is an improvement from 2017. I see that Metaknowledge has reverted the change in diff. Why? I think it may be appropriate. The digraphs shouldn't cause many issues.--Anatoli T. (обсудить/вклад) 00:43, 26 June 2019 (UTC)
- To be honest, I think the new transliteration is awful, but that's my personal opinion so I'm going to otherwise abstain. —Rua (mew) 08:09, 26 June 2019 (UTC)
- @Rua: Thanks, it’s ugly but not as ugly as 2017 version. Anatoli T. (обсудить/вклад) 20:51, 26 June 2019 (UTC)
It seems unwise to make haste to update Kazakh romanisation because the standard may change fairly frequently. We never know if new changes will have taken place before 2025. Vtgnoq7238rmqco (talk) 16:28, 26 June 2019 (UTC)
- @Vtgnoq7238rmqco: You’re not assuming that I’m going to replace, are you? It’s about the automated transliteration. Anatoli T. (обсудить/вклад) 20:51, 26 June 2019 (UTC)
- @Atitarev: I do not recommand the change because some letters merge together after romanisation (һ/х and capital Й/И) while the transliteration of a few letters seems not to be clarified (ъ, ь, э, ц, etc.) in the new standard. Vtgnoq7238rmqco (talk) 23:26, 26 June 2019 (UTC)
- @Vtgnoq7238rmqco: OK, no problem, please keep up the good work. BTW, you have misused the meaning of alphabet, a very common mistake in various parts of Asia where "letter" and "alphabet" may be the same word. They are called letters, symbols or characters in English. The alphabet is the whole set of letters and the writing system of a language, not individual letters. --Anatoli T. (обсудить/вклад) 23:31, 26 June 2019 (UTC)
- @Vtgnoq7238rmqco: I noticed the mistakes and they were rectified. Actually "letter" and "alphabet" are different words in Chinese.Vtgnoq7238rmqco (talk) 23:34, 26 June 2019 (UTC)
- @Vtgnoq7238rmqco: You're right, well almost, words 文字 (wénzì) and 字母 (zìmǔ) can mean or be used as both. It is the same word in Thai, Lao, Burmese, etc. and somehow spread in other areas. --Anatoli T. (обсудить/вклад) 23:43, 26 June 2019 (UTC)
- @Atitarev: The word "alphabet" means "a set of letters", which is translated as 字母表 (zìmǔbiǎo) in Chinese. Vtgnoq7238rmqco (talk) 23:55, 26 June 2019 (UTC)
- @Vtgnoq7238rmqco: Yes, the distinction is there as it should be. I have heard much less the confusion between letter and alphabet from Chinese people but it's more common, even from language teachers in India, Thailand, etc. --Anatoli T. (обсудить/вклад) 23:59, 26 June 2019 (UTC)
- @Atitarev: The word "alphabet" means "a set of letters", which is translated as 字母表 (zìmǔbiǎo) in Chinese. Vtgnoq7238rmqco (talk) 23:55, 26 June 2019 (UTC)
- @Vtgnoq7238rmqco: You're right, well almost, words 文字 (wénzì) and 字母 (zìmǔ) can mean or be used as both. It is the same word in Thai, Lao, Burmese, etc. and somehow spread in other areas. --Anatoli T. (обсудить/вклад) 23:43, 26 June 2019 (UTC)
- @Vtgnoq7238rmqco: I noticed the mistakes and they were rectified. Actually "letter" and "alphabet" are different words in Chinese.Vtgnoq7238rmqco (talk) 23:34, 26 June 2019 (UTC)
- @Vtgnoq7238rmqco: OK, no problem, please keep up the good work. BTW, you have misused the meaning of alphabet, a very common mistake in various parts of Asia where "letter" and "alphabet" may be the same word. They are called letters, symbols or characters in English. The alphabet is the whole set of letters and the writing system of a language, not individual letters. --Anatoli T. (обсудить/вклад) 23:31, 26 June 2019 (UTC)
- @Atitarev: I do not recommand the change because some letters merge together after romanisation (һ/х and capital Й/И) while the transliteration of a few letters seems not to be clarified (ъ, ь, э, ц, etc.) in the new standard. Vtgnoq7238rmqco (talk) 23:26, 26 June 2019 (UTC)
New schema
[edit]@Vtgnoq7238rmqco, Benwing2: Guys. I've gone ahead and applied the known changes per current transliteration schema. Are there any issues? Let's monitor this. Anatoli T. (обсудить/вклад) 08:19, 12 July 2022 (UTC)
- @Vtgnoq7238rmqco, Benwing2: I need your help, please to fix the digraph "нг" to become" "ñ" as in тренинг (treniñ). --Anatoli T. (обсудить/вклад) 08:22, 12 July 2022 (UTC)
- @Atitarev: I forget to mention that the letter ‘я’ has two transliterations. One is into the digraph ‘ia’ based on the vowel harmony (e.g саясат/saiasat), the other is into the letter ‘ä’ (mostly in Russian loanwords). Vtgnoq7238rmqco (talk) 11:54, 12 July 2022 (UTC)
- @Vtgnoq7238rmqco, Benwing2: This will make it harder or impossible to fully automate the transliteration. What's the default or most common transliteration of "я"? Is it "ia" or "ä"? Can it be determined by a position, e.g. "ä" only in endings like "-ия" (+ inflected forms?)? Anatoli T. (обсудить/вклад) 23:26, 12 July 2022 (UTC)
- @Vtgnoq7238rmqco, Atitarev: Failing that, is there a limited set of words with one or the other? If so, we could hardcode those words, as we do for handling g vs. v in -ого/-его endings in Russian. Benwing2 (talk) 02:21, 13 July 2022 (UTC)
- @Atitarev Do you mean for тренинг to end up transliterated as treniñ? That looks very strange to me, but if this is correct I will implement it. Benwing2 (talk) 03:18, 13 July 2022 (UTC)
- @Benwing2: Yes, please take a look at https://sozdik.kz/ru/dictionary/translate/ru/kk-Latn/тренинг/. However, when a vowel is added, they change to "ng"! @Vtgnoq7238rmqco: Do you understand the rule here? Or is it a mistake? E.g. маркетинг-> marketiñ > bank marketingı (Cyrillic is банк маркетингі???). --Anatoli T. (обсудить/вклад) 04:00, 13 July 2022 (UTC)
- @Benwing2, Vtgnoq7238rmqco: I think we may need Module:kk-translit/testcases to start looking at multiple issues and questions. Just checked that "кенгуру" is just "kenguru", confirming that нг=ñ is only true in the final or pre-consonantal positions. --Anatoli T. (обсудить/вклад) 04:17, 13 July 2022 (UTC)
- @Benwing2: Yes, please take a look at https://sozdik.kz/ru/dictionary/translate/ru/kk-Latn/тренинг/. However, when a vowel is added, they change to "ng"! @Vtgnoq7238rmqco: Do you understand the rule here? Or is it a mistake? E.g. маркетинг-> marketiñ > bank marketingı (Cyrillic is банк маркетингі???). --Anatoli T. (обсудить/вклад) 04:00, 13 July 2022 (UTC)
- @Benwing2: It may be difficult for User:Vtgnoq7238rmqco to answer this. If you are able to make a list of all current Kazakh words with "я" (probably other contentious letters), we can start assessing.
- ядро/ädro ("nucleus", from Russian ядро́ (jadró))
- яғни/iağni ("that is", from Arabic يَعْنِي (yaʕnī) Anatoli T. (обсудить/вклад) 03:21, 13 July 2022 (UTC)
- @Atitarev: I found that ‘-ия’ should be transliterated into ‘-ia’ when I checked the word автономия/avtonomia. Vtgnoq7238rmqco (talk) 16:05, 23 July 2022 (UTC)
- @Vtgnoq7238rmqco, Benwing2: I've just added a new failed test case. Please note that -ия is retained in inflected forms (e.g. автономиясы/avtonomiasy).
- Unfortunately, we are not getting closer to any solution. We need to define when to use "ä" or "ia" for "я".
- The module won't be smart to know when to transliterate "ä" or "ia". We must describe the rules or think about exception lists, if it's possible. User:Benwing2 is advanced in Lua. He might be able to make it work if the rules are clarified. Please ping him in responses as well.
- The situation with "нг" seems more straightforward, though.
- Please also advise if you prefer to fully or partially revert to the previous transliteration schema. Anatoli T. (обсудить/вклад) 02:42, 24 July 2022 (UTC)
- @Atitarev, Vtgnoq7238rmqco Thank you. I made a list here of pages with я: User:Benwing2/kazakh-pages-ya Out of 8,309 lemmas, 604 have я in them. On the same page farther down is a list that excludes pages ending in -ия. That reduces the list to 308. If I additionally exclude -иялы, -иялық and -ияшы, which appear to be common suffixes, the list is reduced to 205. Of these, the majority appear to be Russian loanwords, but a lot of them still appear to be Kazakh native words. This suggests to me that it will be very difficult to handle this correctly using the new transliteration (which is really a transcription, trying to match the pronunciation), and that we're probably better off using the old scheme. Benwing2 (talk) 03:01, 24 July 2022 (UTC)
- @Atitarev: I found that ‘-ия’ should be transliterated into ‘-ia’ when I checked the word автономия/avtonomia. Vtgnoq7238rmqco (talk) 16:05, 23 July 2022 (UTC)
- @Atitarev Do you mean for тренинг to end up transliterated as treniñ? That looks very strange to me, but if this is correct I will implement it. Benwing2 (talk) 03:18, 13 July 2022 (UTC)
- @Vtgnoq7238rmqco, Atitarev: Failing that, is there a limited set of words with one or the other? If so, we could hardcode those words, as we do for handling g vs. v in -ого/-его endings in Russian. Benwing2 (talk) 02:21, 13 July 2022 (UTC)
- @Vtgnoq7238rmqco, Benwing2: This will make it harder or impossible to fully automate the transliteration. What's the default or most common transliteration of "я"? Is it "ia" or "ä"? Can it be determined by a position, e.g. "ä" only in endings like "-ия" (+ inflected forms?)? Anatoli T. (обсудить/вклад) 23:26, 12 July 2022 (UTC)
- @Atitarev: I forget to mention that the letter ‘я’ has two transliterations. One is into the digraph ‘ia’ based on the vowel harmony (e.g саясат/saiasat), the other is into the letter ‘ä’ (mostly in Russian loanwords). Vtgnoq7238rmqco (talk) 11:54, 12 July 2022 (UTC)
@Sameerhameedy, @Vtgnoq7238rmqco, @Benwing2:
Hi. @Sameerhameedy, I saw your post, which you have later deleted. Posting it here. The Kazakh transliteration (Cyrillic to Roman) is currently flawed and doesn't cater, for example for multiple ways to render e.g. letter "я" or "нг"
Pls see Module_talk:kk-translit#New_schema above. Please don't rely on automatic transliterations in all cases.
It works OK in ілгерішіл (ılgerışıl), which you used in your original post. Anatoli T. (обсудить/вклад) 02:17, 25 September 2023 (UTC)
- @Atitarev so, if this module isn't reliable should template:kk-alt have a separate module to generate the Kazakh latin spellings then? سَمِیر | Sameer (مشارکتها • کتی من گپ بزن) 02:29, 25 September 2023 (UTC)
- @Sameerhameedy:
- Kazakh Cyrillic to Latin transliteration requires work, handling exceptions, special cases, using test cases, etc.
- The older transliteration schema was straightforward (1 to 1) but the new, based on the government's new initiative, is less so.
- (I don't object if it's totally reverted to the old schema, at least it was reliable but it won't match the new schema and the dictionary https://sozdik.kz/ru/ (if you choose русский-qazaqşa or qazaqşa-русский). Anatoli T. (обсудить/вклад) 02:29, 25 September 2023 (UTC)
- @Atitarev My instinct is to use the old schema. It seems the government's plans change frequently (cf. the various names of the capital) so who knows how long this current scheme will last. Cf. this comment:
- On October, 26 2017, the president of Kazakhstan, Nursultan Nazarbayev, has passed a decree to make effective the switchover of Khazakh from the Cyrillic script to the Latin alphabet by 2025. However, the proposed system is not unambiguous: some Latin characters can be represented by several Cyrillic characters (for instance, и and й are both transliterated by i’), and other Cyrillic characters have no equivalent in Latin alphabet, like ц or щ.
- Benwing2 (talk) 02:50, 25 September 2023 (UTC)
- @Benwing2: Thank you. I actually see that failing cases are somewhat easy to fix, at least logically:
- нг - "ng" vs "ñ". "ñ" only in final or pre-consonantal position. Otherwise "ng". маркетинг "marketiñ" but банк маркетингі "bank marketingı"
- я - "ä" vs "a". "ia"after vowels, including word-initial. otherwise "ä". "ия" always "ia". ядро "ädro", аялдама "aialdama", астрономиялық "astronomialyq".
- яғни as "iağni" is an exception.
- Russian ц (s), щ (ş), й (i), ь, ъ and ё (e) work as expected "s", "ş", "i" NIL, NIL and "e". They have merged with equivalent letters in Latin spellings or not used. EDIT: щ (ş) should be "ş", not "şş"
- I've also checked User:Benwing2/kazakh-pages-ya, thank you.
- @Sameerhameedy, @Vtgnoq7238rmqco, @Theknightwho: FYI. Anatoli T. (обсудить/вклад) 03:45, 25 September 2023 (UTC)
- Module:kk-translit/testcases is quite representative. яғни/iağni is an exception I know of. Anatoli T. (обсудить/вклад) 03:50, 25 September 2023 (UTC)
- ё had better be transliterated into "io" instead of "e". The letter spells like [ʲɵ] or [ʲo] based on vowel harmony.
- Its diaeresis is usually omitted, but there are exceptions when transliterating Korean proper names e.g. 명 (myeong) --> Мёң.
- Vtgnoq7238rmqco (talk) 04:01, 25 September 2023 (UTC)
- @Vtgnoq7238rmqco: Probably OK for "ё" as "io", since all loanwords are spelled with "ё" to match the pronunciation.
- Anatoli T. (обсудить/вклад) 04:09, 25 September 2023 (UTC)
- Actually, щ (ş) should be "ş", not "şş". щетка should be "şetka". --Anatoli T. (обсудить/вклад) 04:13, 25 September 2023 (UTC)
- @Theknightwho: Hi. Do you know why щетка (şetka) shows "şşetka" (with two "ş") but it shows one "ş" in Module:kk-translit/testcases?
- @Vtgnoq7238rmqco: Hi. Do you know many cases when "ё" should be "io", not "e"? My understanding that all Russian borrowings use "e" and it matches pronunciation, it may only be important for rendering loanwords like Мёң, you mentioned.
- @Benwing2, @Sameerhameedy, @Theknightwho: Guys, you have the skills to address the module Module:kk-translit/testcases and any others that may arise. It's even simpler than handling of Cyrillic е, ё and ч in Russian (based on positions or exceptions). We can always revert to the original schema but it may be useless now. Anatoli T. (обсудить/вклад) 23:47, 26 September 2023 (UTC)
- @Atitarev It’s using the scheme in the languages module - this module’s unused I think. Theknightwho (talk) 00:22, 27 September 2023 (UTC)
- @Theknightwho: Thanks but sorry I don't understand. Which module should be modified for Kazakh translit? Anatoli T. (обсудить/вклад) 00:42, 27 September 2023 (UTC)
- @Atitarev I think the transliteration issue with the word маркетинг (marketiñ) in the test cases could be fixed by adding:
- text = mw.ustring.gsub(text, "нг([^" .. vowels .. "]), "ñ%1") and
text = mw.ustring.gsub(text, "нг$", "ñ") (though obviously after adding local vowels group)
However I can't add it myself or even preview the text cases to see if that would work, because the page is protected.
The faulty transliterations of я can also probably be fixed in a similarly way. ё seems complicated (why is it usually the same as е?) and i'm not sure what the issue is with ч. As for the issue with Module:kk-translit/testcases that you're asking about, you should ask theknightwho and Ben because I don't see any issues. سَمِیر | Sameer (مشارکتها • کتی من گپ بزن) 00:28, 27 September 2023 (UTC)- @Sameerhameedy: Thanks, I don't know why it was protected that way. I reduced the protection but see what @Theknightwho says, it may not be even the right module.
- Cyrillic "ё" in Kazakh was mostly used in borrowing from Russian. In Russian, adult native speakers usually don't use this letter, unless need to disambiguate, so "е" is written instead but pronounced as "ё". Kazakh follows this but to the extent that it also affects the pronunciation, at least as much as I can tell. The new romanisation always uses "e" for loanwords from Russian and it's a phonetic spelling in Kazakh.
- I used "ч" as an example for handling group of words and exceptions in Russian, not Kazakh. You may try using it as a reference, hopefully, if it helps. In Russian, ч (č) is read as ш (š), г (g) is read as в (v) in certain combinations and group of words. Anatoli T. (обсудить/вклад) 00:51, 27 September 2023 (UTC)
- @Atitarev User:Theknightwho is correct in that this module is unused currently. The transliteration of Kazakh is currently specified inline in the languages module, starting here: Module:languages/data/2#L-1083. I think we should not adopt the new transliteration because of its problems and the fact that it's not yet put into practice and may well change (it seems to have been created by the former Kazakh dictator Nursultan Nazarbayev, who has since fallen out of favor). In fact, see w:Kazakh alphabets#Latest developments, which documents lots of recent changes and the fact that the newest revision won't be put into practice until 2030. Benwing2 (talk) 01:33, 27 September 2023 (UTC)
- @Atitarev It’s using the scheme in the languages module - this module’s unused I think. Theknightwho (talk) 00:22, 27 September 2023 (UTC)
- Actually, щ (ş) should be "ş", not "şş". щетка should be "şetka". --Anatoli T. (обсудить/вклад) 04:13, 25 September 2023 (UTC)
- @Benwing2: Thank you. I actually see that failing cases are somewhat easy to fix, at least logically:
- @Atitarev My instinct is to use the old schema. It seems the government's plans change frequently (cf. the various names of the capital) so who knows how long this current scheme will last. Cf. this comment: