User talk:Conrad.Irwin/Transliterator.php
- Many more details can be found at http://mediawiki.org/wiki/Extension:Transliterator.
This is a PHP extension designed to allow automatic transliteration where this is possible. I am told that it is possible for many languages, certainly for Armenian, Korean, and Greek, Serbian/Serbo-Croation should also be possible. For languages where it is not automatically possible, tough cookies.
The approach is very simplistic, it is supplied with a list of rules (those for Armenian are attached) and it transliterates by matching rules from longest to shortest against the string.
In order to use the extension, you simply call "{{transliterate:amn|խաչակրաց արշավանք}}" and it will give you "xačakrac’ aršavank’". The extension has been designed to be used in generic templates, so it is possible to use it without requiring any form of {{#ifexists:}} check (and internally it checks which rules exist much more efficiently).
For example, a template like term might use the form [[{{{word}}}]]{{transliteration:{{{lang}}}|{{{word}}}| ($1)|{{{tr|}}}}}. This would give the following:
- {{term|fr|Ouzbékistan}} => Ouzbékistan
- {{term|amn|Ուզբեկստան}} => Ուզբեկստան (Uzbekstan)
The syntax for defining rules is fairly simple, and they can be specified in terms of either letters or NFD code-points. Most languages should use the letters form, so that "á" and "a" are different unrelated characters, however for some languages, like Korean, it is useful to be able to analyze the decomposed form.
Rules for transliteration of Armenian (Hübschmann-Meillet) (hy)
# lowercase ա => a բ => b գ => g դ => d ե => e զ => z է => ē ը => ə թ => tʿ ժ => ž ի => i լ => l խ => x ծ => c կ => k հ => h ձ => j ղ => ł ճ => č մ => m յ => y ն => n շ => š ո => o չ => čʿ պ => p ջ => ǰ ռ => ṙ ս => s վ => v տ => t ր => r ց => cʿ ւ => w ու=> u փ => pʿ ք => kʿ և => ew օ => ō ֆ => f
# uppercase Ա => A Բ => B Գ => G Դ => D Ե => E Զ => Z Է => Ē Ը => Ə Թ => Tʿ Ժ => Ž Ի => I Լ => L Խ => X Ծ => C Կ => K Հ => H Ձ => J Ղ => Ł Ճ => Č Մ => M Յ => Y Ն => N Շ => Š Ո => O Չ => Čʿ Պ => P Ջ => J̌ Ռ => Ṙ Ս => S Վ => V Տ => T Ր => R Ց => Cʿ Ւ => W Ու => U ՈՒ => U Փ => Pʿ Ք => Kʿ Օ => Ō Ֆ => F
Rules for transliteration of Belarusian (Scientific transliteration) (be)
# lowercase а => a б => b в => v г => h ґ => g д => d е => e ё => ë ж => ž з => z і => i й => j к => k л => l м => m н => n о => o п => p р => r с => s т => t у => u ў => ŭ ф => f х => x ц => c ч => č ш => š ы => y ь => ’ э => è ю => ju я => ja ѣ => ě # uppercase А => A Б => B В => V Г => H Ґ => G Д => D Е => E Ё => Ë Ж => Ž З => Z І => I Й => J К => K Л => L М => M Н => N О => O П => P Р => R С => S Т => T У => U Ў => Ŭ Ф => F Х => X Ц => C Ч => Č Ш => Š Ы => Y Ь => ’ Э => È Ю => Ju Я => Ja Ѣ => Ě
Rules for transliteration of Bulgarian (Scientific transliteration) (bg)
# lowercase а => a б => b в => v г => g д => d е => e ж => ž з => z и => i й => j к => k л => l м => m н => n о => o п => p р => r с => s т => t у => u ф => f х => h ц => c ч => č ш => š щ => št ъ => ǎ ь => j ю => ju я => ja ѫ => ǫ ѣ => ě ѧ => ę
# uppercase А => A Б => B В => V Г => G Д => D Е => E Ж => Ž З => Z И => I Й => J К => K Л => L М => M Н => N О => O П => P Р => R С => S Т => T У => U Ф => F Х => H Ц => C Ч => Č Ш => Š Щ => Št Ъ => Ǎ Ь => J Ю => Ju Я => Ja Ѫ => Ǫ Ѣ => Ě Ѧ => Ę
Rules for transliteration of Georgian (ISO 9984) (ka)
# lowercase ა => a ბ => b გ => g დ => d ე => e ვ => v ზ => z თ => t’ ი => i კ => k ლ => l მ => m ნ => n ო => o პ => p ჟ => ž რ => r ს => s ტ => t უ => u ფ => p’ ქ => k’ ღ => ḡ ყ => q შ => š ჩ => č’ ც => c’ ძ => j წ => c ჭ => č ხ => x ჯ => ǰ ჰ => h
Rules for transliteration of Gothic (got)
𐌰 => a 𐌱 => b 𐌲 => g 𐌳 => d 𐌴 => e 𐌵 => q 𐌶 => z 𐌷 => h 𐌸 => þ 𐌹̈ => ï 𐌹 => i 𐌺 => k 𐌻 => l 𐌼 => m 𐌽 => n 𐌾 => j 𐌿 => u 𐍀 => p 𐍁 => 90 𐍂 => r 𐍃 => s 𐍄 => t 𐍅 => w 𐍆 => f 𐍇 => x 𐍈 => ƕ 𐍉 => o 𐍊 => 900
Rules for transliteration of Greek (el)
α => a ά => á αι => ai άι => ai αϊ => ai αυ => av αυθ => afth αυκ => afk αυξ => afx αυπ => afp αυσ => afs αυς => afs αυτ => aft αυφ => aff αυχ => afch αυψ => afps αυ$ => af αύ => áv αύθ => áfth αύκ => áfk αύξ => áfx αύπ => áfp αύσ => áfs αύς => áfs αύτ => áft αύφ => áff αύχ => áfch αύψ => áfps αύ$ => áf άυ => áy αϋ => aÿ β => v γ => g γγ => ng γξ => nx γκ => gk γχ => nch δ => d ε => e έ => é ει => ei έι => ei εϊ => ei ευ => ev ευθ => efth ευκ => efk ευξ => efx ευπ => efp ευσ => efs ευς => efs ευτ => eft ευφ => eff ευχ => efch ευψ => efps ευ$ => ef εύ => év εύθ => éfth εύκ => éfk εύξ => éfx εύπ => éfp εύσ => éfs εύς => éfs εύτ => éft εύφ => éff εύχ => éfch εύψ => éfps εύ$ => éf έυ => éy εϋ => eÿ ζ => z η => i ή => í ηυ => iv ηυθ => ifth ηυκ => ifk ηυξ => ifx ηυπ => ifp ηυσ => ifs ηυς => ifs ηυτ => ift ηυφ => iff ηυχ => ifch ηυψ => ifps ηυ$ => if ηύ => ív ηύθ => ífth ηύκ => ífk ηύξ => ífx ηύπ => ífp ηύσ => ífs ηύς => ífs ηύτ => íft ηύφ => íff ηύχ => ífch ηύ$ => íf ήυ => íy ηϋ => iÿ θ => th ι => i ί => í ϊ => ï ΐ => í κ => k λ => l μ => m ^μπ => b μπ => mp ν => n ντ => nt ξ => x ο => o ό => ó οι => oi όι => oi οϊ => oi ου => ou όυ => óy οϋ => oÿ π => p ρ => r σ => s ς => s τ => t υ => y ύ => ý ϋ => ÿ ΰ => ý υι => yi φ => f χ => ch ψ => ps ω => o ώ => ó
Rules for transliteration of Kazakh (QazAqparat) (kk)
# lowercase а => a ә => ä б => b в => v г => g ғ => ğ д => d е => e ё => yo ж => j з => z и => ï й => y к => k қ => q л => l м => m н => n ң => ñ о => o ө => ö п => p р => r с => s т => t у => w ұ => u ү => ü ф => f х => x һ => h ц => c ч => ç ш => ş щ => şş ъ => ” ы => ı і => i ь => ’ э => é ю => yu я => ya
# uppercase А => A Ә => Ä Б => B В => V Г => G Ғ => Ğ Д => D Е => E Ё => Yo Ж => J З => Z И => Ï Й => Y К => K Қ => Q Л => L М => M Н => N Ң => Ñ О => O Ө => Ö П => P Р => R С => S Т => T У => W Ұ => U Ү => Ü Ф => F Х => X Һ => H Ц => C Ч => Ç Ш => Ş Щ => Şş Ъ => ” Ы => I І => I Ь => ’ Э => É Ю => Yu Я => Ya
Beginnings of rules for Korean (revised romanization published in 2000) (ko)
# Single letters take from http://cpansearch.perl.org/src/KAWASAKI/Lingua-KO-Romanize-Hangul-0.20/lib/Lingua/KO/Romanize/Hangul.pm # It needs some special cases for certain adjacent characters but I cannot decipher the documentation, and the perl code above # seems to replace characters only in circumstances that they can't appear.
<decompose>
# initial ᄀ => g ᄁ => kk ᄂ => n ᄃ => d ᄄ => tt ᄅ => r ᄆ => m ᄇ => b ᄈ => pp ᄉ => s ᄊ => ss ᄋ => ᄌ => j ᄍ => jj ᄎ => ch ᄏ => k ᄐ => t ᄑ => p ᄒ => h # Vowel ᅡ => a ᅢ => ae ᅣ => ya ᅤ => yae ᅥ => eo ᅦ => e ᅧ => yeo ᅨ => ye ᅩ => o ᅪ => wa ᅫ => wae ᅬ => oe ᅭ => yo ᅮ => u ᅯ => wo ᅰ => we ᅱ => wi ᅲ => yu ᅳ => eu ᅴ => ui ᅵ => i # Final # This first character seems to indicate "no tail" rather than exist as a character. ᆧ => ᆨ => g ᆩ => kk ᆪ => ks ᆫ => n ᆬ => nj ᆭ => nh ᆮ => d ᆯ => r ᆰ => rg ᆱ => rm ᆲ => rb ᆳ => rs ᆴ => rt ᆵ => rp ᆶ => rh ᆷ => m ᆸ => b ᆹ => bs ᆺ => s ᆻ => ss ᆼ => ng ᆽ => j ᆾ => c ᆿ => k ᇀ => t ᇁ => p ᇂ => h
Rules for transliteration of Macedonian (ISO/R 9:1968) (mk)
# lowercase а => a б => b в => v г => g д => d ѓ => ǵ е => e ж => ž з => z ѕ => dz и => i ј => j к => k л => l љ => lj м => m н => n њ => nj о => o п => p р => r с => s т => t ќ => ḱ у => u ф => f х => h ц => c ч => č џ => dž ш => š
# uppercase А => A Б => B В => V Г => G Д => D Ѓ => Ǵ Е => E Ж => Ž З => Z Ѕ => Dz И => I Ј => J К => K Л => L Љ => Lj М => M Н => N Њ => Nj О => O П => P Р => R С => S Т => T Ќ => Ḱ У => U Ф => F Х => H Ц => C Ч => Č Џ => Dž Ш => Š
Rules for transliteration of Old Armenian (Hübschmann-Meillet) (xcl)
# lowercase ա => a բ => b գ => g դ => d ե => e զ => z է => ē ը => ə թ => tʿ ժ => ž ի => i լ => l խ => x ծ => c կ => k հ => h ձ => j ղ => ł ճ => č մ => m յ => y ն => n շ => š ո => o չ => čʿ պ => p ջ => ǰ ռ => ṙ ս => s վ => v տ => t ր => r ց => cʿ ւ => w ու=> u փ => pʿ ք => kʿ և => ew օ => ō ֆ => f
# uppercase Ա => A Բ => B Գ => G Դ => D Ե => E Զ => Z Է => Ē Ը => Ə Թ => Tʿ Ժ => Ž Ի => I Լ => L Խ => X Ծ => C Կ => K Հ => H Ձ => J Ղ => Ł Ճ => Č Մ => M Յ => Y Ն => N Շ => Š Ո => O Չ => Čʿ Պ => P Ջ => J̌ Ռ => Ṙ Ս => S Վ => V Տ => T Ր => R Ց => Cʿ Ւ => W Ու => U ՈՒ => U Փ => Pʿ Ք => Kʿ Օ => Ō Ֆ => F
Rules for transliteration of Old Church Slavonic, Cyrillic and Glagolitic (cu)
# Cyrillic а => a А => A б => b Б => B в => v В => V г => g Г => G д => d Д => D є => e Є => E ж => ž Ж => Ž ѕ => dz Ѕ => Dz ꙃ => dz Ꙃ => Dz з => z З => Z ꙁ => z Ꙁ => Z и => i И => I і => i І => I ї => i ћ => ǵ Ћ => Ǵ к => k К => K л => l Л => L м => m М => M н => n Н => N о => o О => O п => p П => P р => r Р => R с => s С => S т => t Т => T оу => u Оу => U ѹ => u Ѹ => U ф => f Ф => F х => x Х => X ѡ => ō Ѡ => Ō ц => c Ц => C ч => č Ч => Č ш => š Ш => Š щ => št Щ => Št ъ => ŭ Ъ => Ŭ ꙑ => y Ꙑ => Y ъи => y ЪИ => Y ъі => y ЪІ => Y ь => ĭ Ь => Ĭ ѣ => ě Ѣ => Ě ю => ju Ю => Ju я => ja Я => Ja ꙗ => ja Ꙗ => Ja ѥ => je Ѥ => Je ѧ => ę Ѧ => Ę ѩ => ję Ѩ => Ję ѫ => ǫ Ѫ => Ǫ ѭ => jǫ Ѭ => Jǫ ѯ => ks Ѯ => Ks ѱ => ps Ѱ => Ps ѳ => θ Ѳ => Θ ѵ => ü Ѵ => Ü Ѽ => O! ѿ => ot Ѿ => Ot # Glagolitic Ⰰ => a ⰰ => a Ⰱ => b ⰱ => b Ⰲ => v ⰲ => v Ⰳ => g ⰳ => g Ⰴ => d ⰴ => d Ⰵ => e ⰵ => e Ⰶ => ž ⰶ => ž Ⰷ => dz ⰷ => dz Ⰸ => z ⰸ => z Ⰹ => i ⰹ => i Ⰺ => i ⰺ => i Ⰻ => i ⰻ => i Ⰼ => ǵ ⰼ => ǵ Ⰽ => k ⰽ => k Ⰾ => l ⰾ => l Ⰿ => m ⰿ => m Ⱀ => n ⱀ => n Ⱁ => o ⱁ => o Ⱂ => p ⱂ => p Ⱃ => r ⱃ => r Ⱄ => s ⱄ => s Ⱅ => t ⱅ => t Ⱆ => u ⱆ => u Ⱇ => f ⱇ => f Ⱈ => x ⱈ => x Ⱉ => ot ⱉ => ot Ⱊ => p ⱊ => p Ⱋ => št ⱋ => št Ⱌ => c ⱌ => c Ⱍ => č ⱍ => č Ⱎ => š ⱎ => š Ⱏ => ŭ ⱏ => ŭ Ⱐ => ĭ ⱐ => ĭ Ⱑ => ě ⱑ => ě Ⱓ => ju ⱓ => ju Ⱔ => ę ⱔ => ę Ⱕ => ę ⱕ => ę Ⱖ => jo ⱖ => jo Ⱗ => ję ⱗ => ję Ⱘ => ǫ ⱘ => ǫ Ⱙ => jǫ ⱙ => jǫ Ⱚ => θ ⱚ => θ Ⱛ => ü ⱛ => ü Ⱝ => a ⱝ => a Ⱞ => m ⱞ => m
Rules for transliteration of Phoenician (phn)
𐤀 => ʾ 𐤁 => b 𐤂 => g 𐤃 => d 𐤄 => h 𐤅 => w 𐤆 => z 𐤇 => ḥ 𐤈 => ṭ 𐤉 => y 𐤊 => k 𐤋 => l 𐤌 => m 𐤍 => n 𐤎 => s 𐤏 => ʿ 𐤐 => p 𐤑 => ṣ 𐤒 => q 𐤓 => r 𐤔 => š 𐤕 => t
Rules for transliteration of Russian (Scientific transliteration) (ru)
# lowercase а => a б => b в => v г => g д => d е => e ё => ë ж => ž з => z и => i й => j к => k л => l м => m н => n о => o п => p р => r с => s т => t у => u ф => f х => x ц => c ч => č ш => š щ => šč ъ => ” ы => y ь => ’ э => è ю => ju я => ja і => i ѣ => ě ѳ => f ѵ => i
# uppercase А => A Б => B В => V Г => G Д => D Е => E Ё => Ë Ж => Ž З => Z И => I Й => J К => K Л => L М => M Н => N О => O П => P Р => R С => S Т => T У => U Ф => F Х => X Ц => C Ч => Č Ш => Š Щ => Šč Ъ => ” Ы => Y Ь => ’ Э => È Ю => Ju Я => Ja І => I Ѣ => Ě Ѳ => F Ѵ => I
Rules for transliteration of Tajik (developed specifically for Wiktionary) (tg)
# lowercase а => a б => b в => v г => g ғ => ġ д => d е => e ё => yo ж => ž з => z и => i ӣ => ī й => y к => k қ => q л => l м => m н => n о => o п => p р => r с => s т => t у => u ӯ => ū ф => f х => x ҳ => h ч => č ҷ => j ш => š ъ => ʾ э => è ю => ju я => ja # uppercase А => A Б => B В => V Г => G Ғ => Ġ Д => D Е => E Ё => Yo Ж => Ž З => Z И => I ӣ => Ī Й => Y К => K Қ => Q Л => L М => M Н => N О => O П => P Р => R С => S Т => T У => U ӯ => Ū Ф => F Х => X Ҳ => H Ч => Č Ҷ => J Ш => Š Ъ => ʾ Э => È Ю => Ju Я => Ja
Rules for transliteration of Ugaritic (uga)
𐎀 => ả 𐎁 => b 𐎂 => g 𐎃 => ḫ 𐎄 => d 𐎅 => h 𐎆 => w 𐎇 => z 𐎈 => ḥ 𐎉 => ṭ 𐎊 => y 𐎋 => k 𐎌 => š 𐎍 => l 𐎎 => m 𐎏 => ḏ 𐎐 => n 𐎑 => ẓ 𐎒 => s 𐎓 => ʿ 𐎔 => p 𐎕 => ṣ 𐎖 => q 𐎗 => r 𐎘 => ṯ 𐎙 => ġ 𐎚 => t 𐎛 => ỉ 𐎜 => ủ 𐎝 => ś
Rules for transliteration of Ukrainian (Scientific transliteration) (uk)
# lowercase а => a б => b в => v г => h ґ => g д => d е => e є => je ж => ž з => z и => y і => i й => j ї => ji к => k л => l м => m н => n о => o п => p р => r с => s т => t у => u ф => f х => x ц => c ч => č ш => š щ => šč ь => ’ ю => ju я => ja ѣ => ě ё => ë э => è ы => y ѳ => f ѵ => i ѧ => ę # uppercase А => A Б => B В => V Г => H Ґ => G Д => D Е => E Є => Je Ж => Ž З => Z И => Y І => I Й => J Ї => Ji К => K Л => L М => M Н => N О => O П => P Р => R С => S Т => T У => U Ф => F Х => X Ц => C Ч => Č Ш => Š Щ => Šč Ь => ’ Ю => Ju Я => Ja Ѣ => Ě Ё => Ë Э => È Ы => Y Ѳ => F Ѵ => i Ѧ => Ę
Start a discussion about User:Conrad.Irwin/Transliterator.php
Talk pages are where people discuss how to make content on Wiktionary the best that it can be. You can use this page to start a discussion with others about how to improve User:Conrad.Irwin/Transliterator.php.