Jump to content

Transliteration questions

Transliteration questions

Howdy! I have been on a bit of a transliteration binge recently, and I had a few questions I thought you might be able to answer (In part because you worked on this some).

I was looking at Module:uga-translit and wondering:

  1. Do we have a decision about the correct transliteration of πŽ€ (ΚΎa/αΊ£), πŽƒ (αΊ–/αΈ«), πŽ› (ΚΎi/ỉ/i), 𐎜 (ΚΎu/ủ), 𐎝 (sβ‚‚/Ε›)?
  2. Why was this never fully implemented?

In the same vein of deciding on specific tranliterations, I wrote module:Ital-translit based on Appendix:Old Italic script and was wondering:

  1. Do I need a vote to decide on particular encoding/transliteration principles for certain languages? For instance, the South Picene lemma mefiΓ­n (which I want to move to Ital) could be lemmatized:
    1. πŒŒπŒ„βšπŒ‰πŒ‘πŒ (me iΓ­n) with ⁚ and πŒ‘ (which looks like the form used in South Picene)
    2. πŒŒπŒ„:πŒ‰πŒ‘πŒ (me:iΓ­n) with a colon
    3. πŒŒπŒ„βšπŒ‰πŒπŒ (me iΓ­n) with 𐌝 (the Unicode character encoded for Γ­, but that doesn't look like the form in South Picene)
    4. πŒŒπŒ„:πŒ‰πŒπŒ (me:iΓ­n)
  2. What do I need to change to get both ⁚ & : to be transliterated as f?
  3. Ital-translit currently has a standard behavior for all Ital characters and then exceptions by language. This means that if character, which is not in a particular language's sub-alphabet. is added, it will be transliterated regardless using the standard correspondence. Should I disallow this behavior and only permit transliteration of characters within a language's sub-alphabet?

Sorry for all the questions, but I thought you might have useful answers/opinions.

β€”JohnC5β€Ž16:11, 8 July 2015
Edited by author.
Last edit: 16:28, 8 July 2015

I can't help you at all on the first part, sorry.

For the Italic alphabets, the common set was chosen so that it could apply for all languages. If it doesn't apply to all languages equally, then it shouldn't be in the common set. Alternatively, you could transliterate the language-specific features first, and let the common set handle whatever remains after that.

Something you need to be careful with is using gsub with '.' to replace multiple-character combinations. That's not going to work. Sadly, extending it to '..' will not work either in case you were thinking of that. The way I handle these situations is a bit more elaborate but it works much better at least.

  • "rest" contains characters yet to be processed, "parts" is a table containing characters or sequences that were recognised.
  • Look at the "rest" string for the longest match with each one of the character search sequences.
  • Once the longest match is determined, insert that into the list of parts. If no match was found at all, just insert the first character.
  • Remove the processed characters from "rest".
  • Repeat until "rest" is empty.
β€”CodeCatβ€Ž16:20, 8 July 2015

I currently have it transliterating the language-specific features first then the common set second.

Any idea about getting ⁚ & : to both transliterate to f?

And do you think I need to have a vote or something about these correspondences, or should I just enact them de facto?

β€”JohnC5β€Ž16:26, 8 July 2015

Would you not just add them to the table?

β€”CodeCatβ€Ž16:29, 8 July 2015