Module talk:typing-aids

From Wiktionary, the free dictionary
(Redirected from Module talk:typing-aids/data)
Latest comment: 3 years ago by Erutuon in topic This module does replacements twice
Jump to navigation Jump to search

Hittite cuneiform

[edit]

@Erutuon Could I entreat you to add the content of w:Hittite cuneiform? —JohnC5 02:40, 29 January 2017 (UTC)Reply

Well, you should be able to manage it. Just add the replacements to Module:typing-aids/data within a table replacements["hit"] = {}. The special characters used in transliteration could be rendered as the plain letter plus apostrophe. I can write a function in Module:typing-aids that parses the hyphens between each symbol's transliteration. — Eru·tuon 03:22, 29 January 2017 (UTC)Reply
Sorry, I'll get on it now. I was fixing a bug in mod:grc-pronunciation that was breaking things. —JohnC5 03:49, 29 January 2017 (UTC)Reply

@Erutuon: What characters should be used for the nonsyllabic, grave, ś, and ? —JohnC5 05:19, 2 February 2017 (UTC)Reply

@JohnC5: Huh. Are the nonsyllabic diacritic and s with acute used in Hittite? I would use apostrophe for acute, and maybe _^ for nonsyllabic diacritic, as in X-SAMPA. Not sure about grave; you could use \, which would be easier to type than `. As for , you're already using h, aren't you? — Eru·tuon 05:29, 2 February 2017 (UTC)Reply
@Erutuon: Woops, I should clarify that these are for the actual use of these characters. , ś, and the acute and grave sequences are used in transcribing Hittite for which this module would also be helpful. The combining nonsyllabic mark is mostly for for transcribing Pokorny and the LIV's PIE notation. and would also be great. I mostly use these for the |head= parameter in the {{R:ine:IEW}} and {{R:ine:LIV}} templates. —JohnC5 05:43, 2 February 2017 (UTC)Reply
@JohnC5: Ohh! Now I get it. I added a table of Hittite transliteration shortcuts, under the code hit-tr. In the PIE replacements, I added ^ as shortcut for inverted breve above and below. k^ will become k̑ and i^ i̯. I didn't add them to the "all" group because they conflict with the circumflex used for Proto-Germanic. What do you think? — Eru·tuon 08:12, 2 February 2017 (UTC)Reply
@Erutuon: Looks great! —JohnC5 13:38, 2 February 2017 (UTC)Reply

@Erutuon I have never gotten this to work for me. What am I doing wrong? {{l|hit|𒋗𒌒𒉺𒊑|tr=šu-up-pa-ri}} --Victar (talk) 17:33, 1 December 2017 (UTC)Reply

@Victar: It was a problem in the module. Fixed now. — Eru·tuon 21:11, 1 December 2017 (UTC)Reply
Ah, thanks @Erutuon! --Victar (talk) 04:22, 2 December 2017 (UTC)Reply
@Erutuon, could we make it so that the input is copied to a |tr= param and/or allow for |tr=, ex.{{subst:chars|desc|hit|šu-up-pa-ri-ya-zi|tr=šu-up-pa-ri-ya-zi /supriézi/}}? --Victar (talk) 04:45, 2 December 2017 (UTC)Reply
@Victar: Hmm, I'll try. That will require the diacriticless equivalents to be replaced with diacriticked ones. It is already done for Sanskrit, but Benwing2 was the one to do it. — Eru·tuon 19:23, 2 December 2017 (UTC)Reply
Done. It was actually quite simple. — Eru·tuon 19:45, 2 December 2017 (UTC)Reply
Thanks again, @Erutuon. --Victar (talk) 04:37, 3 December 2017 (UTC)Reply

Macrons

[edit]
moved from Template talk:chars

@Erutuon: can we use something else besides the hyphen to translate into the macron? Proto-language entries very often need to contain actual hyphens, including after vowels (e.g. Reconstruction:Proto-Indo-European/ǵʰuto-, although that entry could be moved to Reconstruction:Proto-Indo-European/ǵʰutós). Also, macrons may occasionally be used over consonant letters, e.g. in the scholarly transliteration of Hebrew. Maybe we could use the underscore for the macron, as is already being done for Ancient Greek. —Aɴɢʀ (talk) 14:54, 31 January 2017 (UTC)Reply

@Angr: All right, I'll make these changes. — Eru·tuon 18:32, 31 January 2017 (UTC)Reply
@Angr, Erutuon: Yes, the underscore is what's used in Perseus's and others' beta code for Greek, so I'm glad this change was made. — I.S.M.E.T.A. 07:41, 27 February 2017 (UTC)Reply

Sanskrit, Avestan

[edit]

@Erutuon It would be great if you could support Sanskrit and Avestan entered using the standard transliterations (i.e. with diacritics present in the Latin text as necessary), as well as supporting {{cog}} and params 3 and 4 so I can enter a gloss (these would be params 4 and 5 in {{chars}}). For Sanskrit, it should ignore accents when converting to Devanagari, but if an accent is present, keep the transliteration as a tr= param. E.g. if I enter {{subst:chars|cog|sa|tāpáyati||to heat, to torment}} then it should convert to {{cog|sa|तापयति||to heat, to torment|tr=tāpáyati}}, but if I enter {{subst:chars|cog|sa|tāpayati||to heat, to torment}} (no accent) then it should convert to {{cog|sa|तापयति||to heat, to torment}}, and if I enter {{subst:chars|cog|ae|tāpaiieiti||to heat}} then it should convert to {{cog|ae|𐬙𐬁𐬞𐬀𐬌𐬌𐬈𐬌𐬙𐬌||to heat}}. Thanks! Benwing2 (talk) 11:26, 24 February 2017 (UTC)Reply

In fact, it would be great if we could also type transliterations using the ASCII-only characters that {{chars}} already converts, e.g. {{subst:chars|cog|sa|ta_pa'yati}} would have the same behavior as {{subst:chars|cog|sa|tāpáyati}}. —Aɴɢʀ (talk) 12:02, 24 February 2017 (UTC)Reply
Indeed. If we support ASCII-only entry of Sanskrit we'd need to have ways of inputting various lower-dotted letters e.g. ṇ, ṣ, ṭ, ḍ, ḥ, ṛ, ḷ (perhaps n. s. t. d. etc.), as well as ṅ, ñ, ś (perhaps n*, n~, n'). See International Alphabet of Sanskrit Transliteration. The only tricky issue is with avagraha, which per Wikipedia's IAST page is encoded using an apostrophe. There are a few possibilities here: (1) Use a different character for acute accent (e.g. /, as in Greek); (2) Use a different character for the avagraha (e.g. typewriter apostrophe ’, which is used for avagraha on Wikipedia's avagraha page); (3) Since avagraha mostly only occurs word-initially in Sanskrit, recognize the combination space + apostrophe as avagraha (deleting the space, which seems to appear in translit but not in devanagari) . Benwing2 (talk) 15:28, 24 February 2017 (UTC)Reply
Better yet, for Sanskrit we could use Harvard-Kyoto, which is ASCII-friendly and already established. I like the idea of using the slash instead of the apostrophe for the acute, thus leaving the apostrophe open for avagraha. —Aɴɢʀ (talk) 20:23, 24 February 2017 (UTC)Reply
Yeah, Cologne Digital Sanskrit Dictionaries allows Harvard-Kyoto, and both this site and this encoding are my favorite. I also request that we avoid Velthuis altogether; it's the worst. —JohnC5 20:30, 24 February 2017 (UTC)Reply
@Aɴɢʀ: I've added replacements for generating IAST from Harvard-Kyoto. They are accessed by using {{subst:chars|sa-tr}}: for instance, {{subst:chars|sa-tr|saMskRtaM}} → saṃskṛtaṃ. I'm still thinking about how to handle the Devanagari consonant letters, though. — Eru·tuon 22:07, 24 February 2017 (UTC)Reply
If one or both of you would add a bunch of testcases to Module:typing-aids/testcases, that would help me a great deal, as I'm not very familiar with Devanagari. I've added directions in the test module. The examples should include every Devanagari grapheme, if possible. — Eru·tuon 22:17, 24 February 2017 (UTC)Reply
@Benwing2: I've added {{cog}} and {{noncog}} to the list of linking templates, so they will work, though right now you have to enter the translation in the |t= parameter – {{subst:chars|cog|ae|tāpaiieiti|t=to heat}} rather than {{subst:chars|cog|ae|tāpaiieiti||to heat}} – and letters with diacritics aren't recognized. Module:typing-aids/data does have a table of letters used in transliteration, so I can fix the second problem easily. — Eru·tuon 21:01, 24 February 2017 (UTC)Reply
Okay, now the template recognizes characters with diacritics, but {{subst:chars|cog|ae|tāpaiieiti|t=to heat}} transforms to the incorrect Avestan 𐬙𐬁𐬞𐬀𐬍𐬈𐬌𐬙𐬌 (tāpaīeiti, to heat), because ii is used as a shortcut for ī. To get the correct output, you have to add a hyphen between the two is: {{subst:chars|cog|ae|tāpai-ieiti|t=to heat}}Avestan 𐬙𐬁𐬞𐬀𐬌𐬌𐬈𐬌𐬙𐬌 (tāpaiieiti, to heat). — Eru·tuon 21:31, 24 February 2017 (UTC)Reply

Yay! I added the virAma, and suddenly the Sanskrit testcases passed! Now to make manual transliteration be added if you have added an apostrophe to the shortcut in the template. — Eru·tuon 23:40, 24 February 2017 (UTC)Reply

Now, if you enter {{subst:chars|m|sa|tApa/yeti}}, you will get तापयेति (tāpáyeti) with manual transliteration. — Eru·tuon 00:22, 25 February 2017 (UTC)Reply
Thanks! Can we change the shortcut used for ī to something other than ii? It should always be possible to enter raw translit and get back the expected result. Also, hyphen is problematic as a separator because we might want a real hyphen. I recommend maybe i: for a long ī (similar to IPA) and similarly for other long vowels. Benwing2 (talk) 00:46, 25 February 2017 (UTC)Reply
@Angr: You can assign whatever shortcuts you would like in the "ae-tr" table in Module:typing-aids/data. I am not sure what the shortcuts for the other symbols should be. — Eru·tuon 01:12, 25 February 2017 (UTC)Reply
Note that ii in Avestan is very common and represents /j/. Benwing2 (talk) 00:46, 25 February 2017 (UTC)Reply
@Benwing2: Ahh, I didn't know that. I was puzzled because it would be strange to have a sequence of two identical short vowels. — Eru·tuon 01:12, 25 February 2017 (UTC)Reply
The same also goes for uu representing w. —JohnC5 02:49, 25 February 2017 (UTC)Reply
@Erutuon: Avestan is beyond me. I know nothing about it, and I don't even have a font installed that will display it. —Aɴɢʀ (talk) 13:23, 25 February 2017 (UTC)Reply

Great work! By the way, I don't like hyphens in Sanskrit's manual transliterations. It's used to separate compound words but it's meaningless in Sanskrit. Same with stress marks. --Anatoli T. (обсудить/вклад) 01:17, 25 February 2017 (UTC)Reply

@Atitarev What's wrong with stress marks in translit? These are important for marking the position of the Vedic accent. Benwing2 (talk) 01:50, 25 February 2017 (UTC)Reply
@Benwing2: Because word stresses are not marked in Devanagari and are unpredictable, so they can be shown in the pronunciation sections, unless an invisible symbol is used. In any case, I don't think acute accents or hyphens are part of I.A.S.T.. --Anatoli T. (обсудить/вклад) 04:58, 25 February 2017 (UTC)Reply
I concur with Benwing2. —JohnC5 02:51, 25 February 2017 (UTC)Reply

──────────────────────────────────────────────────────────────────────────────────────────────────── @Erutuon Thanks for all your work on this module! I notice however that it doesn't seem to accept raw IAST currently, e.g. I tried {{subst:chars|cog|sa|tāpáyati|t=to heat, to torment}} and I get Sanskrit तāपáयति (taāpaáyati, to heat, to torment). I think you could fix this fairly easily by rewriting data["sa"] to work with IAST translit and then pass the translit through your Harvard-Kyoto-to-IAST converter beforehand. Benwing2 (talk) 02:57, 25 February 2017 (UTC)Reply

@Benwing2: Oh yes, it makes more sense to use standard translit and have the ASCII version in a separate table. I've done that for Sanskrit, and Avestan as well (though there might be some errors, I don't know). — Eru·tuon 03:32, 25 February 2017 (UTC)Reply
@Erutuon There were some problems preventing {{cog|sa|तापयति|t=to heat, to torment|tr=tāpáyati}} from working. They should be fixed. The translit-preserving functionality still doesn't work right with compound templates (e.g. {{affix}}), though. I had to undo your change to decomposeAcute(), which broke things by unilaterally removing acute accents instead of preserving them in decomposed form. Benwing2 (talk) 08:06, 3 April 2017 (UTC)Reply
@Erutuon Also, can you add test cases for things like {{chars|cog|sa|tāpáyati|t=to heat, to torment}}, to test that the expansion is what's expected (e.g. it preserves the translit)? Benwing2 (talk) 08:09, 3 April 2017 (UTC)Reply
@Benwing2: That certainly needs to be tested, but I'm not sure how to do it using the current testcases module. I haven't made testcases for templates yet. — Eru·tuon 18:51, 3 April 2017 (UTC)Reply

Avestan pantā

[edit]

@Benwing2, Erutuon, Aryamanarora, Octahedron80, ZxxZxxZ: find#Etymology mentions an Avestan word transliterated pantā. When I tried to convert it using {{subst:chars|cog|ae|panta_}}, it tried to convert the "an" sequence to "ą", but did not then convert that "ą" into Avestan characters, so what it wound up with was "𐬞ą𐬙𐬁". Now, I know I could correct that by writing {{subst:chars|cog|ae|pa`ta_}}, but is that even correct? Is pantā a mistaken transliteration for pątā, or are they distinct? —Aɴɢʀ (talk) 14:40, 3 April 2017 (UTC)Reply

Similarly at Appendix:List of Proto-Indo-European roots/d, it's trying to turn daoš and draonah into dåš and drånah. —Aɴɢʀ (talk) 14:48, 3 April 2017 (UTC)Reply

@Angr You should always be able to enter Avestan in the standard transliteration and get the expected results. I've changed the ASCII sequences for an, ae, ao, etc. to avoid possible clashes. If there are any more such cases, please let me know. Thanks! Benwing2 (talk) 16:46, 3 April 2017 (UTC)Reply

Module fixes

[edit]

@Angr, JohnC5, CodeCat, Erutuon, Victar I've made a bunch of module fixes. It should be more reliable and easier to use ... just add subst:chars| before a template that's missing native script and it will be added. See User:Benwing2/test-typing-aids for various examples: Benwing2 (talk) 21:03, 3 April 2017 (UTC)Reply

  • {{chars|cog|sa|tāpáyati||to heat, to torment}} --> {{cog|sa|तापयति|tr=tāpáyati||to heat, to torment}}
  • {{chars|cog|sa|tāpáyati|t=to heat, to torment}} --> {{cog|sa|तापयति|tr=tāpáyati|t=to heat, to torment}}
  • {{chars|bor|ru|sa|tāpáyati|t=to heat, to torment}} --> {{bor|ru|sa|तापयति|tr=tāpáyati|t=to heat, to torment}}
  • {{chars|bor|ru|sa|tāpáyati||to heat, to torment}} --> {{bor|ru|sa|तापयति|tr=tāpáyati||to heat, to torment}}
  • {{chars|affix|sa|tā|pá|yáti|t1=ta1|t2=pa2|t3=to heat, to torment}} --> {{affix|sa|ता|t1=ta1|प|tr2=pá|t2=pa2|यति|tr3=yáti|t3=to heat, to torment}}
  • {{chars|cog|ae|daoš}} --> {{cog|ae|𐬛𐬀𐬊𐬱}}
  • {{chars|cog|ae|||foo|tr=daoš}} --> {{cog|ae|𐬛𐬀𐬊𐬱||foo}}
  • {{chars|cog|ae|tr=daoš}} --> {{cog|ae|𐬛𐬀𐬊𐬱}}

Classical Armenian

[edit]

@Benwing2: Would it be possible to add Classical Armenian to the list of languages whose scripts can be generated from their transliteration? —Aɴɢʀ (talk) 15:10, 5 April 2017 (UTC)Reply

A comment: I don't know much about Armenian script, but it appears from Appendix:Armenian script that there are a few pairs of characters with the same transliteration. That may make it difficult to convert from transliteration to Armenian script.— Eru·tuon 00:14, 6 April 2017 (UTC)Reply
@Benwing2, Erutuon: What about Gothic? —Aɴɢʀ (talk) 11:08, 13 April 2017 (UTC)Reply
@Angr: Hmm, Gothic should be fairly easy to add. I'll start work on it. — Eru·tuon 23:03, 13 April 2017 (UTC)Reply
There was a problem, but now Gothic works! — Eru·tuon 21:29, 14 April 2017 (UTC)Reply
@Erutuon Which pairs of characters have the same translit? The only possibility I see is e+w vs. ew, and it says that w exists only in the Traditional orthography. Benwing2 (talk) 22:33, 16 April 2017 (UTC)Reply
@Benwing2: Huh, now that I look at it again, I don't see any such pairs after all. So never mind. — Eru·tuon 06:19, 17 April 2017 (UTC)Reply
@Angr I added Classical and modern Armenian. There may be mistakes; it would be worth adding more test cases. @Vahagn Petrosyan, perhaps you could help. (The testcases are in Module:typing-aids/testcases.) Benwing2 (talk) 05:19, 24 April 2017 (UTC)Reply

Templates supported as parameter 1

[edit]

Could we add {{desc}}, {{t}}, {{t+}}, and {{t-check}} to the list of templates supported as parameter 1 of {{chars}}? —Aɴɢʀ (talk) 16:25, 8 April 2017 (UTC)Reply

I'll do this in a few days (I'm on vacation now). Benwing2 (talk) 11:41, 12 April 2017 (UTC)Reply
@Angr Done. Benwing2 (talk) 22:54, 22 April 2017 (UTC)Reply
@Benwing2: Thanks! But what's the point of this? When do macrons ever "survive canonization" in Gothic? Our custom is always to transcribe 𐌴 and 𐍉 as ē and ō but for the equivalent page names to use plain e and o. —Aɴɢʀ (talk) 04:25, 23 April 2017 (UTC)Reply
@Angr Macrons can occur over Gothic a and especially u, and they aren't redundant. "Survive canonicalization" refers to the canonicalization performed by got-tr (which converts ē and ō to e and o, but leaves alone ā and ū), not the unilateral stripping of macrons that occurs in page names. An example is {{chars|cog|got|rūna}}, which maps to {{cog|got|𐍂𐌿𐌽𐌰|tr=rūna}}. If it is acceptable to include macrons directly on Gothic text, I'll remove this addition. Benwing2 (talk) 05:10, 23 April 2017 (UTC)Reply
Oh right, I forgot about ā and ū. It all looks fine now, thanks. —Aɴɢʀ (talk) 05:32, 23 April 2017 (UTC)Reply

Mycenaean Greek and Old Persian

[edit]

w:Linear B and w:Old Persian cuneiform. Both of these work like Hittite, with dashes between the syllables. —Aryamanarora (मुझसे बात करो) 19:03, 17 April 2017 (UTC)Reply

@Erutuon If it's not too much work... —Aryamanarora (मुझसे बात करो) 22:51, 6 May 2017 (UTC)Reply
@Aryamanarora: Thanks for the ping. I created Module:typing-aids/data/peo. If you could add a bunch of testcases to Module:typing-aids/testcases, that would really help. — Eru·tuon 00:13, 7 May 2017 (UTC)Reply
@Erutuon: Working great. Wiktionary's transliterations reflect the actual pronunciation, not the glyphs themselves btw. —Aryamanarora (मुझसे बात करो) 00:21, 7 May 2017 (UTC)Reply
Ahh, so to generate the correct cuneiform, you have to modify the Wiktionary transliteration. — Eru·tuon 00:30, 7 May 2017 (UTC)Reply

Unexpected result with two words

[edit]

@Erutuon, Benwing2 Hi. The expected إِلَى is not produced in this combination: E.g. فَأَسْرَعَ جُحَا ئِلَى. It gives ئِلَى instead. --Anatoli T. (обсудить/вклад) 12:20, 6 June 2017 (UTC)Reply

@Atitarev: Fixed. There was only a replacement rule for hamza kasra at the very beginning of the code passed to the template, but not at the beginning of a word. There may be other problems like this with word-initial sequences. — Eru·tuon 15:39, 6 June 2017 (UTC)Reply

Early Cyrillic for OCS

[edit]

Could a functionality be added to generate Cyrillic for Old Church Slavonic out of ASCII as well as traditional transliteration? I'm thinking of a conversion function like this:

Cyrillic Latin
а a
б b
в v
г g
д d
е e
ж zh
ž
ѕ d^z
ʒ
з z
и i
і i\, ì
к k
л l
м m
н n
о o
п p
р r
с s
т t
оу u
ф f
х x
ѡ o_
ō
ц c
ч ch
č
ш sh
š
щ sht
št
ъ uh
ŭ
y
ь ih
ĭ
ѣ eh
ě
ja
ѥ je
ю ju
ѫ o~
ǫ
ѭ jo~
ѧ e~
ę
ѩ je~
ѯ k^s
ξ
ѱ p^s
ψ
ѳ th
θ
ѵ y\
ü
ҁ q

What do others think? Is this feasible and a good idea? —Aɴɢʀ (talk) 19:51, 13 July 2017 (UTC)Reply

Some series of replacements for OCS would be useful. I'll have to try to implement yours to see if it works. — Eru·tuon 01:20, 29 July 2017 (UTC)Reply
@Angr: I've created Module:typing-aids/data/Cyrs and enabled it for both Old Church Slavonic and Old East Slavic. If there are any problems or you want to add some more characters, let me know. — Eru·tuon 19:04, 23 September 2017 (UTC)Reply
@Erutuon: Cool, thanks! —Aɴɢʀ (talk) 20:35, 23 September 2017 (UTC)Reply
@Erutuon: Could you start a "testcases" page for it? I just tried applying it at meek#Etymology, and it wouldn't convert "mŭčati" directly; I had to alter it to "muhchati" to get it to work. —Aɴɢʀ (talk) 20:42, 23 September 2017 (UTC)Reply
@Angr: Just add that example to Module:typing-aids/testcases inside the table in the test_Old_Church_Slavonic function using the syntax { "<input>", "<expected output>" },. That'll give me an incentive to fix it. — Eru·tuon 21:09, 23 September 2017 (UTC)Reply
Okay, that example works now. I also fixed some other potential problems. — Eru·tuon 21:43, 23 September 2017 (UTC)Reply

Farsi

[edit]

Any chance in getting this to work for the Persian alphabet? --Victar (talk) 20:35, 27 November 2017 (UTC)Reply

@Victar: That might be possible. Want to come up with shortcuts for the letters that would make sense? — Eru·tuon 20:42, 27 November 2017 (UTC)Reply
@Erutuon: Not speaking Farsi or Arabic, I wouldn't be qualified, but I imagine it would just be a modified Arabic. --Victar (talk) 20:58, 27 November 2017 (UTC)Reply
@Victar: I haven't studied Persian, though I've studied Arabic. I know that on Wiktionary it usually doesn't use diacritics, and that four letters are not found in Arabic: پ, چ, ژ, گ. Not having diacritics means the code for a word may have to be vowelless and look quite different from the transliteration, and from the shortcuts used for an equivalent Arabic spelling. For instance, maybe bradr or brAdr for برادر (barâdar). My main hangup is what to do about the extra letters. — Eru·tuon 21:52, 27 November 2017 (UTC)Reply
@Erutuon: I have some interest in this. I will try to give you a table with my ideas in a while but quite busy lately. Yes, no diacritics (almost) but I will explain later. --Anatoli T. (обсудить/вклад) 04:57, 28 November 2017 (UTC)Reply
@Erutuon: Will this work? I have listed multiple options, though:
  1. ["aa, A, â, ā"] = "ا ‏",
  2. ["'aa, 'â, 'a"] = "آ‏",
  3. ["b"] = "ب ‏",
  4. ["p"] = "پ ‏",
  5. ["t"] = "ت ‏",
  6. ["c, ṯ"] = "ث ‏",
  7. ["j, ǧ"] = "ج ‏",
  8. ["č, C"] = "چ ‏",
  9. ["H, ḥ"] = "ح ‏",
  10. ["x, ḫ, ḵ"] = "خ ‏",
  11. ["d"] = "د ‏",
  12. ["z', ḏ, ẕ"] = "ذ ‏",
  13. ["r"] = "ر ‏",
  14. ["z"] = "ز ‏",
  15. ["ž, z'"] = "ژ ‏",
  16. ["s"] = "س ‏",
  17. ["x, š, s'"] = "ش ‏",
  18. ["S, 9, ṣ"] = "ص ‏",
  19. ["D, ḍ"] = "ض ‏",
  20. ["T, 6, ṭ"] = "ط ‏",
  21. ["Z, ẓ"] = "ظ ‏",
  22. ["ʿ, 3, E, ʕ"] = "ع ‏",
  23. ["ğ, G, ḡ, ɣ"] = "غ ‏",
  24. ["f"] = "ف ‏",
  25. ["q"] = "ق ‏",
  26. ["k"] = "ک ‏",
  27. ["g"] = "گ ‏",
  28. ["l"] = "ل ‏",
  29. ["m"] = "م ‏",
  30. ["n"] = "ن ‏",
  31. ["v, uu, U, w, ū"] = "و ‏",
  32. ["h"] = "ه ‏",
  33. ["y, ii, ī"] = "ی ‏",
  34. ["aN"] = "اً",
  35. [","] = "،",
  36. [";"] = "؛",
  37. ["?"] = "؟",
  38. ["ʔ, '"] = "ء",
  39. ["'ye"] = "ﮥ",
  40. ["'u"] = "ؤ",
  41. ["'ii"] = "ئ",
--Anatoli T. (обсудить/вклад) 11:34, 29 November 2017 (UTC)Reply
@Anatoli: I've created Module:typing-aids/data/fa with this table. There was just one conflict, z' being used for both ژ and ذ. I've just left it for now. The replacements can now be used in the template, and I've started a set of testcases. — Eru·tuon 03:38, 1 December 2017 (UTC)Reply
@Erutuon: Thank you so much! @ZxxZxxZ: Could you suggest a unique symbol or an unambiguous combination to render letter "ذ", please? --Anatoli T. (обсудить/вклад) 04:32, 2 December 2017 (UTC)Reply
ż (since ض is pronounced as [z] in Persian, it is also usually transliterated with a variant of z. If I remember correctly, Encyclopedia Iranica actually use the letter I mentioned for dhad in its strict transliteration system) --Z 13:41, 2 December 2017 (UTC)Reply
@ZxxZxxZ: Thank you but you must have misunderstood. This is a typing aid only (type in Roman letters - get Persian writing) and since some letters have identical sounds, we have to choose something different for each Persian letters. I "borrowed" Arabic just to make them different and still to make sense and make the convention intuitive and easy to remember. Alternatives are also helpful. "dh" would not work for "ذ", since the combination "ده" is possible. --Anatoli T. (обсудить/вклад) 02:07, 3 December 2017 (UTC)Reply

Urdu alphabet

[edit]

I could use an Urdu converter. Also, it would be great if it could work for Baluchi and Yidgha as well. --Victar (talk) 17:11, 7 January 2018 (UTC)Reply

Further cuneiform--Hittite and Sumerian

[edit]

Can someone add Sumerian (sux)? (@Tom 144: for visibility). —Justin (koavf)TCM 18:21, 15 January 2018 (UTC)Reply

@Erutuon, could we also get the CVC sings for hittite, and the determiners? The module recognizes some CVC characters but not all of them.--Tom 144 (𒄩𒇻𒅗𒀸) 04:49, 16 January 2018 (UTC)Reply
@Tom 144: You can edit Module:typing-aids/data/hit to add them. The basic syntax is ["shortcut"] = "resulting text",. The shortcut can contain any character except a hyphen–minus, -, which separates the shortcuts in the template input.
@Koavf: I can start a new module at Module:typing-aids/data/sux and link it up to the main module, but someone else will need to fill in most of the shortcuts, as I am not familiar with Sumerian. — Eru·tuon 20:12, 16 January 2018 (UTC)Reply
@Erutuon: I am equally too ignorant. It seems like it's fairly low overhead and low risk to just make placeholders for a variety of scripts and then fill them in as someone who is knowledgeable comes along (i.e. someone may know the proper shortcuts for how to type Sumerican cuneiform but is confused by all this Lua module business and vice versa). —Justin (koavf)TCM 20:20, 16 January 2018 (UTC)Reply

Manichaean

[edit]

Hey, @JohnC5, I ripped your Mani-translit module and created a data table from it. Did you want to go over it? --Victar (talk) 03:14, 1 March 2018 (UTC)Reply

Hmm, {{subst:chars|l|xmn|δsʾ}} yields 𐫔𐫣 (δś). --Victar (talk) 05:27, 1 March 2018 (UTC)Reply
@Victar: I can't immediately tell what's going on with this error. I'll defer to @Erutuon. —*i̯óh₁n̥C[5] 09:13, 1 March 2018 (UTC)Reply

@Erutuon, do you think it would be possible make have |sc= also trigger this module? For instance, I'd like to do {{subst:chars|l|sog|ʾβtʾ|sc=Mani}}. --Victar (talk) 03:14, 1 March 2018 (UTC)Reply

@Victar: I'll look into it. — Eru·tuon 18:48, 1 March 2018 (UTC)Reply
It seems to be working now: {{subst:chars|l|sog|ʾβtʾ|sc=Mani}}𐫀𐫂𐫤𐫀 (ʾβtʾ). — Eru·tuon 19:09, 1 March 2018 (UTC)Reply
Thanks, @Erutuon! --Victar (talk) 19:20, 1 March 2018 (UTC)Reply
@Erutuon, how can I have Sogdian function with two different scripts, Mani and Sogd? Also, if it could default to Sogd, the better. Thanks. --Victar (talk) 14:20, 8 March 2018 (UTC)Reply
@Victar: No language has its own function, only a data module of replacements that is assigned to it by Module:typing-aids/data. If you want Sogdian to use two different modules, then use the format in this edit: just add another script_name = "module name". I suppose the main module could be made to recognize a default that was given in the format default = "module name". — Eru·tuon 19:21, 8 March 2018 (UTC)Reply
Thanks, @Erutuon, am I doing this right, ["sog"] = { default = "Sogd", Mani = "Mani" }? --Victar (talk) 21:41, 8 March 2018 (UTC)Reply
@Victar: Yes. Huh, is Sogdian supported in Unicode yet? I couldn't find block names for most of the codepoints in the Sogdian data module. — Eru·tuon 00:05, 9 March 2018 (UTC)Reply
From what I understand, their block was approved and you can find the future names here. I'm developing a font for it now. --Victar (talk) 00:25, 9 March 2018 (UTC)Reply

Mycenaean

[edit]

@Erutuon Hello. My attempts to add Linear B/Mycenaean have failed miserably... The data is at Module:typing-aids/data/gmy (a simple reversion of Module:Linb-translit). Do you think you could get it to work? --Per utramque cavernam (talk) 12:37, 9 March 2018 (UTC)Reply

@Per utramque cavernam: Now it works, but I need to make the hyphens be removed. I removed the reference to "gmy-tr" replacements; I think those won't be needed because the transliteration system only uses ASCII. — Eru·tuon 19:47, 9 March 2018 (UTC)Reply

psu to Brah

[edit]

@Erutuon I'd like to move the psu data module to its script name, Brah, but psu seems to be a hard-coded into the main module. Could you assist with this move? --Victar (talk) 20:11, 7 April 2018 (UTC)Reply

@Victar: I'll do my best to make it work, if you would redo the move of Module:typing-aids/data/psu and the modifications you made. — Eru·tuon 21:36, 7 April 2018 (UTC)Reply

Module override

[edit]

@Erutuon, could you add a module override param in {{chars}} like in {{xlit}}? I need to be able to reverse transliterate scripts to different systems in specific cases. Thanks. --Victar (talk) 15:51, 4 September 2018 (UTC)Reply

@Victar: Could you give some examples of what you want to do? There are a number of places where data modules are used and I'm not sure precisely where I would need to make changes. — Eru·tuon 00:03, 5 September 2018 (UTC)Reply
@Erutuon: For instance, I use {{chars|otk|JBz|sc=Orkh}} in {{R:otk:Bitig}}, but it's a non-standard transliteration system, so I want to be able to do {{chars|otk|JBz|module=Orkh-Bitig|sc=Orkh}}. --Victar (talk) 04:31, 5 September 2018 (UTC)Reply
@Victar: Okay, now you can do silly things like {{chars|en|ai)ei/|module=grc}} → αἰεί. Depending on the structure of the module you want to use, though, I might have to make some changes. — Eru·tuon 10:12, 6 September 2018 (UTC)Reply
@Erutuon: Thanks! --Victar (talk) 14:33, 6 September 2018 (UTC)Reply

Hittite transliteraion

[edit]

@Erutuon, Hello, is there a way that the transliteration of superscripted logograms can be directly transformed to cuneiforms. It would be good if by typing:

<sup>DINGIR</sup>IŠKUR

the module provided the same output DINGIR-IŠKUR would have given. This is because some logograms (determinatives), are conventionally superscripted in transliteration. --Tom 144 (𒄩𒇻𒅗𒀸) 14:42, 13 April 2019 (UTC)Reply

This shouldn't be too hard. Probably I can just have the module replace <sup>...</sup>... with ...-... and then run the replacements normally. — Eru·tuon 18:46, 13 April 2019 (UTC)Reply
Done. Could you also look at the testcases in the Hittite section that are currently failing and correct them or let me know if it's a bug in the module? — Eru·tuon 20:21, 13 April 2019 (UTC)Reply
Thanks. When I use the module for those testcases it actually gives the right output, so I guess there's no problem. –– Tom 144 (𒄩𒇻𒅗𒀸) 16:26, 14 April 2019 (UTC)Reply
I'm not sure if I'm understanding correctly, but I changed the two failing testcases based on the assumption that the output was correct and the expected output given in the module was wrong. — Eru·tuon 19:16, 14 April 2019 (UTC)Reply
@Victar: Idk, I guess there is a limited set of Unicode superscripted signs. There wouldn't be any sings for "ḫ" or "š", and it would probably be a lot more tedious to adapt the module to them, in comparison to using <sup>. –– Tom 144 (𒄩𒇻𒅗𒀸) 16:23, 14 April 2019 (UTC)Reply
That's fair. It wouldn't be any tedium for the module though -- if anything it would make it easier. --{{victar|talk}} 22:51, 14 April 2019 (UTC)Reply
Huh. I think it'd be more complicated: involving making sure the superscript letters are separated by a hyphen, and either adding a bunch of superscript things to the data module or making the main module convert superscript to regular letter. — Eru·tuon 23:02, 14 April 2019 (UTC)Reply
I'd argue with that, but it's all moot without the needed character set. --{{victar|talk}} 23:25, 14 April 2019 (UTC)Reply

Akkadian cuneiform

[edit]

@Erutuon Would it be possible to add Akkadian to the module. The script can be similar to Hittite's, except that it wouldn't have to add automatically the caron to "s", and would require a different set of correspondences for the syllabary. I can add the signs to the data subpage. It would also be useful if it had a way to add the dot to "ṭ" and "ṣ". –Tom 144 (𒄩𒇻𒅗𒀸) 19:05, 21 April 2019 (UTC)Reply

@Tom 144: I started Module:typing-aids/data/akk with a copy of Module:typing-aids/data/hit; you can make changes that are appropriate for Akkadian. it would be helpful if you added some testcases (Module:typing-aids/testcases); I started the testcases framework. For the dotted letters, just choose something that is easy to type and not likely to be used in any other way. — Eru·tuon 20:51, 21 April 2019 (UTC)Reply
Thanks, I think I will use an asterisk, since periods are are often used in transliterations. –Tom 144 (𒄩𒇻𒅗𒀸) 21:46, 21 April 2019 (UTC)Reply
@Erutuon: I began to edit the data, but the module doesn't seem to run it. --Tom 144 (𒄩𒇻𒅗𒀸) 23:19, 22 April 2019 (UTC)Reply
@Tom 144: Whoops. I forgot that some stuff has to be done in Module:typing-aids and Module:typing-aids/data. — Eru·tuon 23:32, 22 April 2019 (UTC)Reply

This module does replacements twice

[edit]

@Erutuon User:SodhakSH wants Sanskrit to convert to अइ when by itself, but e.g. paï to convert to पइ. I added what I thought would fix this in Module:typing-aids/data/sa, but it doesn't work; instead you get पि. See User:Benwing2/test-typing-aids for examples. What's happening is that step 6 in Module:typing-aids/data/sa, which converts independent to dependent vowels, is getting run twice, so it first removes अ after प and then converts पइ to पि, even though the second step shouldn't be happening. I think the culprit is lines 178-182 in Module:typing-aids, but I haven't worked on this module in 4 years and you've heavily modified it since then. Could you take a look? Benwing2 (talk) 04:13, 30 April 2021 (UTC)Reply

@Benwing2, SodhakSH:, actually I think it's because the vowel replacements (data["sa"][6]) were iterated over in an unspecified order (because they're in a hash map) and replaced in the same order, and if अ happened to be encountered before इ, you got paï → पअइ → पइ (अ is replaced with nothing) → पि (इ is replaced with its diacritical form). Apparently by chance this never didn't happen. So I put अ in a table that is always used after the table that contains इ so that this can't ever happen. — Eru·tuon 07:10, 30 April 2021 (UTC)Reply
@Erutuon Thanks. This makes me think we should ditch the use of lists of tables and just have lists directly so the order is always clear. Benwing2 (talk) 07:12, 30 April 2021 (UTC)Reply
@Benwing2: Yeah, that'd be a lot more intuitive. We can probably gradually switch over: with replacements_new_format = { {"a", "अ"}, {अ", ""} }, the new format will have type(replacements[1]) == "table" and type(replacements[1][1]) == "string" and the old format won't. — Eru·tuon 07:16, 30 April 2021 (UTC)Reply
Thanks a lot, Erutuon! 🔥शब्दशोधक🔥 07:16, 30 April 2021 (UTC)Reply