User talk:Benwing2/test-ko-xlit

@Tibidibi I have this partly working. With suffixes beginning with a consonant, however, the hyphen is in the wrong place, after the consonant instead of before. To fix this I will probably have to change the format of Module:ko-pron/data and split up the transliterations between final and next-syllable-initial consonant. Benwing2 (talk) 01:52, 12 July 2021 (UTC)Reply

@Benwing2 Thank you so much.

If it would be easier for you, would it be possible to set it up so that the format at least works correctly for vowel-consonant sequences? The consonant-consonant cases are relative edge cases (they appear in the verbal paradigm, but verbal suffixes will not be hyphenated in usage examples), so if the consonant-vowel sequences and the vowel-consonant sequences are implemented correctly, the new module can probably be implemented as is even if it fails for consonant-consonant sequences.

I've put in two examples of vowel-consonant sequences in User:Benwing2/test-ko-xlit, if that helps.--Tibidibi (talk) 02:22, 12 July 2021 (UTC)Reply

@Tibidibi To get the vowel-consonant sequences working isn't much easier than getting both vowel+consonant and consonant+consonant working, I think. I went ahead and tried to fix my local copy of Module:ko-pron/data at Module:User:Benwing2/ko-pron/data by adding semicolons between the final and initial consonants of system 2 boundary strings. I don't really know how to read Hangul so you should check it to make sure it's correct. In the meantime I'll see about getting the code to handle the semicolons correctly. Benwing2 (talk) 03:05, 12 July 2021 (UTC)Reply

@Benwing2 Does ["-ᄀ"], ["-ᄁ"], ["-ᄂ"], etc., in the code represent those consonants in word-final position, or something else?--Tibidibi (talk) 03:12, 12 July 2021 (UTC)Reply

@Benwing2 And do the semicolons signify places where hyphens will appear? E.g. will "t;s" in the code show up as "t-s"? Thanks again.--Tibidibi (talk) 03:15, 12 July 2021 (UTC)Reply

@Tibidibi The strings ["-ᄀ"], ["-ᄁ"], ["-ᄂ"], etc. appear to represent consonants in word-initial position, whereas the corresponding things like ["Ø-ᄀ"], ["Ø-ᄁ"], etc. represent consonants in syllable-initial position following a syllable ending in a vowel. Semicolons indeed signify places where hyphens will appear. Benwing2 (talk) 03:17, 12 July 2021 (UTC)Reply

@Benwing2 Thank you. I went through the list and corrected the few errors there were. I must say your hangul-reading skills are really quite good!--Tibidibi (talk) 03:31, 12 July 2021 (UTC)Reply

@Tibidibi It should be working now. I checked that suffixes alone work correctly but you might want to check that prefixes alone work correctly and/or check things like vowel-final stem + vowel-initial suffix. Once you are satisfied everything is working, I'll push this code into production. Benwing2 (talk) 03:54, 12 July 2021 (UTC)Reply

@Tibidibi BTW the hyphens are only preserved in system 2; to get them working for other systems we'd need to similarly add semicolons to the other boundary strings, along with some further code hacking to handle certain edge cases. Benwing2 (talk) 03:55, 12 July 2021 (UTC)Reply

@Benwing2 Thank you again, everything seems to be working swimmingly. Just as a final request, could you strip the hyphens away from the display of the actual Korean text (leaving them only in the links and the transliteration)?--Tibidibi (talk) 04:07, 12 July 2021 (UTC)Reply

@Tibidibi Hmmm, when you say "from the display of the actual Korean text" do you mean everywhere, or just in certain places (links and/or usexes and/or etymology templates and/or Korean-specific templates and/or ...)? Implementing this everywhere will require modifying code in a bunch of different places as there isn't currently a central display handler. Benwing2 (talk) 04:17, 12 July 2021 (UTC)Reply

@Benwing2 Hyphens are never used in Korean text, so it should be everywhere.--Tibidibi (talk) 04:22, 12 July 2021 (UTC)Reply

@Benwing2 They could stay in etymology templates, since they are used to distinguish bound morphemes from unbound ones in linguistic contexts. But that's essentially their only use in non-numerial contexts.--Tibidibi (talk) 04:23, 12 July 2021 (UTC)Reply

@Tibidibi OK, it looks like modifying tag_text() in Module:script utilities will handle a lot of cases, e.g. it will handle links, usexes, {{lang}} and probably etymology templates as well since they use Module:links underlyingly. I'll look into this tomorrow evening my time (UTC-5). Benwing2 (talk) 04:40, 12 July 2021 (UTC)Reply

@Tibidibi Haven't forgotten about this. Will try to look into it tomorrow (Wed evening). Benwing2 (talk) 06:25, 14 July 2021 (UTC)Reply

@Benwing2 Sorry to ping you again, but would it be possible to get the script display working soon? @Omgtw15 and I have already been inserting hyphens in usexes in preparation for the move, so several entries currently have broken displays.--Tibidibi (talk) 05:11, 17 July 2021 (UTC)Reply

@Tibidibi Yes, by this weekend. I have been hesitating because it involves modifying core functionality and if I get it wrong it will have a lot of effects, but I will take the time to do it right. Benwing2 (talk) 05:19, 17 July 2021 (UTC)Reply

suppressing hyphens in links and such

Latest comment: 3 years ago23 comments3 people in discussion

@Tibidibi This is implemented in User:Benwing2/mention, which is a sandbox version of {{m}}. In its current implementation it just suppresses hyphens in display everywhere, including in bare prefixes and suffixes. Let me know if you want something else. Please experiment a bit with this and let me know if it's what you're looking for, before I push it to production. BTW it may not suppress all hyphens everywhere; it only works for code paths that ultimately go through tag_text in Module:script utilities. This includes anything that generates a link in the standard fashion ({{l}}, {{m}}, {{alter}}, etymology templates, etc.) as well as {{lang}}, but if some module just displays the raw text directly, the hyphens will still appear, and I'll have to modify that code as well. Benwing2 (talk) 18:26, 24 July 2021 (UTC)Reply

@Benwing2 Thank you for this. Having run some tests at User:Tibidibi, it seems that hyphens are stripped within wikilink targets, e.g. {{User:Benwing2/mention|ko|-를}} shows up as Lua error in Module:User:Benwing2/links/templates at line 68: attempt to call field 'getFull' (a nil value)Template:redlink category and not 를 (reul). Is there no way to preserve the hyphenation in the wikilink targets, so the new hyphenated entries can actually be linked to? Hence {{User:Benwing2/mention|ko|-를}} should effectively function as {{m|ko|-를|를|tr=-reul}}, stripping the hyphen only in the display but preserving it in the link target (and also in the transliteration).--Tibidibi (talk) 01:10, 25 July 2021 (UTC)Reply

@Tibidibi Try it now. Benwing2 (talk) 03:03, 25 July 2021 (UTC)Reply

@Benwing2 Thank you. It seems to work for short sentences, but

{{User:Benwing2/mention|ko|내 벗-이 몇-이냐 하니 수석-과 송죽-이라, 동산-에 달 오르니 긔 더욱 반갑고야, 두어라 이 다섯 밖에 또 더하야 무엇하리}}

yields the (buggy?) Lua error in Module:User:Benwing2/links/templates at line 68: attempt to call field 'getFull' (a nil value)Template:redlink category.--Tibidibi (talk) 03:10, 25 July 2021 (UTC)Reply

@Tibidibi Yeah, I used some existing hacky code that was there to handle Mongolian. Not surprised this code is buggy. Let me look into what's going on here. Benwing2 (talk) 03:19, 25 July 2021 (UTC)Reply

@Tibidibi I think I know how to fix this properly, and I'm going to try to fix this today. Benwing2 (talk) 21:36, 31 July 2021 (UTC)Reply

@Tibidibi Try it now. Benwing2 (talk) 06:00, 1 August 2021 (UTC)Reply

Thank you so much! I tried it out on my userpage (User:Tibidibi) and it works perfectly.--Tibidibi (talk) 09:18, 1 August 2021 (UTC)Reply

@Benwing2 There's another issue I just discovered; the allophony in the new romanization only functions if there are no wikilinks. {{User:Benwing2/ko-xlit|조국-을}} gives the desired form Lua error in Module:User:Benwing2/ko-pron at line 449: attempt to call field 'pattern_escape' (a nil value), but {{User:Benwing2/ko-xlit|조국-을}} gives the incorrect Lua error in Module:User:Benwing2/ko-pron at line 449: attempt to call field 'pattern_escape' (a nil value), in addition to creating undesirable wikilinks in the romanized text itself. Could you have a look at this please?--Tibidibi (talk) 09:26, 1 August 2021 (UTC)Reply

@Tibidibi This sort of thing happens with all translit modules. Normally this isn't an issue because whenever translit modules are called internally, wikilinks are removed from the text prior to it being passed to the translit module. I can add something to {{ko-xlit}} to manually remove wikilinks, but the question is, are you ever actually invoking {{ko-xlit}} yourself with wikilinks in it? Benwing2 (talk) 17:05, 1 August 2021 (UTC)Reply

@Tibidibi OK, so I realize that there is no {{ko-xlit}} in fact; {{User:Benwing2/ko-xlit}} is something I created for testing, and the actual interface to the translit module, which is {{xlit|ko}}, does remove links prior to passing text to the translit module. Benwing2 (talk) 17:08, 1 August 2021 (UTC)Reply

@Benwing2 Oh, in that case I think the two modules should be ready for launching. Thank you again.--Tibidibi (talk) 01:01, 2 August 2021 (UTC)Reply

@Tibidibi I pushed the code live. We need to watch out for any more pages running out of memory (currently there are 16 pages in CAT:E), as I had to slightly increase the size of Module:script utilities, which is used in a lot of places, and add a check for hyphens in Korean text in that module. Benwing2 (talk) 01:39, 2 August 2021 (UTC)Reply

@Benwing2 Thank you! 길 (gil) looks amazing now. Also pinging @Omgtw15 for good measure.--Tibidibi (talk) 01:46, 2 August 2021 (UTC)Reply

@Benwing2 One minor thing, the transliteration of a bolded word in the usex is no longer being bolded. See 길 (gil).--Tibidibi (talk) 01:53, 2 August 2021 (UTC)Reply

@Benwing2 Could you also check the etymology templates? {{af|ko|일|-꾼}} should produce 일 (il) + 꾼 (-kkun), but it produces 일 (il) +‎ 꾼 (-kkun), with the link to the suffix being unhyphenated.--Tibidibi (talk) 01:58, 2 August 2021 (UTC)Reply

@Tibidibi I took out the script 'Kore' from the list of scripts where display hyphens are suppressed, which should hopefully fix the issue with {{af}} and similar templates handled by Module:compound. I'm not sure what happened with the bolded words, somehow the quotes aren't being passed through the transliteration. I'll look into this. Benwing2 (talk) 02:08, 2 August 2021 (UTC)Reply

@Tibidibi I made a change to Module:ko-pron that should hopefully fix the boldface issue. Benwing2 (talk) 02:32, 2 August 2021 (UTC)Reply

@Benwing2, Tibidibi D-#Korean:

{{usex|ko|오늘은 여행 D-2야.|It is 2 days before the trip today.}}

<div class="h-usage-example"><i class="Kore mention e-example" lang="ko">[[오늘#Korean|오늘]]은 [[여행#Korean|여행]] '''D'''2야.</i><dl><dd><i lang="ko-Latn" class="e-transliteration tr Latn">Oneureun yeohaeng '''D-2ya.</i></dd><dd><span class="e-translation">It is 2 days before the trip today.</span></dd></dl></div>

{{usex|ko|오늘은 여행 D-2야.|It is 2 days before the trip today.|subst=D\-//디마이너스,2//이}}

<div class="h-usage-example"><i class="Kore mention e-example" lang="ko">[[오늘#Korean|오늘]]은 [[여행#Korean|여행]] '''D'''2야.</i><dl><dd><i lang="ko-Latn" class="e-transliteration tr Latn">Oneureun yeohaeng '''dimaineoseu-iya.</i></dd><dd><span class="e-translation">It is 2 days before the trip today.</span></dd></dl></div>

—Suzukaze-c (talk) 03:44, 3 August 2021 (UTC)Reply

@Suzukaze-c Apologies, I'm not quite sure what the issue is. I see that a hyphen is being used here for something other than a suffix or morpheme boundary but I'm not sure what the correct behavior should be. Benwing2 (talk) 03:47, 3 August 2021 (UTC)Reply

[edited] @Benwing2 Besides the hyphen in the text, the bold tag(?) is unclosed. —Suzukaze-c (talk) 03:50, 3 August 2021 (UTC)Reply

@Benwing2 Is ignore_cap in Module:links something that could be merged into the hyphen code? It uses the carat for capitalization: {{l|ko|^전라도}}→전라도 (Jeollado). —Suzukaze-c (talk) 02:18, 2 August 2021 (UTC)Reply

@Suzukaze-c I think that stuff has to remain, because the carat needs to be removed from wikilinks as well as from the display, whereas the hyphen is removed only from the display and not from wikilinks (which is what makes the code tricky). Benwing2 (talk) 02:34, 2 August 2021 (UTC)Reply