Module talk:Hani-sortkey

Unsupported characters

Latest comment: 4 years ago11 comments3 people in discussion

@Suzukaze-c, Wyang: Is there a way to get sortkey data for unsupported characters such as ⿰亻革? If so, I imagine the sortkey module could handle them. — Eru·tuon 21:32, 31 July 2017 (UTC)Reply

I think that maybe there should be another data module or something, and these characters can be added on an ad-hoc basis. —suzukaze (t・c) 21:38, 31 July 2017 (UTC)Reply

Have a look at Module:zh-sortkey/data/unsupported for one way of organizing them. If there are a huge number, then maybe a different structure would be better. — Eru·tuon 21:44, 31 July 2017 (UTC)Reply

I'm not sure that structure will work out. Ideographic description characters can be nested. Keeping it simple, like ["⿰亻革"] = "人09" might be best/easiest. —suzukaze (t・c) 21:49, 31 July 2017 (UTC)Reply

Wow, that makes it a lot more complicated. Ultimately what will be needed is a function that can iterate over the charactes to determine which belong to the character description. I guess you're right; I'll change it. — Eru·tuon 22:18, 31 July 2017 (UTC)Reply

Would it be possible to try using gsub with every item in unsupported if an IDS character is in the pagename? —suzukaze (t・c) 23:13, 31 July 2017 (UTC)Reply

Yes, that would be possible, but I already made a function that seems to be able to identify IDSes in Module:zh-sortkey/sandbox. It can, at least, recognize the ridiculously long one that you showed me. I dunno which possibility would be faster or use less memory. — Eru·tuon 23:28, 31 July 2017 (UTC)Reply

Cool. That works too. I think the IDS checking function could be generally useful enough to be moved to Module:zh-han, and could be used to check the IDS data in existing entries. —suzukaze (t・c) 23:30, 31 July 2017 (UTC)Reply

Oops, I spoke too soon. It's failing with ⿺辶⿳穴⿲月⿱⿲幺言幺⿲長馬長刂心⿺辶⿳穴⿲月⿱⿲幺言幺⿲長馬長刂心麵／⿺辶⿳穴⿲月⿱⿲幺言幺⿲长马长刂心⿺辶⿳穴⿲月⿱⿲幺言幺⿲长马长刂心面. It thinks the whole thing is one IDS. — Eru·tuon 00:45, 1 August 2017 (UTC)Reply

Fixed, and added the function to the main module. — Eru·tuon 21:34, 1 August 2017 (UTC)Reply

It looks like the module does not support CJK block F/G yet. Please update it @Erutuon --Octahedron80 (talk) 00:50, 5 September 2020 (UTC)Reply