Module talk:Hani-sortkey
Add topicAppearance
(Redirected from Module talk:Hani-sortkey/sandbox)
Latest comment: 4 years ago by Octahedron80 in topic Unsupported characters
Unsupported characters
[edit]@Suzukaze-c, Wyang: Is there a way to get sortkey data for unsupported characters such as ⿰亻革? If so, I imagine the sortkey module could handle them. — Eru·tuon 21:32, 31 July 2017 (UTC)
- I think that maybe there should be another data module or something, and these characters can be added on an ad-hoc basis. —suzukaze (t・c) 21:38, 31 July 2017 (UTC)
- Have a look at Module:zh-sortkey/data/unsupported for one way of organizing them. If there are a huge number, then maybe a different structure would be better. — Eru·tuon 21:44, 31 July 2017 (UTC)
- I'm not sure that structure will work out. Ideographic description characters can be nested. Keeping it simple, like
["⿰亻革"] = "人09"
might be best/easiest. —suzukaze (t・c) 21:49, 31 July 2017 (UTC)- Wow, that makes it a lot more complicated. Ultimately what will be needed is a function that can iterate over the charactes to determine which belong to the character description. I guess you're right; I'll change it. — Eru·tuon 22:18, 31 July 2017 (UTC)
- Would it be possible to try using
gsub
with every item inunsupported
if an IDS character is in the pagename? —suzukaze (t・c) 23:13, 31 July 2017 (UTC)- Yes, that would be possible, but I already made a function that seems to be able to identify IDSes in Module:zh-sortkey/sandbox. It can, at least, recognize the ridiculously long one that you showed me. I dunno which possibility would be faster or use less memory. — Eru·tuon 23:28, 31 July 2017 (UTC)
- Cool. That works too. I think the IDS checking function could be generally useful enough to be moved to Module:zh-han, and could be used to check the IDS data in existing entries. —suzukaze (t・c) 23:30, 31 July 2017 (UTC)
- Yes, that would be possible, but I already made a function that seems to be able to identify IDSes in Module:zh-sortkey/sandbox. It can, at least, recognize the ridiculously long one that you showed me. I dunno which possibility would be faster or use less memory. — Eru·tuon 23:28, 31 July 2017 (UTC)
- Would it be possible to try using
- Wow, that makes it a lot more complicated. Ultimately what will be needed is a function that can iterate over the charactes to determine which belong to the character description. I guess you're right; I'll change it. — Eru·tuon 22:18, 31 July 2017 (UTC)
- I'm not sure that structure will work out. Ideographic description characters can be nested. Keeping it simple, like
Oops, I spoke too soon. It's failing with ⿺辶⿳穴⿲月⿱⿲幺言幺⿲長馬長刂心⿺辶⿳穴⿲月⿱⿲幺言幺⿲長馬長刂心麵/⿺辶⿳穴⿲月⿱⿲幺言幺⿲长马长刂心⿺辶⿳穴⿲月⿱⿲幺言幺⿲长马长刂心面. It thinks the whole thing is one IDS. — Eru·tuon 00:45, 1 August 2017 (UTC)
Fixed, and added the function to the main module. — Eru·tuon 21:34, 1 August 2017 (UTC)
It looks like the module does not support CJK block F/G yet. Please update it @Erutuon --Octahedron80 (talk) 00:50, 5 September 2020 (UTC)