Jump to content

Template talk:zh-forms

Page contents not supported in other languages.
Add topic
From Wiktionary, the free dictionary
Latest comment: 7 months ago by Verdy p in topic Discussion from Talk:溍

rfm

[edit]

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits.

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.


I suggest moving this template to Template:Hani-forms, and keeping the old name indefinitely as a redirect.

The code "zh" is ambiguous and unwanted per consensus for a number of reasons. In particular, this template begins with "zh" (which means "Chinese", or "Mandarin", depending on how you look at it), but it is also used in other languages written in Han script, whose code is "Hani". This template serves the purpose of showing varieties of Han script, so a name beginning with that code seems to be a very natural choice.

FWIW, another high-use template whose name begins with a script code is Template:Latn-def. --Daniel 08:59, 8 June 2011 (UTC)Reply

I think {{zh}} refers to the Chinese languages as a whole as opposed to the script. No real strong feelings, you could move it to {{zhx-forms}}, for example. --Mglovesfun (talk) 12:04, 8 June 2011 (UTC)Reply
If there are many good template names to be chosen, then you can consider my proposal of "Hani-forms" as completely arbitrary, but a proposal nonetheless, that I believe to be better than the current system.
However, I do think that "Hani-forms" is even better than "zhx-forms". The template is used with Translingual entries, that are neither Sinitic nor of any other family, but are written with Han characters nonetheless. --Daniel 12:28, 8 June 2011 (UTC)Reply

Done. Nobody objected. --Daniel 01:44, 22 June 2011 (UTC)Reply

技巧

[edit]

@kc_kennylau Please see 技巧, thanks. Wyang (talk) 00:39, 31 January 2016 (UTC)Reply

@Wyang: What is the problem? --kc_kennylau (talk) 02:03, 31 January 2016 (UTC)Reply
@kc_kennylau Word linking in trad. Wyang (talk) 02:24, 31 January 2016 (UTC)Reply
@Wyang: Sorry, my bad. --kc_kennylau (talk) 02:25, 31 January 2016 (UTC)Reply

Size of text in alt=

[edit]

For me, it's barely legible (´∀`;) Perhaps it's my font choices though. —suzukaze (tc) 03:41, 5 June 2016 (UTC)Reply

Fair enough. Taking the average of 70%. Wyang (talk) 03:48, 5 June 2016 (UTC)Reply

Reduplication

[edit]

@Wyang It's putting 密西西比 (Mìxīxībǐ) and the like into Chinese reduplications. — justin(r)leung (t...) | c=› } 01:29, 26 October 2016 (UTC)Reply

I originally put "This category includes any Chinese word containing two consecutive identical characters." in the description of Category:Chinese reduplications to show that this is only a category for all words with reduplicated characters. I tightened the criteria a bit to exclude sole transcriptions, and reduplications crossing component boundaries, but it may be hard to achieve the linguistic sense of reduplication automatically. Wyang (talk) 01:57, 26 October 2016 (UTC)Reply

草蜢撩雞公——自尋死路; maybe a list or tracking category should be generated and the category should be applied manually. —suzukaze (tc) 09:41, 26 October 2016 (UTC)Reply

Fixed just then in this edit. Wyang (talk) 09:42, 26 October 2016 (UTC)Reply

{{zh-forms|s=㙌|t=⿰土肅|type=3}} on

[edit]

Attention needed. Wyang (talk) 22:24, 29 June 2018 (UTC)Reply

Discussion from Talk:溍

[edit]

@Wyang, @Zcreator alt, @Justinrleung, @Suzukaze-c Hi. After the most recent edit in Module:zh-forms [1], the zh-forms box isn't displaying the proper traditional and simplified forms for and (both encoded under the same code point) based on the language tag. Also, I think it would be preferable to add in such characters manually rather than letting it do so automatically. If you look at revision 49664286 [2], some characters added to the unified_char list such as (huì), (zhōu) show no significant difference between traditional and simplified forms in the Unicode charts. I don't think it is necessary to split between traditional and simplified forms for characters that show only minor cosmetic differences (mostly in the stroke direction) such as /, /, /, /, /, /, /, /, /. Instead, these differences should be noted in the translingual section (either as alternative forms or in their respective ids). KevinUp (talk) 13:08, 8 June 2018 (UTC)Reply

I think it would be much better to add something such as "zh-forms|s=t" to characters such as / (U+73CA), / (U+7424), / (U+733A), / (U+761F), / (U+8392) as these characters are special exceptions that have been unified when compared with derived characters of (U+518A)/ (U+518C), (U+722D)/ (U+4E89), (U+4343), 𥁕 (U+25055)/ (U+6637), (U+5442)/ (U+5415) such as  / (U+59CD)/ (U+59D7) and  / (zhēng) (U+775C)/ (zhēng) (U+7741) and  / (yáo) (U+6416)/ (yáo) (U+6447) and  / (wēn) (U+6EAB)/ (wēn) (U+6E29) and  / (gōng) (U+5BAE)/ (gōng) (U+5BAB) that have been disunified. It should be noted that Han unification is slightly inconsistent, with frequently used characters split into separate code points whereas rarely used characters are unified. Hence, I would suggest adding "zh-forms|s=t" to such anomalies when encountered instead of having a unified_char list that is prone to errors when not properly checked. KevinUp (talk) 02:12, 6 July 2018 (UTC)Reply

@KevinUp: looks fine on my computer. What does it look like on your system? About unified_char, I do agree that the list needs improvement, but I like the idea that this is done automatically. We can always update the list when needed. There are still some problems to consider: (1) not all systems have the right fonts; (2) some simplified glyph shapes are acceptable (or even standard in Hong Kong) in traditional Chinese; (3) how different is different?—to me, and are different enough. — justin(r)leung (t...) | c=› } 02:30, 6 July 2018 (UTC)Reply
I'm not sure if the problem still persists on your (KevinUp) computer, but I can see a trad-simp form difference on 溍, same as Justin above. Wyang (talk) 03:07, 6 July 2018 (UTC)Reply
@Wyang: No, it's still not working for me. However, if I were to copy the code from your previous edit at 49663403 [3] and apply it to the page for , I would be able to distinguish between the two forms. Otherwise I'm only seeing the simplified forms in both boxes. KevinUp (talk) 04:40, 6 July 2018 (UTC)Reply
@Wyang:, @Justinrleung: I managed to get the characters to display correctly via this edit [4]. Can you all check to see if the fonts are applied correctly on the devices that you are using? Thanks. KevinUp (talk) 07:56, 6 July 2018 (UTC)Reply
@KevinUp: Yes, thanks, it's still displayed correctly for me. Wyang (talk) 13:24, 6 July 2018 (UTC)Reply
@Justinrleung: (1) On my system I am able to distinguish between and . Before this, in edit 49666022 [5], I was still able to see the difference. But since edit 49666028 [6], only the simplified form is shown. (2) Can you list a few more examples where the glyph shape in Hong Kong is different from the one used in Taiwan besides () (standard in Hong Kong/mainland China) vs  / () (standard in Taiwan) and (standard in Hong Kong/mainland China) vs 𥁕 (wēn) (standard in Taiwan)? So far I'm only aware of these two, as well as 𤏁/𤏁 (U+243C1) and 𤇍/𤇍 (U+241CD) which have different compositions in Hong Kong compared to Taiwan based on HKSCS 2016. In this case, adding usage notes for the respective characters would be more helpful. (3) I agree that and are different enough because there is an additional horizontal stroke for the form used in mainland China. Most of the characters that look different in mainland China due to Xin Zixing (新字形) and are encoded under the same code point in Unicode should not be considered as "simplified forms" as this would cause some confusion. Simplified characters should be defined as those that are found in 1956 《漢字簡化方案》, 1964 《簡化字總表》, 1988 《現代漢語通用字表》, 2013 《通用規范漢字表》 and 1956 《第一批异体字整理表》 (Revised 1986, 1988, 1993). Besides this, I am of the opinion that characters which have separate code points in Unicode such as  / (bié) (U+5225) and (U+522B),  / (U+5167) and (U+5185) or preferred forms that are encoded separately such as (jué) (preferred in Taiwan) and (jué) (preferred in mainland China) can be listed as being traditional/simplified in the zh-forms box. However, I don't think it is a good idea to consider Xin Zixing characters as being simplified. Some traditional characters in China are composed of simplified elements due to Xin Zixing such as (mainland China) vs (Taiwan) and (mainland China) vs (Taiwan). In this case both mainland China and Taiwan character forms are encoded under the same code point but are composed of different forms and have different stroke counts. I think having the unified_char list is great but it needs to be properly checked and compared with the Unicode charts to ensure that the characters are actually different. To me, characters that were unified inconsistently (such as the anomalies given in the second top level of this discussion) should be added to the list while those that are unified consistently across its set of derived characters such as / should not be added to the list. Consider /, listed as being traditional/simplified) due to the difference in composition of /. By analogy the derived characters of such as / should be added as well. But if someone were to add in derived characters of en masse, some anomalies are bound to occur such as in , which is both a traditional character found in Shuowen Jiezi and the simplified form of  / (píng). Hence I don't think it is a good idea to define Xin Zixing characters that are unified consistently as being simplified. By the way, I'm using Source Han Sans fonts. It covers the differences in glyph shapes between different regions and supports all characters found in HKSCS 2016. KevinUp (talk) 04:40, 6 July 2018 (UTC)Reply
@Tooironic, @Dokurrat: Do you think Xin Zixing (新字形) characters that have different glyph shapes but are encoded under the same code point such as /, /, /, /, /, /, / should be considered as simplified and separated from traditional forms in the zh-forms box? KevinUp (talk) 04:40, 6 July 2018 (UTC)Reply
We should determine a reasonable limit to this, or otherwise we might as well show zh_CN-Hans, zh_CN-Hant, zh_HK-Hant, and zh_TW-Hant at all times. —Suzukaze-c 05:42, 6 July 2018 (UTC)Reply
@Suzukaze-c: I think that one way to overcome this issue is to upload SVG files of Open-source Unicode typefaces such as Source Han Sans, Source Han Serif and Google Noto Sans/Serif CJK to Wikimedia Commons so that the different character forms can be displayed independently of the fonts used by the user's computer system. Another possibility is to put a special note to specify that the character may appear differently due to Xin Zixing character forms rather than splitting the box into traditional and simplified forms. Note that some 新字形 and 舊字形 (jiùzìxíng) images have already been uploaded to Wikimedia Commons, and these can be found on the 新字形 page on Chinese Wikipedia. KevinUp (talk) 07:56, 6 July 2018 (UTC)Reply
(I just remembered, zh.wiktionary actually does show all 4 at once 🤔 —Suzukaze-c 06:13, 15 July 2018 (UTC))Reply
@KevinUp, @Justinrleung, @Suzukaze-c, @Wyang The problem with the output is that this template is outputting lang="zh", which only contains a language code. To get correct display between simplified and traditional Chinese, you need to use ISO 15924 script codes in the lang attribute (i.e., lang="zh-Hans" and lang="zh-Hant") because this information is what Web browsers use for correct glyph selection. For a proof of concept, see Template:CJKV-forms, which had the problem this template currently has; I just fixed it.
If it's necessary to display distinct glyphs for Hong Kong and Taiwan traditional Chinese, you'll have to get even more specific and use the language codes for Cantonese (yue) and Mandarin (cmn) (i.e., lang="cmn-Hant" and lang="yue-Hant"). Or so I assume; I've never tried to display distinct glyphs in this case.
You can also use region codes: lang="zh-Hant-HK" and lang="zh-Hant-TW". However, I dislike this approach because it ties a language to a political designation.
For the first, simpler case, it looks like there are two places in the code where lang="zh" attributes are output and need to be fixed. In each, lang="zh-Hant" needs to be output when the script arguments are 'trad', lang="zh-Hans" when they are 'simp', and lang="zh-Hani" otherwise.
I can attempt to fix this template myself, but I would prefer that someone else try since I don't feel particularly comfortable modifying live code in a programming language I don't know. (I have strong abilities in several programming languages, but Lua isn't one of them.) If no one tries, I'll probably make an attempt anyway.
Patrick Dark (talk) 15:49, 5 August 2018 (UTC)Reply
@JustSomeGuy: If we want to display proper glyphs, we should not be using regional language codes, which allows the user's browser to pick fonts, but we should try to use the classes listed in MediaWiki:Common.css. — justin(r)leung (t...) | c=› } 16:04, 5 August 2018 (UTC)Reply
@Justinrleung: I also feel that region codes are a bad idea (as previously stated), but ISO 15924 script codes should be used. Users' browsers are already picking fonts since Wiktionary doesn't serve its own fonts. It's using a stylesheet to make educated guesses about what fonts are available on a user's system, but those fonts can't be predicted reliably and the guesses are more likely to be wrong for users on minority operating systems (e.g., Ubuntu (Linux)) such as myself. It therefore should be assumed that browsers will need this information until Wiktionary serves its own fonts.
As for that stylesheet, CSS has a :lang() selector specifically for dealing with this subject, but it doesn't work properly if script codes aren't specified. This is evidenced in said stylesheet, which is using classes in an attempt to work around a lack of script codes. For example, code like .Latn[lang=zh] is brittle and breaks as soon as someone adds a script or region code; it should be :lang(zh-Latn).
Patrick Dark (talk) 16:55, 5 August 2018 (UTC)Reply

unified_char list

[edit]

AFAIK, allographic variant characters like /, /, / or / are rather regional differences than differences between traditional and simplified characters, also because these characters aren't part of the Complete List of Simplified Characters. Therefore it's probably better to abandon that list. By the way, this list is far from complete. --SelfishSeahorse (talk) 21:31, 18 February 2020 (UTC)Reply

Edit: Some examples of characters that look the same in mainland China and Hong Kong, but different in Taiwan:

Mainland China Hong Kong Taiwan

And and example of a character that looks different in mainland China, Hong Kong and Taiwan:

Mainland China Hong Kong Taiwan

These examples show such variant characters are related to the region and not to simplified characters. --SelfishSeahorse (talk) 18:18, 20 February 2020 (UTC)Reply

Honestly we should probably display a "Taiwan" section at all times, like zh.wiktionary, instead of maintaining these huge lists. —Fish bowl (talk) 01:02, 7 April 2022 (UTC)Reply
and also 臺標 is ugly and does not deserve to be presented as 繁體字. —Fish bowl (talk) 06:40, 10 April 2022 (UTC)Reply

2022

[edit]

@Justinrleung, RcAlex36, 沈澄心, Theknightwho, Kernel-chan What do you think of removing chars_unified and instead adding separate rows for different country standards at all times?

Example:

  • "Traditional" (codepoint-wise same as Taiwan[1], but not in 臺標): 說夢話
  • PRC traditional: 説夢話
  • Taiwan traditional: 說夢話
  • HK traditional: 說夢話
  • PRC simplified: 说梦话
  1. ^ is that what we're currently doing? i'm not sure why we're using 為 instead of 爲

Fish bowl (talk) 07:59, 1 May 2022 (UTC)Reply

@Fish bowl: I think it would be nice, though would it look too clunky on the side? Another issue is that sometimes Taiwan or HK (and in the rare occasion, simplified) may have more than one acceptable/accepted variant (officially or otherwise); it’s hard to say for HK sometimes because there is much less standardization at the 詞 level afaik, given that there aren’t big official dictionaries for HK afaik. A third issue is that places like Macau don’t have a clear standard afaik; do we assume it’s traditional or following Hong Kong? BTW, the HK standard (according to 常用字字形表) should be 説夢話. — justin(r)leung (t...) | c=› } 13:26, 1 May 2022 (UTC)Reply
Do you mean something like the Chinese Wiktionary template that shows variants and relatives?
Also what about adding a remote character composer renderer?
Like ⿱𥫗旦 becoming 笪 automatically?, it could pull a svg renderer or combining with a tag would render over those characters maybe? Kernel-chan (talk) 02:12, 2 May 2022 (UTC)Reply
The problem is old and in fact since Unicode 4 has a clean solution. The attempt to support variants in browsers using language tagging (either with deprecated Unicode language tag characters, or using rich-text tagging with HTML, XML, or even CSS) is deprecated since years. The real solution (that works even in modern browsers, and renderers, even in plain text) is to use variation sequences (i.e. to follow the unified ideograph by a variation selector, which are standardized in the Ideographic Variation Database (IVD), a integral standard part of the Unicode character database (UCD). However this template (and the associated module) does not use any such IVD sequence.
Note that the module would need to specify which "variation selector" to use for each form of each ideograph (the same "variation selector" used after different characters are not warrantied to select the same form, and in fact Han ideographs may have MORE than just two forms ("simplified" and "traditional"). These variant forms may be encoded and added at any time in the Unicode standard (in the IVD) long after the encoding of isolated ("unified" or "compatibility") ideographs and isolated variation selectors: you need to use the normative data from the IVD (there, you'll find multiple variants for traditional forms, and multiple variants for the simplified form, depending on the language: Chinese, Japanese, Vietnamese, Korean, or relevant national standards).
It is the standardisation of the IVD that allowed Unicode and the ISO TC to affirm that there would no longer be any new addition to "compatibility ideographs" and that any such request for standardization will be now rejected (the two existing compatibility blocks in the BMP have been "frozen", except to fix a few missing characters that were forgotten in the relevant standards that were accepted and normatively referenced in past Unicode/ISO standard versions, due to past defects in the Han unification: all seems to be fixed now, and there are more quality assurance tools used by Unicode and the IRG to make sure that all variants are referenced in the IVD (all past compatibility ideographs are present in the IVD with their defined variation selector, along with the variation selector for the unified ideograph, so that canonical equivalence now works perfectly with Han characters). All "compatibility ideographs" are now deprecated (this does not concern 12 characters from the "IBM 32" subset that are present in one compatibility block, but that are NOT "compatibility ideographs" but are unified ideographs. Since this IVD standardization, all new additions to Han ideographs have only occurred with new blocks allocated exclusively for "unified ideographs", all of them mappable at any time in the IVD to assign their needed variants.
I then strongly suggest you to include support for the IVD (part of the standard UCD and integrated in the Unihan Database). And then generate variation sequences in this template, instead of relying of language tagging (which was experimental, and was removed from all modern browsers, whose text renderers are already capable to correctly display the variation selectors (with quality fonts that have mapping from them; legacy font mappings on compatibility ideographs is also starting to disappear, moderns fonts are now removing these old mappings in favor of mappings of variation sequences!).
Verdy p (talk) 22:20, 3 May 2024 (UTC)Reply

Template:zh-forms not displaying a definition

[edit]

In this entry (八面玲瓏), Template:zh-forms does not properly display defitions in the box. Maybe it's because all four defitions in the entry are prefaced with template:lb.--Prisencolin (talk) 07:33, 29 November 2018 (UTC)Reply

@Prisencolin: I don't think that's the problem. The problem is probably that it only takes the first definition. The first definition is wrapped in {{n-g}}, which it automatically ignores. — justin(r)leung (t...) | c=› } 23:07, 5 December 2018 (UTC)Reply

|alt= parameter

[edit]

Is the |alt= parameter for written or spoken variants? 圖窮匕見 / 图穷匕见 (túqióngbǐxiàn) is unfortunately mixing the two.

I think it's better to move spoken variants to the "Alternative forms" section and format them with {{zh-l}} so that their pronunciation is visible. That's what I'm proposing for Japanese too.

(Notifying Kc kennylau, Atitarev, Tooironic, Jamesjiao, Meihouwang, Suzukaze-c, Justinrleung, Hongthay, Mar vin kaiser, Dokurrat, Zcreator alt, Geographyinitiative): --Dine2016 (talk) 15:10, 25 November 2019 (UTC)Reply

idk, but i'd really like to distinguish the two tbh. —Suzukaze-c 16:01, 25 November 2019 (UTC)Reply
I think of 'alt=' as written variants which are pronounced in the same way as the word being defined (like on the 停車 / 停车 (tíngchē) or 高雄 (Gāoxióng) pages). But here's another page where that rule is not being followed: 柴米油鹽醬醋茶 / 柴米油盐酱醋茶 (chái mǐ yóu yán jiàng cù chá) --Geographyinitiative (talk) 22:06, 25 November 2019 (UTC)Reply
@Dine2016: I agree. I prefer to have |alt= reserved for variations in orthography only, i.e. they are all pronounced the same way as the main entry. Any other type of "alternative form" should either be treated as a synonym (if it's very different) or as an alternative form under the "alternative form" header. — justin(r)leung (t...) | c=› } 22:43, 25 November 2019 (UTC)Reply
Maybe? [7][8] --Geographyinitiative (talk) 22:55, 25 November 2019 (UTC)Reply

IDS

[edit]

@Erutuon Could the IDS functions from Module:zh-sortkey be called so that these work properly? (and perhaps moved to a more general module? Module:Hani?)


earth; dust; rural
earth; dust; rural; uncouth; uncultured; plebeian
 
Gansu; respectful
trad. (⿰土肅)
simp. () [[#Chinese|]] [[#Chinese|]]
trad. 𪈿
simp.


Suzukaze-c (talk) 08:23, 5 July 2020 (UTC)Reply

@Suzukaze-c: Good idea. See the draft in Module:zh-forms/sandbox. (I haven't split the IDS code into a separate module yet.) It handles these two cases at least, though it might need more testing. — Eru·tuon 03:56, 6 July 2020 (UTC)Reply

Broken 星期日

[edit]

The template call outputs unterminated wikicode. Can someone look at it? --Derbeth talk 14:18, 20 December 2020 (UTC)Reply

共和國

[edit]

Also broken, outputs unterminated wikicode. --Derbeth talk 16:01, 2 May 2021 (UTC)Reply

Justify left and right seems random

[edit]

Using this template, sometimes the box is sitting on the left, unchanged. Sometimes it has it's justify set to the right. I can't see anything in the module code that would do this, and it's very annoying. Is this meant to be this way? Levi OP (talk) 16:49, 10 January 2022 (UTC)Reply

@Levi OP: https://en.wiktionary.org/w/index.php?title=Module:zh-forms&oldid=66370000#L-80. Honestly I don't get it either. —Fish bowl (talk) 06:38, 10 April 2022 (UTC)Reply
@Fish bowl Nice find. I might just be bold and change this. If anyone has an issue with it it can be reverted but I can't see any reason that it would be like this. Levi OP (talk) 01:08, 15 April 2022 (UTC)Reply

I think the reasoning for this is that above a certain length it takes up the whole row anyway, so it may as well be aligned left. I don’t really mind it, but it seems to be set too short, and it does make it inconsistent. Theknightwho (talk) 12:32, 1 May 2022 (UTC)Reply

|ss= and 二簡 1977 vs. 1981

[edit]

@Theknightwho, Kernel-chan You guys should probably figure out a way to adapt |ss= for whatever this difference is.

  1. https://en.wiktionary.org/w/index.php?title=罐&action=history (?)

Fish bowl (talk) 03:12, 18 May 2022 (UTC)Reply

Issue with "Template:vern" transclusion in "橡果#Chinese"

[edit]

I think Template:zh-forms has an issue with transcluding Template:vern in 橡果#Chinese. One of the column headers displays {{vern|jolcham oak}} literally instead of transcluding it. -- F1yingpig (talk) 00:27, 17 October 2022 (UTC)Reply

The problem is in Module:zh/data/glosses; the template syntax is not rendered. @DCDuring, how important are these templates in this context? Should they be processed? —Fish bowl (talk) 00:15, 7 May 2023 (UTC)Reply
It is not at all critical. The problem may have to do with there being two templates on that line in the module. There are many instances of {{vern}} and {{taxlink}} in that module, but I don't recall ever seeing the two together.
The purpose of those templates is to count links to determine which organism names are most worth adding. I would like to count all uses of the name, from definitions, image captions, etymology sections, and the forms boxes. But, as I have to use the XML dumps to count the templates, I can't count links in the forms boxes. Arguably, they are of lesser importance than the links from the other items, but it may lead to failure to add organism-name entries that are of fundamental cultural importance in China and elsewhere in Asia. As far as I can tell, my template count finds only three links to Quercus serrata, whereas a search finds 18 uses. DCDuring (talk) 02:10, 7 May 2023 (UTC)Reply

{{{ss}}} needs simplified form suppression

[edit]

"2nd round simp. 泞/𰛑" @TheknightwhoFish bowl (talk) 07:52, 5 March 2024 (UTC)Reply

@Fish bowl Done Done. I've also suppressed it for nonstandard simplified forms as well. Theknightwho (talk) 14:05, 5 March 2024 (UTC)Reply

{{taxfmt}}

[edit]

FYI this doesn't seem to be handling {{taxfmt}} properly: see 上海白菜. Weylaway (talk) 22:12, 21 April 2024 (UTC)Reply