Template talk:zh-dial

From Wiktionary, the free dictionary
Latest comment: 3 years ago by Justinrleung in topic Colloquial Putonghua
Jump to navigation Jump to search

Dialect maps

[edit]

@Wyang, Suzukaze-c, Tooironic, Atitarev, Hongthay, Mar vin kaiser: I've attempted to cobble up some preliminary code to make maps displaying dialectal variation based on our dialectal data (as suggested by Wyang). The code is here, and a rough example with 馬鈴薯 is here. There's obviously some formatting to be done, and some of the dots are repeated because there aren't enough dots of the same style at Commons. I'm thinking of including the map with zh-dial (hidden under the table with the actual list of synonyms), but I don't know if that's a good idea. I'm open to any suggestions. — justin(r)leung (t...) | c=› } 23:08, 24 February 2017 (UTC)Reply

Great work!! I love this idea and support adding it to the table. I did some changes (Module talk:User:Justinrleung/dial-map-2) - splitting words for each location, allowed multiple words to show up for one location, and increased the size for dots. I think the appearance of the dots can still be improved; perhaps making them solid dots would be more aesthetic. Wyang (talk) 23:30, 24 February 2017 (UTC)Reply
Great work indeed! --Anatoli T. (обсудить/вклад) 01:11, 25 February 2017 (UTC)Reply
@Wyang, Atitarev Thanks for your support and feedback! Frank, I love the changes you're making! We do still have the problem with having too few dots, though; some colours overlap. — justin(r)leung (t...) | c=› } 04:09, 25 February 2017 (UTC)Reply
Thanks! I converted the image-based colour dots to html-based, and used some online colour brewers (such as this) to generate a large number of distinct colours per [1]. It is looking okay but still should be improved. Also added sorting for the individual words, so that the most common ones occur first in the list. I think we should add in a preliminary screening function for all the distinct lemmas, so that we know which ones occur most often and assign the most distinctive colours to these high-frequency terms. Wyang (talk) 05:42, 25 February 2017 (UTC)Reply
@Wyang: Great improvements! I think assigning high-frequency terms particular colours is a good idea. One thing about multiple synonyms for one location: how about we have something like a multi-coloured circle instead having them overlap? (it looks a bit hard though) — justin(r)leung (t...) | c=› } 05:53, 25 February 2017 (UTC)Reply
That's a good suggestion. I had a read of the tutorial page, but the $wheel code is making my head spin. Perhaps CSS guru Suzukaze could help here, haha. :) Wyang (talk) 06:15, 25 February 2017 (UTC)Reply
Yeah, they're using Sass, which has a bit of non-CSS stuff. I think we could implement it with CSS and Lua though (with Suzukaze's help, of course). — justin(r)leung (t...) | c=› } 06:22, 25 February 2017 (UTC)Reply
Another suggestion would be to improve the annotation on the dots - such as including more on the variety (Mandarin, Cantonese, etc.), Chinese name of location (done), and linking to the respective entries. Wyang (talk) 12:30, 25 February 2017 (UTC)Reply
I moved it to Module:zh-dial-map and created the supporting templates ({{zh-dial-map}} and its subpages). The link to the map is next to the gloss on the title for the moment. More suggestions are welcome. :) Wyang (talk) 09:59, 26 February 2017 (UTC)Reply
Just a thought. These dialectal synonyms charts and maps would be ideal for Arabic. --Mar vin kaiser (talk) 11:05, 26 February 2017 (UTC)Reply
@Mar vin kaiser: Definitely! It'd be great to see this implemented for other languages! — justin(r)leung (t...) | c=› } 03:24, 27 February 2017 (UTC)Reply

I think this is really cool. Other thoughts:

  • What about using one color for all one-off words and saving colors for words that appear in multiple regions?
  • Multiple words occupying the same region seems to be a problem. What if the map was an off-wiki Tool Labs utility with zooming and other possible JavaScript enhancements?
  • About the multi-colored circle:
    • I have a decent grip on CSS but stuff like SASS and drawing fancy shapes is not a level of expertise I have not reached yet :/
    • Also, I think the tooltips would become impossible.
  • What if the map served as "documentation" for the Module: data? (i.e. following a link to Module:zh/data/dial-syn/水果 would present the map followed by the export.list stuff)

suzukaze (tc) 00:46, 27 February 2017 (UTC)Reply

Great to hear back from you, @Suzukaze-c! I was thinking of something similar to the "one-off words" idea, but then that might be problematic for words like 鼻子, where everything is one off from 鼻. As for the multi-coloured circle, I tried implementing it in my sandbox based on this but I'm not totally understanding the math yet. I think it would look ok on the map (and the tooltip should work), but the code is just too repetitive. — justin(r)leung (t...) | c=› } 03:20, 27 February 2017 (UTC)Reply
I think Suzukaze's "one-off" referred to terms only found in one location. It's potentially a good idea, if we use an inconspicuous colour (gray?) for these one-off terms. In certain cases, using several central key terms and combining similar terms into one colour to reduce the #colours (while using different shapes in the same colour to represent terms of similar appearance) is another possibility. Some key characters may need to be specified a priori (e.g. 薯, 芋), and terms are grouped based on them. Alternatively, terms are grouped based on their similarity with one another, perhaps by calculating Levenshtein distance if both are > 2 characters, for example compute_distance in Module:ko-utilities. Wyang (talk) 03:39, 27 February 2017 (UTC)Reply
Oops... I see what you mean. — justin(r)leung (t...) | c=› } 03:58, 27 February 2017 (UTC)Reply

Thanks

[edit]

@Justinrleung Thanks for the great work you put into this! I've copied over a lot (okay, maybe all) of your code to MOD:inc-ash/dial. —AryamanA (मुझसे बात करेंयोगदान) 01:25, 7 February 2018 (UTC)Reply

@AryamanA: Umm, I think you should be thanking @Wyang if you're talking about MOD:zh-dial-syn. — justin(r)leung (t...) | c=› } 03:06, 7 February 2018 (UTC)Reply
@Justinrleung: Oh, my bad. (You made the map tho right?) —AryamanA (मुझसे बात करेंयोगदान) 03:15, 7 February 2018 (UTC)Reply
@AryamanA, Justinrleung Yeah, Justin designed the map, and also contributed a lot to other dialectal modules, including most of the dialectal data modules. Anyway, glad to see it being used elsewhere. Looks great! Wyang (talk) 03:34, 7 February 2018 (UTC)Reply
I wonder if we can figure out a way to generalize this code instead of making forks every time someone wants to add dialect information for a new language... —suzukaze (tc) 03:35, 7 February 2018 (UTC)Reply
@Suzukaze-c: There are some minor customizations such as automatic transliteration and script detection (Ashokan Prakrit used two scripts), but that could be generalized too. I'd imagine this would be useful for Arabic at some point. —AryamanA (मुझसे बात करेंयोगदान) 04:37, 7 February 2018 (UTC)Reply

Dungan?

[edit]

🤔 —suzukaze (tc) 06:47, 8 March 2018 (UTC)Reply

@Suzukaze-c: Yeah, that's what I've been thinking about since when we started including Dungan in Chinese entries. I've put it off for two reasons: (1) I don't know what location should be used to represent the different dialects (like Gansu and Shaanxi), and (2) I'll have to reupload a map that includes Central Asian countries like Kyrgyzstan. I can't understand Russian, so the only resources I'm working with are English and Chinese. @Wyang, Atitarev, any ideas? — justin(r)leung (t...) | c=› } 21:54, 8 March 2018 (UTC)Reply
@Justinrleung Sorry Justin, I'm not really familiar with Dungan and not up-to-date with our Dungan stuff. Can we list all the resources we have amassed for Dungan so far somewhere? (Sorry for asking if this has already been done)
Also Justin, do you mind if we merge the resources list in your userspace into those here? I would like to know how you usually create the dial-syn lists ― do you go through the 42 volumes of 現代漢語方言大詞典 one by one, and how do you know which terms to look up? (驚恐) I would really appreciate if you could write a simple guide on the template page describing the collation of dialectal data in more detail (and a guide on how to find/assign the Min Nan pronunciations). Often I feel 愛莫能助... Wyang (talk) 23:17, 8 March 2018 (UTC)Reply
@Wyang: About Dungan, we should probably have an about page for Dungan and put the resources there. I have a page for Dungan phonology, which has things we need to fix or implement, especially tone sandhi patterns.
About the resources lists, it's largely still in progress, so I'm not too sure about moving them here yet. I really want more coverage for Mandarin, which is severely underrepresented in the dialectal tables compared to the southern dialects. It's really difficult to find substantial data for those though, especially for dialects that aren't that different from Putonghua in terms of vocabulary. I know of 普通话基础方言基本词汇集, but I can't find it anywhere.
To create the dial-syn lists, I usually check if 漢語方言詞彙 has column for it. If so, I start with that as the basis. Then I look through each volume of 現代漢語方言大詞典. All of them have an index sorted by category (plants, animals, weather, clothes, body parts, ...). Then I look through other sources for the locations not covered by the two sources. It usually takes an hour or two to compile one list decently. Is this enough for a "guide"? — justin(r)leung (t...) | c=› } 03:40, 9 March 2018 (UTC)Reply
Yeah it suffices. Thanks! Wyang (talk) 04:08, 9 March 2018 (UTC)Reply
@Justinrleung, Suzukaze-c, Wyang: Sorry for the late reply. Please let me know if you need anything translated from Russian. I have been using the two Dungan dictionary files - Dict.pdf (Russian) and dunganDictionary.html (English and Russian), which were linked earlier. If it were possible to load data here, then I could gradually translate the Russian bits where necessary. Please note that the pdf file uses some replacements for Cyrillic characters, which are missing in Russian, so the extract files. E.g. вәму (vəmu, we) is displayed as "ВӘМУ" (correctly) but stored in the file as "ВHМУ". The Russian dictionary is also a bit harder to use because they don't use Chinese characters. I can often guess but it's only guessing.
The English-Russian-Dungan dictionary uses more complicated tones. Not sure if this is implemented here. I used just the I-II-III tones in Chinese entries.
I think for the distribution of Dungan dialect of Mandarin using Kyrgyzstan would be sufficient. --Anatoli T. (обсудить/вклад) 06:49, 9 March 2018 (UTC)Reply

Bug report: Spurious span being generated for coloned entries...

[edit]

Page : 肉棒

Inovcation:

{{zh-dial|陰莖}}

Relevant portion: -

...
!rowspan=21 style="background:#FAF5F0"| Mandarin
|style="background:#FAF5F0"| [[w:Beijing dialect|Beijing]]
|style="background:#FAF5F0"| <span class="Hani" lang="zh">[[雞巴]]、[[卵子]]、[[雀子]]</span> <span style="font-size:60%"><i>†</i></span><span class="Hani" lang="zh">、[[小雞兒]]</span> <span style="font-size:60%"><i>†</i></span></span>
|-
...

There should ideally only be one closing span tag? ShakespeareFan00 (talk) 00:54, 27 December 2018 (UTC)Reply

Map columns

[edit]

@suzukaze-c any idea why the legend for the dialectal maps have lost the columns? (User:Justinrleung @ Discord #chinese)

MediaWiki:Gadget-zhDialMap.css has a typo: -column-count instead of column-count. Perhaps support for -moz- and -webkit- was dropped recently. —Suzukaze-c 06:34, 23 March 2020 (UTC)Reply

Order of Hainanese dialects

[edit]

@Justinrleung I noticed that now Haikou appears before Wenchang in the dialectal tables. I understand that Haikou is the capital and largest city in Hainan, but Wenchang is the prestige dialect, so shouldn't Wenchang appear first? The dog2 (talk) 02:50, 4 August 2020 (UTC)Reply

@The dog2: I didn't make the change, but @RcAlex36 did. I think he mentioned something about ordering from north to south, but I don't see how this is north to south. — justin(r)leung (t...) | c=› } 03:20, 4 August 2020 (UTC)Reply
@Justinrleung, The dog2: I was going by decreasing latitude. It makes sense for Leizhou to go in front of Hainan, but we can reorder the Hainanese dialects. RcAlex36 (talk) 03:24, 4 August 2020 (UTC)Reply
@RcAlex36: OK, I see what you tried to do. It's just that for the other dialects, the prestige dialect appears always first in the list, so that's why Guangzhou appears first for Cantonese, Meixian appears first for Hakka, Xiamen appears first for Hokkien, Chaozhou appears first for Teochew and Beijing appears first for Mandarin. That's why I think Wenchang should appear first for Hainanese to fit with this pattern. If you want, I think it's fine to put Leizhou before the Hainanese dialect, since Leizhou is closer to the mainland Minnan dialects. The dog2 (talk) 03:30, 4 August 2020 (UTC)Reply

Error

[edit]

@沈澄心 Hey, your recent edit results in error for all dialectal modules. --Mar vin kaiser (talk) 14:55, 21 December 2020 (UTC)Reply

Colloquial Putonghua

[edit]

@Justinrleung, Mar vin Kaiser, RcAlex36, Suzukaze-c, 沈澄心, Geographyinitiative Come to think of it, maybe it's a good idea to separate colloquial Putonghua from formal written Chinese. What do you guys think? The dog2 (talk) 19:43, 18 June 2021 (UTC)Reply

I will note that there's no Wikipedia version for those two forms separately. The Putonghua-written Chinese combo is called the Chinese Wikipedia, but they use Mandarin ruby text, never Cantonese pronunciation. Cantonese, Wuu, Gan have their own Wikipedias. --Geographyinitiative (talk) 19:51, 18 June 2021 (UTC)Reply
@Geographyinitiative: Yes, but there are some expressions that are used in formal written documents which will be considered stilted if you use them colloquially. For instance, when you tell the time, you would often indicate the hour with 時, but this form is rarely used in speech, in which 點 would be used instead. In addition, modern China also has many internet slangs that are used in colloquial Mandarin all over China (some of which I have trouble understanding since they are not used in Singapore), but would most certainly be inappropriate for formal writing. An example of such a term would be 牛屄, so the question is, how can we properly list such China-wide colloquial terms in our dialectal tables. The dog2 (talk) 20:07, 18 June 2021 (UTC)Reply
@Geographyinitiative: re: “The Putonghua-written Chinese combo is called the Chinese Wikipedia, but they use Mandarin ruby text, never Cantonese pronunciation.” This is actually not right. If you look at the Hong Kong/Macau version of the Chinese Wikipedia (which you can switch to using the 繁簡轉換 function on the top left corner), there will be jyutping.
@The dog2: It’s hard to determine what goes into this colloquial Putonghua. There are differences in the north and the south, and the north often has terms that are just the same as the dialectal forms. It’s certainly something we’re missing, but I’m not sure what the best way to deal with this. — justin(r)leung (t...) | c=› } 21:23, 18 June 2021 (UTC)Reply