Jump to content

Wiktionary:Language treatment requests

Add topic
From Wiktionary, the free dictionary
Latest comment: 15 days ago by Sebirkhan in topic Ancestor of Azerbaijani


Wiktionary Request pages (edit) see also: discussions
Requests for cleanup
add new request | history | archives

Cleanup requests, questions and discussions.

Requests for verification

Requests for verification in the form of durably-archived attestations conveying the meaning of the term in question.

Requests for deletion

Requests for deletion of pages in the main and Reconstruction namespace due to policy violations; also for undeletion requests.

Requests for deletion/Others
add new request | history

Requests for deletion and undeletion of pages in other (not the main) namespaces, such as categories, appendices and templates.

Requests for moves, mergers and splits
add new request | history | archives

Moves, mergers and splits; requests listings, questions and discussions.

Language treatment requests
add new request | history

Requests for changes to Wiktionary's language treatment practices, including renames, merges and splits.

{{attention}} • {{rfap}} • {{rfdate}} • {{rfquote}} • {{rfdef}} • {{rfeq}} • {{rfe}} • {{rfex}} • {{rfi}} • {{rfp}}

All Wiktionary: namespace discussions 1 2 3 4 5 - All discussion pages 1 2 3 4 5

This is the page for proposing changes to Wiktionary's language treatment practices, including language renaming, merging and splitting.

Use this page if you want to propose a non-trivial change to:

For issues pertaining to a single language, such as orthography, start a conversation on the discussion page of the language considerations page (the so-called "About LANG" page), or the beer parlour if no such page exists.

Archiving: Language treatment requests, once closed and (if applicable) acted upon, are archived on Wikipedia-style archive subpages. These can be found at Wiktionary:Language treatment requests/Archives and in the list below:

Language treatment requests: Archive index

2016

[edit]

Nkore-Kiga

[edit]

As can be seen at w:Nkore-Kiga language, Kiga [cgg] should definitely be merged into Nyankore [nyn]. Unfortunately, this might require a rename to something that is both hyphenated and considerably less common that just plain "Nyankore" (though that is, strictly speaking, merely the name of the main dialect). —Μετάknowledgediscuss/deeds 05:21, 18 September 2016 (UTC)Reply

I'm not sure. WP suggests the merger was politically motivated, but many reference works do follow it. Ethnologue says there as "Lexical similarity [of] 78%–96% between Nyankore, Nyoro [nyo], and their dialects; 84%–94% with Chiga [cgg], [...and] 81% with Zinza [zin]" (Kiga, meanwhile, is said to be "77% [similar] with Nyoro [nyo]"), as if to suggest nyn is about as similar to cgg as to nyo, and indeed many early references treat Nkore-Nyoro like one language, where later references instead prefer to group Nkore with Kiga. Ethnologue mentions that some authorities merge all three into a "Standardized form of the western varieties (Nyankore-Chiga and Nyoro-Tooro) [...] called Runyakitara [...] taught at the University and used in internet browsing, but [it] is a hybrid language." (For comparison, Ethnologue says English has 60% lexical similarity to German.) - -sche (discuss) 00:16, 2 June 2017 (UTC)Reply
Input needed
This discussion needs further input in order to be successfully closed. Please take a look!

Itneg lects

[edit]

See w:Itneg language. All the dialects have different codes, but we really should give them a single code and unify them. I came across this problem with the entry balaua, which means "spirit house" (but I can't tell in which specific dialect). It's also known as Tinggian (with various different spellings), and this may be a better name for it than Itneg. —Μετάknowledgediscuss/deeds 02:09, 23 September 2016 (UTC)Reply

Input needed
This discussion needs further input in order to be successfully closed. Please take a look!

Paraguayan Guaraní [gug]

[edit]

I just noticed that we have this for some reason. Guaraní is a dialect continuum that is quite extensive, both in inter-dialect differences and in geography, and certain varieties have been heavily influenced by Spanish or Portuguese. That said, our Guaraní [gn] content is, as far as I can tell, pretty much entirely on Paraguayan Guaraní, which for some reason has a different code, [gug]. My attention was brought to this by User:Guillermo2149 changing L2 headers (I have not reverted his edits, but they do cause header-code mismatch). We could try splitting up the Guaraní dialects, but it would hard to choose cutoffs and would definitely confuse potential editors, of which we have had more since Duolingo released a Guaraní course. I think the best choice is to merge [gug] into [gn] and mark words extensively for which dialects or countries they are used in. @-scheΜετάknowledgediscuss/deeds 01:29, 1 November 2016 (UTC)Reply

Support merging gn and gug. - -sche (discuss) 14:33, 1 November 2016 (UTC)Reply
Don't forget there's also [gui] and apparently also [tpj]. - -sche (discuss) 04:28, 16 May 2017 (UTC)Reply

2017

[edit]

Merger into Scandoromani

[edit]

I propose that the Para-Romani lects Traveller Norwegian, Traveller Danish and Tavringer Swedish (rmg, rmd and rmu) be merged into Scandoromani. TN, TD and TS are almost identical, mostly differing in spelling (e.g. tjuro (Sweden) vs. kjuro (Norway) meaning 'knife', gräj vs. grei 'horse' etc.). WP treats them as variants of Scandoromani. My langcode proposal could be rom-sca, or maybe we could just use rmg, which already has a category. --176.23.1.95 20:19, 25 January 2017 (UTC)Reply

Im supporting it. Traveller Norwegian is sometimes referred to as Tavring, and, to be honest, Ive never herd nobody use the term Traveller Norwegian as a language. People are calling it rather Taterspråk or Fantemål, even when books states it as a derigatory therm. The other problem is that we've got in fact 2 differnet Norwegian Traveller languages (the Romani-based and the Månsing-based). So it look like a total mess rite now Tollef Salemann (talk) 07:55, 2 April 2023 (UTC)Reply
I don't think this makes sense if the orthographies are consistently different, which seems to be the case. Otherwise, we could use the same logic to merge quite a few of the Slavic languages, which obviously doesn't make sense. Theknightwho (talk) 13:43, 2 April 2023 (UTC)Reply
Ok, but Traveller Norwegian is not quite right term, cuz the Romani-based TN has two or more branches, which are quite different from eachother, while the main one is allmost the same as the Swedish and had often the same name(s). Meenwhile, there is also a Germanic TN version, unrelated to the Romani-ish TN variations. I mean, we need at least two more L2 in this case, even if we gonna merge TN and Swedish Tavring.
PS there are also Swedish stuff like Knoparmoj and Loffarspråk and more, and they still have remnants in some rare Swedish/Norwegian sociolects. Maybe they also need their L2? Or can we treat them as sociolects? Tollef Salemann (talk) 13:59, 2 April 2023 (UTC)Reply

Yenish

[edit]

The Yenish "language" (which we call Yeniche) was given the ISO code yec, despite being clearly not a separate language from German. Instead, it is a jargon which Wikipedia compares to Cockney (which has never had a code) and Polari (which had a code that we deleted in a mostly off-topic discussion). The case of Gayle, which is similar, is still under deliberation at RFM as of now. Most tellingly, German Wiktionary considers this to be German, and once we delete the code, we should make a dialect label for it and add the contents of de:Kategorie:Jenisch to English Wiktionary. @-scheΜετάknowledgediscuss/deeds 00:49, 7 April 2017 (UTC)Reply

I don't see how that's most tellingly; I don't know about the German Wiktionary, but major language works frequently treat things as dialects of their language that outsiders consider separate languages.--Prosfilaes (talk) 03:01, 10 April 2017 (UTC)Reply
The (linked) English Wikipedia article even says "It is a jargon rather than an actual language; meaning, it consists of a significant number of unique specialized words, but does not have its own grammar or its own basic vocabulary." Despite the citation needed that follows, that sentence is about accurate, as such this should be deleted. -- Pedrianaplant (talk) 10:53, 30 April 2017 (UTC)Reply
(If kept, it should be renamed.)
There are those who argue that Yenish should have recognition (which it indeed gets, in Switzerland) as a separate language. And it can be quite divergent from Standard German, with forms that are as different as those of some of the regiolects we consider distinct. Many examples from Alemannic or Bavarian-speaking areas are better considered Alemannic or Bavarian than Standard German. But then, that's a sign that it is, as some put it, a cant overlaid onto the local grammar, rather than a language per se. Ehh... - -sche (discuss) 03:22, 9 July 2017 (UTC)Reply

2018

[edit]

Category:Nahuatl language

[edit]

Nahuatl is sometimes treated as a language, and sometimes as a family of languages. Right now, Wiktionary is treating it as both simultaneously, which doesn't make sense. "Nahuatl" should be removed as a language. --Lvovmauro (talk) 11:55, 30 August 2018 (UTC)Reply

I agree the current arrangement doesn't make sense; it is a relic of very early days on Wiktionary, and has persisted mostly because it's not entirely clear how intelligible the varieties are and hence whether it's better to lump them all into nah, or retire nah and separate everything. But enough varieties are not intelligible that I agree with retiring nah (or perhaps finally converting it to a family code). - -sche (discuss) 20:34, 31 August 2018 (UTC)Reply
I think a family code for Nahuan languages is really needed since there are many cases where we don't know specifically which variety a word was borrowed from. --Lvovmauro (talk) 09:55, 9 September 2018 (UTC)Reply
@Lvovmauro: OK, thanks to you and a few other editors, all words with ==Nahuatl== sections have been given more specific headers. However, as many as a thousand translations remain to be dealt with before the code can be made a family code and Category:Nahuatl language moved on over to Category:Nahuan languages. - -sche (discuss) 06:48, 19 September 2018 (UTC)Reply
A disturbingly large number of these translations are neologisms with no actual usage. Some of them don't even obey the rules of Nahuatl word formation. --Lvovmauro (talk) 11:03, 19 September 2018 (UTC)Reply
@Lvovmauro: Feel free to remove obvious errors / unattested neologisms. If a high proportion of the translations are bad, it might even be reasonable to start presuming they're bad and just removing them, since they already suffer from the problem of using an overbroad code. - -sche (discuss) 00:28, 21 October 2018 (UTC)Reply
Someone with more time on their hands than me at the moment will need to delete all the subcategories of Category:Nahuatl language, and then the category itself, in preparation for moving 'nah' from the language-code module to the family-code module so the categories won't be recreated by careless misuse of 'nah' in the labels etc of 'nci' entries. - -sche (discuss) 00:24, 21 October 2018 (UTC)Reply
Five years on, I've reviewed the situation here. There are no Nahuatl entries anymore, which is good progress. However, two pressing issues are stopping us from fully retiring this language code:
  • There are still about 450 "Nahuatl" (nah) translations in English entries. I suppose these need manual review. This should not be too difficult if one can find word lists for some of the best-attested Nahuatls.
  • Many languages have at least one word said to be derived from Nahuatl (presumably this is the word for "chocolate" in most cases). This could be solved by making Nahuatl an etymology-only language, or by changing these etymologies to refer generically to "a Nahuan language".
This, that and the other (talk) 09:25, 1 November 2023 (UTC)Reply


Language request: Old Cahita

[edit]

Mayo and Yaqui are mutually intelligible and sometimes considered to be a single language called Cahita. But their speakers apparently consider them to be distinct languages, and they have distinct ISO codes (mfy and yaq) and are currently treated distinctly by Wiktionary.

I'm not requesting that they be merged, but separating them is a problem because an important early source, the Arte de la lengua cahita conforme à las reglas de muchos peritos en ella (published 1737 but written earlier) treats them as a single language, and also includes an extinct dialect called Tehueco. I'd like to add words from the Arte but I can't list them specifically as either Mayo or Yaqui.

One solution would be treat to the language of the Arte as a distinct historical language, "Old Cahita", which would then be the ancestor of Mayo and Yaqui. The downside is there only seems to be one linguist currently using this name. --Lvovmauro (talk) 11:32, 4 November 2018 (UTC)Reply

On linguistic grounds, it seems like we should merge Yaqui and Mayo. Jacqueline Lindenfeld's 1974 Yaqui Syntax says "Yaqui and Mayo are sufficiently similar to be mutually intelligible", the Handbook of Middle American Indians says "the modern known representatives of Cahitan—Yaqui and Mayo—are mutually intelligible", and various more general references say "Yaqui and Mayo are mutually intelligible dialects of the Cahitan language", "The Yaqui and Mayo speak mutually intelligible dialects of Cahita". (There are political considerations behind the split, which a merger might upset, so adding Old Cahita would also work, but we have tended to be lumpers...) - -sche (discuss) 23:03, 18 November 2018 (UTC)Reply
I wouldn't object to merging them. --Lvovmauro (talk) 08:58, 19 November 2018 (UTC)Reply

Merging Classical Mongolian into Mongolian

[edit]

"Classical Mongolian" refers to the literary language of Mongolia used from 17th to 19th century created through a language reform associated with increased Buddhist cultural production (this started in the 16th century, but language standardization took place later). In the 20th century, (outer) Mongolia became independent from China and later adopted a Cyrillic orthography based on the spoken language, while Inner Mongolia kept her Uyghur script.

The literary language of Inner Mongolia continues Classical Mongolian in terms of its orthography as well as most of its grammar (to an extent that Janhunen (?) calls the situation bilingual). Modern varieties, in both Outer and Inner Mongolia, have greatly expanded their lexicons through borrowing of modern terms, but they also both consider all of Classical Mongolian lexicon to be a part of their language, and will put it in their dictionaries, even transcribed into Cyrillic.

The actual problem I have with this division is that when it comes to borrowings from (Classical) Mongolian, we sometimes cannot ascertain whether they precede the 20th century or not, or more common still, we know they precede the 19th century (and post-date the 16th), but they obviously come from a spoken variety and not "Classical Mongolian" as a literary language. Crom daba (talk) 17:14, 15 November 2018 (UTC)Reply

Yes. I find it also strange that Wiktionary distinguishes Ottoman Turkish from Turkish, it’s like distinguishing pre-1918 Russian from “Russian”, or like one reads about “Ottoman Turks” instead of “Turks”. Also Kazakh and the other Turkic language do not get extra codes for Arabic spelling, this situation is even more comparable, innit. Kazakhs in China write in Arabic script, Mongols in China in Mongolian script, but the languages are two and not four. Or also it sounds as with Pali. Am I correct to assume that Classical Mongolian texts get reedited in Cyrillic script? Then you could base all on Cyrillic and make Mongolian script soft redirects, because even words died out before the introduction of Cyrillic can be found in Cyrillic. Fay Freak (talk) 15:23, 17 November 2018 (UTC)Reply
@Fay Freak, the situation is similar to Turkish, but it creates less problems there since the Arabic script Turkish is obsolete and most relevant loans are pre-Republican.
In principle it could be possible to collapse all of Mongolian into Cyrillic, but this would be extremely politically incorrect.
Collapsing everything (potentially even Buryat, Daur and Middle Mongolian) into Uyghur script, like we do with Chinese, would perhaps make more sense, but 1) it's a pain to enter 2) Cyrillic is generally more accessible and useful to our users and (Outer) Mongolians 3) most of my materials are in Cyrillic 4) it corresponds poorly to the spoken forms 5) its Unicode encoding corresponds poorly to its actual form 6) the encoding doesn't correspond that well to the spoken form either. Crom daba (talk) 16:50, 18 November 2018 (UTC)Reply
This is tricky, because as far as language headers and having entries for terms in the language, it seems like we could often resolve which language a word is in(?) by knowing the date of the texts it's attested in. It is, as you say, etymologies where it's hardest to ascertain dates. (Still, if we merged the lects, we could retain an "etymology only" code for borrowings that were clearly from Classical Mongolian, like is done for Classical Persian, etc.) I'm having a hard time finding any references on the mutual intelligibility of the two stages; most references are concerned with the intelligibility or non-intelligibility of modern Khalkha, Kalmyk, etc. If we kept the stages separate, etymologies could always say something like "from Mongolian foo, or a Classical Mongolian forerunner". - -sche (discuss) 22:50, 18 November 2018 (UTC)Reply
@-sche, yes, the Persian model would be desirable.
It doesn't make much sense to speak of intelligibility between Classical and Modern Mongolian, Classical Mongolian is exclusively a written language, its spelling reflects the phonology of 13th-century Mongolian (early Middle Mongolian). The same spelling is used in Modern Mongolian as written in Uyghur script.
The biggest problem with Classical Mongolian is how redundant it is. For any word that is shared between modern and classical periods, and that is probably most of the lexicon, we would need to make two identical entries in Uyghur script for modern and classical Mongolian. Crom daba (talk) 11:18, 19 November 2018 (UTC)Reply
That seems not unlike how we handle Serbo-Croatian and Hindi-Urdu. — [ זכריה קהת ] Zack. 14:25, 30 November 2018 (UTC)Reply
Indeed. The way we handle them sucks. Crom daba (talk) 12:52, 1 December 2018 (UTC)Reply
I agree. All this duplication is a huge waste of resources. Per utramque cavernam 13:22, 1 December 2018 (UTC)Reply
Not exactly; Serbo-Croatian and Hindi-Urdu have redundant entries in different scripts on different pages, while I understand Crom daba's point to be that we would need to have redundant ==Mongolian== and ==Classical Mongolian== entries on the same pages for most Mongolian/Uyghur script words, which would be more like having duplicate Bosnian and Croatian entries on the same pages, not our current system. And Serbo-Croats are testier about their language(s) being lumped than speakers of Classical Mongolian... ;) - -sche (discuss) 17:29, 3 December 2018 (UTC)Reply
OK, does anyone object to the merge? If not, I can try to do it with AutoWikiBrowser later, or Crom or others could start reheadering our small number of Classical Mongolian entries, fixing any wayward translations, etc. For etymologies of terms that are known to derive from Classical Mongolian, we should be able to just move cmg over to Module:etymology languages/data. - -sche (discuss) 17:29, 3 December 2018 (UTC)Reply
@Crom daba, Fay Freak I made the few ==Classical Mongolian== entries we had into ==Mongolian== entries (labelled "Classical Mongolian" unless there was already a modern Mongolian section on the same page), but many of the categories still need to be deleted, and one needs to check whther anything else is left that would break before "cmg" is moved from being a language code to being an etymology-only code. - -sche (discuss) 02:46, 27 September 2020 (UTC)Reply
There's no full correspondence between different Mongolian scripts and none of the scripts is totally phonetic. It's not just the spelling, the phonologies are different but sometimes one script represents the true or historical pronunciation and it's not necessarily Cyrillic, which is strange. There are words that only exist on one or the other, which is quite understandable, cf. modern ᠱᠠᠹᠠ (šafa, sofa) in Inner Mongolia (from 沙發沙发 (shāfā) and софа (sofa, sofa) in outer Mongolia (from софа́ (sofá). I support the merge, though but I am curious if classical Mongolian terms are equally representable in Cyrillic and Arabic scripts. In other words, are there terms in classical Mongolian, which are different from modern and there's no Cyrillic form for them? I think I saw them.
Duplication of entries is a waste. You may think I am biased but I think Mongolian should be presented/lemmatised in Cyrillic (Uyghurjin should also be available in all entries where it can be found) - for which resources are much more accessible. (Serbo-Croatian should be lemmatised on the Roman alphabet, on the other hand, let's finish the senseless duplications of entries)
Also supporting the Ottoman Turkish/Turkish merge. --Anatoli T. (обсудить/вклад) 03:25, 27 September 2020 (UTC)Reply
@Atitarev In Mongol khelnii ikh tailbar toli we see the term уйгуржин бичиг is described as ‘монгол бичгийн дундад эртний үеийн хэлбэр’ (‘early form of the Mongolian/Khudam script’). Middle Mongolian in uigurjin with its own rules shall not to be equated with the later ‘Classical’-Modern script and orthography. I maintain uigurjin (with its specific glyph forms and spelling rules) shall be treated as a term only for Middle Mongolian.
Similarly I also object treating Northern Yuan – Qing (‘Classical’) Mongolian and Modern Mongolian-script Mongolian as one literary language standard. In fact orthographic standardisations and modifications make written Modern Mongolian such different from Classical. Personally I’d like to display a historical feature of this language collectively under ‘Classical Mongolian’, as only this term directly interlinks with an Inner Asian historical and linguistic tradition. LibCae (talk) 16:40, 7 May 2021 (UTC)Reply


Renaming agu

[edit]

We currently call this "Aguacateca", but "Aguacateco" is much more common. (Wikipedia opts for "Awakatek", which is rapidly becoming more common but is probably not there yet — not that we can't be crystal-ballsy if we want to when it comes to names rather than entries.) —Μετάknowledgediscuss/deeds 05:42, 19 December 2018 (UTC)Reply

You're right that several modern (and a few older) sources seem to use Awakatek. In turn, historically Aguacatec has been used in the titles of many reference works on it, and seems like it may be the most common name (ngrams), although it's also the name of the people-group. (Others: Awakateko, Awaketec, Qa'yol, Kayol, and variously spellings of Chalchitec sometimes considered a distinct lect.) - -sche (discuss) 04:31, 19 August 2020 (UTC)Reply
Indeed, the most common name by a longshot is Aguacatec, followed by Awakatek (but these are also names of the people-group), followed by Awakateko, then Aguacateco, and in dead last, our current name of Aguacateca. Can we rename to Aguacatec? - -sche (discuss) 07:02, 28 December 2023 (UTC)Reply
  • Support renaming to Aguacatec. Also being the name of the "people-group" is hardly an argument against it; the same is true of a huge number of languages including French, Welsh, Manx and the vast majority of language names ending in -ish. —Mahāgaja · talk 07:22, 28 December 2023 (UTC)Reply
    Oh, to clarify, I didn't intend that as an argument against using that name, but as a qualification on the data; comparing which term is more common can't easily determine which is the most common name of the language if one term is also used for something else (the name of the people). But Aguacatec seems to be the most common name in e.g. the books about it in Glottolog's bibliography, too. Who has a bot that does renames? This one involves few enough entries that it could be done by hand, but it seems like the tasks that would need to be done are the same for many (all?) language renames, so it should be bottable... - -sche (discuss) 07:51, 28 December 2023 (UTC)Reply

2021

[edit]

Canonical name of "mep"

[edit]

Currently, the canonical name of the language in WT is spelled Miriwung, even though every primary/secondary source I could find recommended the spelling Miriwoong, as that is consistent with the language's own orthography, while the spellings "Miriwung" and "Miriuwung" are considered nonstandard. Can someone fix it? --Numberguy6 (talk) 14:47, 8 May 2021 (UTC)Reply

It's not exactly hard to find sources spelling it as Miriwung, but I'm sure you're right. @-sche? —Μετάknowledgediscuss/deeds 22:52, 21 July 2021 (UTC)Reply

Names of sah, alt, xgn-kha and request for Soyot

[edit]

The Constitution of the Republic of Sakha (Yakutia) (https://iltumen.ru/constitution) officially used язык саха referring to the language sah. A government decree («О Правилах орфографии и пунктуации языка саха») which approved the language’s current orthography, used язык саха instead of якутский язык from its annexe. However, this usage is not mandatorily popularised. I suggest Sakha to be adopted instead of Yakut due to the Constitution reference.

Whence atv ‘Northern Altai’ is not a singule language/dialect but a group of several (Kumandy, Chelkan & Tubalar), atv shall be split into subcodes. Furthermore Southern Altai is only a classifying term, Altai as an official term shall be suggested for alt.

Khamnigan xgn-kha, as a transitional dialect (with conservative phonology) between Buryat and Mongolian, its simple name may not create ambiguity.

In addition I also request a code for Soyot. It will help contrasting Sayan Turkic languages. LibCae (talk) 06:36, 2 September 2021 (UTC)Reply

The Constitution of the Republic of Sakha is not our guide to using English names. In the case of [sah], most scholarly descriptions use "Yakut" (e.g. The Turkic Languages), there are far more raw Google hits for "Yakut language" than "Sakha language", and Google Ngrams show a preference for "Yakut" that has not waned over time (but we don't know past 2008, after which the data are incomplete).
I can't comment on the other code requests, but it would be more convincing if there were some evidence in favour of the need for these codes and their distinctiveness from their closest relatives. —Μετάknowledgediscuss/deeds 16:11, 2 September 2021 (UTC)Reply
I don’t see the argument how more information would come to light if we split Northern Altai. Surely also Northern Altai and Southern Altai are the most usual names, in either English or Russian. For that number of speakers Northern Altai has, how could there be a benefit? The major factor for editors is what sources they use, whether they indicate the sources and whether those are clear about the place of origin. I had many books about “the Aramaic dialect of [village X]” where I don’t know which damn language code of Wiktionary it is supposed to belong to, Wiktionary making codes centered around city A and B but not village X, in the end I ignored to add anything. Fay Freak (talk) 17:00, 2 September 2021 (UTC)Reply
Oppose renaming Yakut
Support splitting atv
Support renaming alt to Altai
Abstain regarding xgn-kha
Support creating a code for Soyot, quite strongly so. Allahverdi Verdizade (talk) 17:13, 2 September 2021 (UTC)Reply

Renaming [nlo]

[edit]

Wikipedia uses the phrase "Ngul (including Ngwi)" to describe this language, which we currently call "Ngul", but this paper indicates that these are just two of several synonyms, and uses "Ngwi" as the primary name. We should follow suit. —Μετάknowledgediscuss/deeds 00:19, 21 December 2021 (UTC)Reply

Renaming [amf]

[edit]

We currently call this language "Hamer-Banna", after two of its dialects; WP uses "Hamer". This hyphenated name is found in the literature, though it excludes the third dialect, Bashaɗɗa. Modern publications, following the lead of Petrollino's grammar, use the spelling "Hamar" for that dialect. As I see it, if we stick with the hyphenated name, we should change it to "Hamar-Banna", but we could also consider elevating the name of the primary dialect to cover the language as a whole, as WP does, though in that case we should use "Hamar" instead. —Μετάknowledgediscuss/deeds 07:56, 22 December 2021 (UTC)Reply

Indus Valley Language

[edit]

We currently have this language, which Wikipedia refers to as the Harappan language, as [xiv]. I suggest that we retire the code, because the language is undeciphered and its script has not been encoded, so there is nothing to add to Wiktionary in the foreseeable future. I also suggest that we retire the script code [Inds], which is only used for this language. @AryamanAΜετάknowledgediscuss/deeds 07:14, 28 December 2021 (UTC)Reply

Merging Yoruba dialects

[edit]

Currently, we have codes for [mkl] "Mokole" (see Mokole language (Benin)), [cbj] "Ede Cabe", [ica] "Ede Ica", [idd] "Ede Idaca", [ijj] "Ede Ije", [nqg] "Ede Nago", [nqk] "Kura Ede Nago", [xkb] "Manigri-Kambolé Ede Nago", and [ife] "Ifè" (all of which are lumped into Ede language). These lects are all very close to Yoruba proper (which they use for formal and liturgical purposes), and spoken by people who are considered ethnic Yorubas; moreover, they are included in the Global Yoruba Lexical Database. I have added them as dialects of [yo] "Yoruba" in MOD:labels/data/subvarieties, but treating Yoruba as a macrolanguage means we must remove these codes. (Note: the family code [alv-ede] would have to be removed as well.) @AG202, Oniwe, Oníhùmọ̀Μετάknowledgediscuss/deeds 07:29, 28 December 2021 (UTC)Reply

Merge, obviously again Ethnologue’s fabrications, which were then copied over from Wikipedia and some other “encyclopedias” with their impractical credulity towards this reference. Fay Freak (talk) 07:54, 28 December 2021 (UTC)Reply
If anything I would keep the Ede family code and change the lects to be etymology-only languages (edit: excluding probably Ifè since it is much more documented), but putting them all under Yoruba I unfortunately oppose for now. The Western Ede languages as seen here have a higher degree of separation from Nuclear Yoruba, and it checks out more when comparing, at the very least, the words and phrases of Ifè to nuclear Yoruba: Ifè-French Dictionary, Peace Corps - IFÈ O.P.L. WORKBOOK, J'apprends l'ife: Langue Benue-Congo du Togo. While there are obviously words that are shared due to them being related languages, it doesn't feel like a dialect of Yoruba (to me at least), so I feel uncomfortable grouping it under Yoruba. Though I do admit that I haven't really looked into the other Ede languages nearly as much. Edit: This paper may be helpful and at least shows some of the differences between Ifè & Yoruba and some aspects of the dialect continuum. Obviously some Ede varieties are much closer to Yoruba, but then I wonder what to do about the other ones. AG202 (talk) 15:09, 28 December 2021 (UTC)Reply
@AG202: Thanks for the sources. The question of whether to lump a code is in part based on how much extra work is entailed; would you be willing to work through a subsample to see how much we would just be duplicating Yoruba entries, and how much would be distinct? I'm not sure what you're actually advocating, because making them etymology-only languages (which you say you support) would require merging them (which you say you oppose). —Μετάknowledgediscuss/deeds 07:18, 29 December 2021 (UTC)Reply
@Metaknowledge Yea, sorry for that being unclear. I oppose the merger under solely Yoruba. Regarding the etymology-only part, I would support having all the Ede lects (excluding Ifè) under the header "Ede" and then differentiating on the definition line which Ede lect it is, mainly because they have much less coverage than Ifè, and it's harder to tell their mutually intelligibility. (Though as mentioned I'm not as well-versed with the other lects, so I might be entirely wrong about their continuum) In terms of working through a subsample, I am up to do so, though I am swamped at the moment so it'd definitely take a while, but from what I've seen so far, I'd be worried about putting possible Ifè terms like ɖíɖì (belt) or àntã̀ (chair) under a Yoruba header and keeping nice clear entries for readers. AG202 (talk) 07:52, 29 December 2021 (UTC)Reply
Looks reasonable. To clarify, my main note relates to observation that the language names currently in the data are too unnatural to find use and are not even meeting our CFI, which again means there is no entrotopy for those who know the languages to assign material to the designations with little doubt, as there is little to confirm the meanings of the language names, which should be a consideration if you devise new namings, in so far as you would like to not have private language but more or less obvious to new editors what the language codes are for. So I was not to mean that there cannot be a split in a different manner, or a smaller merge, but the current ones should be recognized as off the wall, and then there will have to be something that interrelates the remaining codes if one stumbles upon one, else it will be a reoccurring problem that an editor did not see the distinction of the available language codes. Fay Freak (talk) 01:36, 30 December 2021 (UTC)Reply

2022

[edit]

Category:Gansu ChineseCategory:Gansu Mandarin? Category:Gansu Dungan?

[edit]

Members:

@Justinrleung, RcAlex36, 沈澄心Fish bowl (talk) 05:55, 6 February 2022 (UTC)Reply

@Fish bowl: Gansu means actual Gansu in China, but Gansu Dungan should be its own label perhaps. I'm not sure why those entries are labelled specifically as Gansu Dungan, though, because do we know if it's not used in other varieties of Dungan? Pinging @Mar vin kaiser to know why he chose to label it as Gansu Dungan specifically. — justin(r)leung (t...) | c=› } 06:03, 6 February 2022 (UTC)Reply
@Justinrleung: There's this website, I can't find the link now, that was like a mini Dungan dictionary, and for some of its words, it has a dialectal label. I think I got it from there. --Mar vin kaiser (talk) 08:39, 6 February 2022 (UTC)Reply
@Mar vin kaiser: This? I know these words are marked as Gansu here, but I wonder if we need to specify it as Gansu specifically when we don't know if other Dungan varieties use it. — justin(r)leung (t...) | c=› } 09:02, 6 February 2022 (UTC)Reply
@Justinrleung: Oh, I added the label Gansu with the assumption that it's specifying that it's only used in Gansu. Aren't there just two dialects, Gansu and Shaanxi? --Mar vin kaiser (talk) 14:03, 6 February 2022 (UTC)Reply

Merge Category:Hokkien, Category:Hokkien Chinese; and perhaps move Category:Hainanese depending on the result of the previous

[edit]

Category:Hokkien is an etymology language, while Category:Hokkien Chinese belongs to the {{dialectboiler}} system.

Category:Hainanese is presently both.

Fish bowl (talk) 11:10, 7 February 2022 (UTC)Reply

@Fish bowl @Justinrleung @RcAlex36 @沈澄心 @AG202 IMO we should delete Category:Hokkien Chinese and recategorize the lemmas under it to Category:Hokkien. This is consistent with the treatment of other etymology languages, particularly since Hokkien is considered a dialect of the Min Nan language and not a dialect of "Chinese" (which is not a language). If you don't mind, I will go ahead and do this. (While we're at it, we should rename the Amoy etymology language to Xiamen Hokkien, which is currently a dialect category but not an etymology language, and give it a standardly-formed etymology code. Its current code is nan-xm, which is badly formatted; etymology codes should consist of sections of three letters, hence nan-xia. Same goes for nan-ph -> nan-phi, nan-qz -> nan-qua, nan-zz -> nan-zha, nan-jj -> nan-jin.) Benwing2 (talk) 05:29, 16 September 2023 (UTC)Reply
I also think we should upgrade Hokkien to a full language, esp. seeing as Min Nan is itself not a language but a macrolanguage. Benwing2 (talk) 05:30, 16 September 2023 (UTC)Reply
Agree that we should treat Hokkien as a full language - this feels like to be long overdue. I think in general each lect listed in {{zh-pron}} should be treated as a full language in its own right, which means Sichuanese (currently with etymology code [cmn-sic]) and Leizhou (currently lacks a code, I would suggest [nan-lei] or [nan-lz]) would be upgraded. We might also want to add more etymology codes, but that might warrant a separate discussion.
I however oppose changing the 3-2 letter codes, which are much easier to memorise (since this is just taken from the first letter of each syllable) and also are consistent with the location codes used in {{zh-pron}}. Changing them means that we would need to deal with two separate, inconsistent systems.
Regarding the category name issue, for some reason we also have categories like Cat:Mandarin Chinese, Cat:Cantonese Chinese, Cat:Hakka Chinese, Cat:Min Nan Chinese, etc. alongside the regular lemma categories. I don't really care about their treatment as long as the approach is consistent. – wpi (talk) 17:18, 16 September 2023 (UTC)Reply
@Wpi It is a pain to have nonstandard etym codes like this, as it requires adding code to various places to handle them. I don't see why the 3-2 codes are easier to memorize; the proposed 3-3 codes consistently use the first three letters of the lect in question, which is standard practice at Wiktionary, whereas the 3-2 codes aren't consistent (nan-ph is not the first two syllables of "Philippine"). In terms of the location codes in {{zh-pron}}, we should rename the latter to match the 3-3 codes. However, as a first step if you don't object, I will promote Hokkien to a full language, and we can continue the discussion on etym codes; in this case we should maybe eliminate Category:Hokkien in favor of Category:Hokkien Chinese for consistency with the other such categories, although in general we need to rethink the naming of these categories. Benwing2 (talk) 19:21, 17 September 2023 (UTC)Reply
I think one reason that 3-2 codes are easier to memorize is that {{zh-pron}} uses 2-letter codes for dialects of Hokkien. However, if it makes more sense for codes to be 3-3 to be consistent with other languages, I wouldn't mind it. I agree that whatever we do, we should make it consistent with CAT:Mandarin Chinese, CAT:Gan Chinese, CAT:Xiang Chinese, etc., (which means the easiest thing to do is to have CAT:Hokkien Chinese). — justin(r)leung (t...) | c=› } 18:09, 19 September 2023 (UTC)Reply


Slavic phylogeny

[edit]

Old Slovak ?

[edit]

How about adding code for the Old Slovak (zlw-osk) as well. In the same {{R:sla:ESSJa}} (ЭССЯ), especially in recent editions, Old Slovak is constantly listed separately. In this case, etymology-only code is sufficient. --ZomBear (talk) 07:32, 21 March 2022 (UTC)Reply

@ZomBear @Thadh @Sławobóg @Vininn126 What is the current state of this? I notice that Middle Russian is an etym-only language of Russian and has two codes zle-mru and zle-oru, which looks very suspect. I also think Middle Polish has in fact been made an etym language of Polish. Benwing2 (talk) 06:24, 19 September 2023 (UTC)Reply
I still believe that at least there should be an etym-code for the Old Slovak language. It is also necessary to combine Czech & Slovak into the “Czech–Slovak family” in Slavic languages tree, as was done with Lechitic (zlw-lch) F. ZomBear (talk) 06:54, 19 September 2023 (UTC)Reply
I know that Sławobóg also wanted to split Old Slovak. As to grouping them and giving them a family lang code, I'm not sure. Perhaps Moravian should also be split and placed in this family. @Zhnka Vininn126 (talk) 07:51, 19 September 2023 (UTC)Reply
I'm pretty certain that this question isn't as straightforward as you make it out to be, and I read on multiple occasions that the similarities between Standard Czech and Standard Slovak arose due to Czech's influence on Slovak and that dialectal evidence shows no evidence of genetic relationship closer than on the West Slavic level. So I would like a more detailed discussion on this. Thadh (talk) 08:10, 19 September 2023 (UTC)Reply
@Benwing2 @Thadh @ZomBear @Sławobóg I was reading up on w:sk:Dejiny slovenčiny#11. až 18. storočie, and it seems like there were huge phonological and grammatical changes, IMO upon reading it enough to split Old Slovak into an L2. There also appears to be a dictionary Historický slovník slovenského jazyka that could be used as a source. So I propose that we split Old Slovak. Vininn126 (talk) 10:15, 1 October 2023 (UTC)Reply
Also @Zhnka for the tactical ping. Vininn126 (talk) 10:49, 1 October 2023 (UTC)Reply
Support. @Vininn126 I just created a template for Historical Dictionary of the Slovak Language {{R:sk:HSSJ}}. It contains more than 70,000 words from the pre-literary period (before the 18th century) of the Slovak language. This is a really good source for Old Slovak. ZomBear (talk) 11:29, 1 October 2023 (UTC)Reply
@ZomBear We should be careful, however, Old Slovak is best described as 9th-14th centuries. Vininn126 (talk) 11:33, 1 October 2023 (UTC)Reply
@Vininn126 it’s just great what’s in this dictionary, when quoting, the year or century when the word was recorded is indicated. For example, voda (“water”), it can be seen that the oldest evidence for this word is 1473, 1585 and 1376. ZomBear (talk) 11:50, 1 October 2023 (UTC)Reply
Support. Sławobóg (talk) 13:14, 1 October 2023 (UTC)Reply
I have split Old Slovak and given it the code zlw-osk. Vininn126 (talk) 19:26, 3 October 2023 (UTC)Reply

Slavic phylogeny

[edit]

East Slavic codes

[edit]

Following up a long discussion on the Old East Slavic About: page, I'd like to propose the following splits:

  • Split off Old Ruthenian (zle-ort)
  • Set Old Ukrainian (zle-obe) and Old Belarusian (zle-ouk) as etymology-only descendants and labels of Old Ruthenian
  • Set Ukrainian (uk), Belarusian (be) and Rusyn (rue) as descendants of Old Ruthenian
  • Change Old Russian (zle-oru) to Middle Russian (zle-mru) and set this as a label of Russian (ru)

On the final point there was quite some discussion, and I personally support making Middle Russian as a full-fledged code, but since we couldn't reach consensus, I propose making that a separate discussion if need be.

The proposed historical borders of the languages are as follows:

  • Old East Slavic (until the 14th century)
  • Middle Russian (=Moscow Literary language; 14th century-18th century) [Peter the Great's reforms]
  • Old Ruthenian (='West Russian' Literary language; 14th century-19th century) [Kotliarevsky's Eneïd]

Pinging @Atitarev, ZomBear, Useigor, Ентусиастъ, Benwing2, Rua, Ogrezem. I apologise if I forgot anyone. Thadh (talk) 12:43, 2 March 2022 (UTC)Reply

I still support only the introduction of Old Ruthenian, which is missing but as before, I don’t claim to be an expert on the matter. The Russian corpus in the other discussion was helpful. When I filtered on “Middle Russian”, I think I was able to find a couple of words, which are now considered obsolete. The rest were words, which just need to be respelled to find quotes in (early) Modern Russian. I found a few different ways to abbreviate and also numerous misspellings. Overall I sort of feel why these additional splits are not so popular - little strong evidence to work with. Middle Russian may be allowed to be added, let’s just look for good cases.
To make decisions easier, why don’t we add a couple of specific examples for each new language code proposed - something to work with. (They can be vocab, grammar or pronunciation cases). They proponents should have examples in mind to make the case(s) stronger. We can work together on confirming or disputing those cases. --Anatoli T. (обсудить/вклад) 22:57, 2 March 2022 (UTC)Reply
I'll see if I can make a list of features that distinguish Middle Russian from (Modern) Russian. In any case, for the time being, treating Middle Russian like Old East Slavic makes little sense to me, especially if we're splitting off Ruthenian (otherwise we get some kind of Dutch-Afrikaans situation), so we could go ahead with that now and in the meantime continue discussing MR's position as a separate code. Thadh (talk) 23:30, 2 March 2022 (UTC)Reply
(edit conflict) You can use any of the examples already in discussions used as evidence, e.g. онтарь/оньтарь, агистъ, etc. BTW, I see that "Old Russian" was used incorrectly by ZomBear when actually talking about Middle Russian. "Old Russian" = "Old East Slavic". The Russian term for Middle Russian is старору́сский (starorússkij) but Old East Slavic (Old Russian) is древнеру́сский (drevnerússkij). --Anatoli T. (обсудить/вклад) 00:21, 3 March 2022 (UTC)Reply
Quick update, I've found a relevant discussion from three years ago, Wiktionary talk:About Russian#Middle Russian?. Also, The Russian Language before 1700 (Matthews 1953) argues your and Fay Freak's point (that Middle Russian is too similar to modern Russian to warrant a linguistic distinction) Fun point, it also provides съмьрть's accentuation :0. I'll still look for differences in the corpora, but if the languages are too similar I guess I don't mind keeping the two together - as long as the descendants sections don't get too cluttered, I'm fine. Thadh (talk) 00:02, 3 March 2022 (UTC)Reply
BTW, I didn’t get back to you on the concern I have in regards to introduction of word stresses in Old East Slavic. My reason being there are many cases where assumptions can go wrong based on descendants. We should only use referenced data. Well, we don’t have native speakers to prove us wrong, do we? —Anatoli T. (обсудить/вклад) 23:03, 2 March 2022 (UTC)Reply
Sure, but of course we can still use sound laws for words without referencing the specific word's reconstruction. A word like съмь́рть will have the stress on the second syllable, because otherwise the Russian term would be something like **со́мерть rather than сме́рть. However, I wouldn't know where to look for any reference on this specific word, and googling "съмь́рть" returns no results. Thadh (talk) 23:30, 2 March 2022 (UTC)Reply
Of course, there could be strong (?) assumptions on vowels, which became silent (i.e. they are unstressed) but I wouldn't be so sure even on e.g. вода́ (vodá) (if it weren't referenced), since the word is stressed on the first syllable in some Ukrainian dialects, if you know what I mean. --Anatoli T. (обсудить/вклад) 00:21, 3 March 2022 (UTC)Reply
@Thadh: I support your suggestions. Ентусиастъ (talk) 16:19, 3 March 2022 (UTC)Reply
I have already spoken before. I'm for it too.--ZomBear (talk) 00:57, 4 March 2022 (UTC)Reply
@Thadh: Again, unfortunately, I see that the discussion has stopped again. It's been almost a month since no one has written anything. Every day I look forward to the solution of this issue with the Old Ruthenian language. --ZomBear (talk) 07:32, 21 March 2022 (UTC)Reply
Done. What we need now is to split all pages into either Old East Slavic, Russian (with the Middle Russian label) or Old Ruthenian (with or without the Old Belarusian/Old Ukrainian label). Thadh (talk) 18:43, 21 March 2022 (UTC)Reply
I also removed Old Novogrodian as the child of Old East Slavic. Vininn126 (talk) 08:52, 4 October 2023 (UTC)Reply

@Thadh how about adding more etymology only language codes? Modern dictionaries use more than just Old Belarusian/Ukrainian. I saw Middle Bulgarian, Old Slovak, Old Slovene, Old Serbian, Old Croatian, Old Serbo-Croatian, Old Bulgarian, Old Upper Sorbian, Old Lower Sorbian. Possibly Middle Czech and Middle Polish also would be useful sometimes. Old Sorbian was also used by Boryś (Old Sorbian peleš as cognate for Polish pielesze), however we can't just link to both Lower and Upper Sorbian at once, so that would require full support for this language (?). Scientific publications mention Old Polabian as language of Polabian Slavs in Middle Ages, it is used usually for proper nouns like given names, theonyms, toponyms, sometimes ordinary words mentioned in Latin texts and it is always reconstructed language, I would like to have it tho. Sławobóg (talk) 14:32, 28 May 2022 (UTC)Reply

@Sławobóg I'll need from you in order to determine if the splits are worth it is:
- Exact boundaries of the languages' stages
- You need to check how much literature there is in the earlier stages of the language.
- You need to check in how much the languages differ from their modern stages.
Once you do that, we can continue the conversation about splitting them. It seems pointless to split a language off just because there are two inscriptions in some dusty old book. Thadh (talk) 15:15, 28 May 2022 (UTC)Reply
@Thadh: IMO Middle Polish would benefit greatly from the split.
  • Boundaries: As it is with extinct languages, there aren't really any exact boundaries, but it's usually defined as between the 16th and the 18th century; Polish Wiktionary has settled on years 1500 to 1750 to account for Doroszewski's dictionary.
  • Literature: There are two major corpora, accessible on the SPXVI and ESXVII websites.
  • Differences: I reckon the spelling and pronunciation differences, especially the employment of "slanted vowels" (samogłoski pochylone, I have no idea what their name is in English), should be enough.
Plus, like, this would help with attestation. Hythonia (talk) 11:08, 30 July 2022 (UTC)Reply
Middle Polish is also thusly defined on Wikipedia. I also think it would make more sense to have Middle Polish as an LDL. The alternative would be having a label. If we split, we'd have to add Middle Polish both to Proto Slavic descendent entries as well as intermediates on etymologies. Vininn126 (talk) 11:52, 30 July 2022 (UTC)Reply
Also pinging @KamiruPL, as an editor for Old Polish. Do you think we should fully split Middle Polish, create a label, or some other alternative? Vininn126 (talk) 13:44, 30 July 2022 (UTC)Reply
@Vininn126: I treat Arabic before the spread of printing in the Arab world, which is from 1800 (Napoléon brought the press to Egypt, which was then a state business that over time was rented by privates who would copy it), as LDL. The reason becomes more obvious for Hebrew where we are eager to include hapax legomena in the Tanakh and due to lacking distinctness of the Modern to the Biblical language, from which the former has been resurrected, have little desire to split. This is in analogy to the split of English from Middle and Old English, where basically the split happens following the new medium of printed books—accordingly if Polish literacy in the same fashion starts only somewhere in the 18th century then we become stricter only then.
Circumventing attestation criteria is no reason to split language headers, as your perception about whether something is another language is the same and only disingenuously modified by that consideration of its description. So more appropriate attestation criteria – and I think of the many carefully collected variants sadly left even unmentioned as a consequence of no sense of proportion applied to the teleology of our rules – by no means should serve motivation to split languages; we can already derive them by the accepted statutory interpretation methods.
To be clear, since legal thinking is unwonted and mysteriously strange to many in spite of people rightly being appointed for it in any society: In this case this is really just systematic interpretation: Since the community authoring the policies was biased towards English but the splits of other languages wrought comparative inconsistency with its situation according to which it has been split by chronolects, we break the criteria down to be suited for the languages they were only roughly devised for. Fay Freak (talk) 09:51, 31 July 2022 (UTC)Reply
In all honesty a label is likely the best option. Vininn126 (talk) 10:05, 31 July 2022 (UTC)Reply
@Hythonia @Sławobóg @KamiruPL I've gone ahead and added Middle Polish as a label. Vininn126 (talk) 12:11, 8 August 2022 (UTC)Reply
I've thought about this more, and I think there might be a case for Middle Polish as an L2. If we agree it should be split, I can help convert the existing entries to Middle Polish.
Here is my reasoning:
Old Polish, Middle Polish, modern Polish, and Silesian are four lects that are hard to separate accurately. Part of this argument hinges on Silesian, which we currently treat as an L2, and I don't see that changing. There are political, historical, and linguistic reasons
===Why Silesian should be an L2===
  • Its speakers feel strongly that it is a language, not a dialect, most Polish linguists pushing that it is a language include Jan Miodek, who is a notable prescriptavist who pushes more nationalistic views of how languages should be treated, and I believe that treating Silesian as a dialect is done partially to stifle any sense of individuality to further Polish control. However, I recognize that theory has some tinfoil-hat conspirist vibes to it, so I'll stick to its speakers strongly feel it is.
  • Significant linguistic difference: Silesian has a different phonology to Polish, and other grammatical features, such as retaining the Proto-Slavic aorist in an analytical past tense, as opposed to a more agglutinative/morphological one in Polish. It also recently has undergone strong standardization, as can be seen on silling.org and the ślabikŏrzowy szrajbōnek.
  • Significant lexical differences: Silesian differs quite a bit from Polish in terms of lexical information. Core inherited words are of course similar, but look at other Slavic languages. It's also been heavily "Policized", but so has Kashubian, which we also treat as an L2 and is recognized as a separate minority language in Poland, and both Kashubian and Silesian are recognized by ISO and Glottolog.
  • Finally, the key point to the overall arguement: Silesian is a descendent of Middle Polish. Most claims that it is Czechoslovakian are refuted by Silesian philologists.
===Why Middle Polish should maybe be an L2===
So if we decide that Silesian is an L2, that would give Middle Polish multiple descendents. This would "fix" many inherited etymologies, such as wszystek. This would also fix Latinate borrowings, where Silesian inherited an older pronunciation of Latinate words, and also the chain generally works better as Learned borrowing into Middle/Old Polish -> Polish + Silesian, as opposed to setting multiple Learned borrowings.
Furthermore, Middle Polish was siginificantly different from Modern Polish in terms of phonology and grammar (I recently updated the Middle Polish Wikipedia page). In terms of lexical content - there were significant shifts, I would say less than the standard differences between Slavic languages, but there were still trends, and dictionaries such as {{R:pl:SXVI}}, {{R:pl:SXVII}}, and occasionally {{R:pl:SJP1807}} or {{R:pl:SJP1900}} would be key in this. Furthermore, Middle Polish is otherwise resource poor, and should be treated as an LDL, label or not. Having it as an L2 is cleaner in terms of citations.
If we agree that this should be done, I would recommend setting the cutoff dates as c. 1500-c. 1780, with a language code of zlw-mpl. Vininn126 (talk) 12:39, 24 April 2023 (UTC)Reply
@Atitarev@Fay Freak@Hythonia@Sławobóg@Thadh@ZomBear@Ентусиастъ Vininn126 (talk) 17:30, 24 April 2023 (UTC)Reply
Update: there is debate as to whether Silesian should be listed as from Old Polish or Middle Polish, which really affects the above argument. Vininn126 (talk) 14:53, 25 April 2023 (UTC)Reply
Just flagging up that it's possible to give Middle Polish an etymology-only language code, and to set it as the ancestor of Polish (and Silesian, if desired). This would be a way to keep its entries under the Polish L2, while allowing etymologies to formally mention it. In turn, Middle Polish could have Old Polish set as its ancestor.
Of note is the fact we already have Middle Russian, Old Ukrainian, Old Belarusian, Middle Bulgarian and Early Modern Czech, which are all currently handled in the same way. Theknightwho (talk) 16:14, 25 April 2023 (UTC)Reply

Old Slovak ?

[edit]

How about adding code for the Old Slovak (zlw-osk) as well. In the same {{R:sla:ESSJa}} (ЭССЯ), especially in recent editions, Old Slovak is constantly listed separately. In this case, etymology-only code is sufficient. --ZomBear (talk) 07:32, 21 March 2022 (UTC)Reply

@ZomBear @Thadh @Sławobóg @Vininn126 What is the current state of this? I notice that Middle Russian is an etym-only language of Russian and has two codes zle-mru and zle-oru, which looks very suspect. I also think Middle Polish has in fact been made an etym language of Polish. Benwing2 (talk) 06:24, 19 September 2023 (UTC)Reply
I still believe that at least there should be an etym-code for the Old Slovak language. It is also necessary to combine Czech & Slovak into the “Czech–Slovak family” in Slavic languages tree, as was done with Lechitic (zlw-lch) F. ZomBear (talk) 06:54, 19 September 2023 (UTC)Reply
I know that Sławobóg also wanted to split Old Slovak. As to grouping them and giving them a family lang code, I'm not sure. Perhaps Moravian should also be split and placed in this family. @Zhnka Vininn126 (talk) 07:51, 19 September 2023 (UTC)Reply
I'm pretty certain that this question isn't as straightforward as you make it out to be, and I read on multiple occasions that the similarities between Standard Czech and Standard Slovak arose due to Czech's influence on Slovak and that dialectal evidence shows no evidence of genetic relationship closer than on the West Slavic level. So I would like a more detailed discussion on this. Thadh (talk) 08:10, 19 September 2023 (UTC)Reply
@Benwing2 @Thadh @ZomBear @Sławobóg I was reading up on w:sk:Dejiny slovenčiny#11. až 18. storočie, and it seems like there were huge phonological and grammatical changes, IMO upon reading it enough to split Old Slovak into an L2. There also appears to be a dictionary Historický slovník slovenského jazyka that could be used as a source. So I propose that we split Old Slovak. Vininn126 (talk) 10:15, 1 October 2023 (UTC)Reply
Also @Zhnka for the tactical ping. Vininn126 (talk) 10:49, 1 October 2023 (UTC)Reply
Support. @Vininn126 I just created a template for Historical Dictionary of the Slovak Language {{R:sk:HSSJ}}. It contains more than 70,000 words from the pre-literary period (before the 18th century) of the Slovak language. This is a really good source for Old Slovak. ZomBear (talk) 11:29, 1 October 2023 (UTC)Reply
@ZomBear We should be careful, however, Old Slovak is best described as 9th-14th centuries. Vininn126 (talk) 11:33, 1 October 2023 (UTC)Reply
@Vininn126 it’s just great what’s in this dictionary, when quoting, the year or century when the word was recorded is indicated. For example, voda (“water”), it can be seen that the oldest evidence for this word is 1473, 1585 and 1376. ZomBear (talk) 11:50, 1 October 2023 (UTC)Reply
Support. Sławobóg (talk) 13:14, 1 October 2023 (UTC)Reply
I have split Old Slovak and given it the code zlw-osk. Vininn126 (talk) 19:26, 3 October 2023 (UTC)Reply

Proposal to rename Ottawa (otw) to Odawa

[edit]

I think Ottawa should be renamed to Odawa; It's the more common English name used to refer to the language nowadays, and preferred by speakers. What do you think? /mof.va.nes/ (talk) 15:47, 15 April 2022 (UTC)Reply

Split [zhx-pin] into [cnp] and [csp]

[edit]

[zhx-pin] is an etym-only code added back in 2014 (diff) as [pinhua] and later renamed to [zhx-pin] in 2019. [cnp] and [csp] are ISO 639-3 codes added in January 2020. Note that the current data module incorrectly suggests [yue] (Cantonese) to be the parent of [zhx-pin], but they are generally considered to be distinct, which is mentioned in ISO's comment on the change request. -- Wpi31 (talk) 14:40, 23 August 2022 (UTC)Reply

Support 12:29, 6 October 2022 (UTC)Reply
Support — justin(r)leung (t...) | c=› } 16:10, 6 October 2022 (UTC)Reply
Support Theknightwho (talk) 15:39, 13 December 2023 (UTC)Reply

Given this has been open for over a year, I'm going to close this as split. Theknightwho (talk) 15:39, 13 December 2023 (UTC)Reply

Re-merge Kven and Meänkieli into Finnish

[edit]

@-sche, Chuck Entz, Rua, Tropylium, Hekaheka, Surjection, Brittletheories, Mölli-Möllerö

In the previous discussion on this topic ([1]) it seems everyone has agreed that it's best to merge Kven and Meänkieli into Finnish. However, the discussion was closed without actually merging the codes, and currently we (again) have 40 Kven and 30 Meänkieli lemmas, many of which are also duplicated as Finnish for the reasons discussed in the above discussion. Has anyone changed their opinion or does anyone have anything to add to this or can we actually go ahead and merge the languages?

I guess related to this is also the question of how to handle dialectal morphology of Finnish dialects, but maybe that's a bit out of scope for this discussion. Thadh (talk) 16:24, 23 September 2022 (UTC)Reply

The strongest arguments in favour of splitting them are political and should therefore be ignored. Our task is to best present the most information, and that would best be achieved by merging the three lects. The dozens or so new dialectal terms will fit in quite well with the 1250 pre-existing ones. brittletheories (talk) 16:49, 23 September 2022 (UTC)Reply
Incubator says "Wikimedia does not decide for itself what is a language and what is a dialect. We follow the ISO 639 standard." This means that it's up to the agency that grants language codes, not to us, right? Meänkieli and Kven have written standards so they should stay as they are. (In my view, Tver Karelian should also be treated as a language so I could add Tver Karelian words without knowing if they're used in the more usual "vienankarjala" dialect.) Mölli-Möllerö (talk) 19:55, 23 September 2022 (UTC)Reply
The Incubator standards are not the same as our standards. Our language treatment does not strictly follow ISO 639. — SURJECTION / T / C / L / 20:33, 23 September 2022 (UTC)Reply
@Mölli-Möllerö: On the Tver Karelian issue, you could also just leave the first parameter of {{krl-regional}} empty or |1=? it, and it will automatically be sorted in Category:Karelian term requests, and I'll be able to add the terms later. Or you could use either {{R:krl:KKS}} or another Viena source, the correspondences are usually quite easy. Thadh (talk) 20:44, 23 September 2022 (UTC)Reply
Wrong. There's a big difference between Wikimedia's administrative needs and the lexical needs of a dictionary. As for written standards: the world is full of languages with multiple written standards: Brazilian and European Portuguese, European and Canadian French, Austrian and German German, etc. We can't let others decide for us- each case needs to be considered on its own. We've chosen to merge languages treated as separate by ISO and recognize languages with no ISO codes. In other cases we've gone with the ISO. Chuck Entz (talk) 20:59, 23 September 2022 (UTC)Reply
For outsiders, Meänkieli (in Sweden) and Kven (in Norway) are languages or rather dialects that have become languages by virtue of being across the border (the Finnish-Swedish border and the Finnish-Norwegian border, respectively). Finnish speakers can easily understand nearly 99% of Meänkieli or Kven, and the main differences are either dialectal features also found in Far Northern Ostrobothnian dialects or (the lack of) recent developments within the past 200 years (in one or the other).
Linguistically they are 100% dialects, but politically both Sweden and Norway respectively have recognized them as separate languages, which is also what their speakers think. A more cynical person might say that they have deluded themselves into thinking their language is not Finnish in order to avoid persecution of Finnish that was prevalent in Sweden and Norway in the 19th and 20th centuries ("Finnish? what Finnish? we're not speaking Finnish, it's Meänkieli/Kven").
However WIktionary best handles cases like these, I don't know. 200 years is not enough for what is generally a phonologically conservative language for it to become anywhere near unrecognizable. It could be compared to how Karelian is now almost universally treated as a separate language, even though it forms a dialect continuum and has been diverging now for at least about 800 years (ever since the 1323 Treaty of Nöteborg).
Finnish sources almost exclusively consider Meänkieli and Kven to be dialects, even more so when these sources are linguistic-oriented (some other sources take a political stance and recognize that they are considered "minority languages" in their respective countries). — SURJECTION / T / C / L / 20:34, 23 September 2022 (UTC)Reply
"The main differences are either dialectal features also found in Far Northern Ostrobothnian dialects or (the lack of) recent developments within the past 200 years (in one or the other)"... and the additional Swedish/Norwegian loanwords found in Meänkieli/Kven, of course. But many of these are also found in Finnish dialects. — SURJECTION / T / C / L / 21:37, 23 September 2022 (UTC)Reply
The divergence of Karelian from Finnish, FWIW, almost certainly goes back at least 1200 years (to the archeological / mentioned-in-Novgorod-sources Old Karelian culture). The initial split-off of Northern Finnish dialects is probably about as old too.
What I would think of as the best argument against treating Meänkieli and Kven as languages is that they're not even internally well-defined — typically they're just catch-all terms for "Northern Finnish in Sweden" and "Northern Finnish in Finnmark" with relatively various dialects encompassed by each. There's some efforts (schoolbooks, etc.) towards a "standard" Meänkieli based on the Torne Valley dialect but I don't think it could be called actually standardized just yet. I suppose one thing we could do is to document whatever is done on this specifically under "Meänkieli" and leave anything else as dialectal Finnish, but that might be a bit premature still too. --Tropylium (talk) 07:44, 24 September 2022 (UTC)Reply
I would not say that "everybody" agreed on the merger. I didn't. I can only comment Meänkieli but I would not be surprised if similar argumentation would also apply for Kven:
  • The overall small number of Meänkieli words in Wiktionary only proves that we don't have an active editor in Meänkieli. There seem to be some 30,000 entries in this Meänkieli--Finnish-Swedish dictionary[1]
  • The small sample of words we have proves nothing of similarity of the vocabularies. If you study the dictionary I mentioned (press "tutki") you'll find that there are considerable differences between Finnish and Meänkieli. In addition to vocabulary, conjugation of verbs seems to differ (e.g. Meänkieli: tukeat - Finnish: tuet - English: you support).
  • This article[2] promotes the opinion that Meänkieli is a dialect. However the writers admit that the two are not readily mutually understandable: Finnish-speakers usually understand Meänkieli relatively well, partly because of their knowledge of Swedish, but for Meänkieli speakers Finnish isn't as easy. If we took a Finn who does not know a word of Swedish, they would be lost with a Meänkieli speaker.
  • This article[3] starts from the maxim that Meänkieli is a dialect of Finnish but finishes with the conclusion that at the end of the day it is the spakers of a language themselves who decide the status of a language/dialect. Meänkieli speakers have made their opinion clear: they want it treated as a language. How competent are we to second-guess their point of view? Has any of us studied Meänkieli more than superficially?
Here is also a link to a Kven-Norwegian dictionary[4]--Hekaheka (talk) 09:44, 24 September 2022 (UTC)Reply
To be fair all these points would still hold for Ingrian and Savonian dialects, too, and of Ingrian dialects I'm fairly certain no Finnish speaker would readily understand them much better than, say, Izhorian or Karelian. Thadh (talk) 09:51, 24 September 2022 (UTC)Reply
A clear-cut solution would be to stick to ISO. Ingrian has an ISO code, Savo hasn't. Is Ingrian currently treated as Finnish dialect? I think it shouldn't. --Hekaheka (talk) 12:05, 24 September 2022 (UTC)Reply
You're confusing Ingrian (inkeroinen) and Ingrian (inkerin (suomalainen)). The first one is the same as Izhorian and is handled as a distinct language, has an iso code, and is spoken by the orthodox Izhorians. The latter one is the same as Ingrian Finnish and is handled as a Finnish dialect, does not have an iso code, and is spoken by the lutheran Ingrian Finns. My remark concerned the latter. Thadh (talk) 13:46, 24 September 2022 (UTC)Reply
I've come around to say that I think they should be merged. We don't consider Valencian, Ulster Scots nor Lemko (the linguistic case is very similar between those examples and this one) to be their own languages despite political arguments that they should be considered as such (and even some recognition like in the ECRML). We shouldn't do so here either. And don't even mention the whole thing going on with Serbo-Croatian... The general trend on en.wikt seems to be to consider the linguistic argument more important than any political ones (which I can appreciate). — SURJECTION / T / C / L / 11:51, 3 October 2022 (UTC)Reply
As a Norwegian, I find it odd that there is a proposal to merge Kven with Finnish - as Kven is an officially recognized minority language in Norway (Finnish is not). I do not agree with this merge, for the following reasons:
  • At least in Norway, Kven and Finnish are considered separate languages. You are able to get elementary school education and books in Kven (but not in Finnish, as far as I know) - you can even study Kven at the University of Tromsø and receive a bachelor's and master's degree in the language (there is a Finnish one as well, and they are considered two separate degrees). Kven people are considered a separate ethnicity, along with their language, descendant from Finns/Finnish.
  • Political reasons are of course relevant, not just linguistic ones. The average Kven speaker has never set foot in Finland, never studied any Finnish, nor consumed any part of Finnish culture and media (music, literature, etc.). An argument was that Finnish speakers understand 99% of Kven - as a Norwegian I understand up to 99% of Swedish and Danish, but they are not getting merged into one language called Scandinavian (for political reasons).
  • If merged, then in theory thousands of new Finnish entries on Wiktionary would emerge, in the form of "dialectal" words which are actually Kven words. If someone bothered to add them all (I, stubbornly, might) - then every Kven word and declension would need to be added under Finnish, and certain words and forms which don't even exist in Finnish dialects in Finland would be present. Every Kven word, even if the nominative singular is identical to Finnish, has a separate declension chart, every single one - there would then need to be a separate template to show these (I think Finnish Wiktionarians would be quite annoyed by this).
  • Kvens in Norwegian have fought very hard for their language, they have gotten their own language institute with a promotion of literature and culture in the Kven language - erasing their language from Wiktionary and treating it as a dialect of a language they don't even speak would be a huge slap in the face. Finns in Finland who speak a dialect of Finnish, also all know standard Finnish, Kven people do not. If a Kven person handed in an essay at a school in Finland, every other word would be marked as wrong or a typo. Supevan (talk) 22:49, 2 November 2022 (UTC)Reply
This entire argument can be boiled down to "Kven is standardized". So is Valencian and Croatian, but we still don't treat them as separate languages. — SURJECTION / T / C / L / 14:57, 5 November 2022 (UTC)Reply
@Surjection: Actually, Kven isn't firmly standardised afaik. Thadh (talk) 14:58, 5 November 2022 (UTC)Reply
We should. Supevan (talk) 17:35, 5 November 2022 (UTC)Reply
@Supevan Most of these points were already raised for Meänkieli. I will try to answer them anyways.
1) First, our standard procedure is to emphasise linguistics over politics, even when much more controversial (see WT:Serbo-Croatian).
2) Secondly, and most importantly, you claim all Kven inflection should be incorporated into Finnish. This is false. There is already a ridiculous amount of variation in the inflection of the various Finnish dialects, and none of it is represented here. We simply do not have the capacity to maintain 30 different tables containing dozens of inflected forms. Additionally, natives do not stick to one variety of Finnish but mix standard Finnish grammar with that from various dialects and registers. It would also be naive to assume that Kven speakers all use one well-defined standard themselves. A language with a morphology as righ as that of Finnish leaves much space for variation.
3) You say, "thousands of new Finnish entries [– –] would emerge, in the form of 'dialectal' words which are actually Kven words", but this is only true if one assumes Kven not to be a collection of Finnish dialects, which is not a popular opinion among linguists. Besides, only a small number of these terms are exclusive to the Ruija dialects.
brittletheories (talk) 13:46, 27 January 2023 (UTC)Reply

2023

[edit]

Church Slavonic and Moravian

[edit]

Technically Old Church Slavonic and Church Slavonic should be two two separate languages (?), but we only have the former probably because of the small number of editors. These languages are always treated as two different languages in etymology. For now in etymologies and Proto-Slavic pages (*viňaga). For now we trick it as Church Slavonic: {{l|cu|асдф}} or Church Slavonic: {{desc|cu|асдф|nolb=1}}. That is not very convenient, we should have separate etycode for Church Slavonic.

We Should also have etycode for Czech Moravian, which is also pretty often used in Proto-Slavic pages (and many etym dictionaries), Serbo-Croatian has templates like that (ckm, sh-kaj, sh-tor). Sławobóg (talk) 12:53, 5 February 2023 (UTC)Reply

@Павло Сарт, Atitarev, Kamen Ugalj, Skiulinamo, Rua, ZomBear, Bezimenen, IYI681, Vininn126 pinging some people that might be interested. Thadh (talk) 13:03, 5 February 2023 (UTC)Reply
Support @Sławobóg I completely agree with you, we need a separate etymological code for the usual Church Slavonic language. I constantly thought about it, why is it not there.. --ZomBear (talk) 19:32, 5 February 2023 (UTC)Reply
Support for Church Slavonic Безименен (talk) 13:45, 7 February 2023 (UTC)Reply
Oppose for Czech Moravian: there would be 20-30 more regional varieties that could spring if one started Balkanizing Slavic languages + I don't want to give food for thought to Z-Russians. There are already talks for forging Novorussian, Transnistrian, or Lipovan Russian in order to justify their expansive aspirations over former Imperial Russian territories. Безименен (talk) 13:45, 7 February 2023 (UTC)Reply

I also propose to do away with similar problems in the tree of Slavic languages once and for all. I suggest:

  • South Slavic:
1. Add etymological code for Old Serbo-Croatian (zls-osh). With a redirect to modern Serbo-Croatian. Meets regularly in {{R:sla:ESSJa}}.
2. Add etymological code for Old Slovene (zls-osl). With a redirect to modern Slovene. Meets regularly in {{R:sla:ESSJa}}.
3. Move the Macedonian language to the descendant of Old Church Slavonic, as it was done some time ago with the Bulgarian language.
4. Add etymological code for Church Slavonic (cu-chu). Perhaps even with a division into Russian Church Slavonic (cu-rcu), Serbian Church Slavonic (cu-scu) and others, if any.
  • West Slavic:
1. Add etymological code for Middle Polish (zlw-mpl). With a redirect to modern Polish or (?). @KamiruPL, Vininn126
2. Add etymological code for Old Slovak (zlw-osk). With a redirect to modern Slovak. It was high time to do it! Meets regularly in {{R:sla:ESSJa}}. Especially if even Early Modern Czech (cs-ear) was awarded a separate code.
3. Possibly add (family code) a Czech–Slovak languages (zlw-csk) ?. Just like there are Lechitic (zlw-lch) F.
4. It's possible: add etymological code for "Old Sorbian" (see Wendish/Lusatian ?) (zlw-osb)? Perhaps with a redirect to Upper Sorbian or (?).
  • East Slavic:
1. Rename etymological codes Old Ukrainian (zle-ouk) & Old Belarusian (zle-obe) → Middle Ukrainian (zle-muk) & Middle Belarusian (zle-mbe), respectively. A similar request from another user was about six months ago (Wiktionary:Beer parlour/2022/September#“Old Ruthenian” language). Therefore, with "Old" for those languages, these are "parts" of Old East Slavic until the 14th c. (this is indicated on the en.Wikipedia).
2. Probably it is worth removing the Old Novgorod from the descendants of the Old East Slavic. Make it a separate and parallel ancient language in the East Slavic subgroup. --ZomBear (talk) 19:32, 5 February 2023 (UTC)Reply
3. Add etymological code for Pannonian Rusyn with a redirect to Rusyn (rue).
  • PS: LOL, I'm serious, add an etymological code for "Early Proto-Slavic" (sla-ear) (?) with a redirect to Proto-Balto-Slavic (?). Because Wiktionary "for the standard" uses a rather late version of the Proto-Slavic language. And sometimes in the Etymology section it may be necessary to indicate an earlier form, and the presence of a separate etym-code for "Early PSl." would not be superfluous. --ZomBear (talk) 19:50, 5 February 2023 (UTC)Reply
I don't think any "Old Sorbian" is attested. Both Upper Sorbian and Lower Sorbian are attested only from the 16th century, and they were already distinct at that point. In theory there could be a code for Proto-Sorbian, but it would have to be a full-fledged protolanguage, not an etymology-only language. —Mahāgaja · talk 20:17, 5 February 2023 (UTC)Reply
@Mahagaja Yeah, I'm not sure about "Old Sorbian" either. This suggestion is only possible. I relied on the fact that in {{R:sla:ESSJa}} sometimes there are words with abbreviations "ст.-луж."/"др.-серболуж." ("старолужицкий"/"древнесерболужицкий" = translation "Old Sorbian") without specifying where the word belongs - to the Upper or Lower Sorbian language. --ZomBear (talk) 21:09, 5 February 2023 (UTC)Reply
@ZomBear: I agree with most of your suggestions, except for Old Serbo-Croatian and Old Sorbian. Serbs and Croats never had an organized shared language until 17-18 century. One could perhaps talk about an Old Serbo-Croatian stage in the development of the Dinaric Slavic complex, but there never was a common language that could be associated with this period (leaving aside the Bosno-Rascian recension of Church Slavonic or Glagolitic Croatian). The same holds in even greater magnitude for Sorbian. Sorbs may self-identify as one people ethnically, but linguistically their languages are noticeably divergent.
PS I also don't see much educational value in copying all the distinctions that you can find in ESSJa. Note that it often gives old spellings that precede various spelling reforms, dialectal forms which don't follow any orthographic standard, morphological variants (like diminutive forms, etc.) which don't contribute much additional insight, it provides local colloquial meanings which are clearly recent innovations, etc. I personally prefer a more concise and economic presentation for reconstructed terms rather than having 10-15 dialectal spellings of Serbo-Croatian or those monstrosities that are given as dialectal variants of Polish/Bulgarian/Slovenian by ESSJa. Meiner Meinung nach, such an information should go to the respective page of the daughter language, rather than overblowing the proto-Slavic Descendants section.
PS2 Early proto-Slavic is a useful designation, however, I don't know where exactly where one should draw the border between Early, Middle and Late proto-Slavic and what notation should be applied. Безименен (talk) 13:30, 7 February 2023 (UTC)Reply
As it stands, Middle Polish is listed as a variant of Modern Polish. We do see some significant phonological changes and a few semantic ones as well, however, it's hard to say whether it should have its own code or not. Even if it did, it would certainly be a redirect to Modern Polish, seeing as it's a period of only about 1250 years. (1500-1750). Vininn126 (talk) 13:36, 7 February 2023 (UTC)Reply
@Vininn126: That's 250 years. —Mahāgaja · talk 15:16, 7 February 2023 (UTC)Reply
The one and the two are right next to each other.


Polish Silesian and Silesian

[edit]

@Shumkichi @KamiruPL The Cieszyn Silesia Polish category has many terms that should probably be moved to Silesian proper. Can we figure out which ones we need to fix? Vininn126 (talk) 12:29, 8 March 2023 (UTC)Reply

Also maybe @Hythonia, @Sławobóg Vininn126 (talk) 12:30, 8 March 2023 (UTC)Reply
Idk where Silesian proper starts and Silesian Polish ends so I don't think I'll be of much help o_ _ _ _ _ _ _ _ _ _ _ _ O Maybe let's just assume they'd all be used in Silesian anyway, and then we can add Polish headers to the few entries that can be considered dialectal Polish after we find some sources later??? Shumkichi (talk) 13:33, 8 March 2023 (UTC)Reply
@Vininn126, Shumkichi Not to throw a monkey wrench into this discussion but ... I read the Wikipedia article on Silesian and it seems there's debate over whether it's a separate language as well as a not-yet-established writing system. Given this, I wonder if it wouldn't be better to unify Silesian and Polish similarly to the way that all Chinese lects as well as Serbo-Croatian are unified. The motivation here is practical: it's significantly more difficult to implement and maintain all the infrastructure for two separate L2's vs. one unified L2, and the minority status of Silesian means it's likely to not get much love as a separate L2 (compare the situation with Jeju vs. Korean and Scots vs. English). Benwing2 (talk) 06:19, 16 March 2023 (UTC)Reply
@Benwing2 I've actually been trying to do some research on this. One problem with that system are the politics involved - there is a considerable Silesian group that consider it separate. I've also been trying to do some research on the pronunciation, but there are some major difference that point to Silesian having come from an older variant of Polish, as opposed to a modern one. And as to the orthography, recently, Ślabikorz śląski was introduced and has been fairly widely adapted, even silling.org has a normalizer - I've included all of this in WT:About Silesian, and I would actually like to go through all the entries and do a major cleanup. I've even been trying to set up other infrastructure. Vininn126 (talk) 09:59, 16 March 2023 (UTC)Reply
As to the fact of it coming from an older variant - there are significant sound differences, such as maintaining distinctions from previous long vowels, having more of a 7 vowel system like in Italian, and some significant grammatical differences like continuing the old aorist in a past tense system that's completely different. Vininn126 (talk) 10:22, 16 March 2023 (UTC)Reply
@Vininn126 I think it's a mistake to conflate whether language A and B are different languages with whether they need separate L2's in Wiktionary. IMO the latter question should be determined by what makes for less work and duplication. If the majority of terms in Silesian are the same as in Polish (which I suspect they are), it might make sense to unify them. The current set of lemmas is non-representative in that it mostly covers lemmas that are different in Silesian. Benwing2 (talk) 15:25, 16 March 2023 (UTC)Reply
@Benwing2 In order to determine that we need more data on that and currently there aren't any major Silesian dictionaries aside from Silling, which is relatively new, and it's currently doing a massive import of words. Currently they are important a Polish-Silesian dictionary so based on that alone it would suggest a lot sharing. However further work needs to be done to determine how different they really are. As someone who works with it more, I'd say it's not any more different than some of the differences between other Slavic languages, which are remarkably similar. Vininn126 (talk) 15:34, 16 March 2023 (UTC)Reply
@Vininn126: Makes sense, thanks. Benwing2 (talk) 15:42, 16 March 2023 (UTC)Reply
@Benwing2 And I think you didn't understand his point. Silesian is not a dialect of Polish since it doesn't come from modern Polish - they both come from Middle Polish (or you could call it Middle Silesian, it doesn't matter, it's just that Polish's always had more speakers, hence the privileged position of Polish over other dialects). That's why your comparison to Serbo-Croatian makes no sense since S-C. is a single language with most of its officially recognised "varieties" not even being different dialects nor even subdialects but simple local variants with at most a few different words, lol. Silesian and Polish, on the other hand, are full of seemingly small but SYSTEMATIC differences that all add up to them being sufficiently different (more so than e.g. Czech and Slovak, I'd say). And the important thing is that they differ not only in vocabulary but also in syntax.
"If the majority of terms in Silesian are the same as in Polish (which I suspect they are)" - no, they are not the same, and your suspicion is wrong. It's as if you looked at the spelling of some Kashubian words and compared them to their Polish cognates - yes, their orthographies are quite similar but it's jsut a superficial similarity. Shumkichi (talk) 20:17, 16 March 2023 (UTC)Reply
@Shumkichi Don't get all worked up over this. You didn't even read the first line of my comment: "I think it's a mistake to conflate whether language A and B are different languages with whether they need separate L2's in Wiktionary." Benwing2 (talk) 20:33, 16 March 2023 (UTC)Reply
@Benwing2 I'm not worked up??? And I did read it, that's why I said the orthographies are different, and that's enough NOT to merge Silesian entries with Polish ones. Polish has an official body that regulates its orthography so it can't use two different spelling norms that also differ in pronunciation. Capisci? Shumkichi (talk) 20:55, 16 March 2023 (UTC)Reply
Also, according to your argument, we should merge Czech and Slovak. But KKK, as they say in Polent. Shumkichi (talk) 20:56, 16 March 2023 (UTC)Reply
Alright, let's cool it here. It seems like Silesian is here to stay at least for the time being. Vininn126 (talk) 21:17, 16 March 2023 (UTC)Reply


Renaming Proto-Mon-Khmer to Proto-Austroasiatic

[edit]

Proto-Mon-Khmer is deprecated. The name of Category:Proto-Mon-Khmer language needs to be changed to Category:Proto-Austroasiatic language, just like how we have Category:Proto-Sino-Tibetan language rather than Category:Proto-Tibeto-Burman language. See the Wikipedia article on Austroasiatic languages to get an idea of why Mon-Khmer is no longer valid, because Munda and Nicobarese are simply regular branches that are sisters of the other so-called Mon-Khmer languages.

The page names can simply be renamed, and the lemmas do not need to be changed. Category:Proto-Sino-Tibetan language is a perfect example of this. The Proto-Sino-Tibetan lemmas are actually all Proto-Tibeto-Burman reconstructed forms by James A. Matisoff, who considers Tibeto-Burman to be a branch of Sino-Tibetan. Now, more scholars are thinking that Chinese is simply another another regular sister branch of the various Sino-Tibetan languages out there, rather than its own special branch. Same goes for Mon-Khmer.

So how can this name change be done? Ngôn Ngữ Học (talk) 22:23, 18 March 2023 (UTC)Reply

Formerly:

  • Austroasiatic
    • Munda
    • Mon-Khmer (which Shorto reconstructed)
      • (about a dozen branches)

Now the consensus is that the tree has a rake-like structure (per Sidwell):

  • Austroasiatic
    • (about a dozen branches including Munda)

That's why Mon-Khmer is an obsolete term now.

Similarly, with Sino-Tibetan, it formerly was:

  • Sino-Tibetan
    • Chinese
    • Tibeto-Burman (which Matisoff reconstructed)
      • (dozens of branches)

Now the consensus among many scholars is that the tree has a rake-like structure with many "fallen leaves" (quoting George van Driem), making Tibeto-Burman obsolete:

  • Sino-Tibetan
    • (dozens of branches including Chinese)

Ngôn Ngữ Học (talk) 22:27, 18 March 2023 (UTC)Reply

Support. If this change happens we should delete Category:Mon-Khmer languages. Benwing2 (talk) 23:41, 18 March 2023 (UTC)Reply
Abstain. I prefer to wait for when an actual new reconstruction of Proto-Austroasiatic is published to do the move, see what I wrote at Wiktionary:About Proto-Mon-Khmer, but I do not actually oppose to moving now. However, if the move do happen, I'm would like to see a line like "This reconstruction is from Shorto (2006) for the obsolete concept of Proto-Mon-Khmer, and should not be treated as actual reconstruction of Proto-Austroasiatic, which as of now has not yet fully materialized, and is simply "placeholder" for the actual Austroasiatic etymologies" (probably as a template) to be added as warning for every reconstruction item. I very much want the same thing to happen to "Proto-Sino-Tibetan", considering a lot of them are no way near actual Proto-Sino-Tibetan, and the reconstruction items themselves are "icky" to say at least. PhanAnh123 (talk) 01:52, 19 March 2023 (UTC)Reply
@PhanAnh123: Take a look at Sidwell's Proto-Austroasiatic reconstruction and Shorto's Proto-Mon-Khmer reconstruction. Sidwell's inclusion of Munda and Nicobarese had virtually no impact on his Proto-Austroasiatic reconstruction (versus if he had only included the "Mon-Khmer" languages) because he considered Munda to be highly innovative and restructured, with few original retentions from Proto-Austroasiatic. Furthermore, it would be very confusing to have duplicates for both Proto-Austroasiatic and Proto-Mon-Khmer. I would just merge them as Proto-Austroasiatic. Ngôn Ngữ Học (talk) 19:25, 19 March 2023 (UTC)Reply
I have no intention to keep Proto-Austroasiatic and Proto-Mon-Khmer seperated (I consider Proto-Mon-Khmer to be likely a ghost after all), what I mean is that we either should keep the entries as are until actual Proto-Austroasiatic reconstruction comes about, or move the "Proto-Mon-Khmer" items to Proto-Austroasiatic but with the warning added. I know what you mean by "inclusion of Munda and Nicobarese had virtually no impact", because like Sidwell, I do think these branches are quite innovative, however, that does not mean I agree to move the Shorto's Proto-Mon-Khmer reconstruction to Proto-Austroasiatic without any warning, since Austroasiatic linguistics have progressed quite a lot even outside of those two branches. The vocalism in Shorto (2006) was very rudimentary reconstructed, which the reconstruction of the descendant branches as well as the recent "sneak peek" to Proto-Austroasiatic reconstruction by Sidwell improved upon; furthermore, the syllable structure itself is also slightly changed, it is now thought that a glottal stop phonetically presented in any Proto-Austroasiatic word that ended in a pure vowel (meaning any word ended in *aːj would still have *aːj, but those ended in **aː would automatically became *aːʔ), plus there is the status of *ʄ- that very much awaits assessment in the actual reconstruction of Proto-Austroasiatic. Like I said, I don't oppose moving, but there much be strings attached. PhanAnh123 (talk) 01:53, 20 March 2023 (UTC)Reply
@PhanAnh123, Ngôn Ngữ Học Such a warning can be added by bot to the top of all entries if both of you agree. Benwing2 (talk) 03:30, 20 March 2023 (UTC)Reply
@Benwing2: Agree, a warning placed by a bot should be sufficient. Also @PhanAnh123, we can use Sidwell & Rau (2015) for some of the basic Swadesh list words, but a full reconstruction of Proto-Austroasiatic is currently being done by Sidwell. It should come out in a few years. Ngôn Ngữ Học (talk) 10:19, 20 March 2023 (UTC)Reply
We are all in agreement then, so obviously now I support moving. With this Munda cognates can be directly added to the entries. PhanAnh123 (talk) 10:29, 20 March 2023 (UTC)Reply
Agree on the support.
Abstain Support. I've seen assertions that Mon and Khmer actually form a subgroup within the traditional Mon-Khmer grouping. Of course, it could be something messy as with Indo-European, where we have at least Indo-Iranian and Balto-Slavonic. --RichardW57m (talk) 16:19, 21 March 2023 (UTC)Reply
There is no such thing as a Mon+Khmer grouping within Mon-Khmer. Some classifications propose Eastern, Southern, and Northern groupings within Mon-Khmer, but none of them put Monic and Khmeric together. Please consult the Austroasiatic languages article on Wikipedia to get a basic refresher of all the major previous classifiations. Ngôn Ngữ Học (talk) 15:04, 23 March 2023 (UTC)Reply
The cited articles do show that their crown group is larger than Monic + Khmeric, but it does look as though we don't need to worry about anyone using 'Mon-Khmer' to denote their (weak) association. --RichardW57m (talk) 11:36, 28 March 2023 (UTC)Reply

Renaming Proto-Hmong to Proto-Hmongic

[edit]
  1. Category:Proto-Hmong language needs to be changed to Category:Proto-Hmongic language. See Hmongic languages and Hmong language on Wikipedia.
  2. Category:Proto-Mien language needs to be changed to Category:Proto-Mienic language. See Mienic languages and Iu Mien language on Wikipedia.

The Hmong-Mien language tree is like this:

  • Hmong-Mien
    • Hmongic
      • Hmong
      • (dozens of languages)
    • Mienic
      • Iu Mien
      • (several languages)

Proto-Hmong refer thus refers to only Hmong, not Hmongic. There are dozens of Hmongic languages that are not Hmong. They include Hmu, Pa Hng, Bunu, She, and others.

Same goes for Proto-Mienic. Proto-Mien technically refers to Proto-Iu Mien, but does not include Kim Mun, Biao Min, and Dzao Min.

Ngôn Ngữ Học (talk) 22:23, 18 March 2023 (UTC)Reply

Support. If we make this change we also need to rename the families, i.e. Category:Hmong languages -> Category:Hmongic languages and Category:Mien languages -> Category:Mienic languages. This is similar to the change from Category:Korean languages -> Category:Koreanic languages, which was implemented in Jan 2022. Benwing2 (talk) 23:45, 18 March 2023 (UTC)Reply
Support. Theknightwho (talk) 17:57, 1 June 2023 (UTC)Reply


Okinoerabu and Tokunoshima

[edit]
Discussion moved from Wiktionary:Beer parlour/2023/June.

These are two Ryukyuan languages that we currently call Oki-No-Erabu and Toku-No-Shima, because that’s how they’re spelled in ISO 639. However, literature invariably uses the unhyphenated forms, and they’re also much easier to read.

Could we please therefore rename them to the unhyphenated forms? Theknightwho (talk) 19:39, 4 June 2023 (UTC)Reply

I dislike the EN penchant for glomming Japanese names into long undifferentiated strings, as I find that this instead makes them harder to read, and it erases the distinction between the actual component terms.
In some cases, the resulting interpretation or partial-expansion goes sideways, as we see at w:Tokunoshima, where the English text describes this as "Tokuno Island" -- the no portion is simply the genitive particle (no), so as Japanese, this is better thought of as "Toku Island".
That aside, I do see that w:Tokunoshima language lists the alternative rendering "Toku-No-Shima", and the w:Okinoerabu dialect cluster similarly lists the alternative rendering "Oki-no-Erabu". A quick-and-dirty Google hits comparison (including "the" to filter for English hits):
In the English-language web, the allthewordsruntogether renderings appear to be most common. Meanwhile, the
Language Subtag Registry based on ISO 639 and maintained by IANA
(https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry) does indeed use the hyphenated descriptors.
Meh. After digging into this some, I realize I just don't care all that much one way or the other. ‑‑ Eiríkr Útlendi │Tala við mig 22:09, 9 June 2023 (UTC)Reply
Searching on Google Scholar, it seems the unhyphenated forms are more common, but I concur with Eirikr's views that they look worse.
However, I would suggest that if we were to retain the hyphens, the two languages should be renamed to "Oki-no-Erabu" and "Toku-no-shima" (or the rarer "Toku-no-Shima"), since the these are more common from Google Scholar, and also because "no" is a particle that shouldn't be capitalised in a proper noun, cf. Southend-on-Sea, Stoke-on-Trent or von, de, etc. in surnames. – Wpi (talk) 11:20, 21 June 2023 (UTC)Reply


Correct language names

[edit]

Could you correct Juǀ'hoan to Juǀʼhoan, Kwak'wala to Kwakʼwala, and K'iche' to Kʼicheʼ? There's no punctuation in the ethnonyms. If we want to use assimilated English forms, then the latter would be Quiché; I'm not sure about Juǀʼhoan. kwami (talk) 19:16, 13 July 2023 (UTC)Reply

  • Support. To clarify for people using low-resolution screens: the request is to use the modifier letter apostrophe character ʼ rather than the typewriter apostrophe '; the categories are currently at Category:Juǀ'hoan language (ktz) and Category:K'iche' language (quc). Our usual practice is to use the spelling most common in contemporary English-language discussions of the language. Which is more common in current books and journal articles, Kʼicheʼ or Quiché? —Mahāgaja · talk 19:30, 13 July 2023 (UTC)Reply
    Just to be clear, I personally don't care about ASCII substitutions in category names; what I'm concerned about is proper headers in the dictionary entries. But it's fine by me if the two go together.
As for Kʼicheʼ or Quiché, the English-language lit has been moving from the Spanish form to the ethnonym. That's an ongoing trend, though of course not universal (e.g. 'German', 'Greek', 'Armenian' etc.). kwami (talk) 21:15, 13 July 2023 (UTC)Reply
The L2 headers and category names do need to match, at least for readers using tabbed browsing. Otherwise, the categories won't appear in the correct language tab. I think there are also bots that require the L2 header to be the canonical language name in order to work properly. —Mahāgaja · talk 22:20, 13 July 2023 (UTC)Reply
Okay. Works for me. kwami (talk) 22:24, 13 July 2023 (UTC)Reply
@Kwamikagami Normally at Wiktionary we use typewriter apostrophes rather than curly single quotes, and this issue is somewhat controversial, so this change is unlikely to happen without significant further discussion and consensus. Benwing2 (talk) 04:27, 24 July 2023 (UTC)Reply
I'm not requesting quote marks. That would also be incorrect. Rather, since we are attempting to use the endonym, IMO it should be the glottal stop or ejective diacritic that's in the orthography. kwami (talk) 04:41, 24 July 2023 (UTC)Reply
Indeed, no one is advocating curly single quotes. The modifier letter apostrophe is a different character; it's a letter, not a punctuation mark. There are several other language names besides these two that ought to be using it. —Mahāgaja · talk 06:23, 24 July 2023 (UTC)Reply
Sarci, for example, which was just moved to its endonym (minus tone marking). But I thought I'd wait to see how things went before attempting a more comprehensive proposal. kwami (talk) 06:27, 24 July 2023 (UTC)Reply
Support - this isn't a matter of using curly quotes vs straight ones; it's a matter of using the correct letter instead of punctuation. We already do this extensively in entries for languages that use it anyway. Theknightwho (talk) 15:39, 24 July 2023 (UTC)Reply
Going through WT:LOL, these are the languages whose names have the modifier letter apostrophe at Wikipedia but the typewriter apostrophe here:
Other languages with typewriter apostrophe whose Wikipedia article uses a different character include:
  • gez Ge'ez → Geʽez with ʽ (U+02BD modifier letter reversed comma)
  • hps Hawai'i Pidgin Sign Language → Hawaiʻi Pidgin Sign Language with ʻ (U+02BB modifier letter turned comma)
  • num Niuafo'ou language → Niuafoʻou with ʻ (U+02BB modifier letter turned comma)
  • tct T'en → Tʻen with ʻ (U+02BB modifier letter turned comma)
  • tsl Ts'ün-Lao → Tsʻün-Lao with ʻ (U+02BB modifier letter turned comma)
I support making all of these changes. —Mahāgaja · talk 19:54, 24 July 2023 (UTC)Reply
I oppose these changes. What is the actual benefit? From the above discussion, there are at least three different Unicode apostrophe-like characters involved, which are easily confused, and it will make it significantly harder to type the language names into headers, categories and the like. This is going to be a major pain in the ass for people like me who will have to clean up wrongly-typed apostrophes in language headers in innumerable articles created by IP's and other occasional contributors, who are unlikely to be able to type the right character. Furthermore, even with these changes, the language names in many cases will not actually match their endonym spelling; cf. the proposed Oʼodham, which is actually spelled ʼOʼodham natively with two apostrophes. Similarly, as pointed out by User:Kwamikagami, our spelling of the CAT:Tsuut'ina language doesn't include the tone mark that is present in the native orthography, and wouldn't even with the change in apostrophe. I should add that Wikipedia uses these Unicode chars specifically because Kwami went around renaming all the articles (formerly they used the straight apostrophes), and is not consistent, e.g. the article on the name of the people is still at O'odham with a straight apostrophe. Glottolog uses straight apostrophes for O'odham; so does [5], the Endangered Languages Project. In general, our policy is to use the *English* names for languages; we are not forced to use the exact native spelling. While I agree it's a good idea to approximate the spelling (e.g. avoiding exonyms where possible), I disagree we have to take this to the extreme of using the "correct" Unicode apostrophes (which I bet you will find native speakers not using in many cases as well). Benwing2 (talk) 20:22, 24 July 2023 (UTC)Reply
Other people's carelessness in using Unicode is no excuse for us to be careless, and anyway, language names can always be inserted by typing {{subst:\|xyz}}, which doesn't involve any non-ASCII characters. Latin a and Cyrillic а look identical in every font and font style too, but substituting one for the other is an error; it's no different with ' and ʼ. —Mahāgaja · talk 07:05, 25 July 2023 (UTC)Reply
I think you're missing the point. We don't include Cyrillic letters in language names, either. Benwing2 (talk) 07:13, 25 July 2023 (UTC)Reply
I know that. My point is that using ' where ʼ belongs is as bad as using Cyrillic letters in Latin-script language names. —Mahāgaja · talk 07:24, 25 July 2023 (UTC)Reply
I would support the changes, but only if they're truly the most used forms in terms of literature. Ideally we'd have people from each community give their opinions here, but alas, we're not afforded that. If the specific respective unicode apostrophe is used in literature, then we can use it here too. I can see the problem with inputting the apostrophes that's been brought up, but let's be real here, how many people are actually working on these languages to where this'd be a serious problem? I feel like this could be fixed with just an about:XYZ page or something. These languages unfortunately don't get enough traction. But again, I'd only support this if it can be proven that they're the forms used in English literature. AG202 (talk) 01:49, 17 August 2023 (UTC)Reply
@AG202 I agree with you, that is one of the points I made above, which has gotten lost in this thread. Benwing2 (talk) 02:08, 17 August 2023 (UTC)Reply
Ahh, got it, missed that, apologies. AG202 (talk) 02:11, 17 August 2023 (UTC)Reply
Hmm... like Benwing, my initial inclination is to oppose this, because the odds of anyone being able to type names with the fancy characters when adding entries is low (and given recent events, I wonder if one or more admins would block people for 'adding wrong language names' if people keep typing the names they're able to type). OTOH, I recognize that we require entries themselves to be input using correct spellings (with accents etc) and not in hacky ways... If we had a system like the French Wiktionary where no-one had to type the language names (instead only typing language codes, which only consist of easily-typeable ASCII characters), then changing the displayed character would be less of a problem (though still hard for navigating to categories, etc). Do we have a template with a simple short name people could subst: to produce the untypeable names, so they could write =={{subst:langname|foo-bar}}== to get ==Fooʾbar==? Or if we took this type of functionality and had a button people could periodically press (hosted on here like that Javascript is, not as a Python script on the computer of a user who might leave the project or be too busy to run it) that would search the database for instances of the typeable names and update them to the untypeable names, then it would be less of a problem (although it'd still be creating an unending maintenance task). - -sche (discuss) 16:22, 16 August 2023 (UTC)Reply
We do have {{subst:x2i}} that will convert the string _> to ʼ, but more helpfully we have (as I mentioned above) {{subst:\}}, which converts a language code to its canonical name. —Mahāgaja · talk 21:55, 16 August 2023 (UTC)Reply
Even with these workarounds, it seems extra work for no gain. There is no rule that says we need to follow native orthography to the T in our English names for languages; otherwise we'd have Deutsch in place of German, and русский in place of Russian, etc. I have seen no arguments that indicate why having these special apostrophes in language names gains us anything except some nebulous sense of "correctness". Benwing2 (talk) 23:07, 16 August 2023 (UTC)Reply
Deutsch is the endonym. What we're talking about here is using the proper Unicode characters for whichever name we decide to use. The apostrophe is a punctuation mark, and the glottal stop is not punctuation. Using the letter for glottal stop is analogous to using en-dashes and minus signs rather than hyphens. kwami (talk) 00:28, 17 August 2023 (UTC)Reply
Deutsch is the endonym
Yes exactly. The exonym can have apostrophes while the endonym has Unicode whatever. Nothing wrong with that. Benwing2 (talk) 00:56, 17 August 2023 (UTC)Reply
@Benwing2 I think we’re getting too focused on Unicode. The thing we should care about is what character is actually intended, which isn’t necessarily the same as what they actually wrote. To use an analogy: we don’t lemmatise the palochka with the numeral 1 or Latin l, even though both are probably more common than the actual palochka character, and that’s because we all know that the writer intended to use a palochka irrespective of what character they actually wrote in Unicode. Theknightwho (talk) 02:18, 17 August 2023 (UTC)Reply
@Theknightwho I think we'll just have to agree to disagree here. I don't think the analogy you are making here with palochka is very applicable and you're still missing the point made by User:AG202 about what's the most common usage in scholarly and other English sources. Benwing2 (talk) 02:24, 17 August 2023 (UTC)Reply
@Benwing2 The whole reason I brought it up is as an example of when the most common usage isn’t necessarily an indicator of what’s most appropriate. I’ve also seen plenty of typography mistakes in scholarly sources, too, or fonts that map common characters to a glyph of what is actually intended. You can’t just rely on the codepoint. Theknightwho (talk) 02:27, 17 August 2023 (UTC)Reply
Just to be clear, when I said common usage, I meant what character is actually intended, not necessarily parsing specifically based on codepoints. However, this isn't an easy task for sure, unfortunately. AG202 (talk) 02:49, 17 August 2023 (UTC)Reply
Doesn't matter whether it's the endonym or exonym: the apostrophe is a punctuation mark, and these are not punctuation marks. Yes, we can substitute, and that's common enough. We could also use a hyphen for a minus or a double hyphen for an em dash -- those substitutions are common too -- but that doesn't mean we should do that. We could substitute click letters with exclamation marks and pipes. But if we want Wiktionary to look professional, then IMO we should typeset it professionally, and not use ASCII substitutes just because they're easier to type. kwami (talk) 04:06, 17 August 2023 (UTC)Reply


Ktunaxa, Secwepemctsín

[edit]

Could we rename Kutenai (kut) to Ktunaxa, and Shuswap (shs) to Secwepemctsín please? The first names are the Anglicized terms for the languages, and are somewhat outdated and/or not in use among speakers. GKON (talk) 22:46, 12 August 2023 (UTC)Reply

@-sche Can you weigh in here? There is nothing wrong per se with having exonyms for languages (we say "German" not "Deutsch" for example), and I note that Wikipedia still uses Kutenai and Shuswap. The main issue in my view is (a) avoid pejorative terms, and (b) use the most common terms as found in English-language sources. Benwing2 (talk) 23:37, 15 August 2023 (UTC)Reply
For Shuswap, almost no-one uses Secwepemctsín in English, either in books overall as tracked by Ngram Viewer, or in reference works about the language at Glottolog. For kut, Kutenai was the main name (in reference works/Glottolog and overall/Ngrams) until a few years ago, when Ktunaxa started to just barely overtake it. - -sche (discuss) 17:45, 16 August 2023 (UTC)Reply
That is true, however I would argue that for Shuswap, the use of this term is declining as seen by Ngram. The replacement is looking like Secwepemc, which is another word for the language that is kind of a good middle ground between Shuswap and Secwepemctsín, wouldn't you say? Also, the actual communities in Secwepemc traditional territory mostly use Secwepemc. For example, if there is some quote or phrase on a billboard in Shuswap, the billboard will say that it's in Secwepemc. Another real life example was a board in Banff town, which had greetings in multiple languages. Among them was Blackfoot, Stoney, Ktunaxa, and Plains Cree, (apart from Ktunaxa) these are all Anglicized terms. However the greeting in Shuswap was said to be Secwepemc.
Shouldn't we be using this term, seeing as it gets the most use in these modern times? GKON (talk) 17:09, 20 August 2023 (UTC)Reply

Akan varieties

[edit]

@-sche This is another mess. Wikipedia has an article Akan languages yet according to both Glottolog and Ethnologue, all varieties are mutually intelligible and better classified as dialects, and indeed we have a single Category:Akan language (code 'ak'). The correct family tree seems to include a top level division into Fante, Twi and Wasa, all of which have ISO 639-3 codes (respectively fat, twi, wss; and Twi has the ISO 639-1 code 'tw' as well). Twi in turn is divided into Asante, Akuapem and Bono. Fante and all three Twi varieties have their own literary standards, and there is also a unified Akan literary standard based primarily on Akuapem. Up until recently, we had {{dialectboiler}} categories for Fante and Twi, called Category:Fante Akan and Category:Twi Akan. I added etym-only varieties for those two as well as for the Twi lects of Asante, Akuapem and Bono. Then I discovered we also have separate languages under Akan for Category:Abron language (= Bono), Category:Wasa language and Tchumbuli (which has no lemmas, and I have no idea what it is). None of these Akan languages have very many lemmas (< 10 each), and as mentioned Tchumbuli has none. I would recommend either we convert Akan into a family and fix up the hierarchy appropriately, or (preferably) we maintain the single Akan language and convert the sublanguages into etym-only varieties. The list of varieties under Category:Akan language is also somewhat messed up (e.g. what is 'Twi-Fante'?), but that is less important. Benwing2 (talk) 18:10, 17 September 2023 (UTC)Reply

Looking into the history (of the codes, on Wiktionary), I think the sub-dialects simply escaped notice at the time Twi, Fante, and Akan were merged. I note that the two Wasa entries we have are identical to Akan, and the Abron ones are very similar. I would merge them; AFAIK the difference was historically in spelling, not in speech, and since the 70s also not anymore in spelling. (I entered the Abron entries a year before the lects were merged, using a reference published two years before the speakers of Abron and the other dialects of Akan unified their orthographies. The Wasa entries were added in 2021 by a Japanese editor, also using an old pre-reform ref, which the user also used for the Akan spelling: we should check what the modern spelling is...) Re "Twi-Fante" being listed as a "variety" of Akan: it was originally listed as an alternative name of Akan; when 'alternative names for the language' and 'names of varieties' were split into being separate parameters, someone must've mis-assigned it. - -sche (discuss) 06:02, 23 September 2023 (UTC)Reply

New language codes for nested Persian translations

[edit]

Per Wiktionary:Beer_parlour/2023/October#Persian_nested_translations_-_split_or_labelled?

@Sameerhameedy, @Benwing2, @Theknightwho.

New codes and labels, under "Persian" to work with MediaWiki:Gadget-TranslationAdder.js

  1. "prs" - Dari
  2. "fa-cls" - Classical Persian

Considering "fa-ira" for Iranian Persian. Anatoli T. (обсудить/вклад) 05:13, 4 October 2023 (UTC)Reply

Don't we normally use ISO 3166 codes for countries? I'd say it should be "fa-IR". —Mahāgaja · talk 09:24, 4 October 2023 (UTC)Reply
@Mahagaja: Not sure what is right in this case but it must have been done.
Both "prs" and "fa-ira" seem already working but {{t+|اَفْغانِسْتان}} fails to link to fa:افغانستان
Since the code is already working (apart from the interwiki) links, automatic nesting should be possible as well.
Need to make "fa-ira" link to "fa" Wiktionary, just like "cmn" links to "zh" Wiktionary. {{t+|cmn|阿富汗}} to zh:阿富汗
@Benwing2, @Sameerhameedy, @Theknightwho: can someone please fix the the interwiki link? I think it was @Ruakh who made it work for Mandarin. I'll take a look at nesting. Anatoli T. (обсудить/вклад) 00:07, 13 October 2023 (UTC)Reply
Actually the new codes still don't work with the translation-adder. Some changes to Module:languages/data submodules need to happen. Anatoli T. (обсудить/вклад) 00:19, 13 October 2023 (UTC)Reply
@Mahagaja: "fa-ira" is correct per Module:etymology_languages/data Anatoli T. (обсудить/вклад) 00:33, 13 October 2023 (UTC)Reply
Update: @Sameerhameedy: Language code "prs" can now be used for automatic nested translations: Persian\Dari. Just use the language code "prs" in the translation adder but I wasn't able to tweak modules for "fa-ira" or "fa-cls". Anatoli T. (обсудить/вклад) 02:58, 13 October 2023 (UTC)Reply


Merging Mengisa and Leti (Cameroon); Rename Leti (Indonesia) to Leti

[edit]

Per Wikipedia, Leti and Mengisa are the exact same thing (Leti is spoken by the Mengisa). We currently don't have the Mengisa language. Googling "Mengisa language" and "'Leti language' 'Cameroon'" show about an equal number of results. I wonder if we could rename leo to "Mengisa" (since the two names are equally used), thus also freeing up place to rename lti to "Leti", making editing both more accessible. Any objections? Thadh (talk) 16:46, 18 October 2023 (UTC)Reply

@Thadh No objections. Benwing2 (talk) 04:51, 19 October 2023 (UTC)Reply
Merged/Renamed. Thadh (talk) 21:20, 25 September 2024 (UTC)Reply

Splitting Mazurian

[edit]

I would like to open a discussion about the pros and cons of splitting Masurian as an L2 with the langcode zlw-maz and as a descendent of Old Polish. I would also like to preface this that while I am leaning towards split that I am not dead-set on it. The argument is as follows:

w:Masurian dialects would benefit a lot from having a separate L2. There are significant differences in pronunciation (extra vowels non-existant in Polish a loss of quite a few consonants), grammar (different endings from standard Polish), and vocabulary, especially outside the "core" vocabulary. Even a significant number of basic forms end up looking different from Polish, and it has many inflections and conjugations. I could place them in the tables for Polish, but it might get cluttery. I would like to also point out that {{R:pl:SgOWiM}} exists as a good, reliable source for entries.

Problems of splitting - most people do consider this specifically a dialect, even most speakers, and most forms of it today are heavily policized. However, at least up until the 20th century it was distinct and much more difficult to understand in comparison to standard Polish. My problem is that some of these differences are so vast it might not make sense to put them all under Polish. Vininn126 (talk) 21:43, 12 November 2023 (UTC)Reply

A point for not splitting is that some other dialects of Polish might be equally as divergent, such as Łowicz, in some respects. So what might be better is including multiple declension tables and the like. (Notifying KamiruPL, BigDom, Hythonia, Tashi, Sławobóg): , @Benwing2, @PUC, @Thadh Vininn126 (talk) 12:58, 14 November 2023 (UTC)Reply
Here is some sample text The little prince in Mazurian. This channel has some other examples. As someone with high proficiency in Polish, I can understand large parts of it but there's also a significant portion that is very difficult, maybe 65% for me. Vininn126 (talk) 17:28, 14 November 2023 (UTC)Reply
@Vininn126 As you know, I tend to lean towards not splitting in cases of doubt, while Thadh leans towards splitting. Comparisons to multi-dialect languages like Occitan and Ancient Greek might be useful. In this case I don't know, but I think we're hampered by the lack of standardization. Benwing2 (talk) 23:06, 14 November 2023 (UTC)Reply
@Benwing2 There is a notation system widely used for Masurian which is present in the Wikipedia article that I'd be able to use for WT:About Masurian if split. Also, using this system would yield in 1) a different pagename 2) different pronunciation section (as the notation system is based on the different pronunciation) 3) different definition section at least outside of "core" vocab, and core vocab would only share 1-2 defs, as opposed to all of the obsolete senses as well. 4) different conjugation/declension section as well Vininn126 (talk) 08:53, 15 November 2023 (UTC)Reply
I’d be in favour of the split. As a native Polish speaker I find it difficult to understand some Mazurian texts and eg. parts of this Mazurian rendition of Colors of the Wind, Farbi Zietrżu would be straight ungrammatical in Polish (the infinitive construction in cÿsz ti słicháł zilkä wicz ‘have you heard the wolf howl’, which looks more Czech than Polish and in Pl. would have to be reworder as ‘czyś ty słyszał jak wilk wyje’ or ‘wycie wilka’ or something, but the infinitive doesn’t work).
Also I’ll note that Mazurian also keeps some phonemes long gone from standard Polish (like the /r̝/ phoneme written in the song above which Polish merged with ż /ʐ/).
And, @Vininn126, could you include me too in Polish-related discussions when pinging people? I feel left out ;-) // Silmeth @talk 12:28, 15 November 2023 (UTC)Reply
@Silmethule I can add you to the Polish ping group. Yes, the completely different set of phonology and grammar are both big points for me. Masurian also keeps reflexes of Old/Middle Polish pochylone vowels while getting rid of quite a few consonants. Reading up on the Wikipedia article, quite a few experts also claim it's a language. Vininn126 (talk) 14:16, 15 November 2023 (UTC)Reply
Having boned up more on Polish dialectology I'm definitely leaning now more towards split. I haven't been able to find another dialect (that we would mark as such) as divergent as Masurian. There's also a big gap of mutual intelligibility Vininn126 (talk) 15:48, 20 November 2023 (UTC)Reply
I'll also add I was wrong in the original post - the Masurians had a stronger sense of identity even more so than the neighboring regions. Vininn126 (talk) 16:47, 20 November 2023 (UTC)Reply
I'm still wavering, upon listening to more recordings. It might be possible to automatically generate pronunciation sections (even though they would be very, very different), and then it would just be a matter of giving special definitions a label and then I suppose conjugation/declensions... Vininn126 (talk) 09:17, 28 November 2023 (UTC)Reply
@Silmethule @Mahagaja Another question would be the langcode. Is the one I proposed best? I doubt it. At this point I'm fairly sure we are splitting.Vininn126 (talk) 13:48, 7 December 2023 (UTC)Reply
@Vininn126 Depending on the choice of Mazurian vs. Masurian, it should be zlw-maz or zlw-mas. Benwing2 (talk) 22:21, 7 December 2023 (UTC)Reply
@Benwing2 You're right, so it's probably gonna end up being zlw-mas. Vininn126 (talk) 22:23, 7 December 2023 (UTC)Reply
I'm going to go ahead with this today and make an entry. I've also been able to contact someone educated in this lect and they'll be able to check anything that I (or potentially we, me and him) make. There is a weak consensus it should be split, and if it's handled right I think it will be much better than smushing everything into Polish. Vininn126 (talk) 17:55, 8 December 2023 (UTC)Reply
@Benwing2 @Mahagaja @Silmethule Sorry for all the pings as of late. I figured now would be a good time to take a pause and look at the current state of things after the decision. We currently have 428 Masurian lemmas, Appendix:Masurian pronunciation, Appendix:Masurian Swadesh list, along with various infrastructure. I know this is a lot of material, I ask you to please take a look at some of these and give your input, and I thought now would be a good time before things got too big, and also at this point I am going to slow down.
Of the existing lemmas, I added mostly cognates, so there aren't many words unique to Masuria, but there are plenty of definitions and of course, pronunciations. I haven't been able to do any work with declensions, as Masurian declensions are too complicated for me at the moment, but I can assure you there are plenty of differences.
I also know I gave the impression I was gung-ho for a split, and also for a split for Goral, which isn't the case, I simply found resistance everywhere I went when trying to add Masurian information - some felt it clogged up the main Polish entry, didn't want particular information, other times I heard that it's remarkably different.
Having added all these terms, I can still see it going either way. On one hand, having it split as a language is a view held by some linguists, but not all (always a problem), and I think the orthography us few Masurian editors have been using easily demonstrate the phonemic difference (the template is phonemic except for (literally) 1 or (potentially) 2 phones, that being the ones represented by <ä> (which might be phonemic) and <ÿ> (which I believe is phonemically /i/).
However, if we merged, as I have seen various reactions to the split, and understandably so, I'd have a few questions.
What would be the best way to represent Masurian pronunciation? We could ignore spelling and put everything under the Polish spelling, using a respelling in the pronunciation module. This is the approach I take with Middle Polish, and it serves me well. For Masurian only terms (such as szmanta), I'd prefer to keep {{zlw-mas-IPA}}, similar what we have currently {{zlw-mpl-IPA}}. However this leaves us with the issue of <ä> and <ÿ>.
Another potential approach would be to keep the spellings, but I'd be less sure about this, as it works better for British/American English. One potential issue this would solve is the problem of standard Polish definitions absent from Masurian.

One other potential issue is the fact that Masurian would ideally be treated as an LDL. Currently Middle Polish is (not standardly!) treated as an LDL, despite being part of Polish, and it would be a shame to see the potential for someone to RFV all of them (perhaps they won't, but the option exists) and have certain very real terms deleted just because it's considered part of a WDL.

I know there's been a lot of talk about this lately, hopefully there isn't too much fatigue. That is why I decided it might make more sense to review this now and press on later. Vininn126 (talk) 23:40, 18 January 2024 (UTC)Reply
I was asked by Vininn to add my two cents on the issue, so here I go.
I must say I am worried about using language splits in order to circumvent the WT:WDL policy. I understand the frustration of having dialectal terms left undocumented, but there is no way to objectively draw a line between one dialect and another. In the end the smallest unit of a complete language system is an idiolect, and between that and a language family any grouping is ultimately either political or arbitrary.
I'm not sure how to define what is and isn't a language. I would say ISO codes are a good start, and after that splits may be warranted provided that there is abundant literature in the lect, a solid written language, or some major problems in mutual intelligibility... Knowing how Slavic languages are, the last one is probably not the case with these Polish lects. I don't know enough about them to comment on the first two.
With historical lects, a different issue comes up. In my opinion, it is only possible to treat a standard language as an WDL after its standardisation, and so I would prefer lects like Middle Polish to stand separate, like Old Ruthenian, and in my opinion the same should be done with Middle Russian (although this discussion led nowhere). Thadh (talk) 13:26, 19 January 2024 (UTC)Reply
@Thadh As to intelligibility, as mentioned above, I'd say that Massurian (and to a lesser extend Goral) is as intelligible as two other Slavic languages, so somewhat, but also quite diffificult for a lot of people. Middle Polsih is also the period when standardization really began and to some extend, solidified. Vininn126 (talk) 13:42, 19 January 2024 (UTC)Reply
@Thadh, Vininn126: regarding mutual intelligibility, my subjective opinion is that Middle Polish is easier for a modern Polish speaker than Masurian (if not because of anything else, then due to exposure in school to 16th and 17th century texts) – but since modern standard Polish does continue the standard that was established during Middle Polish period, I think there’s more to it. Masurian truly feels “foreign”. So if we’re willing to keep Middle Polish as a separate lang, IMO Masurian deserves the treatment too.
But then, regarding the factors of attestation in literature, separate grammar, recognition in separate ISO code, etc. – we’ve merged Classical Gaelic with modern Gaelic langs and it’s still not split – despite having its own ISO code, having very rich literature in 13th–18th centuries, its own grammar schooling tradition, established (if changing in time) spelling conventions, etc. So even we acknowledge those factors provide good guidance we definitely don’t always follow it very closely. // Silmeth @talk 14:20, 19 January 2024 (UTC)Reply

Proposal for several languages without ISO codes

[edit]

Tagging @-sche and @Benwing2 who are likely to be interested in this. Here is a list of languages that currently lack ISO codes, with a brief explanation as to why they probably justify an L2 code. In a couple of cases, we're never likely to have more than a handful of entries for the language in question due to the scant number of attestations we have, but I don't think that should be used as justification for exclusion.

Baltic

[edit]
  • Splitting Galindian (xgl) into East Galindian (xgl-eas) and West Galindian (xgl-wes).
    This seems to have been a genuine mistake by the ISO: "Galindian" refers to two separate extinct languages within the Baltic family, which don't even seem likely to have been part of the same sub-branch. Both are poorly attested, however.
    What is there to add in either language? WP says both are "poorly attested", but I'm having trouble finding whether they are actually attested or this is just an editor's euphemism for "not attested". (All I've found so far is a random website mentioning that some placenames are known or inferred for "Galindian".) This would help with deciding whether to just retire xgl, add full codes for East and West, or add etymology-only codes for them. - -sche (discuss) 19:29, 16 January 2024 (UTC)Reply

Creoles and pidgins

[edit]
  • Scots-Yiddish (crp-syi)
    A Scots-Yiddish creole spoken in the first half of the 20th century. Attestations are scanty, but some records do exist.
    I'd like to see good evidence that this is a genuine creole (or even pidgin) rather than Scots with some Yiddish loanwords or simple code-switching. Pidgins rarely arise when there are only two languages in contact, and not all pidgins undergo creolization. —Mahāgaja · talk 07:36, 8 December 2023 (UTC)Reply
    Yeah, I don't think we have enough evidence of this being a real, distinct language to add it. (Several of the relatively few works "in" the "language" appear to be inventing, or as they put it, "reimagining" it like a conlang.) - -sche (discuss) 19:05, 16 January 2024 (UTC)Reply

Dravidian

[edit]
Created. Theknightwho (talk) 01:18, 3 February 2024 (UTC)Reply
  • Malamuthan (dra-mal)
    A small tribal language related to Malayalam - we have quite a few of these already, and I see no obvious reason to exclude this one.
    I'm having trouble finding any reference works about this; Mikhail S. Andronov (in A Comparative Grammar of the Dravidian Languages and A Grammar of the Malayalam Language in Historical Treatment) speaks of "the Malamuttan dialect". Perhaps we should just wait until someone has content they're wanting to add in this lect, to judge how distinct it is. - -sche (discuss) 19:38, 16 January 2024 (UTC)Reply
    @-sche I'm not sure if you've seen it, but pages 37 to 39 of Tribal Languages of Kerala has some information about it, which notes a number of distinctive qualities; not least because they have a very strong tradition of isolating themselves from outsiders. That paper cites a 1981 reference work, but I assume it's in Malayalam. Theknightwho (talk) 14:35, 20 February 2024 (UTC)Reply

Germanic

[edit]
  • Greenlandic Norse (gmq-grn)
    A descendant of Old Norse spoken in Greenland until sometime in the 15th century, which diverged likely due to isolation (compare Icelandic and Norn). Some linguistic innovations and conservations have been noted, though the number of attestations is relatively small.
    Oppose: This is concidered a dialect of Old West Norse, for which we already have code: non-own. --{{victar|talk}} 19:22, 7 December 2023 (UTC)Reply
    @Victar That's an etymology-only code, not a full language code. Theknightwho (talk) 20:22, 7 December 2023 (UTC)Reply
    I'm aware. This is a subdialect of a larger dialect. --{{victar|talk}} 20:30, 7 December 2023 (UTC)Reply
    My initial inclination is to keep treating this as ==Old Norse== as far as L2s go (or if we really want to, treat it as ==Old West Norse== and upgrade OWN to being attested like Proto-Norse). Various Old Norse dialects including this one have some differences from one another, but I do not know that it makes sense to speak of Greenlandic Norse as a "descendant" of Old Norse when it was contemporaneous and stopped being spoken at around the same time as other Old Norse, and other members of the dialect continuum do not seem to have had trouble understanding it, or at least modern scholars don't (given the uncertainty over whether various texts or inscriptions represent Greenlandic Norse or e.g. the Icelandic dialect of Old Norse, and that it sometimes even comes down to just the shapes of runes rather than anything about which letters or words are used); it seems like we can continue to treat it as a dialect in the dialect continuum. It would be reasonable to add an etymology-only code, for use in various Greenlandic terms' etymologies (since we are extremely free with these, and have ety-only codes even for things like en-NNN vs en-US ... I see we even have "en-US-CA" although this does not appear to be used anywhere and I am going to suggest it be deleted along with Template:User en-us-ca...). - -sche (discuss) 20:12, 16 January 2024 (UTC)Reply
Closing this by giving it the etymology-only code non-grn under Old West Norse. Theknightwho (talk) 01:33, 7 February 2024 (UTC)Reply

Indo-Aryan

[edit]
  • Kishtwari (inc-kst)
    Closely related to Kashmiri (and sometimes classified as a dialect), but only retains partial mutual intelligibility, and (unlike Kashmiri) appears to be written using the Takri script.
    Oppose: I have never seen Ka/ishtwari referred to anything other than a dialect of Kashmiri, alongside Kohistani, Poguli, Rambani, and Siraji. --{{victar|talk}} 08:32, 8 December 2023 (UTC)Reply
    @Victar Poguli has an ISO code, so I’m not sure how much value your assertion has. Theknightwho (talk) 08:42, 8 December 2023 (UTC)Reply
    And just because an ISO code exists, doesn't mean we on the project should create a language for it. Often times, village dialects have codes just because someone put out a paper on it, not because it's any more unique than any other dialect on the continuum of dialects. --{{victar|talk}} 09:30, 8 December 2023 (UTC)Reply
    @Victar It calls into question the value of your statement that you have never seen it referred to as a language, if you’re putting it on the same level as a lect which does, in fact, have a language code. It also directly contradicts your previous statement as to the weight we should put on language codes. There is also the matter of the Takri script. Theknightwho (talk) 09:44, 8 December 2023 (UTC)Reply
    It doesn't contradict my opinion at all. In my experience, partially when it comes to Indo-Iranian, is ISO over assigns language codes, so trying to give a language code to a dialect when even ISO doesn't is saying something. --{{victar|talk}} 10:22, 8 December 2023 (UTC)Reply
    @Victar None of which is relevant to the fact there is evidence it isn’t even written with the same script - please present something more substantive than a personal hunch, or a selective approach to the weight you put on language codes. Theknightwho (talk) 10:29, 8 December 2023 (UTC)Reply
    A language written in multiple scripts is practically a hallmark of Indo-Iranian languages and to cite that as a reason to call it a different language would be naive. --{{victar|talk}} 10:39, 8 December 2023 (UTC)Reply
    @Victar You’re being highly misleading: when a “dialect” is written in a different script, its speakers do not consider themselves to be speaking the same language, and it’s also highly divergent (to the point where it is tonal, unlike Kashmiri), then it creates a compelling case for separating it out. Theknightwho (talk) 10:44, 8 December 2023 (UTC)Reply
    That is such an absurd statement. Script usage is frequently dependent of region and religion. Most literate Kashmiri speakers write in Perso-Arabic but the Hindus population uses Devanagari, regardless of any dialectal differences. Also I can't find any paper states Kishtwari is any more or less tonal than standard Kashmiri. You're overreliant on a Wikipedia article for your facts. --{{victar|talk}} 11:41, 8 December 2023 (UTC)Reply
    @Victar Except this is the Takri script and it is directly related to “dialectal” differences, so your comparison is nonsensical because it shows that script usage in this case is affected by the lect, not other factors like religion. Standard Kashmiri isn’t tonal at all, as you very well know. Theknightwho (talk) 11:48, 8 December 2023 (UTC)Reply
    Yes and the Kishtwari dialect is spoken in the region of the Kishtwar Valley, and the use of Takri is regional. Again, no paper I read remarks anything on tone. Unless you can provide a paper, your statement is meaningless. --{{victar|talk}} 11:57, 8 December 2023 (UTC)Reply
    @Victar we also have code for haryanvi, considered a dialect of Hindi. So should it be removed? Word0151 (talk) 12:48, 8 December 2023 (UTC)Reply
    🤷 Plenty of Hindi project users that can decide that. --{{victar|talk}} 01:33, 9 December 2023 (UTC)Reply
  • Urtsuniwar (inc-unr)
    Closely related to Kalasha, but appears to be divergent enough to constitute a separate language with around 70% mutual intelligibility (compare Spanish/Portuguese with 85-90%).
    Oppose: Urtsuniwar is a synonym for Kalasha, see Decker (1992). Some speakers just use more Khowar borrowings than others. --{{victar|talk}} 08:32, 8 December 2023 (UTC)Reply
    @Victar Patently untrue - numerous references in the sources provide by WP (and elsewhere), and you’ve failed to explain the issue of mutual intelligibility. Theknightwho (talk) 08:45, 8 December 2023 (UTC)Reply
    How is it "patently untrue"? Did you read Decker (1992): "Kalasha speakers in the Urtsun Valley sometimes call their language Urtsuniwar." I did explain the "issue of mutual intelligibility" -- speakers of Kalasha use varying degrees of Khowar borrowings. --{{victar|talk}} 09:30, 8 December 2023 (UTC)Reply
    @Victar 70% mutual intelligibility is far below the threshold typically used to classify something as a dialect (80-85%) - the fact that one citation says they are the same does not discount the wealth of evidence to the contrary. Theknightwho (talk) 09:44, 8 December 2023 (UTC)Reply
    What "wealth of evidence"? The first reference on the Wiki page literally lists Urtsuniwar under "Other Names" for Kalasha, beside Bashgali, Kalashwar, Kalashamon, and Kalash. Shall we make Kalashwar its own language as well? Another reference there is titled, I shit you not, "Kalasha of Urtsun". --{{victar|talk}} 10:22, 8 December 2023 (UTC)Reply
    @Victar Insufficient levels of mutual intelligibility, as stated several times. Theknightwho (talk) 10:32, 8 December 2023 (UTC)Reply

Iranian

[edit]
  • Gorgani (ira-gor)
    An extinct Caspian language attested in the 14th century, which appears to have formed a dialect continuum with Mazanderani. Previous discussion here.
    Oppose: The few texts we have in Gorgani are almost indistinguishable from Old Tabari, the ancestor of Mazanderani, and should be considered a dialect of it, not its own language. There are actually more differences between Old Tabari and Mazanderani, but, like Classical Persian and Modern Persian, we treat them as the same language, in large part due to their use of an abjad alphabet. @Fay Freak --{{victar|talk}} 19:35, 7 December 2023 (UTC)Reply
    @Victar In all seriousness: given you clearly respect the views of Borjian, how do you explain his apparent change in view from the line you quoted from 2004 and his 2008 paper on Gorgani in which he invariably refers to it as a language (not a dialect)? Theknightwho (talk) 22:43, 7 December 2023 (UTC)Reply
    By its only being apparent. If you search for such a distinction. I’ve just looked into the 2008 paper again just for you. Normal(ly) people don’t look upon the statistical distribution of the employment of “language” and “dialect” in previous publications to find “changes in view” of linguists. Their views are rarely that sophisticated that one could make meta publications as one does on philosophers, and even then following such a bright shiny object is not an argument. language has multiple languages like sublanguage, including dialect, and one is not only not always anxious to make a distinction, there is usually nothing gained at all from such a “turf war”. All is language and words, rarely isolects or lexemes. Whether or not something should be treated separately is decided long before you realize you could beat the topics of this dichotomy again to fill your publication history.
    In this case the talk of “language”, I may argue, is purposefully misleading people, to market one’s publication career. It’s just much more zhoosh to publish about whole “languages” than dialects. But it’s okay to embellish things a bit since the core message of a paper does not hinge on these concepts. All historical sciences use to be much less exact in their design than that of the jurist who has the peculiar task to weigh or find a balance for a final decision. Like how I formulate etymologies in probability terms is secondary to what information is provided, in other words: it is mostly rhetorics to present the material, the related forms, reconstructions, and bibliography—this is the science, the result is of little practical relevance, unlike in the legal art where in the end you get a sentence or recommend an action. There is a principal misunderstanding of what linguistic papers are about here I can make out. Benwing noticed. You take publications of an author and read them with an exactitude that they don’t provide, with “research results” that they didn’t care about. One could enjoy that there are still naive academics whose subjects are recondite enough for their not bewaring of a lawyer around the corner attempting to misinterpret them. Fay Freak (talk) 00:35, 8 December 2023 (UTC)Reply
    @Fay Freak This seems like a very cynical answer, and it’s difficult to see how you’re not simply accusing Borjian of academic dishonesty. Also Benwing2 didn’t add anything on this topic - he simply asked for consensus. Theknightwho (talk) 08:48, 8 December 2023 (UTC)Reply

Nuristani

[edit]
  • Zemiaki (iir-zem)
    Spoken by around 500 people and related to Waigali, but I'm not seeing any indication it should be treated as a dialect in the literature.
    Oppose: Morgenstierne (1974) calls it a dialect of Waigali, and Edelman (1999) is unsure, labeling it "jazyk/dialekt". We should play it safe and treat it like a dialect. --{{victar|talk}} 21:46, 7 December 2023 (UTC)Reply

Tungusic

[edit]
  • Alchuka (tuw-alk)
    A language in the Jurchenic branch (i.e. close to Jurchen and Manchu), which went extinct at some point in the 1980s. Records of the language aren't great, but there are a handful of works which go into detail.
  • Bala (tuw-bal)
    A very similar situation to Alchuka above, though the language may still be moribund.
  • Kili (tuw-kli)
    Formerly thought to be a dialect of Nanai (a Southern Tungusic language), but now thought to be a Northern Tungusic language influenced by Nanai due to geographical proximity; it had 40 speakers in 1990, and is likely moribund.
With no objections, creating these three. Theknightwho (talk) 18:28, 4 February 2024 (UTC)Reply

Yeniseian(?)

[edit]
  • Jie (qfa-yen-jie)
    Likely to be a Yeniseian language (though possibly Turkic), with only a single attestation from the 4th century (though it wouldn't be the first).
In the absence of objections, I'll create this, given the number of potential entries is capped at 4. Given the contention over its affiliation, und-jie is preferable as a code. Theknightwho (talk) 16:57, 4 February 2024 (UTC)Reply

Unknown

[edit]
  • Xiongnu (und-xnu)
    Attested only via in Old Chinese records of the language [edit: and potentially some inscriptions - see below], but nevertheless, a handful of terms have been recorded (and we can, at least, make broad reconstructions as to how they would have been read): e.g. the Old Chinese borrowing 谷蠡.

Theknightwho (talk) 16:03, 4 December 2023 (UTC)Reply

Oppose Xiognu (Old Chinese is Old Chinese). West Galindian is also unattested. Is East Galindian attested outside of borrowings? If not, maybe keep as a substrate language? Provisional support Zemiaki, Kishtwari, Urtsuniwar, based on the assumption there are no good arguments to keep these together. Abstain for the others: poorly attested, extinct languages are usually subject to a lot of debate and usually dictionary entries in these don't turn out well, but they at least seem valid. Thadh (talk) 16:25, 4 December 2023 (UTC)Reply
@Thadh The issue with Galindian is that we need to deal with the present situation, since having a single language code for both is simply incorrect. Re Xiongnu, I'm not referring to borrowings - I'm referring to specific records of the Xiongnu language in Old Chinese sources. Theknightwho (talk) 16:30, 4 December 2023 (UTC)Reply
@Theknightwho: Do you mean mentions of terms à la Uindiorix, or do you actually mean texts à la Luwian? Because in the former case, I'm inclined to call it a borrowing rather than an attestation, whereas the second one is fair enough. Thadh (talk) 17:18, 4 December 2023 (UTC)Reply
@Thadh It's a bit tricky - for example, see [6], where Vovin argues (quite convincingly) that they're inscriptions in Xiongnu which used Old Chinese characters for their semantic values, except for terms that needed to be transcribed phonetically, such as titles or personal names. There's obviously precedent for this - compare Japanese, Korean, Vietnamese etc. Theknightwho (talk) 18:01, 4 December 2023 (UTC)Reply
@Thadh: Discussion will be considerably less confusing if people put their Supports, Opposes and Abstains under each individual case rather than grouping them together at the bottom. —Mahāgaja · talk 18:06, 4 December 2023 (UTC)Reply
@Mahagaja: I had quite general remarks: Living languages - split. Unattested languages - no split. Rest - abstain. I think repeating this ten times is a bit overkill. Thadh (talk) 21:12, 4 December 2023 (UTC)Reply
I'm usually sympathetic to adding extinct language X even if it's only attested as quotations/mentions/etc in old records in language Y, as long as we're sure X was a language (and different from, not just a dialect of, Y or another language). With Xiongnu, it seems like no one is sure which of various unrelated ethnolinguistic families the Xiongnu people and language(s) might have been from, or even if it was composed of multiple ethnolinguistic groups. That last part gives me pause. Are scholars generally in agreement that the attested words from the Xiongnu are all in one language, or is this like e.g. "Loup" where it's multiple different languages? (We currently have Category:Loup B language, but this is questionable and it seems good that we don't have any entries.) - -sche (discuss) 21:15, 4 December 2023 (UTC)Reply
@-sche A lot of that lack of certainty comes from two factors:
  • Because Xiongnu is filtered through Old Chinese characters, any kind of reconstruction therefore relies on us being able to accurately reconstruct the readings of those characters. This is something that is gradually improving, and - for example - we are in a much better position to make this kind of judgment than Pulleyblank was in the 1960s
  • There’s been a huge amount of (understandable) speculation as to whether the Xiongnu and the Huns were one and the same. If I had to put money on it I’d say they probably were related, but I strongly suspect there was a large dialect continuum involved (just as there was with the Mongolian languages a millennium later). However, I’m certainly not proposing we merge Hunnic with Xiongnu or anything as radical as that. What we do know is that the inscriptions which were found were created by the same Xiongnu who are written about in Old Chinese sources, because they were excavated in the old Xiongnu capital of Longcheng in Mongolia, which was discovered quite recently. The question is whether they’re in Old Chinese or Xiongnu, but I’m inclined to agree with Vovin that the evidence suggests the latter.
Theknightwho (talk) 03:36, 5 December 2023 (UTC)Reply

2024

[edit]

Medieval Greek from Ancient Greek

[edit]

Please, as in Wiktionary:Beer_parlour/2024/January#Petition_to_upgrade_Medieval_Greek, from Category:Ancient Greek language. (I am sorry that my browser has difficulty to read much of this page.) ‑‑Sarri.greek  I 09:45, 2 January 2024 (UTC)Reply

Support. The request is to split grk-gkm Medieval Greek out of grc Ancient Greek. Previous discussion at Wiktionary:Beer parlour/2023/March#Medieval Greek. @Fay Freak, Al-Muqanna, Nicodene, Vahagn Petrosyan, JohnC5, Benwing2, -sche, the people who participated in that discussion which (like most discussions at Wiktionary, unfortunately) ended inconclusively. By the way, we've been using gkm as if it were an ISO 639-3 code, but in fact it isn't one. A request was made for that code many years ago, but it's never been approved or denied. Therefore if the split is approved, we need to use the exceptional code grk-gkm. —Mahāgaja · talk 11:10, 2 January 2024 (UTC)Reply
Note: The proposal in question was rejected on Hallowe’en 2023. 0DF (talk) 19:54, 19 June 2024 (UTC)Reply
Support, but only if any editors are willing to clean up the mess left behind by the split, otherwise this should wait a bit. Also, we have to first figure out which of the many modern Greek varieties (Standard Greek, Mariupol Greek, Pontic Greek, Italiot Greek, Tsakonian, etc.) are to be descendants of Medieval Greek, and which shouldn't. Thadh (talk) 11:39, 2 January 2024 (UTC)Reply
I'm fairly familiar with Attic Greek, but not with Medieval apart from what I've read on Wikipedia. The sources that I've typically used for Ancient Greek entries when I used to create them don't cover Medieval. I wouldn't be opposed if you and a team of other people familiar with Medieval want to split it. I don't know if I can be of much use unless there are bugs in modules or something. — Eru·tuon 08:25, 4 January 2024 (UTC)Reply
Thank you. I will "clean up the mess left behind the split", @Thadh. It is only 248 words that need fixing, plus all related Modern Greek (el) etymologies; I have a list of 711 corrections. I do a lot of Medieval Greek at el.wiktionary, please do not worry, I will not destroy anything. I need one week to fix everything. Please, (@Erutuon) also Module:grc-pronunciation, Section Period for Template:grc-ipa-rows, Template:grc-ipa-rows-byz, Template:grc-ipa-rows-koi needs to say 10th century Medieval (or Mediaeval, according to your HomeRules) not 'Byzantine', Also at its /data might add med1 med2 also would be a nice addition. I am very happy, to resume work for med.greek! ‑‑Sarri.greek  I 04:51, 6 January 2024 (UTC)Reply
I suppose actually the lines for Medieval Greek should be removed from {{grc-IPA}} and moved into a separate {{grk-gkm-IPA}}. Likewise the option for |dial=gkm needs to be removed from all grc inflection tables and new grk-gkm inflection tables created. —Mahāgaja · talk 08:19, 6 January 2024 (UTC)Reply
@Mahagaja, no, not needed. IPA will be with parameter period=byz1 (or period=med1, if Erutuon might give an alias to this parameter). Also: learned medieval inflections are identical to the standard ancient inflections and there is no need to provide them separately. Nothing different. At el.wikt, if we care to repeat them, we add title: learned medieval inflection as in ancient greek. But we shall not provide any of that now. Never mind for vulgar inflections (I'll let you know about these) Thank you for your concern. ‑‑Sarri.greek  I 08:26, 6 January 2024 (UTC)Reply
We're really not supposed to use one language's templates in another language's entries, so if grk-gkm and grc are two different languages, then we're really not supposed to use things like {{grc-IPA}}, {{grc-decl}}, {{grc-adecl}}, and {{grc-conj}} in grk-gkm entries. And there may still be some differences; for example, does Medieval Greek ever use the dual number? If not then the dual shouldn't be shown in {{grk-gkm-decl}} and {{grk-gkm-conj}} as it is in {{grc-decl}} and {{grc-conj}}. —Mahāgaja · talk 09:10, 6 January 2024 (UTC)Reply

Thank you, (sorry, this page gives me page unresponsive at my Chrome browser, and is often difficult to write here.) Thank you @Mahagaja, The code gkm is in wide use, and although not -still- activated by ISO; there have been attempts to draw attention to its acceptance, and will notify if something changes officially. At el.wikt there are also dialectal gkm‑crt and gkm‑cyp as subordinate codes.
Thank you @Thadh, I will check all instances of insource:xxx and intitle:xxx occurances of relevant words and correct them. For the update Module:families/data/hierarchy#Hellenic and Module:etymology languages/data#gkm I submit here (quoted) the official greek source: Modern Greek Dialects What is a dialect? - Research Centre for Modern Greek Dialects, Academy of Athens

Nowadays we consider as dialects the Pontiac (in which the Greek of Crimea-Mariupol are included), the Cappadocian, the Tsakonian and the Southern Italian. All the other regional variants of the Modern Greek Standard are known as idioms. In particular, the Cretan and Cypriot idioms are exceptionally known as dialects, thus acknowledging an intermediate level of language variation.

All the modern Greek dialects Cappadocian.cpg, Italiot.grk-ita, Pontic.pnt which includes Mariupol idiom) and Modern Greek.el itself come from Medieval Greek, except Tsakonian.tsd, which is a special case. Thank you ‑‑Sarri.greek  I 13:07, 2 January 2024 (UTC)Reply

A bit off-topic, but most researchers I have read claim Mariupol Greek is, in fact, not a Pontic lect and doesn't share much if anything in common with Pontic it doesn't with other Greek lects. Thadh (talk) 13:34, 2 January 2024 (UTC)Reply
I kinda doubt editors are willing to clean up, or review the dialectology of the Abstandsprachen. The ideological distinction is barely worth the effort for that and for always checking in which chronolect a word has been used, an argument I often use, as we do not go completely without distinction if we don’t split at the L2 level: now it means we write a label if we know and abstain if we don’t bother. The result could become more often that someone doesn’t add a valid entry or etymological note due to fear of making a mistake. Fay Freak (talk) 19:46, 2 January 2024 (UTC)Reply

I oppose the change in name from “Byzantine Greek” to “Medi(a)eval Greek” for referring to this chronolect. I’m undecided about the split itself. @Sarri.greek: Could you point us to some well-developed Byzantine Greek entries in το Βικιλεξικό to give us some idea what they’d look like, and to what extent they’d contrast with Ancient Greek and Modern Greek entries, please? 0DF (talk) 02:19, 7 January 2024 (UTC)Reply

@0DF. _For the term, professors of linguistics might answer your question (ref). _Examples Παραδείγματα at wikt:el:Κατηγορία:Μεσαιωνικά ελληνικά. ‑‑Sarri.greek  I 08:45, 7 January 2024 (UTC)Reply

──────────────────────────────────────────────────────────────────────────────────────────────────── @Sarri.greek: Thank you for your response. I'll address the παραδείγματα first.
That category you linked (el:Κατηγορία:Μεσαιωνικά ελληνικά = “Category:Mediaeval Greek”) contains 1,804 entries, so I hope you'll forgive me that I only checked out the first column of entries (from el:ἀβαμπαρλιέρης to el:ἀλλάγιον — 63 pages). Of those, none of the gkm entries contained IPA transcriptions, and the only ones with inflection tables are el:αἰγοβοσκός and el:ἀλλάγιον. Those don't appear to be what I'd call "well-developed". As to contrast, the declension tables in αἰγοβοσκός and ἀλλάγιον are identical to Ancient Greek ones, even including the δυϊκός (duïkós, dual). As they are, those 63 entries suggest there would be no benefit to splitting gkm out of grc and that doing so would only create useless redundancy. That being said, I suspect that there could be some value in the split in the cases of entries like el:-άγρα, el:-αινα, and el:-αλγία, which present (currently unseized) opportunities to explain the loss of the accusative , the loss of the dative entirely, and the collapse of the Ancient nominative–vocative plural -αι and accusative plural -ᾱς into the Modern -ες. I also see cases like the Modern Greek entry καλοκαίρι (kalokaíri, summertime, summer), which currently traces the word's etymology, via Byzantine Greek καλοκαίριν (kalokaírin, good season, good weather), to Ancient Greek καλοκαίριον (kalokaírion, fine weather). It would be great to know how καλοκαίριν (kalokaírin) declines; that being said, is there any reason why its declension couldn't be showcased perfectly well as a {{lb|grc|Byzantine}} {{alternative form of|grc|καλοκαίριον}}?
Now to the nomenclatural issue.
I've taken a look at the authority you cited; for the benefit of others reading this, here are its bibliographical details:

  • David Holton with Geoffrey Horrocks, Marjolijne Janssen, Tina Lendari [Stamatina Lentari], Io Manolessou, and Notis Toufexis [Panagiotis Toufexis] (2019) The Cambridge Grammar of Medieval and Early Modern Greek, four volumes, Cambridge · New York · Port Melbourne · New Delhi · Singapore: Cambridge University Press, →DOI, →ISBN, →LCCN

The authors' rationale for their disuse of the term Byzantine Greek is to be found in the introduction to the work, in this paragraph from page xix:

The system of periodization that we have used is not based on external criteria, which might relate to historically significant dates, such as wars, conquest or independence. For this reason we do not employ the term “Byzantine Greek”: for almost the whole of the period that we are concerned with, a substantial part of the Greek-speaking world was not “Byzantine” in a political sense. Our criteria are instead internal ones, based on clusters of important linguistic changes that we see as occurring around 1100, 1500 and 1700 (for details see Holton 2010, Holton/Manolessou 2010). Consequently, we employ the following terminology in order to denote sub-periods of the history of Greek, terms that also conveniently correspond to those widely used for periodization in Western historical thought: Early Medieval (EMedG) from about 500 to 1100; Late Medieval (LMedG) from about 1100 to 1500; Early Modern (EMG) from about 1500 to 1700.

Appeals to authority are all well and good, but that is poor reasoning. Yes, politics affect language, and the Byzantine Empire, whilst it existed, was (I think you'll agree) the political, cultural, and linguistic "centre of gravity" of the Greek world. The authors write that “for almost the whole of the period that we are concerned with, a substantial part of the Greek-speaking world was not ‘Byzantine’ in a political sense” (my emphasis); however, a person's language doesn't (immediately) change with political borders. Earlier op. cit., on page iii, there occurs the sentence “The geographical area where Greek has been spoken stretches from the Aegean Islands to the Black Sea and from Southern Italy and Sicily to the Middle East, largely corresponding to former territories of the Byzantine Empire and its successor states.” Doesn't that show the centrality of that polity to the history of the Greek language during this period? The authors' reason is weak, and I reject it.
I see another problem here, which is that Holton et al. seem to be treating this chronolect as existing between AD ~500 and ~1700. As you probably know, the Middle Ages (a.k.a. the Mediaeval period) are traditionally bookended by the falls of two Roman Empires, starting with the fall of the Western Roman Empire in AD 476 and ending with the fall of the Eastern Roman Empire (i.e. the Byzantine Empire) in 1453; it's not too much of a stretch to push it later, to 500–1500, but I don't know any informed person who calls the seventeenth century mediaeval, so we couldn't call this chronolect “Medi(a)eval Greek”. Holton et al. are not alone in this, either: on page xviii op. cit. they mention the “dictionary of Kriaras and the Vienna-based Lexikon zur byzantinischen Gräzität”; that “dictionary of Kriaras” is Emmanuel Kriaras' Λεξικό της Μεσαιωνικής Ελληνικής Δημώδους Γραμματείας, 1100–1669 (Dictionary of Mediaeval Greek Vernacular Literature, 1100–1669, my emphasis). Maybe the Greek Μεσαίωνας (Mesaíonas) is conceived of differently from the English Middle Ages. It would be possible to call the chronolect “Mesaeonic Greek”, but we'd very much be neologising there; I could only find one instance of meseonic, so the adjective alone wouldn't even satisfy the criteria for inclusion.
Finally, I note that the other dictionary mentioned alongside Kriaras' is entitled Lexikon zur byzantinischen Gräzität (Lexicon of Byzantine Graecity), so it's apparent that not everyone rejects the term Byzantine Greek. Indeed, a text search for the string byzantin (case- and diacritic-indifferent) in the bibliography of The Cambridge Grammar of Medieval and Early Modern Greek (which occupies pages xxxvii–clxvi thereof) finds 201 instances. Some of those may be false positives, but that search would also have missed any instances hyphenated across a line break (byz-antin, byzan-tin, vel sim.) or in languages that spell the word bizant- or otherwise. My point is that Byzantine Greek is still a common term and one we should use.
0DF (talk) 09:23, 8 January 2024 (UTC)Reply

A bit of a nitpick, Byzantine Greek isn't any better than Medieval Greek as a label for the language after the fall of the Byzantine Empire. Strictly speaking it wasn't Byzantine Greek at that point, but Ottoman. But either term applies well to the majority of the period. — Eru·tuon 00:42, 9 January 2024 (UTC)Reply
I don't like the term Byzantine Greek because a naive reader could think it referred to a regional dialect rather than a chronolect. It would be easy for someone to think it referred to Greek as spoken in Byzantium as early as the time of Alexander the Great, and that it would not refer to Greek as written in Athens or Alexandria in AD 600. Also, 0DF, Holton et al. explicitly do not call the period from 1500 to 1700 medieval; they call it Early Modern Greek, just as we call the English of the same period Early Modern English. Wiktionary already uses 1453 as the border between grc and el; there's no reason separating grk-gkm out from grc should entail shifting the starting date of el later than it currently is. —Mahāgaja · talk 08:02, 9 January 2024 (UTC)Reply
I don't have much of a stake in this but I also favour Medieval Greek, though I wouldn't be opposed to having Byzantine Greek as an etym-only language attached to it. Theknightwho (talk) 08:43, 9 January 2024 (UTC)Reply
Side issue: if we split Medieval from Ancient, I suppose the Byzantine flag which is currently used for Ancient Greek in the "Add country flags next to language headers" gadget will need to be moved to Medieval Greek, and Ancient Greek will either need a new flag or no flag. - -sche (discuss) 19:48, 12 January 2024 (UTC)Reply
Preferably none. —Mahāgaja · talk 22:23, 12 January 2024 (UTC)Reply

@Mahagaja, Erutuon, Thadh, since I do not see any more objections: _phase_1: I have already cleaned up Modern Greek etymologies involving gkm (need 70 more to do, also supplying sources, ipa etc), to be ready for the term Medieval instead of Byzantine. This is

These steps are for the name-change. If you provide permission and agree to upgrade, from grc, then _phase_2 from Module:languages/data/3/g to Module:languages/data/exceptional, the working alias gkm is already in place and I will be able procede with corrections for titles of Sections wherever needed, sources. etc. Especially where Modern etymologies need a Medieval lemma. Thank you for your help. ‑‑Sarri.greek  I 10:56, 3 February 2024 (UTC)Reply

There are objections. I would like to add that I too oppose renaming from Byzantine Greek or extending its time frame past the 15th century. Nicodene (talk) 02:17, 4 February 2024 (UTC)Reply
@Nicodene, I have suggested nothing about post 15th century = Early Modern Greek which we deal with in polytonic at el.wikt, not monotonic. But we are at _phase_1 now, which is to rename 'Byzantine language' to Medieval Greek. I am glad that you are interested in periodisation of Hellenic language; it is rare that non hellenists are interested or take time to study this. We can discuss it, if you wish at our Talk pages? Thank you ‑‑Sarri.greek  I 02:35, 4 February 2024 (UTC)Reply
(Why not here?)
I see. For the record I do support splitting it out of Ancient Greek, even if the (prescriptively correct, 'learned') inflections are going to be largely the same.
So far I don't see any real argument against the label 'Byzantine'. The point about political control is a bit spurious as the label 'Byzantine' is no way limited to the political level. It is civilisational.
The point about 'Byzantine Greek' being misinterpretable as 'the dialect of the colony of Byzantion' might be convincing if not for the unlikelihood of someone being simultaneously knowledgeable enough about history to even be aware of the (let's be honest) rather unimportant pre-Constantine city, yet also historically illiterate enough to be unaware of what 'Byzantine' means 99 times out of 100. Nicodene (talk) 02:58, 4 February 2024 (UTC)Reply

──────────────────────────────────────────────────────────────────────────────────────────────────── @Sarri.greek: Respectfully, I think you're being too hasty with this. I acknowledge I've been slow to respond; that has in large part been due to my work researching Atticism (see it, Citations:Atticism, and some of the word's relations) in connection with the more substantive question of whether there is value in splitting gkm from grc. My understanding of that matter is more-or-less in line with this paragraph from the website for Trinity College Dublin's 2024 International Byzantine Greek Summer School (IBGSS):

Byzantine Greek is the dominant form of Greek written during the Byzantine Empire (AD 330–1453). The spoken language changed significantly in this period and came close to Modern Greek, but most Byzantine authors use conservative forms of Greek that looked back to Classical Attic, the Hellenistic Koine and Biblical Greek. Therefore much of the vocabulary, morphology and syntax of Byzantine Greek are not significantly different from Classical Greek, which makes this course a suitable preparation also for reading Classical literature and the New Testament.

But to the matter of the nomenclature: I had previously been arguing that Byzantine Greek is just as good a term as Medieval Greek, but it appears that they may not be entirely synonymous. Please see the quotations I've collected at Citations:Medieval Greek. You'll see that Evangelinos Apostolides Sophocles uses the term Byzantine Greek (for 330–1453) and remarks that “if the expression Mediæval Greek is to be used at all, it should be restricted to the language of [the second epoch of the Byzantine period]” (622–1099), whereas Irach Jehangir Sorabji Taraporewala states that “Byzantine Greek is a direct development from the literary dialect of the second transition period [300–600]” but that “[l]iterary Mediaeval Greek [1000–1450] is a development of the colloquial of the previous (Neo-Hellenic [= Byzantine Greek]) period [600–1000]”; those two sources directly contradict on the details, but they both distinguish the two chronolects. Edward Augustus Freeman speaks explicitly of “a literature, mediæval Greek or Romaic, as distinguished from Byzantine” and the writer for UNESCO discusses in a single sentence borrowings into “Byzantine Greek”, “mediaeval Greek”, and “Neo-Greek”; they appear to have particular time periods in mind, but I'm not sure what they are. And George Leonard Huxley refers to “Byzantine Greek” and “mediaeval Greek language and literature” in consecutive sentences, presumably synonymously, but not obviously so. Many more sources use both terms within the same work, without it being clear whether the terms mean different things or whether they're making a distinction without a difference. Can you explain these distinctions? Are they valid? If not, why not? If so, do you propose more than one offshoot to grc? If not, why not? If so, how many, and what should they be?
@Erutuon: I would argue that, in the same way that Greek writers contemporaneous with but geographically outside the bounds of the Byzantine Empire may nevertheless conform to Byzantine literary norms, Greeks writing after the Empire's fall may, from inertia or nostalgia, also conform to Byzantine literary norms, despite the change in their political context. By contrast, the Middle Ages are strictly chronological and have an exact terminus in the 1453 fall of Constantinople.
@Mahagaja: In my experience, Byzantium is used far more frequently to refer to the Byzantine Empire than it is to refer to the city; most people are unaware that the usage is originally a synecdoche and, whilst a lot of people know Istanbul used to be called Constantinople, far fewer know that Constantinople used to be called Byzantium (and fewer still know that Byzantium used to be called Lygos, but I digress). As such, I don't think that it is at all likely that a naïve reader would make that mistake. A mistake I know some people make, however, is with the qualifiers High or Upper and Low or Lower in geographical and geographically-based terms like Upper Egypt vs. Lower Egypt and High German vs. Low German, with High and Upper mistaken to mean "north(ern)" and Low and Lower used to mean "south(ern)"; I assume the confusion arises from the conventional orientation of maps in the Anglosphere. Despite that confusion, I would not, and I doubt you would, advocate replacing those terms with ones less susceptible to such naïve confusion. For another example, I'm sure a naïve reader could mistake Andalusian Arabic for Arabic spoken in the (present-day) Spanish region of Andalusia; the synonym Moorish Arabic is not susceptible to that confusion, so should we use that instead? There are other confusables as well, I'm sure. ⸻ Re Holton et al., I know they don't call Greek 1500–1700 "Medieval"; the fact that I quoted above a paragraph of theirs that ends "Early Modern (EMG) from about 1500 to 1700" should make that clear. My meaning was that Holton et al. are treating Greek 500–1700 as a single chronolect, which they call "Medieval and Early Modern Greek" and which Kriaras calls Μεσαιωνική Ελληνική (Mesaionikí Ellinikí). Holton et al. make a point of saying that their “system of periodization…is not based on external criteria” and that their “criteria are instead internal ones, based on clusters of important linguistic changes that [they] see as occurring around 1100, 1500 and 1700”. If we did the same, that might indeed entail shifting the starting date of el later than it currently is.
@-sche: I don't have country flags beside language headers turned on and neither am I inclined to turn them on, but if you're interested in having them, you could use the Argead star (commons:File:Vergina Sun WIPO.svg) for Ancient Greek; the English Wikipedia uses that image in its country infoboxes as the flag of the Empire of Alexander the Great, as well as in many other places.
@Nicodene: I largely agree with you, but if we're going to split out gkm, wouldn't it be better to give the inflections that show the changes taking place between Ancient and Modern Greek? Wouldn't it be rather redundant if they had the same inflectional information as that given in Ancient Greek entries?
0DF (talk) 03:46, 4 February 2024 (UTC)Reply

More than one set of inflections could be shown - the learned and Atticising versus the humble and 'demotic', at least by the time of the Digenes Akritas. Or, working with one set of inflection tables, cases or endings falling out of vernacular use could be placed in brackets with an explanatory note regarding register. Apart from that there would be differences in phonology and in various cases semantics as well. Nicodene (talk) 03:56, 4 February 2024 (UTC)Reply

──────────────────────────────────────────────────────────────────────────────────────────────────── @Nicodene: To give us all some idea of the kind of inflectional variability we're dealing with, I added a table to βαθύς (bathús) of its Byzantine forms. There's already a lot there, but that's an underrepresentation, if anything. Annoyingly for our purposes, Holton et al. specifically omit the dative from their paradigms, despite the fact it occurs:

Nominative, genitive and accusative cases continue to exist in LMedG and EMG. The dative case, however, had gradually disappeared from the spoken language during the first millennium and its main functions were reassigned (see Humbert 1930, Lendari/Manolessou 2003, Horrocks ²2010: 183–5, 284, Holton/Manolessou 2010: 546–7). Nonetheless, datives survive in many of the written texts that this Grammar is based on, though mainly in documents and other texts in mixed or higher registers, and they may have a range of inherited functions. Particularly common are datives governed by the prepositions ἐν and σύν. Because the dative had ceased to be part of the spoken vernacular by about the 10th c., dative forms are not included in the paradigms set out in the chapters that follow.¹
¹The only exception that has been made is the dative reciprocal pronoun ἀλλήλοις, on the basis that its occurrence, which is quite rare, seems to be as much a lexical survival as a morphosyntactic feature (see 5.12).

—volume II, § 1.1, pages 241–242

and even in novel formations:

In addition to instances like the above, which could be deemed grammatically “correct” (i.e. in accordance with AG morphology and syntax), we also find dative forms with innovative phonology, stress or morphology, or new lexical items: [τουποθεσίᾳ, Ρεθέμνει, βοθρακοῖς, παρρησιᾷ, Ὀγκριᾷ, Ἀράβοις, ἑταιρίδαις, νήσαις, συνπάσοις, Ἑλλήνοις, δοράτοις, ὀξέοις, ἐμπιστευτηόδες, ἐμπιστευτιόδαις (toupothesíāi, Rethémnei, bothrakoîs, parrhēsiâi, Onkriâi, Arábois, hetairídais, nḗsais, sunpásois, Hellḗnois, dorátois, oxéois, empisteutēódes, empisteutiódais)]
Of particular interest is the use of dative forms for loanwords: [σούγλᾳ, μπασταρδικῷ, σερραγίῳ, ὀντάσι (soúglāi, mpastardikôi, serrhagíōi, ontási)]

ibidem, pages 242–243

Moreover, Holton et al. exclude Atticist texts entirely (“the texts on which this Grammar is based – i.e. texts that are not systematically archaizing” — ibidem, page 243); accordingly, if we're to produce accurate and (in aspiration) exhaustive inflection tables, we shall have to supply the missing Attic forms and datives.
Holton et al. mention the dual number, as far as I can tell, exactly twice in their entire four-volume grammar:

The AG reciprocal pronoun (“one another”) had dual (gen. ἀλλήλοιν) and plural (gen. ἀλλήλων) numbers, and was declined for gender and case (genitive, accusative and dative).

—volume II, § 5.12, page 1,183

The London manuscript can be consulted at http://www.bl.uk/manuscripts/. Ms. Athous Pandel. 538, edited by Vasileiou (2003) has the unusual form εγκρεμίζεσθον Varl. & Ioas. (Pantel.) 303, which is unlikely to be an archaic dual (as the subject is 2 sg.), and probably a writing mistake for ἐγκρεμίζεσουν.

—volume III, § 4.3.1.2, page 1,551, footnote 54

so I don't know whether to infer from their silence that the dual saw no use in Byzantine Greek, or that its use was resticted to Atticist texts, and that it is for that reason that Holton et al. make no mention of it.
Certainly, we can't rely on Holton et al. alone to guide what we do about Byzantine Greek. Nevertheless, that table at βαθύς (bathús) is something concrete to work from. 0DF (talk) 23:37, 7 February 2024 (UTC)Reply

@0DF The effort is quite admirable, thank you. I can't imagine it is sustainable across hundreds of entries, so generating variants with an automated template would be the long-term approach. The tables would probably include a prominent disclaimer like 'not all forms necessarily attested'. The automated romanisation can probably be prevented somehow to alleviate crowding. Nicodene (talk) 23:19, 8 February 2024 (UTC)Reply
@Nicodene: I agree that the transliterations take up too much space and that they probably are best removed by default from Byzantine inflection tables. I also agree with including a prominent disclaimer of the kind you describe. I got the forms of βαθύς I added to that table from Holton et al., volume II, pages 746–757, wherein βαθύς serves as their paradigm for “Adjectives with Originally 3rd-Declension Endings” (§ 3.3), specifically “Oxytone Adjectives in -ύς” (§ 3.3.1). On the basis of Holton et al., volume I, page xxxiii (“When whole words are enclosed in brackets in the tables, the forms in question may reasonably be assumed to have existed, but no example has been located in the LMedG and EMG texts examined, e.g. (μιανοῦ), (χρυσοῦ).”), I presented each form which they give in parentheses instead with a preceding asterisk, as is standard in historical linguistics. Despite there being many forms already, the forms given by Holton et al. are an under-representation, if anything (Holton et al., volume I, page xxxvii, prefacing the Bibliography: “Classical, post-classical, early medieval and other learned Byzantine texts are not included below.”; volume II, page 746, below the synoptic table for βαθύς: “Residual [scil. inherited Attic] forms, e.g. βαθέος, βαθεῖς, are not included in the above table, but will be discussed below where relevant.”; ibidem, page 242: “dative forms are not included in the paradigms set out in the chapters that follow”; ibidem, page 243: “the texts on which this Grammar is based – i.e. texts that are not systematically archaizing”); the forms Holton et al. give are only those non-dative forms which occur in lower-register texts written 1100–1700: a rather limited subset of the “Medieval and Early Modern Greek” whole that you'd reasonably expect that they're trying to describe. On the other hand, if we are to adhere to the 1453 cut-off for Byzantine Greek, we need to be careful to exclude those forms that occur only in texts from the seventeenth, the sixteenth, and/or the latter half of the fifteenth centuries.
I am increasingly recognising that inflection tables for Byzantine Greek terms ideally require certain features that are different from those that befit inflection tables either for Ancient Greek terms or for Modern Greek terms. One of those features would be the indication of pronunciation for each form, because the vocalic mergers of Byzantine Greek render its graphemes surjective upon its phonemes (i.e., with the exception of the bijective α/a/ and ου/u/, each vowel may be written in multiple ways, namely: αι, ε/e̞/; ο, ω/o̞/; ει, η, ι/i/; οι, υ, υι/y/, then, upon the completion of iotacism in the eleventh century, /i/) and because the representation of significant phonological processes that Byzantine Greek underwent (synizesis and various deletions) are only haphazardly reflected in spelling; this would call for a tie-in between a module such as that behind {{grc-IPA}} on the one hand and modules such as those behind {{grc-decl}}, {{grc-adecl}}, and {{grc-conj}} on the other. Another desirable feature, in the light of Holton et al., volume I, page xxxiii (“smaller tables classify the allomorphs as ‘General’ (if they occur widely in the texts examined), ‘Restricted’ (if they are found in only part of the period covered by the Grammar, or only in certain areas or certain types of text), or ‘Rare’ (if their occurrence is very limited)”), would be the means seamlessly to mark each form for its respective period, locale, genre, register, and frequency. The need for bespoke inflection tables, distinct from those designed for Ancient and Modern Greek, is an infrastructural and thematic argument in favour of treating Byzantine Greek terms separately from Ancient Greek terms on the one hand and Modern Greek terms on the other. 0DF (talk) 22:51, 6 March 2024 (UTC)Reply

phase 1

[edit]

notifying administrators for grc @Mahagaja, JohnC5, Erutuon also @Thadh, Theknightwho, Benwing2 More than one month has passed. Am I to procede with _phase_1:rename Byzantine to Medieval? Do I have permission by administrators to start? Would an admin help with Module:etymology languages/data to do with "Medieval Greek", and aliases = {"Byzantine Greek"}, ? (because I am not an administrator, I cannot intervene)? Thank you.
On some other points: (I did not expect “σχοινοτενεῖς, prolix” discussions in this page, but at the corresponding Beer talk. Nevertheless, I am obliged to respond and clarify:)

  • Early Modern Greek: @Mahagaja, yes, the phase 1453-1669 (termination of Cretan literature) is Early Modern Greek (πρώιμη νεοελληνική, interchangeably 'Late Medieval' (όψιμη μεσαιωνική) why? _1. because of its retained mediaevalisms many prominent linguists use interchangeably the terms 'Late Medieval' and 'Early Mod.Gr' -we can discuss further-. And mainly _2. I would not propose a split of Modern Greek or further splits in general. We study it under Med.Gr. because its original script is polytonic, all modules, translit, ipa, etc are already in place -probably some modifications, or a few templates will develop-.
  • Period versus Style/Register. Hellenistic Koine -or even Attic dialect- is used by authors long after the 6th century (even until the 20th century in the form of Katharevousa). The typology (inflections etc) of their words are as in Grammatical rules of Ancient.Gr. (a label like learned may be used for some medieval Koine-style neologisms that might interest Med. We will not duplicate existing Ancient Greek inflections.
  • Polytonic original script. Please note, that greek conservative linguists of past century would 'correct' forms at their editions according to Anc.Gr. rules, while the progressive ones (who were prosecuted during these polemic times -Trial of Accents) like Kriaras, at some point switched to monotonic. Nowadays, it is inconceivable to change the script of an original source at a critical edition. Please note, that everything greek up to 1982 was written polytonic. Nowadays, everything, ancient too- might be seen (e.g. at internet, new books) written monotonically, beacuse it is easy-to-type/cheap-to-print.
  • Polytypy in Hellenic: @Nicodene, yes, it is a fact. It is a stubborn language: flactuation in suffixes runs through all grk. See modern verbs like εκλέγω#Conjugation, Template:el-conjug-'ακούω'. One cannot avoid Modern Greek inflections because of their too many allomorphs -and see how much is omitted! Appendix:Greek_verbs#Omitted.-
  • Will Medieval Greek acquire inflections at once? No. It takes some time for discussions, proposals, trials, to crystallise a method. A Working/Trials/Feedback.for.Med page would be a good way to start. (please check some first attempts at wikt:el:παλληκάριον. wikt:el:σκοῦπα, wikt:el:Template:gkm-κλίση-ουσ, a neologism but learned = in the ancient fashion at wikt:el:ἀπόκρεως. Med.Greek does not have Dative. Our learned friend's 0DF table at βαθύς, is a fusion of Koine datives into Med. for the difficult categories -ύς, -ής of adjectives with lots of learned forms preserved. We do not add Dative at Mod.Greek either (Mod.terms with dative@el.wikt but not at its Tables.).
    I have not proposed any trial for an Appendix of clitic paradigmata and/or tables with the distinction of 'expected versus attested' forms yet.

Why try to formate a neoteric Section 'Medieval Greek' here rather than at el.wikt? Because here, there are so many learned and informed editors: experts -some, professionals-, who can help with their bibliography, their valued opinion in this project. At el.wikt I am totally alone in this project and I found it exhausting to update and patrol, make trials, have no feedback, no help for Med.Greek. All experts are assembling here. This wiktionary is the avant-garde of all wikts.
Admins! Please help to begin this project. Give me permission to start with _phase_1:rename Byzantine to Medieval. Allow _phase_2 (upgrade from etymol.language to an autonomus section), so that I can use the title Medieval Greek for poor Τζέτζης who has been waiting for this for a long time. Help, please, please, allow this long phase of greek at en.wikt to exist! Thank you. ‑‑Sarri.greek  I 04:58, 9 February 2024 (UTC)Reply

@Sarri.greek I tried to read through this discussion. It is confusing because there are two different issues (rename Byzantine -> Medieval, and split out Byzantine/Medieval from Ancient Greek). For issue 1 (rename), it looks like maybe two people (User:0DF and User:Nicodene) disagree with the name change and up to four are in support (User:Sarri.greek, User:Thadh, User:Mahagaja and User:Theknightwho). This is possibly enough for a rename but I feel uncomfortable without a clearer consensus, esp. given that I'm not sure whether User:Fay Freak opposes the name change and/or split (their prose is, as is typical, somewhat impenetrable). User:Erutuon and User:-sche seem willing to accept one or both changes but without a strong opinion. For issue 2 (split), it looks like User:Sarri.greek, User:Thadh, User:Mahagaja and User:Nicodene are in favor of a split, while User:0DF is undecided, User:Fay Freak possibly opposes (?), and User:Theknightwho has not expressed an opinion. Can all the people I just named let me know (1) did I get your opinion correct on both issues and (2) if not, what is your opinion, both about issue 1 (the rename) and issue 2 (the split)? Benwing2 (talk) 05:29, 9 February 2024 (UTC)Reply
Yes, support. Thadh (talk) 11:19, 9 February 2024 (UTC)Reply
Support — long overdue!   — Saltmarsh🢃 06:26, 9 February 2024 (UTC)Reply
@Benwing2: I was more warning with respect to the ambiguous consequences, without obstructing. If people are willing to invest work for a split, it is not my due to oppose it, since I do not expect to do Greek in the medium-term anyway, as it is low on my priority list, relatively to other interesting languages – I have not even followed the forthgoing of the discussion and don’t know what you all exactly intend, especially with respect to the 300–600 time, when I have derived Arabic terms from Byzantine Greek when I am not really sure whether they are from before Islam or right after it or a century later etc. and it might be split to Late Koine and Medieval Greek, which I am not particularly keen to revisit either and Greek editors might be good enough to pinpoint. Fay Freak (talk) 07:07, 9 February 2024 (UTC)Reply
@Fay Freak OK thank you, that clears things up. Benwing2 (talk) 07:12, 9 February 2024 (UTC)Reply
Thank you @Fay Freak for not opposing. Indeed the period of Late Koine 300-600 (600 accepted as turning point with original-Greek parts of Novellae at Iustinianos legal reforms, -langugagewise, while history has a different periodisation-), is under the jurisdiction of Ancient Greek administrators. As seen at {{R:DGE}} and Bailly2020: these dictionaries extend to authors of up to 6-9th, 10th, 13th centuries, when such authors use Koine as high register. ‑‑Sarri.greek  I 07:36, 9 February 2024 (UTC)Reply
@Benwing2, sorry to bother you again: what is going to happen? Would you like me to call more people to vote? Mr @A. T. Galenitis who edits all phases of Greek including Medieval is away. As you see, not many are interested in Greek. But, I am, I am: I am willing and available! Every year, less and less people will be voting. In the end, I will be the only voter! I am awaiting and anxious to start editing. Thank you. ‑‑Sarri.greek  I 18:26, 15 February 2024 (UTC)Reply
@Sarri.greek I'd like to give it a couple of weeks. As it is looking, the split seems pretty clear and the name change is leaning towards, although User:0DF has not added their votes yet. Note that in general you should not canvas votes, i.e. ping people specifically for voting purposes esp. if you believe they will vote in a particular way that you desire. Benwing2 (talk) 00:51, 18 February 2024 (UTC)Reply
@Benwing2, of course, of course! people vote if they agree, not because i called them. I am just informing people with whom we have been discussing about this for more than a year, people that have -or want to- edit Greek. Just Mr A. T. Galenitis, an excellent editor, who supports strongly. But they do not come very often, and they do not get messages except from their Talkpages. I always check Related changes for el and for grc, and I am sorry to say, that there are very few people interested. Perhaps some editors doing very many languages, create some exotic lemmata. Thank you very much, I can wait, I know how busy you are. ‑‑Sarri.greek  I 08:53, 18 February 2024 (UTC)Reply
Thank you very much @Sarri.greek for bringing this into my attention and for putting once again the effort for this very worthwhile change. Indeed, I have been rather inactive lately, but as the creator of many gkm lemmata I am adamant on the need for this split with arguments which have been repeated multiple times. I would be more than happy to put the required work for my own lemmata and create more while at it. Regarding the naming, both approaches have some historical value (with varying power of persuasion) to them yet from a functional point of view it doesn't make much sense to oppose the recent literature and main body of research within the field where "Medieval Greek" has become dominant (vide Holton's et al. recent monumental Cambridge Grammar of Medieval and Early Modern Greek) A. T. Galenitis (talk) 17:09, 16 March 2024 (UTC)Reply
I am sorry that I write so schoenotenically; I have difficulty with concision. I'm also sorry that I have taken so long to respond; I have done a lot research regarding this topic since η Δις Σαρρή first petitioned the Beer parlour for these changes. As you'll see below, whilst I still oppose the change of this chronolect's name, I have come to support its split into a lect with its own L2 header, at least in principle. I feel I should explain my position, especially regarding my “concern…that η Δις Κατερίνα Σαρρή has a different understanding of what this vote endorses from the understanding of the other voters here”.
Δις Σαρρή· When you write things like:
I am left with the impression that you want the label “Medieval Greek” to refer only to the relevant period's basilect of the Greek diglossia. If that is your position, what then happens to the acrolect of that period? Does it remain part of Ancient Greek (grc)? And if so, should Katharevousa be treated similarly? Ultimately, is post-Classical Greek to be split primarily by register? Perhaps I've misinterpreted you, but if so, please clarify your position. If this is your position, you should make it explicit, so that everyone knows exactly what's being voted on. Perhaps this is what Fay Freak meant by “ambiguous consequences”. I could support either one, be it a split by period or by register. Here's a litmus test: In what variety of the Greek macrolanguage was the Suda originally written?
What I could not support is a split by period that excludes from Byzantine Greek its higher-register elements. You seem to want to do that when you say “Med.Greek does not have Dative.” and “We do not add Dative at Mod.Greek either”. It is untrue that Byzantine Greek does not have the dative; on the contrary, as Staffan Wahlgren writes, “The most important observation…is that the dative is so surprisingly alive and productive in such a wide range of Byzantine texts.” (Wahlgren 2014: Abstract) Even Holton et al. (2019: II, 241–243), whom I've already quoted at length above, acknowledge that “datives survive in many of the written texts that th[eir] Grammar is based on” and that “[p]articularly common are datives governed by the prepositions ἐν and σύν”, before recording their decision nevertheless to exclude all datives (except ἀλλήλοις) with the single sentence “Because the dative had ceased to be part of the spoken vernacular by about the 10th c., dative forms are not included in the paradigms set out in the chapters that follow.” — Blink and you'll miss it! And those datives aren't all just learned preservations; especially noteworthy is the Early Modern Cretan Greek noun ἐμπιστευτιός (empisteutiós), which is one of the “[w]ords belonging to [a] paradigm [which] have only been found in LMedG and EMG texts from Cyprus. In all cases these words are local variants of masculine words in -τής…. The earliest examples are from Assizes B (15th-c. ms).” (Holton et al. 2019: II, 451), and which has the dative plural form ἐμπιστευτηόδες (empisteutēódes) attested in a sixteenth-century text.
As a general concern, I think you lean on Holton et al. too much: their work has a far more limited scope than is immediately apparent. As Martin Hinterberger writes, despite the recent appearance of the Cambridge Grammar of Medieval and Early Modern Greek, it is not the “comprehensive linguistic description of written Byzantine Greek (in all its multifarious variants) [which] remains one of the desiderata of Byzantine literary studies” (Hinterberger 2021: 21); in my opinion, though not (explicitly) Hinterberger's, Holton et al. have treated the Greek of 1100–1700 “as a degenerated, deficient form of classical Greek, [which they have ignored,] or as an immature form of modern Greek” (Hinterberger 2021: 37). We should not do the same.
I want to end this on a note of praise. I admire the enthusiasm and hard work you pour into this. If I have the effect of applying brakes, please understand that I do so only to ensure clarity prevails and that the best decisions are taken, even if it might not seem that way to you. I notice that you are writing a module to handle the declension of all Greek nouns. I think this is a worthwhile effort, and it has a precedent in Module:zlw-lch-headword. It would certainly be good to have a common theme for all Greek nominal declension, since that would avoid such aesthetically objectionable clashing as currently exists in Λεϊβνίτιος (Leïvnítios). Keep up the good work! 0DF (talk) 01:51, 24 March 2024 (UTC)Reply

Rename to Medieval Greek

[edit]
  1. Support ‑‑Sarri.greek  I 05:54, 9 February 2024 (UTC)Reply
  2. Support   — Saltmarsh🢃 06:26, 9 February 2024 (UTC)Reply
  3. Oppose - Byzantine is the more common term and no valid argument has been given against it. Nicodene (talk) 08:19, 9 February 2024 (UTC)Reply
    Thank you @Nicodene for your support for this language. Yes, the termByzantine is extremely common because we have Byzantine studies, Etudes Byzantines at Sorbonne, Byzantine Music, Byzantine Iconography, Byzantine Empire and so on. But I do not recall any language taking its name from an empire e.g. Roman Empire Latin, British Empire English? is there any example? Mandarin perhaps as non-linguistic term? The term was used pre-2000 influenced from the very common 'Byzantine' epithet. Greek linguists also used it, but later, preferred the term 'μεσαιωνικός, medieval. But, thanks anyway. ‑‑Sarri.greek  I 08:38, 9 February 2024 (UTC)Reply
    The actual comparison to *[British Empire English] would be *[Byzantine Empire Greek], which nobody says either. And it'd be strange to argue that British English, British music, and British art are all "named after an empire" just because there was also a British Empire. They're all named after Britain and the British people, just as all the things you mention are named after Byzantium and the Byzantines. Nicodene (talk) 09:08, 9 February 2024 (UTC)Reply
    @Sarri.greek: As Nicodene wrote, Byzantine Greek isn't named for the Byzantine Empire; rather, both are named for the Byzantines, who are named for Byzantium. Languages are usually named for people, places, or polities (and polities are usually named for either of the former). Because of what people and places can be named for, this can result in pretty weird language names. For example, Big Nambas (nmb) and Nez Perce (nez) are named for peoples with the same designation, and those peoples are named for their codpieces and misnamed for the Chinooks' nose piercings, respectively. Toponymically, East, South, and West Bird's Head are named for Bird's Head, a peninsula of Papua that looks, indeed, like a bird's head; I can only assume that Port Sandwich (psw) was named for the Vanuatuan coastal settlement that has since been renamed Lamap; and Western Desert (nine dialect codes) is named for desert areas in western Australia (chiefly Western Australia). Many creoles have strange names. Other language names are odd for etymological reasons; for example, Ukrainian (uk, literally “borderlandese”, althought this etymology is disputed) and Zamboanga Chavacano (cbk, literally “poor-taste mooring-place”). And then there are names that are picturesque, like Cœur d’Alêne (crd, literally “heart of awl”), Hill (mrj) and Meadow Mari (mhr), Large (hmd) and Small Flowery Miao (sfm), and Blue (hnj), Green (also hnj), and White Hmong (mww). By comparison, Byzantine Greek is not at all strange or particularly romantic (pun intended).
    I admit I got a bit carried away with the examples there. Sign languages are generally more clearly named for polities; for example, American (ase) and British Sign Language (bfi); compare the more obscure Maritime Sign Language (nsr). Dari (prs and gbz) supposedly derives from Classical Persian دربار (darbār, royal court) and one could argue that Dano-Norwegian is named for the political union Denmark–Norway. However, the language name most unambiguously named for an empire is probably Imperial Aramaic (arc), named for the Neo-Assyrian, Chaldean, and especially Achaemenid Empires. Finally, consider Ashokan Prakrit, which goes one step further by being named for a specific emperor, namely the Mauryan Emperor Ashoka the Great (regnavit circa 268–232 BC). 0DF (talk) 00:34, 7 March 2024 (UTC)Reply
  4. Support {{abstain}} Both names seem about equally common, and I don't really care which one we use. I'm not opposed to either name. Thinking about it some more, I've decided I prefer "Medieval". —Mahāgaja · talk 09:53, 9 February 2024 (UTC)Reply
  5. Support Thadh (talk) 18:32, 15 February 2024 (UTC)Reply
  6. Abstain {{support}} Following the contributions of user 0DF to the discussion, I also see the merit of the term Byzantine Greek. Most importantly, I understand that I require additional reading before coming to a final conclusion. For the time being, abstaining (i.e. agreeing with either terminology to be adopted). A. T. Galenitis (talk) 21:28, 21 March 2024 (UTC)Reply
  7. Oppose To avoid further perceptions of prolixity, I shall be terse:
    Reasons for “Byzantine Greek”:
    1. As I've argued before, the language should be called Byzantine Greek “because its production is inextricably linked to Byzantine civilization” (Hinterberger 2021: 22).
    2. Other things being equal, endonymy is desirable. However, ready apprehensibility by Anglophone readers often supersedes this consideration. The Byzantines usually called themselves Ῥωμαῖοι (Rhōmaîoi, literally Romans), their country Ῥωμανία (Rhōmanía), and their language Ῥωμαϊκή (Rhōmaïkḗ). English Romaic and Rhomaic exist, but I wager they're little-known, and likely to be mistaken as relating to Romani or Romanian. Ancient Greek Ἕλληνες (Héllēnes) exists, but is not specific to the Byzantine period, and “Hellenic Greek” Hellenistic Greek Koine Greek. There's Ancient Greek Γραικοί (Graikoí), but that's used for the macrolanguage “Greek”. There is marginal self-reference by Byzantines to their histories as Βυζαντιακαὶ (Buzantiakaì) and to themselves as Βυζάντιοι (Buzántioi), so “Byzantine Greek” is endonymic. By contrast, no people in the Middle Ages called themselves “Mediaeval” anything.
    3. “Byzantine” is a fairly familiar term to the average educated Anglophone. It is an epithet applied to a great many disciplines, journals, and phenomena pertaining to the empire of that name (v. e.g. [1], [2], [3]), the vast majority of the primary sources for which are written in Byzantine Greek. Cet. par., it is desirable that referents systematically related in such a manner should share a nomenclature. I doubt that those various disciplines would adopt the relatively cumbersome “Mediaeval Greek X” nomenclature to replace the relatively concise “Byzantine X” nomenclature, and it would be ungrammatical to do so in compound modifiers such as Serbo-Byzantine.
    4. The alphabetical and chronological orders of the three chronolects of Greek (that are written in the Greek alphabet) are the same. For any word homographic in the three chronolects — many (most?) consonant-initial ((pro)par)oxytones — this allows one to trace its development from Ancient Greek, through Byzantine Greek, and all the way up to the Greek of the present day by scrolling down the page and reading in order: a boon for comprehension. This serendipity would be lost if Byzantine Greek were renamed Mediaeval Greek.
    Reasons against “Mediaeval Greek”:
    1. Mediaeval means “of or pertaining to the Middle Ages (Latin Medium Aevum)”, but those Middle Ages were not universally significant. Traditionally, the Middle Ages are regarded as beginning in 476 with the fall of the Roman Empire in the West and as ending in 1453 with the fall of the Roman Empire in the East. Lingustically, the former had a considerable impact on Medieval Latin: the dissolution of Roman institutions, radical decentralisation, vernacular drift, development of feudalism, and immigration of unassimilated peoples lead to linguistic innovations and borrowing on a massive scale; often regarded as corruptions, various attempts were made to restore Classical Latinity, as in the Carolingian Renaissance, but these saw only partial success until the triumph of humanist Ciceronianism in the Italian Renaissance. Thus, Mediaeval Latin was succeeded by Renaissance Latin and then by New Latin. This makes the epithet “Mediaeval” highly suited to that chronolect of Latin. By contrast, Byzantine Greek saw no such dissolution, decentralisation, or feudalism, at least not until the Fourth Crusade; for Greek, the fall of 1453 was vastly more consequential than the fall of 476 — the opposite was true for Latin. This makes the epithet “Mediaeval” highly unsuited to that chronolect of Greek. For more, see Kaldellis 2019: ch. 4 (“Byzantium Was Not Medieval”), pp. 75–92.
    2. The adjective has four justifiable spellings: mediaeval, medieval, mediæval, mediëval. Byzantine has only one. Cet. par., that a term's spelling be uncontested is desirable.
    3. The English Wikipedia has three articles entitled “Medieval X” for languages (Medieval Greek, Hebrew [4th–19th CC.!], and Latin); in other articles I saw, they give Medieval Catalan as a synonym of Old Catalan, Medieval Spanish and Old Castilian as synonyms of Old Spanish, and for Galician–Portuguese they give the five synonyms Medieval Galician, Medieval Portuguese, Old Galician, Old Galician–Portuguese, and Old Portuguese. That would give the impression that, in language names, medieval and old are synonymous; not so Medieval Greek, which has the synonym Middle Greek (alongside Byzantine Greek and Romaic). Middle and Old are much more common as chronolect descriptors than Medieval (CAT:en:Languages has 2 members named “Medieval X”, 25 named “Middle X”, and 64 named “Old X”). AFAIK, no one calls Byzantine Greek “Old Greek”. IMO, “Middle X” only really works for languages with a threefold chornolectal division designated “Old–Middle–New X” or “Old–Middle– X”. Greek, however, has a four- or even six-fold division — Mycenaean–Ancient–Byzantine–Modern or Mycenaean–Homeric–Classical–Koine–Byzantine–Modern — one would be hard-pressed, especially in the latter, to describe the Byzantine chronolect as being in the “Middle”.
    4. Pace Κ. Α. Τ. Γαληνίτη, it is not at all apparent that the term “‘Medieval Greek’ has become dominant”, and contra Holton et al., here are uses of Byzantine Greek from three authors, with many more available. The ISO received three proposals in 2006–2009 to create new codes for Medieval Greek gkm, Ecclesiastical Greek ecg, and Katharevousa Greek elr; last year, the ISO rejected them all, partly due to “the lack of consensus among them” (p. 2). It is noteworthy that § 4 of the original change request for Medieval Greek gkm gave the language's name as “Middle Greek” and said of it that “[t]he language is distinct from Ancient Greek in vocabulary, phonology, and grammar, and displays linguistic attributes which are characteristically Byzantine and uncharacteristic of Ancient Greek” [my emphasis], whereas the first page of the request for the new language code element gkm gave, as the reason for preferring the name “Middle Greek” over the autonym “Romaiki” and the alternative names “Byzantine Greek” and “Medieval Greek”, that “Middle Greek” was the “[m]ost common amongst scholars” (!); it's only because Anastassia Loukina emailed SIL International to write that “the more common term used in Greek linguistics to refer to this stage of Greek is ‘Medieval Greek’ rather than ‘Middle Greek’” that the proposal was changed (by the ISO?) to one for “Medieval Greek”, although Δις Loukina merely asserted her claim, not citing anything. Is there any real evidence that any one term predominates?
    Alas! So much for avoiding prolixity…
    @A. T. Galenitis, Benwing2, Erutuon, Fay Freak, Mahagaja, Nicodene, Saltmarsh, Sarri.greek, -sche, Thadh, Theknightwho: For those of you who have voted or who intend to vote, I humbly request that you consider what I've written. For those of you not voting, I ping you in case you're interested and because you've taken part in this discussion before. To all of you, I apologise for the length of this post; I seem not to be very good at brevity. 0DF (talk) 07:37, 20 March 2024 (UTC)Reply
    I've read all you wrote above but am not convinced by it, certainly not enough to change my vote. Points 2 and 4 pro Byzantine strike me as irrelevant, and point 3 sounds like it could equally be an argument to use the term "Anglo-Saxon" instead of "Old English", which I trust no one in this day and age still wants to do. None of the arguments contra Medieval strike me as particularly strong. —Mahāgaja · talk 07:56, 20 March 2024 (UTC)Reply
    And what argument for 'medieval' struck you as strong? Nicodene (talk) 08:27, 20 March 2024 (UTC)Reply
    I think somewhere in this discussion or an earlier one I said I prefer "medieval" because it makes it clear that the lect in question is a chronolect, not a regiolect. —Mahāgaja · talk 08:37, 20 March 2024 (UTC)Reply
    Wut, even if Greek writing is located far in in Arabia or Ethiopia, I still call it Byzantine Greek provided it matches the period. Fay Freak (talk) 11:23, 20 March 2024 (UTC)Reply
    Right, but calling it Medieval Greek makes it clearer that what's relevant is the time period, not the location. —Mahāgaja · talk 11:39, 20 March 2024 (UTC)Reply
    The case can be made that 'Medieval' is chronologically explicit, but it is simply unimaginable that anyone could know the term Byzantine yet mistake Byzantine Greek for a regional label. Nicodene (talk) 11:59, 20 March 2024 (UTC)Reply
    I don't find that unimaginable at all. It's certainly more plausible than someone thinking Byzantine Greek referred to overly complex or intricate Greek, but we can't entirely rule that interpretation out either. —Mahāgaja · talk 12:56, 20 March 2024 (UTC)Reply
    It would require someone who knows about the city of Byzantium and yet is unaware of the existence of the Byzantine Empire, in other words a person that does not exist. As for the other potential sense of ‘Byzantine’, that is simply not an argument as it applies just as well to someone mistaking ‘medieval Greek’ as referring to a brutal or savage dialect. Nicodene (talk) 13:16, 20 March 2024 (UTC)Reply
    Was Byzantine Greek also used outside the borders of the Empire? —Mahāgaja · talk 13:32, 20 March 2024 (UTC)Reply
    Certainly, as it doesn't have to do with borders either.
    If anyone has ever actually used ‘Byzantine Greek’ to distinguish one variety of Greek from another based on region or geopolitical control I've yet to see any sign of it. Nicodene (talk) 13:59, 20 March 2024 (UTC)Reply
    So the language in question is used outside of the geographical area denoted by "Byzantine" but not outside of the chronological era denoted by "Medieval". That's why I prefer to call it Medieval Greek. —Mahāgaja · talk 14:11, 20 March 2024 (UTC)Reply
    ‘Byzantine’ is not a geographical area.
    The one, and only, valid point in this is as stated above - that ‘Medieval’ is more chronologically transparent. Nicodene (talk) 14:21, 20 March 2024 (UTC)Reply
    @Mahāgaja: Thank you for reading my rather overlong post. Responding to your points:
    1. Do you regard point 2 pro Byzantine as irrelevant because you disagree with the statement “other things being equal, endonymy is desirable”? If so, I understand you, since that statement is my axiom for that point. Otherwise, I would appreciate a rationale.
    2. I don't see how you could call point 4 pro Byzantine irrelevant for this project. In a dictionary of Byzantine Greek only, it indeed would be irrelevant, but since that's not what Wiktionary is, it's simply an error to call that point “irrelevant”.
    3. AFAICT, “Anglo-Saxon” — itself a compound modifier — is on all fours with “Old English” in terms of its suitability for forming compound modifiers. That seems like a disanalogy to me.
    4. Whereas “mediaeval” is traditionally clear vis-à-vis period (viꝫ 476–1453), a lot of usage muddies the waters. Jacques Le Goff throughout his career (or at least from 1977 onward) sought to extend the Middle Ages into “the eighteenth century, when, he believe[d], the European nation-states properly emerged” (Kaldellis 2019: ch. 4, p. 77). And conversely, some scholars of chronologically preceding and succeeding fields annex parts of the Middle Ages to their own periods: “The field of ‘late antiquity’ has been pushed by some to the early Carolingians (i.e., to the ninth century), whereas at the other end some historians of early modernity have reached back to claim everything after the twelfth century, when the European economy embarked upon a trajectory that would arc to modernity. With late antique and early modern historians claiming so much territory, that leaves only a rump Middle Ages squeezed around the turn of the millennium. [¶] Byzantium has little standing or stake in this debate.” (ibidem: pp. 77–78)
    0DF (talk) 15:27, 20 March 2024 (UTC)Reply
    I do disagree with the statement "other things being equal, endonymy is desirable". At Wiktionary, as at Wikipedia, what matters is what a language is commonly known as in English, not what its native name is. That's why we call German German, not Deutsch, and Dutch Dutch, not Nederlands. And no ancient language was known to its speakers with modifiers like "Old", "Ancient", "Classical", "Primitive" and so forth. And you yourself point out that Greek speakers of the era under discussion generally referred to their languages as (the Greek equivalent of) Romaic; but absolutely no one here is suggesting that Wiktionary's canonical name for this language should be Romaic. So that point is actually not an argument in favor of Byzantine at all; it's an argument against both Byzantine and Medieval. Point 4 is irrelevant because that's simply not a consideration we have ever had or ever should have. The names "Old Irish", "Middle Irish" and "Irish" are in reverse alphabetical order; so what? —Mahāgaja · talk 15:57, 20 March 2024 (UTC)Reply
    @Mahagaja: Re “what matters is what a language is commonly known as in English”, I already wrote that “ready apprehensibility by Anglophone readers often supersedes th[e endonymy] consideration”, so we don't disagree on the overriding importance of that. However, given a choice between two English names identical in their recognisability (which is an instance of that “other things being equal” qualifier), would you really maintain that endonymy wouldn’t even be a consideration to break the tie? That's not a strictly irrational position, but I would be surprised if you held it. Anyway, with regard to RomaicByzantineMediaeval, my point is that Romaic would be best in terms of endonymy, but its obscurity disqualifies it; whereas Byzantine and Mediaeval are comparably familiar to educated Anglophones, so Byzantine’s endonymy can break that tie. Is my position on this point any clearer now? That “Point 4” is nothing other than a consideration about page layouts which has some bearing on this issue; I'm not saying that it's a be-all and end-all, just that it's a relevant consideration, even if other considerations are primary. 0DF (talk) 00:09, 21 March 2024 (UTC)Reply

Split from Ancient Greek

[edit]
  1. Support, as creator of this proposal ‑‑Sarri.greek  I 05:54, 9 February 2024 (UTC)Reply
  2. Support   — Saltmarsh🢃 06:26, 9 February 2024 (UTC)Reply
    Thank you @Saltmarsh, my guru, mentor and administrator at Modern Greek! I promise to work as you have taught me. ‑‑Sarri.greek  I 06:33, 9 February 2024 (UTC)Reply
  3. Support Nicodene (talk) 08:13, 9 February 2024 (UTC)Reply
  4. SupportMahāgaja · talk 08:26, 9 February 2024 (UTC)Reply
  5. Support Thadh (talk) 18:32, 15 February 2024 (UTC)Reply
  6. Support A. T. Galenitis (talk) 16:46, 16 March 2024 (UTC)Reply
  7. Support in principle — I am concerned, however, that η Δις Κατερίνα Σαρρή has a different understanding of what this vote endorses from the understanding of the other voters here. 0DF (talk) 07:46, 20 March 2024 (UTC)Reply
    See § phase 1 (above) for an explanation of this comment. 0DF (talk) 01:55, 24 March 2024 (UTC)Reply

?

[edit]

Happy month: καλό μήνα (kaló mína), @Benwing2, Mahagaja and everyone! Are we still on hold? I would like so much to come back, but how? having to write {m|gkm|xxx} all the time in pages with Ancient title... for example, @παπᾶς. I need: a month to review what exists. A year to do some labels for Learned Medieval (=archaisms and Hellenistic style), for Early Modern Greek (with medievalisms), some ready-to-fill-in inflection tables, some reference templates etc. I cannot even start without a code. Thank you. ‑‑Sarri.greek  I 17:00, 1 March 2024 (UTC)Reply

@Sarri.greek: I'm working on responses. Sorry for the delay. Please bear with me. 0DF (talk) 02:06, 2 March 2024 (UTC)Reply
Oh, M @0DF. What do you mean 'working on responses'? Please do not flood this page? We understand you are against. I shall make a special workpage-plan for MedGr once it is allowed. And with a talk page, and sections for every subject about it, where you can write as long texts as you like. Thank you. ‑‑Sarri.greek  I 06:04, 2 March 2024 (UTC)Reply
@Sarri.greek It looks like we have consensus for both changes, esp. for the split: 6-0 plus one undecided (User:0DF) for the split, 5-2 for the rename (User:Nicodene and User:0DF opposing). User:0DF, you never gave a response concerning the rename. Do you have anything you'd like to register (e.g. concerns, alternative suggestions, etc.)? Keep in mind that renames are easier to do than splits, so if for some reason it's decided in the future to undo the rename or switch to a third term, it wouldn't be such a big deal. Benwing2 (talk) 01:46, 17 March 2024 (UTC)Reply
Thank you all, thank you M @Benwing2! Great Sunday! I'm ready to start work! and will be checking the changes. I have prepared a trial-User:Sarri.greek/About Medieval Greek (in the pattern of WT:About Ancient Greek), a trial Template:User:Sarri.greek/gkm-IPA which needs to 'show' visibility, and more. Proposals and suggestions for the first-time-presentation of MedGr are welcome and needed from everyone, especially the administrators of Ancient and Modern Greek. e.g. at User About's Talkpage (or open an extra page?, please tell me, Sir, and everyone.) Thank you. ‑‑Sarri.greek  I
I don't have a ton to add to this discussion, any work to offer, or any great expertise - but in terms of periodizing, I wonder if it would also make sense to periodize Koine or classical as separate from ancient (in the sense that I guess ancient greek sort of goes until 300BC, and Koine/Classical goes until whenever we consider Byzantine/Medieval to start). My main thought here is that beyond new vocabulary borrowings replacing other vocabulary, or changes in grammatical forms or pronounciations, it is my extremely amateur perception that meanings, of some words at least, gradually shifted over the Ancient->Classical->[Byzantine to Medieval]->Modern period, especially as a result of Christianization. Or possibly that most attested pre-medieval greek texts are Classical rather than Ancient texts. I also think it's fine to call it Medieval Greek, that seems to be what English Wikipedia uses anyway. Anyway, that's my very late 2 cents to add to this discussion. -Furicorn (talk) 09:57, 31 August 2024 (UTC)Reply
@Furicorn: Thank you for your contribution. Currently, Ancient Greek is all Greek prior to 1453 except for that written in Linear B (which is Mycenaean Greek). Classical Greek and Koine Greek are not synonyms. The core of Classical Greek is the Attic Greek of the 5th century BC. Koine Greek is the form of Greek that developed as a consequence of the language's spread by the empire of Alexander the Great. It would certainly be possible to split Greek five ways — Mycenaean–Ancient/Classical–Koine–Byzantine/Mediaeval–Modern — but I expect that would result in a lot of redundancy, and I'm not sure it would be worth it. In reality, there is more difference between Homeric Greek and the rest of what we currently call Ancient Greek than there is between Classical Greek and Byzantine Greek, but that is not the split that was originally proposed in this discussion. As to what to call this chronolect, “that seems to be what English Wikipedia uses” is not a very strong argument unless you can tell us why it uses that name. 0DF (talk) 19:17, 12 September 2024 (UTC)Reply

Continuation (originally on Sarri's talk page)

[edit]

(moved from User_talk:Sarri.greek/2024#The_old_discussion_in_its_new_place)

Hello, Sarri. It was good of you to create that updated signpost. How is your health nowadays? Do you feel up to answering those questions I posed you at WT:LTR#Medieval Greek from Ancient Greek yet? No pressure if not; I just thought I'd check. All the best. 0DF (talk) 05:00, 27 September 2024 (UTC)Reply

Hello M @0DF, thank you for your interest. Healthwise, I am under therapies (sometimes very hectic). I apologise, that I cannot remember your questions. I am typing with difficulty and I cannot participate in discussions that are too long.
If they rename 'Byzantine' to 'Medieval' or 'Mediaeval Greek', I will be able to check all occurrences. If they split Medieval Greek from Ancient Greek, I will slowly edit the not so many pages involved, marking the too many unmarked Koine entries too.
Attn @Benwing2, Chuck Entz as linguists and bureaucrats: I believe I will have the time to do it. It is simple: There IS a mediaeval period for Greek (grk) (working code everywhere: gkm, or more 'officially' proposed here as grk-gkm). Please include it in en.wiktionary, filling a gap of some 10 centuries from grk's c.3,000+ history. I might make a few very simple templates needed when necessary.
I will be happy to answer questions here; excuse my short answers. Thank you ‑‑Sarri.greek  I 13:28, 27 September 2024 (UTC)Reply
@Sarri.greek: I'm sorry to hear it's still rough for you. The questions are all in WT:LTR#Medieval Greek from Ancient Greek, but that became quite a long discussion by the end. It doesn't seem very ethical to subject you to more questioning in your current condition. Since renaming the chronolect was and is a matter of some contention, what's stopping you from being “able to check all occurrences” of Byzantine [Greek]? 0DF (talk) 01:16, 28 September 2024 (UTC)Reply
@0DF, thank you. gkm automatically gives 'Byzantine'. I have already cleaned up all old manual edits for the language. If administrators that are professional linguists do not prefer the title 'Medieval' to 'Byzantine', there is nothing I can do. ‑‑Sarri.greek  I 06:00, 28 September 2024 (UTC)Reply

@0DF @Sarri.greek I don't think either of you is going to change their mind with further discussion, so I don't think more questions are in order. Given that there is a (bare) supermajority of 4-2 (pro: @Sarri.greek @Saltmarsh @Thadh @Mahagaja; con: @Nicodene; @0DF) with one abstention, I am inclined to go ahead with the rename of Byzantine -> Medieval. More importantly, there is strong support for splitting Medieval/Byzantine out of Ancient Greek, and I don't want this name dispute to be a blocking issue. If it turns out that later on we decide to go back to the name Byzantine, that is not hard to do and I can do it by bot (I've done plenty of such renames before). @Sarri.greek Please be aware that logistically, splitting out gkm into its own L2 language requires adopting a temporary code for either the new L2 language or the old etym-only language while both are coexisting, until everything is moved. My inclination is actually to do the following:
  1. Set up tracking for both the gkm and newly adopted grc-gkm codes.
  2. Rename the etym-only code gkm -> grc-gkm by bot. Leave its name as "Byzantine Greek".
  3. Remove the etym-only tracking for gkm once there are no more references. Leave the tracking for grc-gkm; this will make Sarri's job easier below.
  4. Create a new L2 language gkm named "Medieval Greek". (Having them have different names is fortuitous as it will avoid some complaints about duplicate language names.)
  5. Sarri, over time, moves the relevant entries to the ==Medieval Greek== header and cleans up any existing references to grc-gkm.
  6. When all references to grc-gkm are gone, we can remove this etym-only code.
Also pinging @Theknightwho and @Surjection (who have been involved in prior language splits) for any technical comments. Benwing2 (talk) 07:12, 28 September 2024 (UTC)Reply
M @Benwing2, thank you so much for your reply and thorough plan! I can see how busy you are, dealing with so many languages. I appreciate your work, and your decision; a true gift to grk but also to me, personally. Please note, that admin @Mahagaja has proposed official code grk-gkm, not grc-gkm as it is a period of Hellenic language (grk). I'll follow your edits closely and will do my best, a bit slowly, but diligently; I shall rename Categories, update wikidata and do all the work where adiministrators need not to be bothered. I search with insource: and intitle: I might make a little label to produce: Late Medieval or Early Modern Greek +cat, if I encounter words of 1500, 1600. High register (with datives etc, similar to Koine), can be covered by {lb|gkm|learned}. Thank you, thank you. ‑‑Sarri.greek  I 08:31, 28 September 2024 (UTC)Reply
@Sarri.greek I'm not sure the context behind grk-gkm but grc-gkm is a temporary label used because it refers to a variety of grc. The temporary label will go away once everything is converted to the L2 language gkm. Benwing2 (talk) 08:40, 28 September 2024 (UTC)Reply
@Sarri.greek with apologies - this is really beyond my "pay grade" and terra incognita to me. I rarely venture there. To you personally Sarri - I have friends who have been through similar tribulations and worries, my best wishes.   — Saltmarsh 17:48, 28 September 2024 (UTC)Reply
@Benwing2: I'll agree that no one objects to splitting Byzantine Greek out of Ancient Greek, and that the split should go ahead; however, I don't see how “this name dispute [is] a blocking issue”. Why can't the split take place without changing the name? Unfortunately, I think the original discussion suffered for its obscure location in WT:RFM (now moved to the even more specialised WT:LTR); perhaps there would have been greater engagement had it taken place in WT:BP, where Sarri.greek initially posted about it. To remedy this, shall I draft a vote about the naming issue?
AFAICT, most of what needs to be done on “the front end” is to edit the 289 member-entries of Category:Byzantine Greek to rename, split out, or duplicate their contents as appropriate; of its member-subcategories, only Category:Byzantine surnames needs (presumably) to be renamed Category:Byzantine Greek surnames. As for the changes on “the back end”, I don't really understand why that six-stage process is necessary. Would it not be sufficient to make {{lb|grc|Byzantine}} (and its aliases) categorise into the temporary topic category Category:grc:Byzantine Greek until all the relevant entries are edited to use the new L2 header? I apologise if that is a naïve question. 0DF (talk) 09:14, 30 September 2024 (UTC)Reply
M @0DF. They are not Byzantine. Not necessarily. I would edit under such a title only for historical, artistic fields, probably at wikipedias. Thank you. ‑‑Sarri.greek  I 09:29, 30 September 2024 (UTC)Reply
@Sarri.greek: Sorry, what aren't Byzantine? 0DF (talk) 09:39, 30 September 2024 (UTC)Reply
@Sarri.greek: Do you mean the surnames? Are they, rather, Koine? Shall I recategorise them? 0DF (talk) 14:14, 30 September 2024 (UTC)Reply
M @0DF, aimez-vous les byzantinismes? ‑‑Sarri.greek  I 16:44, 30 September 2024 (UTC)Reply
@Sarri.greek: I'm sure that's very witty, and perhaps I should respond « Pas du tout ! » or something; I'm also sure you're better acquainted with French literature than I am. But to interpret you literally, rather than literarily, no, I don't tend to adopt overcomplicated solutions, and I fail to see how my technically naïve suggestion to Benwing is in any way more complicated than the plan he laid out. 0DF (talk) 17:40, 30 September 2024 (UTC)Reply
@0DF I don't honestly see the need to relitigate this with a formal vote. WT:RFM is the normal place where language moves and splits used to happen (and now WT:LTR). AFAIK everyone has been pinged and had a chance to comment, and the process requesting yes/no votes has been open far longer than a standard vote. However, I will defer to what User:-sche says, who has been the overall person shepherding language moves through; do you think a formal vote is needed? As for the plan I suggested, yes this is necessary because of the existence of the current gkm code; the labels are not the only place that Byzantine/Medieval Greek is being referred to. Benwing2 (talk) 19:08, 30 September 2024 (UTC)Reply
@Benwing2: I figured I'd wait a while before replying, with a view to letting things cool off and in the hope that Ms Sarri might acquiesce to my supplication for a rationale, but no such luck. If this were a līs, the prosecution would be expected at least to make a case. That has yet to happen; consequently, a judgment notwithstanding the verdict is appropriate. There has also been canvassing (see Special:Diff/78033207, Special:Diff/78033252); those are grounds for a “retrial”, surely. We might expect a safer finding from a superior court. In the meantime, I reiterate my suggestion that we make the split without changing the name, since there are no objections to doing that. 0DF (talk) 00:53, 21 October 2024 (UTC)Reply
Plan for Medieval Greek
[edit]

Dear M @Benwing2, I keep checking Watchlists and your contributions, awaiting for your #plan for Medieval Greek. (WT:LTR#Medieval Greek from Ancient Greek) I know how busy you are with more important languages. But I am available for Greek and waiting... Long hours in front of the computer... Thank you ‑‑Sarri.greek  I 20:02, 18 October 2024 (UTC)Reply

@Sarri.greek: I ask again that you present some actual reasons for this proposed name-change, especially since you said in February that you “wouldn't mind terribly either” name. What's changed? 0DF (talk) 00:53, 21 October 2024 (UTC)Reply
[by Sarri.greek: Please excuse a short recap of a long discussion.] The essence of my proposal (as at Jan2024petition & sources for documenting our lemmata in March2023) was to humbly inform the community of editors of the English Wiktionary of the newest developments on Medieval Greek language studies, defined now as a Medieval period of a language instead of "Byzantine" language as we often have been seeing at chairs of "Byzantine Studies" in universities all over the West. This was not MY opinion, but the opinion of professors like w:David Holton, w:Geoffrey Horrocks and many others. I humbly asked the linguists of en.wikt to take a look at their introduction at T:Cambridge Grammar of Medieval and Early Modern Greek
p.xvii … "as Greek scholarship was relatively slow to catch up with the advances made in textual criticism and editorial practice for medieval texts in other major European languages. Over the past thirty or so years much has changed in relation to the situation described above"
And they conclude at p.xix "For this reason we do not employ the term “Byzantine Greek”: for almost the whole of the period that we are concerned with, a substantial part of the Greek-speaking world was not “Byzantine” in a political sense. Our criteria are instead internal ones, based on clusters of important linguistic changes that we see as occurring around 1100, 1500 and 1700"
Whether there are linguists specialising in Greek (ancient, medieval or modern) who object, I did not hear any references from editors who object to the name "Medieval". The vote for naming languages is not about how one feels about it, but to give a chance to all the community to bring in references and enrich the information available.
Who is the linguist that opposes to the term "Medieval Greek" or the existence of it as a distinct period? ‑‑Sarri.greek  I 20:17, 21 October 2024 (UTC)Reply
@Sarri.greek @0DF We're looking for consensus rather than litigating something in a court of law, so I am reluctant to do something like a judgment notwithstanding the verdict, which would involve overturning the consensus. I asked on Discord in the #hellenic channel and several additional people expressed support to one degree or another for the term "Medieval" and none for "Byzantine", so I am going to go ahead with the rename. Keep in mind that consensus does not have to be (and often is not) completely unanimous, and that renames are relatively easy to undo if for some reason this needs to happen. At the same time, however, a few people on Discord expressed strong reservations about the split and also said the amount of work required to effect the split might be a lot more than we think. Combined with the fact that in my experience, merges are even harder than splits, suggests to me that we should go slow in putting a split into practice. Let's first effect the rename, get the kinks worked out, and only then revisit the issue and look more deeply into what the split will involve. Benwing2 (talk) 20:38, 21 October 2024 (UTC)Reply
No problem with "slow", thank you M @Benwing2. Who are the linguists (contemporary ones) referred who think that it is not a distinct period? ‑‑Sarri.greek  I 20:55, 21 October 2024 (UTC)Reply
@Sarri.greek I'll let them reveal themselves if they want; they said they didn't want to participate in this discussion to avoid causing further upset and distress. Benwing2 (talk) 01:02, 22 October 2024 (UTC)Reply
@Sarri.greek, Benwing2: My apologies for taking a while to reply here. I am exceptionally busy in real life ATM, so haven't had the combination of time and lucidity to respond until now.
@Sarri.greek: In response to your question (“Who is the linguist that opposes to the term ‘Medieval Greek’…?”): I already cited above the Byzantinist professor Αντώνιος Καλδέλλης (Antónios Kaldéllis), whose 2019 book, Byzantium Unbound, has a chapter explicitly entitled “Byzantium Was Not Medieval”, which I excerpted in Citations:medieval. (Kaldellis is an Athens-born Greek, which I mention because it seems to matter to you.) The gist of his argument is that the Greek world didn't undergo the Middle Ages (the approximate millennium of notional benightedness intermediate between the dissolution of Classical civilisation and its Renaissance – literally “rebirth”), so the adjective “mediaeval” is improper when applied to it. Really, the closest analogue to the Middle Ages in the Greek world was the Τουρκοκρατία (Tourkokratía) of 1453–1821. Sure, you can call 1821 Greece's “delay[ed R]enaissance”, though only if you call 1453 its “delayed Fall”, but then it becomes clear that Latin Europe's Middle Ages and the Greek world's “Middle Ages” have basically nothing to do with each other. I have more to say, but I'll leave it there to keep it short.
@Benwing2: I applied the “court of law” analogy because you used the term relitigate, that's all. Still, I think it was illuminating: I'm sure you won't deny that this whole proposal is ill-conceived and ill-pursued. I do find it galling that Ms Sarri can so grossly mischaracterise the foregoing discussion with statements like “I did not hear any references from editors who object to the name ‘Medieval’” having, by dint of her sheer obstinacy, eventually found a forum that will wave her proposal through; independently of the merits of her proposals, that kind of evasive and dishonest behaviour should not be rewarded. Finally, you wrote that you are “reluctant to do something like a judgment notwithstanding the verdict, which would involve overturning the consensus,” before immediately invoking a conversation on Discord in order to overturn the consensus! How is such a double standard justifiable? But as a general principle, there is no way that “people on Discord said” should carry such weight, least of all when people on-wiki have not been made privy to the text of the discussion and those who opined on Discord decline to “own” their comments in the publicly-accessible on-wiki record. Wiktionary:Discord server neither permits nor prohibits such trust-me-bro invocations, but w:Wikipedia:Discord#Consensus clearly states that the relevant part of Wikipedia's policy on consensus (“Consensus is reached through on-wiki discussion or by editing. Discussions elsewhere are not taken into account. In some cases, such off-wiki communication may generate suspicion and mistrust.”) applies. We should have the same regulation, and judging by the comments by AG202, Mnemosientje, Sgconlaw, CitationsFreak, and koavf in Wiktionary:Beer parlour/2024/October#change in color to nyms, usex, affixusex, such a regulation has considerable community support.
0DF (talk) 17:56, 3 November 2024 (UTC)Reply

Solombala English

[edit]

Howdy folks! Am wondering if it may be a good or a bad idea to add a new language code for Solombala English, which is a very little attested pidgin, which has some common features with Russenorsk. It has only 20 known words, and two of them are obviously misunderstood by the later translators (but can be seen in the original sources). All the words, as far I know, are presented here: w:ru:Соломбальский английский язык (I added some commentary and sources there as well, but long time ago). The main reason of my request is that Solombala may be useful in etymology of some Russenorsk words. Tollef Salemann (talk) 17:43, 9 February 2024 (UTC)Reply

Support. Theknightwho (talk) 08:54, 25 February 2024 (UTC)Reply

Created as crp-slb, since this has been open for a couple of weeks, and no-one else seems to have much to say. @Tollef Salemann.

I have given it the Cyrillic and Latin script codes because, having checked, the original 1849 source uses (pre-reform) Russian Cyrillic, but modern sources seem to prefer a Latin transcription exclusively: e.g. "vat ju vanted, asej!" is actually "ватъ ю вантетъ, асей!" in the 1849 source (pp. 406-7); note that вантетъ (vantet) has been transcribed as vanted, for instance. I can't find the 1867 source referred to, but I assume it's also in Cyrillic.

Please let me know if you think we should be handling the scripts in a different way, though. Theknightwho (talk) 09:30, 25 February 2024 (UTC)Reply

Thank you! There are also "my" instead of "tu". This was mistake of Broch i guess, and it seems like im the only who noticed it. There is also a funny story with his translation of "milek", cuz it was used in some adult context. As far i remember, there is no original Latin script Solombala, but im gonna first check through all the sources for being sure. The 1867 source took me a while to find last year, but i remember it wasn't impossible. Tollef Salemann (talk) 11:07, 25 February 2024 (UTC)Reply
@Tollef Salemann Alrighty - let me know if you think we should remove Latn. I should have also said that I've also set it to use Russian transliteration, for obvious reasons. Theknightwho (talk) 03:04, 27 February 2024 (UTC)Reply


Converting Min Nan into a family

[edit]

Currently, we classify Min Nan (nan) as a language, despite it being a family of several Chinese lects. Because of this, the way we treat those lects is arbitrary and inconsistent.

  • Hokkien and Hainanese are both classified as etymology-only languages, despite Hokkien covering several major (dia)lects in its own right, and it being very common for entries to have a large number of Hokkien readings. It's not currently possible to add Hainanese to {{zh-pron}}, but it's also on the roadmap. In terms of how they are used, nothing distinguishes them from how we handle any of the full languages under the Chinese header, so there's no reason to classify them like this.
  • On the other hand, Teochew and Leizhou Min are classed as full languages, but they both have Min Nan set as their "ancestor", which is nonsense. I assume this was done so that the family tree looked right (see Category:Old Chinese language), but this has clearly happened because editors think of Min Nan as a family, not a singular language.

Currently, there is a pending request at the ISO in order to split Min Nan into a macrolanguage (though I won't address those which we don't currently have codes for, since that discussion is for another time).

  1. nan should be converted to a family code.
  2. Hainanese (nan-hai) should be converted to a full language.
  3. Hainanese, Hokkien (nan-hok), Leizhou Min (zhx-lui) and Teochew (zhx-teo) should be on the immediate level below.
  4. Given the large number of entries with numerous Hokkien readings, there are two options:
    1. Convert Hokkien to a full language, with Quanzhou, Zhangzhou and Xiamen etymology-only languages, possibly with the addition of Taiwanese Hokkien.
    2. Convert Hokkien to a family, and have Quanzhou Hokkien, Zhangzhou Hokkien and Xiamen Hokkien as full languages on the level below. I have no opinion on whether Taiwanese Hokkien (which is split out in the ISO proposal) should be treated separately if we do this.

Theknightwho (talk) 13:07, 17 February 2024 (UTC)Reply

Support the first three bullet points, but Weak oppose on the fourth:
  • a potential slippery slope: Singapore, Penang, Longyan, etc. could warrant full languagehood if ZXQ and Taiwan are split
  • treatment of the above would be ambiguous due to the nature of Hokkien potentially not being monophyletic and the fact that eg. Taiwanese can’t really be called “a dialect of Amoynese” despite their shared transitionary nature
  • to draw a parallel with Northern Wu, Shanghainese and Suzhounese, both not being full languages, occupy a very similar geneological level when compared to ZXQ, though as far as the current trajectory is going, they will not be gaining full language-hood any time soon
Just my two cents — 義順 (talk) 02:57, 18 February 2024 (UTC)Reply
@ND381 Just to be clear, does that mean you support option 1 of point 4? Theknightwho (talk) 14:42, 18 February 2024 (UTC)Reply
ah yeah I misread what that said — yes, I would be in support of option 1 of the fourth point — 義順 (talk) 16:38, 18 February 2024 (UTC)Reply
@ND381 What do you mean by "transitionary"? (talk) 11:31, 28 February 2024 (UTC)Reply
I don't particularly know to much abt Hokkien linguistcs (I do Northern Wu) but from what I understand Amoynese and Taiwanese both exhibit features of both Zhangzhou and Quanzhou lects — 義順 (talk) 12:01, 28 February 2024 (UTC)Reply
@ND381 I see. This is the common wisdom, I guess.
In truth, it makes little sense to pretend that "Zhangzhou" & "Quanzhou" are cardinal dialects. For one thing, there is a great deal of variation within what are supposed to be "Zhangzhou" Hokkien & "Quanzhou" Hokkien. Quemoy & Tâng-oaⁿ 同安 dialects of "Quanzhou" Hokkien, as a clear example, are themselves "transitional to Zhangzhou". So the entire "Zhangzhou-Quanzhou" framework is made of duct tape. "Zhangzhou-Quanzhou" reflects Confucian administrative loyalties more than anything else, as the English terminology (via Mandarin Pinyin) suggests. And the exclusion of Amoy Hokkien from "Quanzhou" is arbitrary & inconsistent in itself. So, there's "nothing there", even if certain isoglosses unsurprisingly bundle along the old prefectural border. (talk) 08:59, 29 February 2024 (UTC)Reply
Similar to ND381, Support the first three points. The second subpoint of point 4 is a terrible idea, since it leaves out Zhangzhou-Quanzhou mixed varieties of Hokkien, which is one of the reasons why "Hokkien" isn't monophyletic. It's also unclear whether dialects like Jinjiang and Philippine Hokkien would be subsumed under Quanzhou. While we're at this, we would also need to see how certain other varieties of Min Nan are dealt with under the structure based on the first three points, namely Longyan (including Zhangping), Datian, Youxi, southern Zhejiang and Zhangzhou-based varieties spoken in Guangdong/Guangxi. While the Language Atlas of China groups Longyan with other Quan-Zhang varieties, it seems that it traditionally isn't considered "Hokkien". We might also want to see where Hailufeng Min fits here. (I'm writing this in a little rush, so there might be more points that come along after.) — justin(r)leung (t...) | c=› } 14:30, 18 February 2024 (UTC)Reply
@Justinrleung No, "Longyan" is most definitely not part of Hokkien, either linguistically or sociolinguistically.
Hai Lok Hong Hoklo is clearly parallel to Hokkien & Teochew.
The Hokkien dialects of southern Zhejiang are clearly part of Hokkien.
Many or most pieces seem poised to fall into place. (talk) 11:36, 28 February 2024 (UTC)Reply
@ I agree with you on this - Longyan should definitely be treated separately. I omitted it from the proposal because I specifically wanted to address the issue of whether we should treat Southern Min as a family, so I only mentioned the codes we currently have. It’s not supposed to be comprehensive, and in fact I was hoping it could set the stage for further additions, as I thought this change should probably happen before we add anything else. Theknightwho (talk) 13:07, 28 February 2024 (UTC)Reply
No particular vote as I don't think I'm qualified enough to discuss about Southern Min here as I very rarely edit it, but I share similar views with ND and Justin based on my limited understanding of the internal structure of Southern Min after reading Kwok (2018).
I reckon the treatment of Zhongshan Min should perhaps also be discussed here, given that Glottolog treats it as a subbranch of Southern Min, although it seems like some of it is Eastern Min. Eitherway I think it will need a code. – wpi (talk) 14:09, 23 February 2024 (UTC)Reply
Seconding this. Apparently, so-called "Zhongshan Min" is three mutually unintelligible languages, two of which may not belong to the NAN family (?) at all. (talk) 11:42, 28 February 2024 (UTC)Reply
I don't have many knowledge of the relationship between ZQX Hokkien and other Hoklo varieties like Chaozhou and Hainan.
However, Amoy variety, Quanzhou variety, and Zhangzhou one are mutually intelligible to some extend. Amoy varieties should be treated like a dialect of ZQX language linguistically. Just like Irish deirfiúr that has contained various pronunciation from the dialect locations in Ireland.
Concerning with whether the Taiwanese (Taigi) should be treated like a fully language or a dialect of Hokkien, it's something like Serbo-Croatian language separation issue.--Yoxem (talk) 10:50, 28 February 2024 (UTC)Reply
@Theknightwho Supporting Item 2.
Not opposing Item 1 (nor Item 3) in this context, but — even disregarding misplaced outliers — how much evidence is there that these languages (say, Hainanese & Hokkien) belong to one family in a historical sense? (Wikipedia doesn’t treat Singlish & Jamaican Creole, for instance, as being in the same language family as English. Or do we use the term “family” differently around here?)
Supporting Item 4.1, excluding Taiwanese.
The “Zhangzhou-Quanzhou-Amoy” split reflects the mapping of Confucian loyalties. It corresponds somewhat to linguistic reality, but attempts to package “Zhangzhou” Hokkien & “Quanzhou” Hokkien in a systematic manner seem to give off more smoke than light, as suggested by Mar_vin_kaiser’s comment clarifying what “Zhangzhou Hokkien” should mean.
So so-called “Zhangzhou” Hokkien or “Quanzhou” Hokkien or Amoy Hokkien are all just Hokkien. The “Zhangzhou-Quanzhou” split reflects Confucian psychology, not linguistic reality, and “Amoy” was set up as a third group not for linguistic but for Confucian or face-related (“face truce”) reasons. If some words have lots of pronunciations, in part this reflects the sociolinguistic reality of a wide range of dialects being recognized as a single language. Also, marginal pronunciations seem to find their way into Wiktionary for Hokkien much more than for most other languages, but as long as they exist (and not just idiolectally) & are non-extinct, this is good & well. If extinct or poorly attested pronunciations are swelling the ranks, methods may need examined, but that’s for some other day.
There is something to be said for treating Penang-Medan Hokkien as another language. Even w/o getting into the genesis of Penang Hokkien, the phonology of the variety seems to bend the rules of plain Hokkien. But the convention seems to be to treat it as a dialect within Hokkien, and this in turn reflects the sociolinguistic reality. (talk) 11:54, 28 February 2024 (UTC)Reply

Pinging @Mar vin kaiser, Singaporelang, Mlgc1998, 幻光尘, LeCharCanon, MistiaLorrelay, Kangtw, The dog2, TagaSanPedroAko, Janinga Chang, Yoxem, 汩汩银泉, RcAlex36, Geographyinitiative for comment, who are all users who've edited recently that have some knowledge of Min Nan. Theknightwho (talk) 11:16, 27 February 2024 (UTC)Reply

Thanks for calling - but actually I'm not proficient on the historical & comparative linguistics of Minnan, so I'll report the opinion from @S.G.Junge1997 who is currently working on various Southern Han varieties (I'm doing so because he's currently suffering from IP block).
“As almost all the Sinitic languages that we discuss here, including Southern Min, Northern Wu and so-on, are de facto macrolanguages, it would be not proper to list just some variety of these macrolanguages as distinct languages while to consider other least-concerned languages a part of the huge dialect continuum, not mentioned the phonological, lexicological or genetic differences between the least-concerned varieties are much larger than these varieties with metropolitan native speakers. Janinga Chang (talk) 15:55, 27 February 2024 (UTC)Reply
...Taking Southern Min as an example, the macrolanguage Southern-Min itself is emerged among a group of coastal Min varieties in Dàtián, Fújiàn and surrounding area. Genetically, Southern Min can be divided into three varieties, the Western varieties used in Lóngyán and Zhāngpíng, Fújiàn Province, some remnants in Guǎngdōng Province (namely Zhōngshān Hokkien and some varieties of Leizhou Min), while the majority of Southern Min languages are in fact dialects of the massive Eastern varieties, including Chaozhou, Southern Min proper and Taiwanese Southern Min, these varieties shared a huge amounts of vocabularies and intelligibility, with only some of the characteristic vocabularies shared inside different branches. I'm not arguing about not list Chaozhou and Southern Min proper as different languages, but if one should consider listing Chaozhou and Quanzhou-Zhangzhou Southern Min or even Taiwanese Southern Min as separate languages appropriate, they must consider listing Dàtián qiánlù, Dàtián hòulù, Kǒngfūhuà, Sūbǎnhuà, Yànshí-Báishā, Lóngyán proper, Yǒngfú-Héxī, Zhāngpíng proper, Xīnqiáo-Xīnán and other small varieties concerned way less as distinct languages as well, (apart from Dàtián qiánlù and Dàtián hòulù, all these languages are different varieties of Western branch of the Southern Min which are using in different valleys around Lóngyán, most of which have less native speakers than 10k and are critically endangered, and although most of these languages share some common features, their differences in vocabularies and phonologies make them less intelligible internally than most of Eastern branch varieties, even not considering Chaozhou and Southern Min proper as different languages, some of these languages are still so diverse to be okay to be listed as separated) as it wouldn't be so appropriate to have "endangered" language varieties with often more than 1000k metropolitan native speakers listing as different languages while ignoring the real endangered languages with less than 10k native speakers and trying to hide their differences using a leftover garbage can discarded by thie metropolitan people who think their language is absolutely unique.”
Although this might sound offensive to some who values the traditional Quanzhou-Zhangzhou-Amoy-Taiwan layout more, his opinion is definitely worth considering since he had actually been to Longyan for fieldworks for several times. Janinga Chang (talk) 16:05, 27 February 2024 (UTC)Reply
Hi! I Support the first three points, same as the ones above. I also reject the second subpoint of point 4 for the reasons mentioned. For the first subpoint of point 4, I support making Hokkien a full language. As for "etymology-only languages", I find it vague to say that a word from language X originates from "Zhangzhou Hokkien" when the way we've been using the term "Zhangzhou Hokkien" is the dialect specific to Zhangzhou city proper, and the word might have borrowed it not from Zhangzhou city proper. Seeing the reply of S.G.Junge1997, I'd be open to proposing Datian Min be listed as a separate language. --Mar vin kaiser (talk) 16:13, 27 February 2024 (UTC)Reply
@Mar vin kaiser Just FYI: "etymology-only language" is a misnomer; a much better description is "variant", as it covers everything from written standards like British English (en-GB) to chronolects like Old Latin (itc-ola) to regional varieties like Penang Hokkien (nan-pen). The thing that matters is that they're "part of" a full language (or, in some cases, another etym-only language). We already have codes for a few varieties of Hokkien, so that part isn't proposing anything new; just that they're nested under the new language code for Hokkien, instead of as sub-variants like they are now. Theknightwho (talk) 17:00, 27 February 2024 (UTC)Reply
@Theknightwho: Thanks for explaining! Then I see no problem with it. If ever, my question is why it should not be extended to Penang Hokkien, Singapore Hokkien, and Philippine Hokkien. --Mar vin kaiser (talk) 17:07, 27 February 2024 (UTC)Reply
@Janinga Chang Seconding parts of this. It was careless for all these varieties to have been anonymously swept into NAN w/o careful examination & debate beforehand. (talk) 12:09, 28 February 2024 (UTC)Reply
Support as well 1., 2., 3., and 4.1. as per further explanation of Theknightwho about variants under/part of Hokkien as a full language, e.g. Quanzhou, Zhangzhou, Xiamen, Penang, Singaporean, Philippine, Taiwanese, etc. etc. and also later expansion of no. 3 as well for the others under nan as a family to be their own as full languages under the nan family/branch of Min of Sinitic if they show divergent enough linguistic features and are realistically practically socially regarded by their speakers as separate from their closest of kin anyways by now, such as those mentioned above by Justinrleung and S.G.Junge1997 and those listed in the ISO pending request and other more there may be. Also, 4.2 is a bad idea due to there still being a lot of structurally similar or reasonably identical enough terms shared with these variants (ZXQ++) still tying them together despite some observable differences, whether in phonemic structure, vocabulary choices, tonal differences, and other tendencies of these variants. The gulf of difference with these variants (ZXQ++) is not yet like the difference with say what makes nan-hok, zhx-teo, nan-hai, zhx-lui, etc. different from each other, enough to definitively split them.
Also pinging as well other users I remember seeing them edit or create nan entries before: @Fish bowl, @Wikijb, @, @TongcyDai, @A-cai, @Hongthay for comment on this. Mlgc1998 (talk) 20:45, 27 February 2024 (UTC)Reply
Support the first three bullet points. RcAlex36 (talk) 04:30, 29 February 2024 (UTC)Reply
Thanks for calling and sorry for my bad english.
For point 4, I Support the option 1 and no support option 2.
Since I, as a native speaker (of ZC), I think the differences (of Zhangzhou Hokkien and other Hokkien tongues) are small that cannot split them to languages. I dare say they are just accents of Minnan/Hokkien.
For Teochew, Leizhou-ish and Hainanese, indeed their "ancestor" is not the Min Nan, but they are also southern descendant languages of ancient Min too — different to northern descendants like Fuzhou-ish.
(ZC: the Zhangzhou City accent of Hokkien)
MistiaLorrelay (talk) 10:06, 29 February 2024 (UTC)Reply

Split with option 1 of point 4, given the overwhelming support in the last two weeks. Taking inspiration from @Benwing2's process to split the Khanty languages above (see #Splitting Khanty Languages), I think this is what needs to happen:

  1. Assign new language codes to Hokkien (nan-hbl) and Hainanese (nan-hnm), and change over Leizhou Min (zhx-luinan-luh) and Teochew (zhx-teonan-tws). For the sake of forward-compatibility, I've used the proposed codes from the pending ISO proposal, since that will make things simpler if they're accepted.
  2. Assign a temporary family code to Min Nan (zhx-nan), which will be used while nan still exists as a language code.
  3. Track any uses of the nan code.
  4. Move all current {{nan-*}} templates to {{nan-hbl-*}}, since they all relate to Hokkien.
  5. Convert any existing entries with the Min Nan headword to the relevant language (which I suspect will be Hokkien in the vast majority of cases, if not 100%).
  6. Change any references to nan to use the appropriate code. Again, I suspect Hokkien will predominate.
  7. Change any references to the existing etymology-only codes to use the appropriate code.
  8. Delete nan as a language code, and add it as a family code, replacing the temporary code zhx-nan mentioned above.

At this point, I also suggest that we start a new thread to discuss any additional languages which should be added to the Min Nan family, as several have been suggested above. Theknightwho (talk) 18:52, 2 March 2024 (UTC)Reply

I Support points 1-3. However, ZXQ, Taiwanese, Penang, Singapore, and Philippine are really just variable accents with some regional vocabulary, like English dialects throughout England (are all those words recorded in Wiktionary too? They can't be as separate languages though?). Here in Taiwan, Taiwanese is getting more and more standardisation as the years pass, but I agree with another post comparing it to Serbo-Croatian (all accents of a single Stokavian dialect). There are different regional words used in Taiwan, but we start to understand them all as synonyms and I don't even know anymore which words belong to which specific location, like 日頭花 vs 太陽花, or 葉仔 vs 樹葉 vs 樹仔葉 vs 樹葉仔. I frequently travel throughout Southeast Asia and try to use Taiwanese in Penang and Singapore as much as possible. As someone mentioned, Penang has some interesting phonology, but I'm still able to hold conversations with taxi drivers--they speak in their way and I in mine. Though in Penang I've encountered drivers who talk freely at length and at times I find it hard to understand some of the details--they probably understand Taiwanese better than the other way around due to television dramas. But this interaction would not be possible for Chaozhou, which I consider so different as a separate language, and also Hainan and Leizhou--the phonology is far too different and they grammatically use different words. I feel that adding all the various regional pronunciations for ZXQ/Taigi clutters Wiktionary, and I believe that a better unifying meta-spelling would be better that enables regional pronunciations to be deduced through a few simple rules. I think it's best to mention whether a location has a completely separate word for something, rather than providing multiple pronunciations of the same word/字/morphemes. I also dislike the clutter and use of "invented" alternate romanisations that are not widely used or accepted, nor can anybody actually read. POJ or better, TâiLô, function just fine. Kangtw (talk) 09:36, 5 March 2024 (UTC)Reply
Sorry, when I posted support above, the green + button did not automatically appear when I posted. In spite of that, please consider my vote. Kangtw (talk) 09:39, 5 March 2024 (UTC)Reply
@Kangtw The vote has actually already closed, but everyone seems to have shared your view that Hokkien shouldn’t be split and should be treated as one language, so that’s how I’ve been carrying it out. Theknightwho (talk) 18:34, 7 March 2024 (UTC)Reply
@Theknightwho what remains to be done here? Cat:Min Nan language looks mostly empty. This, that and the other (talk) 09:51, 2 October 2024 (UTC)Reply

Add etymology-only codes for Proto-Anglo-Frisian and Proto-North Sea Germanic

[edit]

As variants of Proto-West Germanic. This shoud hopefully be relatively uncontroversial, since we already have a healthy number of entries in Category:Anglo-Frisian Germanic and Category:North Sea Germanic, and there's a need for these due to both (sub-)families being mentioned in various etymology sections:

No doubt there are many more entries where these could be referred to. Theknightwho (talk) 02:19, 27 February 2024 (UTC)Reply

@Theknightwho Anglo-Frisian is a well-established clade but I'm not so sure about North Sea Germanic. Cf. Wikipedia's comment:
North Sea Germanic, also known as Ingvaeonic /ˌɪŋviːˈɒnɪk/, is a postulated grouping of the northern West Germanic languages that consists of Old Frisian, Old English, and Old Saxon, and their descendants.
Ingvaeonic is named after the Ingaevones, a West Germanic cultural group or proto-tribe along the North Sea coast that was mentioned by both Tacitus and Pliny the Elder (the latter also mentioning that tribes in the group included the Cimbri, the Teutoni and the Chauci). It is thought of as not a monolithic proto-language but as a group of closely related dialects that underwent several areal changes in relative unison.
Benwing2 (talk) 04:36, 27 February 2024 (UTC)Reply
@Victar as a major PWG editor.
Not to mention the fact PWG is already pretty controversial (@Mårtensås had some strong opinions on the topic).
I don't think an etym-only code for either is needed at this time, as the supposed differences were very minor, and we don't represent it in our PWG entries afaik. So while the label signifies a term's distribution, it is still supposedly the same language as any other PWG reconstruction in the model we handle. Thadh (talk) 07:24, 27 February 2024 (UTC)Reply
I've never had a need for either, and North Sea Germanic is generally considered an areal grouping. -- Sokkjō 07:39, 27 February 2024 (UTC)Reply
I can see the argument against NSG, but there is very clearly a need for Proto-Anglo-Frisian based on the etymologies mentioned above. It’s not about whether any particular editor has a need for it themselves, and nobody is suggesting we create separate entries for them outside of PWG. Theknightwho (talk) 11:00, 27 February 2024 (UTC)Reply
@Theknightwho I see you created a category Category:Old Frisian terms derived from North Sea Germanic languages as well as Category:Elfdalian terms derived from North Sea Germanic languages and Category:Elfdalian terms derived from Anglo-Frisian languages. Why did you do that, since this discussion is far from resolved? Benwing2 (talk) 22:29, 27 February 2024 (UTC)Reply
@Benwing2 I've already removed the North Sea Germanic family, as I thought better of it. The question of whether we have an Anglo-Frisian clade is separate from whether we have a protolanguage for it (and that category was created back in November). Theknightwho (talk) 22:35, 27 February 2024 (UTC)Reply
Ignoring that fact that a genetic Anglo-Frisian family is disputed, as far as I'm aware, no one has published "Proto-Anglo-Frisian" reconstructions, not even Boutkan or Siebinga, so we wouldn't even have anyone to cite. -- Sokkjō 00:57, 28 February 2024 (UTC)Reply
@Sokkjo Then someone will need to deal with the etymology sections in those entries. Either we mention Anglo-Frisian reconstructions with a proper language code, or we don't mention them at all. Theknightwho (talk) 01:43, 28 February 2024 (UTC)Reply
Which entries, these: CAT:Anglo-Frisian Germanic? -- Sokkjō 02:11, 28 February 2024 (UTC)Reply
@Sokkjo English welkin (which refers to an "Anglo-Frisian Germanic" term), while Old English hriþer and metegian, Old Frisian hrither, and Saterland Frisian dusse all explicitly give Anglo-Frisian reconstructions. Theknightwho (talk) 02:15, 28 February 2024 (UTC)Reply
Amended. -- Sokkjō 04:16, 28 February 2024 (UTC)Reply
@Sokkjo You should also look at the entries mentioned in the North Sea Germanic list at the top of the thread. Once they're dealt with, I'll close this request as resolved. Theknightwho (talk) 06:44, 28 February 2024 (UTC)Reply
@Theknightwho Before resolving this, we need to clear up whether to let the existing 'Anglo-Frisian' family stand. You created it in November without discussion and it's not clear to me from this discussion whether there's consensus in its favor. Benwing2 (talk) 07:11, 28 February 2024 (UTC)Reply
@Benwing2 To explain the reasoning: I understood it to be an uncontroversial clade, which was reinforced by the existence of Category:Anglo-Frisian Germanic. I may have misunderstood the implications of that category, though. Theknightwho (talk) 07:26, 28 February 2024 (UTC)Reply
@Theknightwho I think what this shows is that all additions of clades, and more generally any addition of languages or families, needs discussion beforehand, no matter how uncontroversial it seems. Benwing2 (talk) 07:53, 28 February 2024 (UTC)Reply
@Theknightwho I see you also created the "High German" family back in November. Let me reiterate, you need to not create any more languages or families without discussion. Benwing2 (talk) 01:25, 1 March 2024 (UTC)Reply

Merging Tupinambá (tpn) into Old Tupi (tpw)

[edit]

Tupinambá has only 3 entries, i, and ý, which are already covered by Old Tupi, i, and 'y/y. Also, Old Tupi is used as an umbrella term for all Tupi dialects in Wikitionary, so having a separate heading for Tupinambá doesn't make much sense. Trooper57 (talk) 17:11, 9 March 2024 (UTC)Reply

I also wanted to merge Tupinikin (tpk) for the same reason, just realised there's page for it. This one is basically blank, except for an empty maintenance category. Trooper57 (talk) 21:15, 9 March 2024 (UTC)Reply
tpw (Old Tupi) got merged into tpn (Tupinambá) in 2022, so we should probably follow suit. I don’t really understand why Tupinikin (tpk) should be merged, though. Theknightwho (talk) 21:52, 9 March 2024 (UTC)Reply
It's the same case of Tupinambá: what they call "Tupinikin language" is the variant of Old Tupi spoken by the Tupinikin people. I called them dialects but the difference is like General American to Southern American English, they differ on pronunciation in some points and call some things by different words, but aren't languages on their own. The category is just gonna stay blank forever as all lemmas will be put in Old Tupi anyway. Also, both Tupinambá language and Tupiniquim language redirect to Tupi language on Wikipedia.
About the code, I chose tpw over tpn because I prefer the name "Old Tupi", since it's neutral. I don't mind changing the code if we keep the name. Trooper57 (talk) 22:44, 9 March 2024 (UTC)Reply
@Trooper57 For reference ISO merged Old Tupi and Tupinambá to tpn, and the code tpw was deprecated. It also seems that all varieties of Tupi are extinct. If Tupinambá & Old Tupi [tpn] are not significantly different from Tupiniquim [tpk] perhaps they should all be merged into Tupi [tpn]? - سَمِیر | Sameer (مشارکت‌ها · بحث) 21:54, 9 March 2024 (UTC)Reply
It seems theknightwho already said that while I was typing so my comment is now pointless 😞. - سَمِیر | Sameer (مشارکت‌ها · بحث) 21:56, 9 March 2024 (UTC)Reply

etymology codes for remaining Chinese varieties in Module:zh-usex/data

[edit]
Current variety code Description Current langcode Proposed langcode Current romanization Comment
MSC MSC cmn cmn Pinyin
M-BJ Beijing Mandarin cmn cmn-bei? [SUGGESTION] Pinyin
M-TW Taiwanese Mandarin cmn cmn-TW? [SUGGESTION] Pinyin
M-MY Malaysian Mandarin cmn cmn-MY? [SUGGESTION] Pinyin
M-SG Singaporean Mandarin cmn cmn-SG? [SUGGESTION] Pinyin
M-PH Philippine Mandarin cmn cmn-PH? [SUGGESTION] Pinyin
M-TJ Tianjin Mandarin cmn cmn-tia? [SUGGESTION] Pinyin
M-NE Northeastern Mandarin cmn cmn-nea cmn-noe? [SUGGESTION] Pinyin
M-CP Central Plains Mandarin cmn cmn-cpl cmn-cep? [SUGGESTION] Pinyin
M-GZ Guanzhong Mandarin cmn cmn-gua? [SUGGESTION] Pinyin Guanzhong
M-LY Lanyin Mandarin cmn cmn-lan? [SUGGESTION] Pinyin
M-S Sichuanese zhx-sic zhx-sic Sichuanese Pinyin
M-NJ Nanjing Mandarin cmn-njn cmn-njn cmn-nan? [SUGGESTION; cmn-njn NOT DEFINED] Nankinese Pinyin
M-YZ Yangzhou Mandarin cmn-yaz cmn-yaz cmn-yan or cmn-yzh? [SUGGESTION; cmn-yaz NOT DEFINED] IPA IPA as a placeholder
M-W Wuhanese cmn-wuh cmn-wuh? [NOT DEFINED] IPA
M-GL Guilin Mandarin cmn-gli cmn-gli cmn-gui? [SUGGESTION; cmn-gli NOT DEFINED] IPA IPA as a placeholder
M-XN Xining Mandarin cmn-xin cmn-xin? [NOT DEFINED] IPA IPA as a placeholder
M-UIB dialectal Mandarin cmn cmn-bei-unk? [DO WE NEED THIS?] Pinyi UIB stands for "unidentified Beijingesque"; this is only used for dialects with similar phonology to one of Beijing dialect or MSC
M-DNG Dungan dng dng Cyrillic
CL Classical Chinese cmn cmn-cla lzh-cmn? [SUGGESTION] Pinyin
CL-TW Classical Chinese cmn cmn-cla-TW lzh-cmn-TW? [SUGGESTION] Pinyin (Taiwanese Mandarin)
CL-C Classical Chinese yue cmn-cla-TW lzh-yue? [SUGGESTION] Jyutping
CL-C-T Classical Chinese zhx-tai zhx-tai-cla lwz-tai? [SUGGESTION] Wiktionary
CL-VN Vietnamese Literary Sinitic vi ??? [DO WE NEED THIS?] Sino-Vietnamese
CL-KR Korean Literary Sinitic ko ??? [DO WE NEED THIS?] Sino-Korean
CL-C Classical Chinese yue yue-cla lzh-yue? [SUGGESTION; DUPLICATE ENTRY] Jyutping
CL-PC Pre-Classical Chinese cmn cmn-pcl lzh-pre? [SUGGESTION] Pinyin
CL-L Literary Chinese cmn cmn-lit lzh-lit? [SUGGESTION] Pinyin
CI Ci cmn cmn-cip lzh-cip? [SUGGESTION] Pinyin
WVC Written Vernacular Chinese cmn cmn-wrv cmn-wvc? [SUGGESTION] Pinyin
WVC-C Written Vernacular Chinese yue yue-wrv yue-wvc? [SUGGESTION] Jyutping
WVC-C-T Written Vernacular Chinese zhx-tai zhx-tai-wrv zhx-tai-wvc? [SUGGESTION] Wiktionary
C Cantonese yue yue Jyutping
C-GZ Guangzhou Cantonese yue yue-gua? [SUGGESTION] Jyutping
C-LIT Literary Cantonese yue yue-lit? [SUGGESTION] Jyutping
C-HK Hong Kong Cantonese yue yue-HK? [SUGGESTION] Jyutping
C-T Taishanese zhx-tai zhx-tai Wiktionary
C-DZ Danzhou dialect yue-dan yue-dan? [NOT DEFINED] IPA IPA as a placeholder
J Jin cjy cjy Wiktionary
MB Min Bei mnp mnp Kienning Colloquial Romanized
MD Min Dong cdo cdo Bàng-uâ-cê / IPA
MN Hokkien nan-hbl nan-hbl Pe̍h-ōe-jī
TW Taiwanese Hokkien nan-hbl nan-hbl-TW? [SUGGESTION] Pe̍h-ōe-jī
MN-PN Penang Hokkien nan-hbl nan-pen Pe̍h-ōe-jī
MN-PH Philippine Hokkien nan-hbl nan-hbl-PH? [SUGGESTION; we have nan-plp but this is badly named] Pe̍h-ōe-jī
MN-T Teochew nan-tws nan-tws Peng\'im
MN-L Leizhou Min luh luh Leizhou Pinyin
MN-HLF Haklau Min nan-hlh nan-hlh IPA IPA as a placeholder
MN-H Hainanese hnm hnm Guangdong Romanization
W Wu wuu wuu Wugniu
SH Shanghainese wuu wuu-sha Wugniu
W-SZ Suzhounese wuu wuu-szh Wugniu
W-HZ Hangzhounese wuu wuu-hzh Wugniu
W-CM Shadi Wu wuu wuu-sha wuu-chm? [SUGGESTION; wuu-sha conflicts with suggestion for Shanghainese] Wugniu wuu-cm? including Chongming, Haimen, Changyinsha etc
W-NB Ningbonese wuu wuu-ngb Wugniu
W-N Northern Wu wuu wuu-nor? [SUGGESTION] Wugniu general northern wu, incl. transitionary varieties
W-WZ Wenzhounese wuu-wz wuu-wen? [SUGGESTION; wuu-wz NOT DEFINED and badly formatted] Wugniu
G Gan gan gan Wiktionary
X Xiang hsn hsn Wiktionary
H Sixian Hakka hak hak-six? [SUGGESTION] Pha̍k-fa-sṳ
H-HL Hailu Hakka hak hak-hai? [SUGGESTION] Taiwanese Hakka Romanization System
H-DB Dabu Hakka hak hak-dab? [SUGGESTION] Taiwanese Hakka Romanization System
H-MX Meixian Hakka hak hak-mei? [SUGGESTION] Hakka Transliteration Scheme
H-MY-HY Malaysian Huiyang Hakka hak hak-hui? [SUGGESTION] IPA IPA as a placeholder
H-EM Hakka hak hak-emo hak-eam? [SUGGESTION] IPA Early Modern Hakka, IPA as a placeholder
H-ZA Zhao'an Hakka hak hak-zha? [SUGGESTION] Taiwanese Hakka Romanization System
WX Waxiang wxa wxa IPA

(Notifying Atitarev, Tooironic, Fish bowl, Justinrleung, Mar vin kaiser, RcAlex36, The dog2, Frigoris, 沈澄心, 恨国党非蠢即坏, Michael Ly, Wpi, ND381): @Theknightwho We currently have 66 bespoke "variety codes" for Chinese lects in Module:zh-usex/data for use by {{zh-x}} (there are 67 entries in the data module, but CL-C occurs twice). The module maps them to language codes (full and etymology-only), but in a non-unique fashion. I haven't counted but I'm guessing maybe 40% of the variety codes have existing full or etymology-only language codes. I propose creating etymology-only codes for the remaining ones and then doing a bot run to replace the bespoke codes with Wiktionary language codes. In the above table I list my suggestions. Note that some of the currently listed lang codes don't exist and some of them are badly named or formatted and should be renamed. Benwing2 (talk) 05:52, 10 March 2024 (UTC)Reply

lmao are we finally getting around to extirpating these?
Personally I believe that giving every dialect branch and family (and location?) will be an unwieldy alphabet soup. I propose using full names for clarity and simplicity.
The Classical Chinese codes seem silly; perhaps we should use a bipartite system giving the text language and the pronunciation language, such as lzh/cmn-TW (Literary Chinese in Taiwanese Mandarin pronunciation).
(Can we do {{zh-pron}} next?) —Fish bowl (talk) 06:16, 10 March 2024 (UTC)Reply
Yeah I don't know much about Classical Chinese; I just added codes for them to correspond to the existing variety codes. I have no objection to merging some of the variety codes. The main disadvantage to using full names in etym codes is that they're long to type. Benwing2 (talk) 06:23, 10 March 2024 (UTC)Reply
(Counterpoint to the "alphabet soup" concern: these 2-letter ad-hoc codes have worked so far(?) —Fish bowl (talk) 06:33, 10 March 2024 (UTC))Reply
@Fish bowl We're going to need codes for quite a few of these anyway, when they eventually get added to {{zh-pron}}, so we might as well give them proper codes now. Quite a few of them will need to be made full languages at some point, but giving them etym-only codes is probably fine for now. Theknightwho (talk) 06:42, 10 March 2024 (UTC)Reply
Support the proposal generally. I don't have strong feelings about the particular etymology codes that are proposed here. While we're at it, perhaps what is called Literary Cantonese should be renamed to Hong Kong Written Chinese or the like. "Literary Cantonese" is kind of a confusing label because it's more like a Mandarin-based register with varying degrees of influence from Cantonese and Literary Chinese. — justin(r)leung (t...) | c=› } 06:37, 10 March 2024 (UTC)Reply
Support in general. CL-* and CI should perhaps be lzh-* instead of cmn-*, or what Fish Bowl has said above. The WVC ones could use *-wvc instead of *-wrv - the former would be easier to remember. Danzhou should be zhx-dan, since it is just often grouped under Yue for convenience but not really a Yue lect. Penang Hokkien should be nan-hbl-pen as it's a Hokkien dialect. Other than these no strong feelings, though I would caution against using too much of the syllables that are too frequently found in place names (e.g. zhou as zh in Suzhou wuu-szh and Hangzhou wuu-hzh, these are already defined so it's fine), otherwise we will run out of possible letter combinations very soon. Also concur with Justin's view on "Literary Cantonese". – wpi (talk) 07:00, 10 March 2024 (UTC)Reply
@Wpi @Theknightwho Yeah I think we should try to write up a proposed set of conventions for new etymology language codes. Generally I try to use the first three letters of the lect name unless that creates ambiguity (e.g. I used nea for Northeastern because nor would be ambiguous with Northern), but beyond that some thought is required. Benwing2 (talk) 07:28, 10 March 2024 (UTC)Reply
@Benwing2 @Wpi I’d oppose having 9 letter codes except where it’s unavoidable (e.g. some proto-languages), since they’re awkward and difficult to remember. In cases like Penang Hokkien, just using the family code as a prefix is probably fine. Theknightwho (talk) 14:36, 10 March 2024 (UTC)Reply
@Theknightwho Yes that is totally reasonable. Benwing2 (talk) 19:39, 10 March 2024 (UTC)Reply
Support
Shadi could be wuu-chm (ie. Chongming) to avoid overlap with Shanghainese if you so wish. wuu-nor looks fine. — 義順 (talk) 09:23, 10 March 2024 (UTC)Reply
Support the general proposal, but I'm wondering if we should have something for colloquial putonghua. There are some colloquial terms that are not specific to the Beijing dialect of Mandarin, and probably some that are more used among non-native Mandarin speakers in southern China. The dog2 (talk) 14:29, 10 March 2024 (UTC)Reply
Comment: I updated some of the suggested codes based on comments and based on my attempts to be more consistent. I am using the following logic for defining codes:
  1. Use the first three letters of the variety/dialect/lect name if possible.
  2. If that causes ambiguity:
    1. If the lect name has three components, use the first letter of each.
    2. If the lect name has two components and one of them begins with a digraph, use the digraph along with the first letter of the other, e.g. Yangzhou could be abbreviated yzh.
    3. Otherwise, if the lect name has two components, use the first two letters of the first component followed by the first letter of the second, e.g. Early Modern could be abbreviated eam and Northeast(ern) could be abbreviated noe.
In response to User:The dog2's comments about colloquial Putonghua, would the current M-UIB variety code suffice? The comment by it is UIB stands for "unidentified Beijingesque"; this is only used for dialects with similar phonology to one of Beijing dialect or MSC.
In response to User:Wpi: I updated the Classical Chinese codes to use lzh-*. What about 'Vietnamese Literary Sinitic' and 'Korean Literary Sinitic'? Are these actual lects that are essentially Vietnamese/Korean-influenced usage of Classical Chinese, or are they merely the use of Chinese terms in Vietnamese and Korean? In the former case we could adopt codes lzh-VI and lzh-KO; in the latter case I'm not sure we need any codes. Benwing2 (talk) 23:32, 10 March 2024 (UTC)Reply
@Theknightwho @Chuck Entz We have errors coming from the undefined codes that currently occur in Module:zh-usex/data. We either need to temporarily change the undefined codes to one of the currently in-use codes, or go ahead and define etymology-only language codes corresponding to the undefined codes (not necessarily using the codes already present; see my suggestions above). The set of undefined codes causing errors is wuu-wz (Wenzhounese), cmn-gli (Guilin), cmn-xin (Xining), cmn-njn (Nanjing), cmn-yaz (Yangzhou), yue-dan (Danzhou). Benwing2 (talk) 23:39, 10 March 2024 (UTC)Reply
@Benwing2 Of those, I'm pretty sure Danzhou and Nanjing should be full languages for sure, but I'm unsure about the others. @wpi, justinrleung? Theknightwho (talk) 23:43, 10 March 2024 (UTC)Reply
@Theknightwho Full languages take longer to gain consensus. IMO if there are no objections we can define them for now as etym-only languages and upgrade them later when the discussion has played out. (E.g. I have heard it said that Wenzhounese itself consists of several mutually incomprehensible dialects, meaning potentially we would need several full languages at some point.) Benwing2 (talk) 23:49, 10 March 2024 (UTC)Reply
@Benwing2 Sure. In the case of Danzhou, it should probably be made a child of Chinese (zh) for now, then. It's traditionally been counted as a Yue lect, but more recently it's been treated as an unclassified divergent lect; however, we treat Yue as a family (zhx-yue), and reserve the code yue for Cantonese. Whether or not Danzhou is part of Yue, it's definitely not part of Cantonese in the sense we're defining it as. Theknightwho (talk) 23:54, 10 March 2024 (UTC)Reply
I don't know about calling it "Beijing-esque". One thing is that you don't really get the erhua in Taiwan, or when people from southern China speak Mandarin. And someone from Beijing will always distinguish between 咱們 and 我們 when speaking standard Mandarin, but you don't see that among people from southern China when they speak Mandarin. Should we just have a generic "Southern Chinese Mandarin then"? The dog2 (talk) 00:09, 11 March 2024 (UTC)Reply
@The dog2 "Southern Chinese Mandarin" seems problematic because Mandarin is a huge area with lots of diversity, and here you don't mean "Mandarin as spoken in the southern part of the Mandarin-speaking area" so much as "Southern Standard Mandarin". Benwing2 (talk) 00:30, 11 March 2024 (UTC)Reply
@Benwing2: What I meant is Mandarin as spoken today in traditionally non-Mandarin-speaking areas like Fujian and Guangdong. The dog2 (talk) 01:40, 11 March 2024 (UTC)Reply
@The dog2 Right, but conceptually your "Southern Chinese Mandarin" is completely different from e.g. Southwestern Mandarin. The latter refers to the lects spoken natively in the southwestern part of the Mandarin-speaking area but the former refers not to the native Mandarin lects in the southern part of the Mandarin-speaking area (which would be something like Jianghuai Mandarin) but to the variety of Standard Mandarin (which is a northern Mandarin variety) as spoken in southern regions that natively don't speak Mandarin at all. Benwing2 (talk) 01:46, 11 March 2024 (UTC)Reply
@Benwing2: It's a little more complicated than that these days. In some places like Nanning and Fuzhou, most of the younger generations can't speak the local dialect anymore, and now speak standard Mandarin as their native language. The dog2 (talk) 04:27, 11 March 2024 (UTC)Reply
@The dog2 OK sure (what you are describing is unfortunately happening everywhere), but are you objecting to the term "Southern Standard Mandarin"? ("Southern Chinese Mandarin" seems to me both problematic for the reasons I have outlined, and redundant in that "Mandarin" is a variety of "Chinese".) Benwing2 (talk) 04:34, 11 March 2024 (UTC)Reply
@Benwing2: Maybe let's ask Justinrleung what he thinks is a good name, because I can't really think of one. And it really depends on which part of China. In the Teochew-speaking areas, the dialect is still going very strong. And people from Fuzhou often lament that people from southern Fujian have preserved the dialects much better than in Fuzhou. The dog2 (talk) 05:05, 11 March 2024 (UTC)Reply
@Benwing2, The dog2: I don't really know what the purpose of this "Southern Standard Mandarin" for the purposes of zh-x or elsewhere on Wiktionary. Terms that are chiefly used in the south but still considered standard aren't usually marked as southern, and there isn't really a cohesive variety, just possibly some shared tendencies. — justin(r)leung (t...) | c=› } 05:17, 11 March 2024 (UTC)Reply
The "Beijingesque" tag is chiefly used by @Dokurrat. Dokurrat, can you explain this tag a bit? —Fish bowl (talk) 22:32, 11 March 2024 (UTC)Reply
@Fish bowl: Sometimes a word or expression exist in both Beijing dialect and my native lexicon and I would construct an example sentence for it. In such case, I feel weird to label my example sentence as "Beijing dialect", as I'm not a Beijing dialect speaker. And hence I created this "UIB" tag thing. Now that I review this thing, I think I could've just used a tag that says "Mandarin" instead. I have no issue retiring the "UIB" tag or renaming it or doing whatever y'all see fit with it. Dokurrat (talk) 08:20, 13 March 2024 (UTC)Reply
@Benwing2, Theknightwho: I'm not sure about Nanjing being a full language but not other varieties of Mandarin (Yangzhou is also Jianghuai, for example, so why would it be differentially treated?) Danzhou can be a full language since its status is disputed. BTW, I'm not exactly sure about "Cantonese" being not the same as "Yue" in current practice on Wiktionary. It just seems like that because all Cantonese entries in jyutping are based on Standard Cantonese (because Jyutping is inherently created for Standard Cantonese) and translations are 99.9% in Standard Cantonese because of our editors' knowledge; however, in "zh-dial" and "zh-pron", "Cantonese" means "Yue". I'm not exactly down with the idea that yue = Cantonese and zhx-yue = Yue, since it's kind of different from how we're treating nan, for example. — justin(r)leung (t...) | c=› } 01:00, 11 March 2024 (UTC)Reply
@Justinrleung I think it would be sensible to have a major thread each for Mandarin and Yue to hash out how we handle them, in a similar fashion to what we've been doing for (Southern) Min, since it would help to iron out these kind of issues, as I think the current piecemeal approach leads to a lot of confusion. In particular, there's the issue you point out as to what we should be using the yue and cmn codes for. Theknightwho (talk) 01:06, 11 March 2024 (UTC)Reply
@Theknightwho: Yes, I agree. — justin(r)leung (t...) | c=› } 01:12, 11 March 2024 (UTC)Reply

──────────────────────────────────────────────────────────────────────────────────────────────────── I added etym codes for the lects in Module:zh-usex/data that were causing errors, using my proposed codes above. They are marked as temporary, pending further discussion. Benwing2 (talk) 22:38, 11 March 2024 (UTC)Reply

Done Done. I omitted the things listed as "DO WE NEED THIS?" above and also omitted "Literary Chinese" because I have no idea how i differs from just lzh, which is also "Literary Chinese". Please note, there are lots more lects mentioned in Module:labels/data/lang/zh and in qualifiers in Chinese thesaurus entries; I will post separately about these. Benwing2 (talk) 23:55, 17 March 2024 (UTC)Reply

Additional Southern Min languages

[edit]

(Notifying Atitarev, Tooironic, Fish bowl, Justinrleung, Mar vin kaiser, RcAlex36, The dog2, Frigoris, 沈澄心, 恨国党非蠢即坏, Michael Ly, Wpi, ND381, Benwing2): Following the various discussions relating to Min in the last month or so, now seems a good time to propose the additional Southern Min varieties which we've been missing:

  1. Zhenan Min (nan-zhn)
  2. Datian Min (nan-dtn)
  3. Longyan Min (nan-lnx) - sometimes grouped as part of Hokkien
  4. Sanxiang Min (nan-zsh) - one of the Zhongshan Min lects; the other two are apparently Eastern Min
  5. Swatow Min (nan-swt) - also known as Shantou
  6. Hoklo Min (nan-hlh) - also known as Hailufeng or Haklau Min; currently etym-only but should be made a full language
  7. Proto-Southern Min (nan-pro) - see Appendix:Proto-Southern Min reconstructions

Although we will want codes for all of these, it might not be desirable to count all of them as separate languages. I also suspect the list is far from complete. Theknightwho (talk) 19:32, 10 March 2024 (UTC)Reply

Support although (a) are we stuck with the above codes (i.e. they are proposed ISO 639 standard codes)? If not some of them could stand to be rationalized; (b) we should clarify earlier rather than later whether these should be full or etym codes (although for Chinese I suppose it makes less difference than elsewhere as the L2 header used is always "Chinese"). Benwing2 (talk) 19:37, 10 March 2024 (UTC)Reply
Swatow Min is classified under Teochew, so we do not need additional codes for it. The term "Hoklo" is a bit ambiguous because Hokkien speakers will consider "Hoklo" to refer to Hokkien. The dog2 (talk) 19:44, 10 March 2024 (UTC)Reply
@The dog2 The difficulty with "Teochew" as a name is that it refers to two different things: (1) what Wikipedia calls Chaoshan Min as a whole, and (2) the specific lect as spoken in Chaozhou, which it calls the Teochew dialect. We will still need a code for it either way, but the question is whether it should be an etymology-only code or a full language code. Theknightwho (talk) 19:54, 10 March 2024 (UTC)Reply
The first definition of "Teochew" already has a code for it. It is "zhx-teo". But I'd be open to changing it to be in line with that of the other Southern Min dialect. In Southeast Asia, the term "Teochew" in common parlance is generally understood to refer to the first definition. The dog2 (talk) 20:00, 10 March 2024 (UTC)Reply
@The dog2 Yeah, that makes sense. Just as a side point, the Teochew code was changed to nan-tws with the split of Min Nan, because it makes sense to give all the Southern Min codes the nan prefix, and the pending ISO code is tws. Theknightwho (talk) 20:21, 10 March 2024 (UTC)Reply
@Theknightwho: Thanks for starting this discussion. There are few issues here.
  1. Zhenan Min might be a confusing name because Southern Zhejiang has both Southern Min and Eastern Min varieties; we may want to look into what other names we can use.
  2. Datian Min might need to split further into Qianlu and Houlu dialects.
  3. Does Longyan Min cover all Southern Min varieties spoken in the prefecture city of Longyan? Otherwise, there are several (sub)varieties of Longyan Min.
  4. Swatow/Shantou should probably not be separate from Teochew - it's rare to consider them different varieties.
  5. I personally prefer Hailufeng over Hoklo for the varieties of Southern Min spoken in Haifeng/Lufeng, since Hokkien may also be called Hoklo.
— justin(r)leung (t...) | c=› } 20:11, 10 March 2024 (UTC)Reply
@Theknightwho
1. “Zhenan Southern Min” lies within Hokkien, both sociolinguistically & in terms of intelligibility. It’s pretty much an overseas cluster of Hokkien (and not only b/c it arrived by sea), and should be discussed in that context.
2. Yes, but “Datian Min” is not one language. Which “Datian Mins” belong within “Southern Min” (in any meaningful sense) is a question yet to be thoroughly considered.
3. Yes. “Longyan Min” is sociolinguistically not-Hokkien as well as mutually unintelligible vs Hokkien.
4. Yes. (Not sure if the other two are “Eastern Min”, but that’s a whole other ballgame.)
5. Swatow “Min” is part of Teochew, as others have pointed out.
6. Yes, most definitely. BTW, “Hoklo” refers to the language cluster that includes this language, Hokkien, Teochew, Taiwanese, & maybe others. So “Hoklo” & “Haklau” would be cognate non-synonyms, kind of like “Thai” & “Tai”, but not as striking.
7. Maybe the supposed proto-language should be fleshed out first? (+ I apologise if this is obvious, but Kwok’s “reconstructions” seem to be something quite different from what we usually mean by reconstruction. Also note (as with the ONESELF line) how much data it just flat-out ignores or omits (in this case perhaps in order to hang on to the presumed characters-of-etymology 家 & 己). (talk) 13:45, 11 March 2024 (UTC)Reply

Beserman

[edit]

(Notifying Thadh, Tropylium, Surjection): Recently I’ve been adding Beserman Udmurt entries (Category:Beserman Udmurt), and contrary to my expectations, Beserman seems less similar to Udmurt than I initially expected (at least in terms of vocabulary and phonology). Beserman is usually considered to be a 'special' dialect of Udmurt, and since recently it also has it's own written standard. As far as I can see it definitely seems more convenient to create separate Beserman entries. I'm afraid that, if not, Udmurt might get pretty messy, with for most Udmurt entries a Beserman alternative form. A lot of information on the Beserman dialect can be found on http://beserman.ru/. I'll be glad to hear your opinions on this. Илья А. Латушкин (talk) 19:52, 13 March 2024 (UTC)Reply

At minimum most of the Beserman entries so far should not be listed as synonyms. Most are simply the result of a regular sound change from ы /ɨ/ to ө. Currently it seems this is also transcribed on here as /ʌ/ and translitterated as å, where at least the latter seems weird, most often I have seen the sound described as /ə/ (= Finno-Ugric transcription ə̑, which beserman.ru also seems to use). In any case, these could be easily accommodated similar to differences between e.g. English dialects, as alternate pronunciations + spellings (besides, this is not unique to Beserman but is paralleled by other dialects). A few other phenomena also come down to simple systematic pronunciation differences, e.g. the replacement of ӧ by /e/. It is unclear to me (and per current literature, it seems, also to Uralistics at large) how much else really differs between Beserman and even standard Udmurt. --Tropylium (talk) 20:07, 13 March 2024 (UTC)Reply
@Tropylium: The usage of synonym of stems from my usage of that format in Komi Izhma entries, e.g. асывыы (asyvyy). It's probably indeed a good idea to mark them as altforms, but the issue I have is mostly that Komi Izhma is actually semi-standardised alongside standard Komi, and the same issue is also present in Beserman.
On the differences between it and standard Udmurt, I honestly can't say a lot as I haven't worked too much with the language. It does feature some unique sound changes from the Proto-Permic language that set it apart from the other Udmurt dialects, like being the only Permic lect to (consistently) differentiate between the reflexes of *u and . It also seems to have a national identity separate from other Udmurts. But other than that I would have to refer to Ilya, as they've worked with the language more closely. Thadh (talk) 20:47, 13 March 2024 (UTC)Reply
Sorry, whose *ü and where? Beserman has a few unique-looking cases of /ə/ (< ? *ɨ), but only in words where southeastern Udmurt more generally also shows /ʉ/ (the generally accepted historical scenario is that Beserman arises from the SE dialects of Udmurt, after a migration towards the north leaves them slightly isolated). --Tropylium (talk) 21:03, 13 March 2024 (UTC)Reply
Lytkin's. I'm talking of words like мөнөнө (månånå, to go) and зөмөнө (zåmånå, to dive). And I do take issue with your identification of the vowel as being a schwa, it most definitely isn't one. If you listen to actual recordings I think you'll agree that it is a low vowel, sometimes even as open as [a]. Thadh (talk) 21:30, 13 March 2024 (UTC)Reply
/ə/ is not my identification but what reference literature insists calling it, e.g. the late Keľmakov's monographs on Udmurt dialectology like Udmurtin murteet (1994), Диалектная и историческая фонетика удмуртского языка (2003). A lot of beserman.ru's recordings do sound more like [ʌ] or [ɐ], I agree. This could be a recent development, also e.g. the loss of ӧ is only post-WW1. --Tropylium (talk) 20:43, 14 March 2024 (UTC)Reply
Overall Permic languages have undergone some shifts in the recent century, also including the delabialisation of ӧ (ö) in practically all varieties of Komi. Since we are primarily a descriptive dictionary of the modern languages (earlier stages are a bonus!) I think we should stick to the modern pronunciation. The transcription of the vowel as å was taken over from Komi-Yazva, which has a very similar vowel written the same way. Thadh (talk) 09:07, 15 March 2024 (UTC)Reply
I know nothing about Udmurt, but I do agree that unless and until Beserman is considered a separate language, its entries should be formatted along the lines of {{alt form|udm|аску|from=Beserman}} rather than as synonyms of primary-dialect forms. —Mahāgaja · talk 21:40, 13 March 2024 (UTC)Reply
@Tropylium I have found some other sound correspondences between Udmurt and Beserman:
1. йырси ~ йөрчө 'hair', кырси ~ көрчө 'son-in-law'
2. кеч ~ кесь 'goat', ӟуч ~ дюсь 'Russian'
3. син ~ синь 'eye', кин ~ кинь 'who', нин ~ нинь 'linden'
4. тэй ~ тей 'louse', дӥсь ~ дись 'clothes', дэрем ~ дерем 'shirt'
5. ӝӧк ~ ӟек 'table', ӝыт ~ ӟөт 'evening', ӝужыт ~ ӟужөт 'high'
6. ньөм ~ ним 'name', йөвор ~ ивор 'news'
7. сылал ~ слал 'salt', плем ~ пилем 'cloud'
Илья А. Латушкин (talk) 18:24, 14 March 2024 (UTC)Reply
FWIW most of this is also within normal phonetic variation for Udmurt dialects, the /Te/ > /Tʲe/ change is the only systematic feature I don't recall seeing reported before (makes sense though, helps for not entirely losing the э/ӧ contrast).
One thing to consider is that even if we created Beserman separately, we'd then still want to note all forms like these in Udmurt entries, just now as etymological cognates rather than pronunciation variants. It might not save substantial work altogether. The etymologist in me at least thinks this would be probably the nicer option though, if you're already creating separate entries anyway. And it would be more consistent also with how we have split Komi-Zyrian and Komi-Permyak, instead of treating them as variants of single "Komi". --Tropylium (talk) 19:43, 14 March 2024 (UTC)Reply
The same thing has come to my mind as well, and at first sight the differences between Komi-Zyrian and Komi-Permyak do not seems to be much larger than those between Udmurt and Beserman.
I've found two more sound correspondences (1. ӟуч ~ дюсь 'Russian', ӟеч ~ десь ‘good’, 2. ньыль ~ ниль ‘four’, выль ~ виль ‘new’) and some Beserman words not found in standard Udmurt (most of them Turkic loanwords), eg. бикем ‘aunt’, биягам ‘husband's older brother’, бийөм ‘mother-in-law’, ўармиська ‘brother-in-law’, писяй ‘cat’ (also found as ‘писэй’ in dial. Udmurt), … Also some other, more sporadic, vowel correspondences have come up: изьыны ~ узьөнө ‘to sleep’, губи ~ гиби ‘mushroom’, чорыг ~ чорог ‘fish’, сюрес ~ сьөрес ‘road’, бугро ~ бөгра ‘felling’, … Илья А. Латушкин (talk) 08:50, 15 March 2024 (UTC)Reply

More etym codes for Chinese varieties, part 1

[edit]

(Notifying Atitarev, Tooironic, Fish bowl, Justinrleung, Mar vin kaiser, RcAlex36, The dog2, Frigoris, 沈澄心, 恨国党非蠢即坏, Michael Ly, Wpi, ND381): @Theknightwho Hopefully this ping isn't too noisy. There are two more sources of Chinese lects here at Wiktionary that I have found that may need etym-only codes: qualifiers in thesaurus entries and labels in Module:labels/data/lang/zh. The following table is derived from thesaurus qualifiers (I computed this as part of converting nan codes and qualifiers to appropriate lect codes):

Qualifier Count Comment Wikidata entry (if any)
ACG 1 Does this mean "Anime, Comics, Gaming"? Not a lect.
Anxi Hokkien 2 Need lect code?
Australia 1 Ambiguous
Buddhism 5 Not a lect
Buddhist temple 8 Not a lect
Chinese landscape garden 1 Not a lect
Christianity 1 Not a lect
Classical Chinese or in compounds 1 Ambiguous
Classical Chinese 59 Ambiguous
Classical 8 Ambiguous
Eastern Min; Southern Min 1 Ambiguous
Fuzhou 1 Ambiguous
Guangdong 1 Ambiguous?
Guiyang 1 Need lect code? Per w:Southwestern Mandarin, a subvariety of the Kun-Gui variety of Southwestern Mandarin Q15911623
Harbin Mandarin 1 Need lect code; a variety of Northeastern Mandarin Q1006919
Harbin 2 (same as above)
Hong Kong 24 Ambiguous
Hong Kong><tr:pot1 1 Ambiguous
Hsinchu & Taichung Hokkien 1 ??? Do we need two lect codes? Wikidata has a "Taichung Accent" (Q10914070) but it is a variety of Mandarin; can't find Hsinchu Hokkien in Wikipedia or Wikidata
Internet slang 9 Not a lect
Internet 2 Not a lect
Japanese calligraphy 1 Not a lect
Jilu Mandarin 1 Need lect code; primary subdivision of Mandarin Q516721
Jinhua Wu 1 Need lect code Q13583347
Korean calligraphy 1 Not a lect
Liuzhou Mandarin 2 Need lect code? Q7224853
Liuzhou 1 (same as above)
Longyan Min 2 Need lect code (but will likely be transitioning to a full language, see #Additional Southern Min languages); per Wikipedia, a variety of Hokkien, but that may be wrong Q6674568
Luoyang Mandarin 1 Need lect code; a variety of Central Plains Mandarin Q3431347
Luoyang 3 (same as above)
Macau 2 a variety of Cantonese? Do we need a lect code?
Mainland China 3 Ambiguous
Mainland 2 Ambiguous
Malaysia 11 Ambiguous
Mandalay Taishanese 1 an overseas variety of Taishanese; Do we need a lect code?
Min 12 Ambiguous
Muping Mandarin 1 Do we need a lect code? This may be a variety of Shangdong Mandarin (Q3285432)
Muping 2 (same as above)
Nanchang Gan 1 Need lect code Q3497239
Northern China 1 Ambiguous
Northern Mandarin 2 Ambiguous
Philippines 1 Ambiguous
Pinghua 1 Ambiguous
Pingxiang Gan 3 Do we need a lect code? A variety of Yiliu Gan Chinese (Q8053438)
Qing Dynasty 1 Not a lect
Sichuanese or Internet slang 1 Sichuanese = zhx-sic; Internet slang = not a lect
Singapore 13 Ambiguous
Son of Heaven 2 What is this? Not a lect.
Southeast Asia; dated or dialectal in Mainland China 1 Ambiguous
Southwestern Mandarin 2 Need lect code Q2609239
TCM 3 Traditional Chinese Medicine? Not a lect.
Taichung & Tainan Hokkien 1 Do we need a lect code or two? See above under "Hsinchu & Taichung Hokkien" for Taichung Hokkien. Tainan Hokkien is mentioned in Wikipedia as being the prestige dialect of Taiwanese Hokkien but can't find it in Wikidata.
Tainan Hokkien 1 (see above)
Taiwan 24 Ambiguous
Taiwanese 2 Ambiguous
Taiyuan 1 Need lect code? Variety of Jin Chinese Q10941068
Taoism 1 Not a lect
Thailand 2 Ambiguous
Urumqi 2 Need lect code? Variety of Lanyin Mandarin Q10878256
Wanrong 1 This is a mountain indigenous township in Taiwan; I don't what lect is being referred to, and whether it's even Chinese Refers to Wanrong County in Shanxi; a variety of Central Plains Mandarin, mentioned in the Great Dictionary of Modern Chinese Dialects; apparently a subvariety of Fenhe Mandarin (Q10379509)
Xi'an Mandarin 1 subvariety of Guanzhong Mandarin (Q3431648); not sure if it needs to be distinguished from Guanzong Q123700130
Xi'an 1 (same as above)
Xinzhou 3 Need lect code? Variety of Jin Chinese, doesn't seem to have Wikidata entry
Yinchuan 1 Need lect code? Variety of Lanyin Mandarin
Yongchun Hokkien 1 Need lect code? Q65118728
Yudu Hakka 1 Need lect code? Q19856416

There are 14 lects among the above qualifiers with Wikidata entries that I could find, and some others apparently without Wikidata entries that might need a code. Benwing2 (talk) 03:12, 18 March 2024 (UTC)Reply

@Benwing2 Thanks for putting this together. On Longyan Min in particular, it's likely going to be separated out as a full language as per #Additional Southern Min languages, despite Wikipedia calling it a variety of Hokkien. Theknightwho (talk) 03:27, 18 March 2024 (UTC)Reply
@Theknightwho Ah, I see that now, thanks. Benwing2 (talk) 03:33, 18 March 2024 (UTC)Reply
@Benwing2: Wanrong refers to Wanrong County in Shanxi; this is a variety of Mandarin (Central Plains IIRC). — justin(r)leung (t...) | c=› } 03:32, 18 March 2024 (UTC)Reply

More etym codes for Chinese varieties, part 2

[edit]

@Theknightwho, Justinrleung Only pinging the people who responded to part 1 above. Here are the uncoded Chinese varieties with labels in Module:labels/data/lang/zh. As above, some have Wikidata items and some are too unspecific or ambiguous to turn into etym-only lects. Some are also clearly full languages or even families.

Canonical label Label aliases Comment Wikidata item (if any)
dialectal Cantonese Not specific enough
Changzhounese Changzhou dialect, Changzhou Wu subvariety of Northern (Taihu) Wu Q1021819
Chuzhou Wu Chuzhou dialect, Lishuinese, Lishui dialect, Fujian Wu, Lishui Wu a variety of Chu-Qu Wu, a Southern Wu language; confusable with Quzhou Wu; not in Wikidata?
Coastal Min coastal Min Not specific enough
Datian Min likely becoming a full language Q19855572
dialectal Eastern Min dialectal Min Dong Not specific enough
Gansu Dungan basis of the Soviet written standard for Dungan; not in Wikidata?
dialectal Gan Not specific enough
Guangxi Mandarin This is possibly the same as Guiliu (Gui-Liu) Mandarin (supervariety of Guilin Mandarin) Q11111664
dialectal Guangxi Mandarin Not specific enough
dialectal Hakka Not specific enough
Hong Kong Hakka Mentioned in the Wikipedia w:Hakka Chinese article Q2675834
Huzhounese Huzhou dialect, Huzhou Wu subvariety of Northern (Taihu) Wu Q15901269
Inland Min inland Min Not specific enough
Jianghuai Mandarin Jiang-Huai Mandarin, Lower Yangtze Mandarin, Huai primary branch of Mandarin Q2128953
Jiaoliao Mandarin Jiao-Liao Mandarin primary branch of Mandarin Q2597550
Jilu Mandarin Ji-Lu Mandarin primary branch of Mandarin? Q516721
dialectal Jin Not specific enough
Korean Classical Chinese Not quite sure what this is and how to classify it; one of the Module:zh-usex/data lects that was skipped
Linshao Wu Linshao, Linshao dialect, Lin-Shao Wu, Lin-Shao dialect, Lin-Shao subvariety of Northern (Taihu) Wu; not in Wikidata?
Liuzhou Mandarin a variety of Southwestern Mandarin Q7224853
dialectal Mandarin Not specific enough
Min Not specific enough
Nanning Pinghua a variety of Southern Pinghua Chinese; not in Wikidata?
North America North American Not specific enough
Pinghua A family, not a language
Shaoxing Wu Shaoxingnese, Shaoxingese, Shaoxing dialect variety of Linshao Wu, in turn a variety of Northern (Taihu) Wu Q7489194
Shehua its own branch of Chinese Q24841605
Shuangfeng dialect of Old Xiang Q10911980
Siyi a Yue language? Includes Taishanese Q2391679
Southern Min Min Nan Not specific enough
dialectal Southern Min dialectal Min Nan Not specific enough
Southern Wu appears to be a Wu subfamily, including at least three languages
Standard Written Chinese SWC Per User:justinrleung, this refers to Standard Mandarin = Putonghua, different from Written vernacular Chinese which refers to the standard written vernacular varieties of the Qing and Ming dynasties, as opposed to Classical/Literary Chinese (NOTE: Wikipedia's Standard Written Chinese confusingly redirects to Written vernacular Chinese, and Wikipedia's article on that covers time periods from the Ming dynasty to the present, not just through the end of the 19th century) Q727694
Sujiahu Su-Jia-Hu Wu, Sujiahu Wu, Su-Jia-Hu a subvariety of Northern (Taihu) Wu
Vietnamese Classical Chinese Not quite sure what this is and how to classify it; one of the Module:zh-usex/data lects that was skipped
dialectal Wu Not specific enough
Wuzhou Wu Jinhua dialect, Jinhuanese, Wuzhou, Wuzhou dialect, Jinhua Wu one of the Southern Wu languages Q2779891
dialectal Xiang Not specific enough
Xinjiang subvariety of Lanyin Mandarin? Includes Urumqi Mandarin (Q10878256)
Xinqu Wu Quzhounese, Quzhou dialect, Shangraonese, Shangrao dialect, Xinzhou dialect, Xinzhou Wu, Quzhou Wu, Shangrao Wu a variety of Chu-Qu Wu, a Southern Wu language Q6112429

Benwing2 (talk) 04:32, 18 March 2024 (UTC)Reply

@Benwing2: Huzhounese is Q15901269. Guangxi Mandarin should be approximately the same as Guiliu Mandarin, which is Q11111664. Hong Kong Hakka is Q2675834. Standard Written Chinese is usually referring to the modern standard, whereas Written Vernacular Chinese seems to refer to written vernacular Mandarin in the Yuan, Ming and Qing dynasties.
BTW, Xinzhou dialect as an alias for Xinqu Wu is problematic, since Xinzhou is ambiguous. Xinzhou Jin is a completely different variety from a different Xinzhou. — justin(r)leung (t...) | c=› } 06:19, 18 March 2024 (UTC)Reply
@Justinrleung Thank you for finding those entries! I think we should remove all aliases that read 'Foo dialect' and consider only allowing aliases that include the language name in them. It is unfortunate that Wikipedia puts the primary entries for various Chinese lects under 'Foo dialect' instead of 'Foo Wu', 'Foo Jin', etc. for precisely the reason you mention. Even in the case of the same location mentioned, it's quite possible for a given location to have multiple dialects of different languages. Benwing2 (talk) 07:02, 18 March 2024 (UTC)Reply
@Benwing2: Thanks for tabulating these.
re: removing aliases that read 'Foo dialect', there are some dialects whose affiliation is not extremely clear, e.g. Huizhou dialect (not to be confused with Huizhou Chinese which is czh) and so we labelled it as "Huicheng dialect" ("Huizhou dialect" would also work but that will certainly be confused with czh).
Often the labels are used to achieve the text rather than categories, which is why there is a relatively large amount of |_| in {{lb|zh}}. One slighly extreme example would be 鐳#Etymology 2 sense 3, {{lb|zh|Malaysia|&|Singapore|_|Cantonese|Hakka|Southern Min|;|Xiamen|Quanzhou|Zhangzhou|_|Hokkien|;|slang|_|in|_|Hong Kong Cantonese}}, which is actually representing a large number of lects but it's not categorised properly due to the limits of {{lb}}. This is why sometimes you will find labels like {{lb|zh|Taiwan Hokkien and Hakka}} so that the desired result is achieved, even though it should actually be {{lb|zh|Taiwanese Hokkien|Taiwanese Hakka}}.
I would suggest to search for additional items in the form of {{lb|zh|Foo|_|Cantonese}} or {{lb|zh|Bar|_Wu}} which should unveil more unencoded dialects, some of which may already be covered in the previous section (e.g. something as mundane as {{lb|zh|Xiamen Hokkien}} isn't a recognized label so often it is inputted as {{lb|zh|Xiamen|_|Hokkien}}). (this is also why there is a relative abundance of Wu dialects in the labels data, probably the result of some dedicated user who added them)
I'll go over the actual individual lects later. – wpi (talk) 12:55, 18 March 2024 (UTC)Reply
Personally I prefer to assign full language codes to a group, while the representative dialect(s) spoken in a specific place will have an etym-only code.
  • Austrailia, Malaysia, Singapore, Thailand etc.: these may need a code for each lect (as appropriate), e.g. Malaysian Cantonese, Thailand Teochew (Malaysia may need to be further subdivided by location, we already have Penang Hokkien) [see also my previous comment]
  • Guangdong: usually means Cantonese+Teochew+may be Taishanese+maybe Leizhou+maybe Hainan, this should be replaced accordingly
  • Hong Kong, Macau: usually refers to the standard form of Chinese (not necessarily Cantonese, but often somewhat influenced by Cantonese) spoken in HK/Macau respectively [zh-HK and zh-MO?]
  • Taiwan: similar to above [zh-TW?]
  • Hsinchu & Taichung Hokkien: there may be some need to create code for the Taiwanese Hokkien dialects, but I'll defer to others for this (but IIRC Hsinchu is predominantly Hakka speaking?)
  • Mandalay Taishanese: might need a code but probably won't be used much
  • Shehua: a branch parallel to Neo-Hakka (which we call Hakka/which is the only part of "Hakka" that we have coverage of), "She" is likely the more common academic term (but this clashes with She the Hmong-Mien language, both names share the same etymology). [zhx-she?]
    • (the ancestor Neo-Hakka and She is parallel to Paleo-Hakka, but this is another rabbit hole, plus coverage of it is relatively poor)
  • Anxi Hokkien, Yongchun Hokkien, Muping Mandarin, Wanrong: seems relatively minor to be assigned a code? I'm not certain however.
Some comments (partly based on my observation of the usage in {{lb|zh}} and also based on our[my] plans to increase coverage of dialects), grouped by branch:
  • Gan: label-wise we usually have Nanchang [gan-nan?], Lichuan [gan-lic?], Pingxiang [gan-pin?], Taining [gan-tai?], Yongxiu [gan-yon?]. These are all locations rather than subgroups (my understanding is that the subgrouping of Gan is quite undeveloped). It's worth noting that our Gan coverage is extremely lacking (due to both lack of data and lack of motivated editors), and most likely we will only have these four locations in the foreseeable future.
  • Hakka: Sixian may need to be divided into North Sixian/South Sixian. We might also want to add the rest of the Taiwanese Hakka dialects. Coverage of Yudu Hakka [hak-yud?] and Hong Kong Hakka [hak-HK?] seems OK.
  • Huizhou: this group is too small to have any meaningful subdivision, I think at most we can assign a code to Jixi [czo-jix?].
  • Jin: I think we could have Taiyuan [cjy-tai?] and Xinzhou [cjy-xin?]. The other dialects have poorer coverage. (I didn't find any usage of Xinzhou Wu)
  • Wu: besides the mentioned ones, we may also need Danyang Wu? I'll defer to ND381 and Musetta6729.
  • Eastern Min: representative dialect is Fuzhou [cdo-fuz?], other possible inclusion would be Fuqing [cdo-fuq?] and maybe Ningde [cdo-nin?]. The rest seems too sporadic.
  • Xiang: Changsha [hsn-cha?], Shuangfeng [hsn-shu?], Loudi [hsn-lou], Hengyang [hsn-hya] are major dialects. The coverage situation is similar to Gan.
  • Mandarin: the ones mentioned should be added generally.
  • Pinghua: Southern Pinghua [csp] is usually considered to be part of Yue. Worth noting Nanning Pinghua and Nanning Cantonese are different though.
  • Cantonese/Yue: I think we should add Siyi Yue [yue-siy?/zhx-siy?] and demote Taishanese [zhx-tai] to a variety of it. The usage of [yue] to refer to Cantonese or Yue is pending discussion. Other ones that could be added include Yangjiang [yue-yan?/zhx-yan?] and Dongguan [yue-don?], while the rest seems to have relatively poor coverage.
  • Southern Min is already dealt with elsewhere
  • Puxian Min: I believe this can have Putian [cpx-put?] and Xianyou [cpx-xia?]?
wpi (talk) 16:37, 18 March 2024 (UTC)Reply
@Wpi Thank you for all the details! I just realized there is a third source of varieties here at Wiktionary, which is the dialectal data found in the data modules for {{zh-dial}}, specifically Module:zh/data/dial. For example, under 討食 / 讨食 you have a whole set of "dialectal synonyms of 要飯 / 要饭 (yàofàn, to beg for food)" in addition to the Thesaurus entries for 乞討 / 乞讨 (qǐtǎo) fetched using {{syn-saurus}}. Ultimately IMO we should probably merge the dialectal data in the {{zh-dial}} modules with the Thesaurus entries, but that is another can of worms. For now I'll just note that the {{zh-dial}} data conveniently comes with links to English or Chinese Wikipedia entries so it should be easy to find the relevant Wikidata items. *HOWEVER*, there are an absolute ton of varieties listed; I count 1,122 of them currently. (Of these, 969 have Wikipedia links, but many of these links are to geographic entries rather than dialectal entries.) I doubt all of these varieties need to be assigned etym-only codes. I think one way to pare them down is to go through the dialectal data and count how many synonyms there are for each variety. This should reveal which varieties are important enough to warrant codes (I imagine a lot of the varieties listed have no synonyms at all in the data). Benwing2 (talk) 22:32, 18 March 2024 (UTC)Reply
Please see User:Benwing2/zh-dialect-counts. This table lists all the varieties/dialects found among the dialectal synonym data along with counts, the Chinese dialect group they're in and the Wikipedia link, if any. (There 2,787 terms currently listed in the data.) I'm thinking we can start with the first 100 or 200 varieties listed, figure out what to do with them, and go from there. Also, the script I wrote to combine the counts with the variety data in Module:zh/data/dial output the following warnings concerning varieties for which there are synonyms but which aren't in Module:zh/data/dial:
WARNING: Found variety 'Luoyang' not in variety data
WARNING: Found variety 'Zhumadian' not in variety data
WARNING: Found variety 'Pingdingshan' not in variety data
WARNING: Found variety 'Zhoukou' not in variety data
WARNING: Found variety 'Xuchang' not in variety data
WARNING: Found variety 'Nanyang' not in variety data
WARNING: Found variety 'Luohe' not in variety data
Benwing2 (talk) 23:24, 18 March 2024 (UTC)Reply
@Wpi In response to some of your comments:
  1. As for 'Foo dialect' issues, I think in cases like 'Huicheng dialect' where the affiliation isn't clear, we should just identify them as 'Huicheng Chinese'. It's true that we usually do that for top-level groups but I think it's better in this case than using "dialect".
  2. I will search for labels specified using _ and such. Hopefully the usage isn't too inconsistent.
  3. Concerning your statement "I prefer to assign full language codes to a group, while the representative dialect(s) spoken in a specific place will have an etym-only code", what is the alternative you are responding to? Is it further full-language splits (e.g. with Southern Min)?
  4. For zh-HK, zh-MO, you say "standard language". If this is Cantonese, maybe we should use yue-HK, yue-MO?
  5. For the specific lect comments, I don't know enough to respond but it all looks reasonable. User:Theknightwho, what do you think of the proposal to demote Taishanese to a variety of Siyi Yue?
Benwing2 (talk) 05:25, 19 March 2024 (UTC)Reply
In re point #2, see User:Benwing2/zh-label-sets. Benwing2 (talk) 06:41, 19 March 2024 (UTC)Reply
OK, only a few uses of labels involving 'Foo dialect', and only one involving a label actually listed in Module:labels/data/lang/zh, which was 𠀫𠀪 (which, BTW, is being RFV'd) using 'Hangzhou dialect':
  28 Huicheng dialect
   4 eye dialect
   3 ancient Chu dialect
   1 title=zh:Grammaire du dialect
   1 southern dialect
   1 some Mandarin with a Southern Chinese dialect
   1 of one's speech of the local dialect
   1 ancient Qi or Wu dialect
   1 ancient Qi dialect
   1 [[w:Luoyang dialect
   1 Sòng-Lǔ dialect
   1 Sichuan dialect
   1 Shaanxi dialect
   1 Northeastern dialect
   1 Ningyuan dialect
   1 Hangzhou dialect
I changed that one usage to 'Hangzhounese' and deleted all the 'Foo dialect' labels. We might want to add something for the 'Huicheng dialect' labels (cf. your mention above of this). Benwing2 (talk) 08:10, 19 March 2024 (UTC)Reply
@Benwing2:
re #3, I'm referring to when we are assigning the codes, i.e. groups like Siyi will have a full code whereas local dialect points like Taishanese will have etym-only codes.
re #4, it's basically Standard Written Chinese as used in Hong Kong/Macau. It should be "written/used" not "spoken" as I previously mentioned. There's a difference between yue-HK (Hong Kong Cantonese) and zh-HK (Hong Kong), it's a bit like Norweigian Nynorsk vs Norweigian Bokmal.
Also pinging @Justinrleung for comments to specific lects.– wpi (talk) 11:31, 19 March 2024 (UTC)Reply
@Wpi OK thanks. As for #3, I agree with your idea of the separation between full and etym-only languages going along group lines. As for #4, didn't realize there is this difference but it makes sense. Benwing2 (talk) 15:04, 19 March 2024 (UTC)Reply
Thoughts on Wu codes (locality codes are just suggestions):
  • Northern Wu subbranches imo don't really need codes but individual localities would be beneficial. Of which:
Changzhounese wuu-chz
Danyangese wuu-dan
Shaoxingese wuu-shx
are in need of codes (due to relative abundance of data, and will also be gaining zh-pron support soon). Some others to consider may include
Cixinese wuu-cix
Huzhounese wuu-huz
and all the other lects currently in Module:wuu-pron/sandbox. We are currently still working on it so it may be worth delaying the addition of these lect codes until we finish the Northern Wu overhaul.
  • Currently extant Northern Wu localities (Hangzhounese, Ningbonese, Shadi Wu, Shanghainese, Suzhounese) should all be listed under Northern Wu (wuu-nor) in the family tree on (and any other system that may handle language families).
  • Southern Wu wise, I believe these would be helpful to have in the future, as we will be adding pages/making modules for them as soon as possible:
Jinhuanese / Wuzhou Wu wuu-jih
Taizhounese / Taizhou Wu wuu-tai
Lishunese / Chuzhou Wu wuu-lis
Shangraonese / Xinzhou Wu wuu-shr
in descending order of importance. I decided to split "Chuqu Wu" as is described on the chart as there is no clear consensus as to how the non-coastal non-Northern Wu bits should be split, but in general these three areas (Wuzhou, Chuzhou, Xinzhou) can be seen reflected in some way.
  • A Southern Wu code (wuu-sou) should not be made. It is likely not a familial grouping but rather just a term to use to contrast it with Northern Wu. There have been some preliminary studies that investigate whether it does form a coherent family, but results are mixed and sample sizes are small.
Regarding why there are so many Northern Wu localities, yes, muset & I added them, as unlike Hokkien for instance, the sociolinguistic attitude towards these lects is first and foremost the locality rather than the family (which contrasts with the "Hokkien" identity).
@Musetta6729 - only other active Wu editor: let us know if you have any other/conflicting ideas — nd381 (talk) 19:38, 19 March 2024 (UTC)Reply
@ND381 Thank you! I will probably take all your suggestions. Benwing2 (talk) 20:26, 19 March 2024 (UTC)Reply
Just only got the chance to look at this thread now - in terms of Wu I definitely agree with everything that ND has said so far, just two things I would like to mention:
First: Having Urban Shanghainese as a variety (maybe under something like wuu-ush) along with simply "Shanghainese" (wuu-sha) might be useful. This is due to a variety of reasons, but mainly that Contemporary "Urban" Shanghainese has showcased more convergent evolution with say, Ningbonese or Suzhounese during the last century, and has become more sociolinguistically and identity-wise distinct from many Non-Urban varieties surrounding it. With only the label "Shanghainese" now it is tricky to disambiguate between categories such as:
  • Primarily urban inventions not used in non-urban varieties, or that have spread out to non-urban regions as still recognisably "urbanite" speech
  • Common invention/retention in Non-Urban Shanghai varieties that are rare/obsolete/not used in Urban Shanghainese
  • Inventions in Non-Urban Shanghainese that is not geographically restricted to one specific region of Shanghai
  • Usage attested in both 1850s City-Center Shanghainese and contemporary Non-Urban, but not Contemporary Urban Shanghainese
Especially because all of this variance is also deeply interconnected with notions of locality, of new and old, of class, ethnicity and other sociolinguistic variables when looked at from an Urban Shanghainese standpoint. All of this has led to the use of ad hoc labels along with the Shanghainese tag like "old-period", "chiefly non-urban/suburban", "rare or obsolete" etc which is definitely not ideal. By having Urban Shanghainese as a variety I expect that this would be easier to manage - and as we go on to add more coverage on Non-Urban Shanghainese varieties we should hopefully be able to have more specific variety codes for lots of the Non-urban Shanghainese varieties too.
The second thing is a bit more minor - Suhujia (蘇滬嘉 - see linked Chinese Wikipedia article) might be a more commonly used term than Sujiahu (蘇嘉滬), which we seem to have now. The grouping seems to be somewhat areal and vaguely defined to me and I am doubtful of the extent to which having it might be useful, but nevertheless it's a fairly widely accepted grouping so thought I would bring this up in case we end up making the decision to add it. Musetta6729 (talk) 04:38, 24 March 2024 (UTC)Reply

Redid Chinese labels

[edit]

(Notifying Atitarev, Tooironic, Fish bowl, Justinrleung, Mar vin kaiser, RcAlex36, The dog2, Frigoris, 沈澄心, 恨国党非蠢即坏, Michael Ly, Wpi, ND381): @Theknightwho I redid the label structure in Module:labels/data/lang/zh. I added missing labels corresponding to the new lects in Module:etymology languages/data, canonicalized the labels to include the group name (e.g. Xiamen Hokkien instead of just Xiamen), and added shorter aliases. Duplication is avoided in something like {{lb|zh|Xiamen Hokkien|Quanzhou Hokkien|and|Zhangzhou Hokkien}} (or equivalently, {{lb|zh|Xiamen|Quanzhou|and|Zhangzhou}}) by a new Chinese-specific label postprocessing function in Module:labels/data/lang/zh/functions, which attempts to remove duplicate group names as well as duplicate occurrences of "Taiwanese" in {{lb|zh|Taiwanese Hokkien|and|Taiwanese Hakka}} or similar. Please let me know if you don't like the output in specific situations and I will tweak the function. Note that I removed the label Taiwanese Hokkien and Hakka and all its aliases, after converting all occurrences to use multiple labels like {{lb|zh|Taiwanese Hokkien|and|Taiwanese Hakka}} or similar. I also changed a few categories to better reflect the lect name, e.g. the label Philippine Hokkien now categorizes into Category:Philippine Hokkien instead of Category:Philippine Chinese. Benwing2 (talk) 00:50, 20 March 2024 (UTC)Reply

@Benwing2: Thanks for setting this up. The function looks like it works well generally, but there are some cases where it might lead to confusion, such as {{lb|zh|Taiwanese Hokkien|Taiwanese Hakka}} showing up as "Taiwanese Hokkien, Hakka", which could mean the unintended "Hakka (in general) and Taiwanese Hokkien". Perhaps one way to prevent this is to only remove duplicate group names when there is an "and" somewhere in the chain? Is that something that could be done? — justin(r)leung (t...) | c=› } 06:56, 20 March 2024 (UTC)Reply
@Justinrleung Yup, I can do that, thanks for the suggestion. Benwing2 (talk) 17:08, 20 March 2024 (UTC)Reply
@Justinrleung This should be done. Let me know if you see anything else needing fixing. Benwing2 (talk) 03:25, 22 March 2024 (UTC)Reply

Ramifying/filling out Yue Chinese

[edit]

(Notifying Atitarev, Fish bowl, Frigoris, Justinrleung, kc_kennylau, Mar vin kaiser, Michael Ly, ND381, RcAlex36, The dog2, Theknightwho, Tooironic, Wpi, 沈澄心, 恨国党非蠢即坏): Apologies once again for the wide ping, as I haven't received any responses to some of my other pings. I added a bunch of labels for Yue Chinese lects, but it is revealing some issues:

  1. We correctly classify Yue as a family, but it contains only two languages (Cantonese language and Taishanese language). Meanwhile per Wikipedia and Glottolog there are something like seven primary branches:
    1. Yuehai Yue, which is more or less Cantonese proper.
    2. Siyi Yue, which includes Taishanese.
    3. Goulou Yue, most notably including Yulin dialect and its sublect Bobai dialect.
    4. Yongxun Yue, with Nanning Yue as the representative dialect.
    5. Gaoyang Yue, most notably including Yangjiang Yue.
    6. Wuhua Yue.
    7. Qinlian Yue, partly intelligible with standard Cantonese.
  2. We are using the code yue for Cantonese proper and zhx-yue for the Yue family, which is inconvenient and contrary to ISO 639-3 usage.

I propose:

  1. Change to using yue for the family and use some more specific code for Cantonese, either yue-can or yue-yue (for Yuehai Yue).
  2. Create L2 languages for each of the above seven groups. We can reuse the "Cantonese language" for Yuehai Yue. This shouldn't entail any real splitting per se as we already have Yue as a family rather than a language.
  3. Demote Taishanese to an etym-only variety of Siyi Yue and assign it a code yue-tai in place of zhx-tai.

Please also note, in the labels I created, the canonical name for each label has "Cantonese" in it for all sublects of Yuehai Yue but "Yue" for Yuehai Yue itself and for all other lects. Almost everything called "Foo Cantonese" (except for variants of standard Cantonese) has an alias "Foo Yue", but not the other way around. For example, the Dongguan dialect is called "Dongguan Cantonese" because it is a variety of Yuehai Yue, and has "Dongguan Yue" as an alias; but the Yulin dialect is called "Yulin Yue" and does NOT have "Yulin Cantonese" as an alias, since it is a variety of Goulou Yue rather than Yuehai Yue. Benwing2 (talk) 22:17, 28 March 2024 (UTC)Reply

Thanks for the ping. Here are some of my questions, to make sure I understand this better:
  1. What would the categories of a normal entry like 不嬲 look like? I'm asking this because "Cantonese" and "Taishanese" are more recognisable than "Yuehai Yue" and "Siyi Yue" and I'm wondering if these more obscure names would end up in the entry. If this works like the other Chinese splits, I suppose the categories would not change, and just the categories of the categories would change?
  2. We have plans (maybe) to include more Yue languages than just Cantonese and Taishanese, which primarily means expanding the scope of the "pronunciation" section of the entries, and this would also generate more categories. Would your proposal benefit this project because we could more easily categorise the new Yue languages to come?
  3. While normal entries written using Chinese characters have the "Chinese" L2 header, romanisations have their respective header per language, such as xiànglái having the Mandarin L2 header and boán-liân having the Hokkien L2 header. We don't seem to do the same for Cantonese, and the pronunciation sections also don't link to the Cantonese romanisations, and I also can't seem to find any Cantonese L2 header. This might have been decided in an earlier policy that I don't know about, so I guess my question is, would it create problems if you demote Taishanese to an etym-only language?
  4. Per your last point I tried to google "Yulin Yue" but the main results are about someone named Yulin Yue, so I tried to google "Yulin Yue" + language and got 235 hits, while "Yulin Cantonese" got me 73 hits (and "Yulin Cantonese" + language got me only 8 hits). This isn't a question per se, just a comment about how little-known other Yue languages are.
  5. I feel like I just have to insert a comment about the choice of Mandarin exonyms vs. Cantonese exonyms vs. endonyms. I think the first option is generally how we do things (except for the names of the main branches), and I suppose this is just the result of the general scholarship, and I'm not really trying to subvert this practice, but I would just like to raise some awareness to this phenomenon.
The above. Apologies if 1999. --kc_kennylau (talk) 23:01, 28 March 2024 (UTC)Reply
@Kc kennylau Thanks much for the detailed questions! In response to your questions, let me see if I can answer:
  1. There are two types of categories: (1) L2 language categories (e.g. Category:Mandarin lemmas); (2) etym-language categories (e.g. Category:Xi'an Mandarin). Under my proposal, we would probably use "Cantonese" in place of "Yuehai Yue" as the L2 language name, since they seem more or less equivalent; but "Siyi Yue" would be the L2 language subsuming Taishanese. This means that a Taishanese term would be categorized both under Category:Siyi Yue lemmas and Category:Taishanese Yue (or maybe just Category:Taishanese; there is some flexibility in the choice of etym-language categories). So essentially, things like Category:Taishanese lemmas would go away in favor of Category:Siyi Yue lemmas + Category:Taishanese Yue, but Category:Cantonese lemmas would remain (possibly with additional more specific categories like Category:Guangzhou Cantonese or Category:Hong Kong Cantonese, both of which already exist).
  2. This proposal is somewhat orthogonal to how we handle the pronunciation section entries; the ones for Cantonese and Taishanese can remain as-is, but might categorize differently (as explained above).
  3. If there were romanizations under a Taishanese header, they would have to be renamed to have Siyi Yue as the header and a label Taishanese attached, to make it clear that the romanizations are specifically Taishanese. (Similarly, entries like boán-liân used to be under a Min Nan header before Hokkien got split out as an L2 language.) But since we don't seem to have any such romanizations, this issue won't arise (at least for now).
  4. As for the obscurity of Yue varieties other than Cantonese and Taishanese, I completely agree. The terminology isn't well-worked out and the term "Cantonese" is particularly problematic since it variously refers specifically to (a) the speech of Guangzhou specifically; (b) the more general Yuehai Yue language that Guangzhou speech is part of [which is what I'm defining it as]; and (c) the entire Yue family. This issue doesn't seem to come up so much for other groups like Mandarin and Wu.
  5. As for Mandarin vs. Cantonese/Yue naming, I am not wedded to using the Mandarin terms; I just chose them because that is what Glottolog and Wikipedia largely use. If the consensus is to use Cantonese-language terms for all lects or to use native terms (endonyms), we can do that as well. I am guessing the Mandarin terms see more usage just out of a sort of default familiarity (pretty much everyone who works with Chinese languages is familiar with Mandarin but many aren't familiar with Cantonese or other varieties, and several Yue varieties don't even have standard romanization schemes). Benwing2 (talk) 23:50, 28 March 2024 (UTC)Reply
I support the move in general (with a strong preference of using yue-can), however here's a couple of problems I can foresee with this proposal:
  1. Goulou actually forms a dialect continuum with Southern Pinghua language, and therefore nowadays [csp] is usually thought of as part of Yue, but weirdly it has a separate language code. Should [csp] be included as well?
  2. Yongxun is a (quite recent) descendant of Cantonese spoken in the major towns and cities in the Pearl River with minor influences from the substrate Goulou varieties. Personally I don't think it should be a separate branch.
  3. As I mentioned before, there are (at least) two distinct varieties of Yue spoken in Nanning, we currently call them Nanning Cantonese (under Yongxun) and Nanning Pinghua (under Goulou-Southern Ping). How can the two be distinguished if it is renamed to "Nanning Yue"?
wpi (talk) 04:19, 29 March 2024 (UTC)Reply
@Wpi Thanks very much for responding. In response to your issues:
  1. I don't know enough about Pinghua to answer, but I note that Wikipedia's Pinghua article asserts that Pinghua has been treated as its own dialect group, separate from Yue, in most textbooks and surveys written since the 1980's. As for dialect continuums, there are many places where different branches form dialect continuums with each other but are still separated. (As an example, Western Bulgarian forms a dialect continuum with Torlakian, which in turn forms a dialect continuum with (other varieties of) Serbo-Croatian. Serbo-Croatian is considered a Western South Slavic language and Bulgarian an Eastern South Slavic language; despite what the Wikipedia article on Torlakian says, it's more often considered part of Serbo-Croatian than Bulgarian.) Maybe User:Justinrleung or User:沈澄心 can comment? There's an additional issue that if we group Southern Pinghua with Yue, what do we do with Northern Pinghua?
  2. Likewise I don't know enough about Yongxun Yue to have a firm opinion; in any case it seems like we won't have any lemmas in it, so whether we make it its own L2 or group it with some other L2 (which one? Cantonese or Goulou?) wouldn't make much difference.
  3. I think this is only an issue if (1) we leave Yongxun as its own group and (2) we put Southern Pinghua under Yue. If Yongxun is e.g. grouped with Cantonese and Pinghua left as-is, the current names are fine. If both dialects get considered non-Cantonese Yue, then one solution is to clarify them as 'Nanning Yongxun Yue' and 'Nanning Pinghua Yue' or something.
Benwing2 (talk) 04:55, 29 March 2024 (UTC)Reply
  • I would prefer to have Southern Pinghua be kept as its own group separate from Yue. It seems that generally speakers of Southern Pinghua would call their varieties Pinghua, distinguished from Baihua (traditionally Yue varieties). The situation in Nanning is a case in point.
  • I don't have a strong opinion on whether Yongxun should be a branch. The Language Atlas of China does mention a few criteria for separating Yongxun out as its own branch, but it seems like those criteria are retentions rather than innovations (from a cursory glance).
— justin(r)leung (t...) | c=› } 18:43, 20 May 2024 (UTC)Reply

────────────────────────────────────────────────────────────────────────────────────────────────────There has been some discussions, and for reference this is our current categorization:

  1. Gwangfu Yue (廣府片) / Yuehai Yue (粵海片): the "main" branch of Yue that contains Cantonese (廣東話), which is the dominant language (besides Mandarin) within the Yue Chinese lects. Our current approach is to group other (more recent) descendents as sub-branches of this branch.
    1. Guan-Bao Yue (莞寶片/莞寶小片): contains Dongguan Cantonese (東莞話) which is genetically close to Cantonese but might be a bit hard to understand for Cantonese speakers because of the differences in phonology. Some classify it as a sister-branch of Gwangfu, but I think we prefer to group it under Gwangfu.
    2. Yong-Xun Yue (邕潯片/邕潯小片): contains Nanning Cantonese (南寧白話). Again this branch is sometimes considered separate from Gwangfu.
    3. Sanyi Yue (三邑小片): the Cantonese spoken in Sanyi (literally "three counties") is highly intelligible with Cantonese, but I want to group them together because they share the innovation that their Tone 4 ("light level") is particularly high.
    4. Xiangshan Yue (香山小片): contains Shiqi Cantonese (石岐話).
  2. Siyi Yue (四邑片): the second most famous branch of Yue that contains Taishanese (台山話). This branch is particularly distinct within Yue, and there should be no debate over the status of this branch.
  3. Gao-Lian Yue / Gao-Lei Yue (高廉片/高雷片): (the Lian 廉 here refers to the River Lian 廉江, which is unrelated to the Lianzhou 廉州 below, which is 145 km apart.) this branch is a merger of the traditional categories Gao-Yang Yue (高陽片) and Wu-Hua Yue (吳化片). The brief reason for this merge is that Gaozhou Cantonese (高州白話, the Gao of Gao-Yang) is also sometimes classified with Wu-Hua Yue, so I think it's better to just merge the two branches. I chose this name because it was also used in earlier classifications for more-or-less the same span. This covers the Yue lects spoken in the Prefectures Yangjiang (陽江), Maoming (茂名), and Zhanjiang (湛江).
  4. Qin-Lian Yue (欽廉片): this category has more-or-less stayed the same across different classifications, but there are also (scholarly) opinions that this is more a regional grouping instead of a proper genetic branch. The following sub-branches have also been proposed in a paper where Qin-Lian is challenged (where I have removed Qinzhou Cantonese (欽州白話) which we consider to be a descendent of Cantonese instead):
    1. Lianzhou Yue (廉州小片)
    2. Lingshan Yue (靈山小片)
    3. Xiaojiang (小江小片)
    4. Liuwanshan (六萬山小片)
  5. Gou-Lou Yue (勾漏片): this category is also quite consistent, with the main distinguishing feature being that voiced stop initials in Middle Chinese tend to become unaspirated. It is also quite distinct among the Yue lects. This lect is primarily spoken in Gwangxi instead of Gwangdong.
    1. Luo-Guang Yue (羅廣小片): this is the Gou-Lou Yue which is spoken in Gwangdong. It might be a misnomer because the Luo stands for the City Luoding (羅定) in the Prefecture Yunfu (雲浮), but there might be no Gou-Lou Yue spoken here.

(Notes for non-Chinese speakers: 片 = branch, 小片 = sub-branch, 話 = dialect.)

There are some remaining problems:

  • Where does the name "Cantonese belong"? Should the sub-branches of Gwangfu Yue also bear the label "Cantonese"?
  • I support using yue for the whole branch and yue-can for "Cantonese" proper.
  • How should we treat sub-branches? Should they have their own codes?
  • Should the names be A-B Yue or AB Yue?

I am also pinging the Chinese editors again for more opinions. (Notifying Atitarev, Benwing2, Fish bowl, Frigoris, Justinrleung, kc_kennylau, Mar vin kaiser, Michael Ly, ND381, RcAlex36, The dog2, Theknightwho, Tooironic, Wpi, 沈澄心, 恨国党非蠢即坏, LittleWhole): --kc_kennylau (talk) 14:24, 23 May 2024 (UTC)Reply

Note that the proposed tree above is solely proposed by Kenny, and certain parts of it lack any sort of substantial discussion.
I strongly disagree with the proposed "Gao-Lian"/"Gao-Lei" group, as it clearly includes at least two groups with vastly distinct phonological features: Wu-Hua (1) has a three way contrast with its voiced/implosive stops and (2) pronounces MC affricates (精 series) as dentals, while the Gao-Lei and Liangyang groups (1) only have a two-way contrast and (2) pronounce MC affricates (精 series) as affricates - among many other differences. Note that the reason why Wu-Hua is sometimes described as Gao-Lei (e.g. in Zhan Bowei's 廣東粵方言概要) is most likely due to the lack of data on Wu-Hua. I should also note that Wu-Hua is sometimes considered to be an incoherent group, but regardless that should not result in placing the entirety of Wu-Hua with Gao-Lei. As to the question of whether Liangyang is distinct or not, it seems to me that the arguments for a separate Liangyang group is stronger, especially because it has a tone system distinct from the surrounding dialects and an inflectional personal pronoun system for 1/2/3pl that is much more similar to Siyi.
Essentially, my view is identical to the divisions in Language Atlas of China (but not the classification of certain lects), with the exception of placing Yong-Xun under Guangfu (since the Yong-Xun "features" are also found in a lot of modern Guangfu lects or historical dictionaries/rime books, and it is well known that Yong-Xun is descended from Guangfu) and splitting out Liangyang from Gao-Yang (Yangjiang data is not mentioned at all in the Atlas!), and perhaps also splitting out Guan-Bao and Xiangshan (according to 廣東粵方言概要), but I am uncertain as to their position within the tree.
Moreover, it would be splitting hairs when we go for the subgroups (小片), as research is often lacking beyond first level groups (even if there is research being done, often there is only one work to reference from).
Some further comments:
  • I think the usage of "Cantonese" among Yue lects should be relatively liberal - the general rule would be to apply it to any Guangfu lect and any dialect described as 白話, e.g. Qinzhou, Gaozhou, Nanning.
  • Agree with the use of yue for the whole branch and yue-can for Standard Cantonese (i.e. what we are currently using yue for).
  • Regarding the use of hyphen, it should be present when the name is a combination of two names. Goulou is named after the mountain of Goulou, so there shouldn't be a hyphen.
wpi (talk) 16:10, 23 May 2024 (UTC)Reply
Thanks, Kenny and Wpi. I generally agree with Wpi's points. Kenny's Gao-Lian/Gao-Lei should be at least two groups: Gao-Yang and Wu-Hua. I don't have a strong opinion on whether Gao-Yang should be split further. As for the structure of the tree, such as whether certain groups belong under certain groups, I feel like we can be agnostic and have them placed under Yue without thinking too much about the internal groupings; this would mean we could have Yong-Xun, Guan-Bao, Xiangshan, etc. as sisters to Guangfu unless we have really strong feelings about the grouping. Luo-Guang seems to be a very erroneous idea that we should not bother adopting at all. — justin(r)leung (t...) | c=› } 17:38, 23 May 2024 (UTC)Reply
Indeed, I should have emphasized that the tree above is not final, and I only posted it here to attract more discussion. Thank you for bringing that up.
I will talk about the Gao-Lian/Gao-Lei group here first and leave the other points to later replies.
  1. The "three-way contrast" is not as simple as it seems. The evolution of Middle Chinese stops in Wu-Hua is not consistent. According to 粤语“吴化片”商榷 (2016) by 邵慧君, Middle Chinese *b- became /pʰ/ in Wuyang, and in Huazhou it was distributed (irregularly) between /p/ and /pʰ/. Using Jyutdict I was able to verify this (see table below). Note how 婆 became /p-/ in Shangjiang and /pʰ-/ in Xiajiang, and 抱 is the other way round. According to the paper, *p- became /ɓ-/ in Wuyang just like in Huazhou, but even so, since *b- became universally /pʰ-/ in Wuyang, that would only be a two-way contrast. Of course, the "number" of labial plosives isn't the important point here, but rather "how" they correspond with Middle Chinese and with each other. The situation becomes even more complicated if we account for the influence of dominant languages in this area, and I believe that *b- > /pʰ-/ in Wuyang is the effect of Hakka.
    In summary, if you take *p- > /ɓ-/ as the defining feature of Wu-Hua, then it fails because it is not universal (even though you might attribute the remaining lects that have /p-/ as Cantonese influence); if you take the evolution of *b- instead, then it also fails because it is inconsistent between the lects.
  2. As for pronouncing 精 as dental, if you look at the map in 醉 in Jyutdict, you will find that indeed the four Wu-Hua languages recorded all have a dental /t-/. However, if you keep going up from there, you will find that the dental initials continue to Yulin (鬱林) of Goulou Yue, and then even to Wuzhou (梧州) of Gwangfu Yue. To the right, though disconnected, you will find that Taishanese and Kaiping (開平) of Siyi Yue also have a dental initial. Indeed, it is possible that the dental initial spread from Wu-Hua to Yulin, just like how the guttural "R" spread all throughout Europe. However, I don't see an argument of why it has to be genetic in Wu-Hua in the first place.
  3. According to the paper, Li Jian (李健) said that "鉴江源出粤西信宜市北部山区,南流经信宜、高州、化州、吴川四市入海。......整个流域粤语不但极为相似,而且南北渐变的痕迹也十分明显。" (paraphrase: the dialects of Xinyi, Gaozhou, Huazhou, and Wuchuan form a continuum). I don't think this observation can be attributed to a "lack of data". While the dialect in Gaozhou seems to me to be highly similar to Cantonese, I did find that interestingly the character 坐 has an /-ɛ/ final in Gaozhou and also in the Wu-Hua lects.
  4. As for the Liangyang group, I have not looked a lot into this, so I will take your side and assume that Liangyang should indeed form a group. However, this does not contradict with my proposed Gao-Lei group, where there can simply be a Liangyang sub-branch. I do wonder though how you view the "inflectional personal pronoun system" as you mentioned that is "much more similar to Siyi". Do you think Liangyang split off from Siyi, or do you think Proto-Cantonese had such a system that was lost in other lects, or do you think this feature arose by contact between Liangyang and Siyi?
Character Middle Chinese initial Tone Category Zhanjiang (湛江) Wuyang (吳陽) Huazhou Shangjiang (化州上江) Huazhou Xiajiang (化州下江)
*p- level (平) /pa/ /pa/ /ɓa/ /ɓa/
*ph- departing (去) /pʰa/ /pʰa/ /pʰa/ /pʰa/
*b- level (平) /pʰei/ /pʰei/ /pɛi/ /pɛi/
*b- level (平) /pʰɔ/ /pʰɔ/ /pɔ/ /pʰɔ/
*b- rising (上) /pʰoɐu/ /pʰoɐu/ /pʰɔu/ /pɔ̯ɒu/
*b- departing (去) /pʰei/ /pʰei/ /ɓɛi/ /pɛi/
*b- entering (入) /pʰaʔ/ /pʰaʔ/ /ɓak/ /pak/
--kc_kennylau (talk) 19:53, 23 May 2024 (UTC)Reply
By the way, we have three Yue lects currently covered by zh-pron (see ), which are Dongguan Cantonese, Yangjiang Yue, and Yulin Yue.(COI: I added them.) Should we have language codes for these three varieties? Something like yue-dgx, yue-yjx, yue-ylx? --kc_kennylau (talk) 14:58, 25 May 2024 (UTC)Reply
(Addendum: we just removed Yulin Yue) --kc_kennylau (talk) 15:00, 25 May 2024 (UTC)Reply
(You mean in addition to the two lects that have been here longer, so actually a total of four Yue lects now.) — justin(r)leung (t...) | c=› } 15:12, 25 May 2024 (UTC)Reply
Just to help me understand the "lay of the land", are there papers that specifically group the dialects traditionally classified as Gao-Yang and Wu-Hua together? If so, what is the name they use for such a grouping? (From the way this was described above, it feels a little original-researchy, which we don't want to do.) — justin(r)leung (t...) | c=› } 15:20, 25 May 2024 (UTC)Reply

──────────────────────────────────────────────────────────────────────────────────────────────────── (cc @Benwing2) After more discussion, @Justinrleung and @wpi have mostly agreed with the following tree (the codes are added by me):

  • Guangfu Yue (廣府片) yue-guf
  • Guan-Bao Yue (莞寶片) yue-gub
  • Xiangshan Yue (香山片) yue-xis
  • Yong-Xun Yue (邕潯片) yue-yox
  • Siyi Yue (四邑片) yue-siy
  • Liangyang Yue (兩陽片) yue-liy
  • Gao-Lei Yue (高雷片) yue-gal (defined as Gao-Yang in the Atlas minus Liangyang)
  • Wu-Hua Yue (吳化片) yue-wuh
  • Qin-Lian Yue (欽廉片) yue-qil
  • Goulou Yue (勾漏片) yue-gol

I also mostly agree with this, but I would just like to note that Guan-Bao, Xiangshan, and Yong-Xun (and likely Gao-Lei as well) are descended from Guangfu, and the last four (Gao-Lei, Wu-Hua, Qin-Lian, Goulou) branches are more areal than genetic. From what I can gather, the reason this structure is preferred over a more nested one is because currently all the genetic relationships are still not clear, as Justinrleung explained above.

I also don't know if some of the above branches should have "~ Cantonese" as an alias.

--kc_kennylau (talk) 13:30, 26 May 2024 (UTC)Reply

Agree with the above list of groups. For Wiktionary purposes, we would simply treat all ten of them as direct descendants of Yue without being specific on their relationship. (yue "Yue" would be a family)
On top of these I think we should have the following full code:
  • yue-can, "Cantonese", equivalent to (some of) the current use of yue, parent yue-guf
and the following etymology codes:
  • yue-gzh, "Guangzhou Cantonese", equivalent to existing yue-gua, parent yue-can
  • yue-hkg, "Hong Kong Cantonese", equivalent to existing yue-HK, parent yue-can
  • yue-tai or yue-hsv, "Taishanese", equivalent to some of the existing zhx-tai, parent yue-siy
The "Cantonese" suffix could be applied to (dialects of) Guangfu, Guanbao, Xiangshan, Yongxun, and other "Baihua" varieties such as Qinzhou and Gaozhou, all of which are often considered to be related to Standard Cantonese.
wpi (talk) 14:11, 26 May 2024 (UTC)Reply
Agree. --kc_kennylau (talk) 21:24, 29 May 2024 (UTC)Reply

About Hawaiian Creole

[edit]

According to Wiktionary:Language treatment, Hawaiian Creole (hwc) is not a separate language and should be treated as English, however Category:Hawaiian Creole language still exists with over 50 lemmas. Shouldn't it have been merged into English? Protegmatic (talk) 18:34, 10 May 2024 (UTC)Reply

Yes, but there's a world of difference between deciding that something ought to be done and someone actually getting off their ass and doing it. —Mahāgaja · talk 21:11, 10 May 2024 (UTC)Reply
I find that WT:Language treatment is often hopelessly out of touch with how we actually treat languages. It should reflect our actual practice, not the other way around. Theknightwho (talk) 00:22, 27 June 2024 (UTC)Reply
Okay, it was unmerged back in 2020, so the definitive answer is that no, the entries should not be merged: Wiktionary:Language treatment requests/Archives/2020-24#HWC Recognition (take 2). Theknightwho (talk) 00:29, 27 June 2024 (UTC)Reply

Manipuri vs Meitei language

[edit]

I propose we change it to Meitei as the language is predominantly spoken by the Meitei people. Meitei is not the only language indigenous to Manipur. There are other ethnic groups in Manipur who speak different languages. So there are many Manipuri languages, Meitei is only one of them. 178.120.0.250 10:40, 9 May 2024 (UTC)Reply

FWIW; this is about renaming what we call Manipuri to Meitei. I told the IP to come here, but in hindsight, perhaps WT:RFM would be a better venue.
At least the English Wikipedia seems to use Meitei as the primary name for the language. — SURJECTION / T / C / L / 11:10, 9 May 2024 (UTC)Reply
Sure, btw you can call me 178 if you want. It's a bit more specific. 178.120.0.250 11:31, 9 May 2024 (UTC)Reply
Yes, WT:RFM is the usual place for discussions about renaming languages. —Mahāgaja · talk 13:34, 9 May 2024 (UTC)Reply
i oppose the proposition as it is unneeded; the rename request is unnecessary as it neither adds nor removes anything valuable. There aren't any active editors in the language, and if such a user comes up and finds problem with the name he will point that out naturally and the the discussion will be fruitful. Discussing over it shall only cause a wastage of time, given that in this case the current name is obviously not obstructive. Word0151 (talk) 14:42, 9 May 2024 (UTC)Reply
Support seems like Wikipedia already changed the name. Not that we need to match Wikipedia, but if they changed it and the only interested editors here wanna change it too... why not? — Sameer مشارکت‌هابحث﴿ 15:52, 9 May 2024 (UTC)Reply
FWIW:
  • Google Ngrams shows "Manipuri language" having about 4x the usage of "Meitei language" and over 12x the usage of "Meithei language" in the most recent year (2019).
  • Wikipedia says that "Meitei" is now used by most Western scholars, although it's sourced to a single source (Chelliah), so take it with a grain of salt.
  • Wikipedia says that Indian government sources and the Indian constitution call it Manipuri, which is probably easily verifiable.
  • Ethnologue calls it "Meitei".
  • Glottolog calls it "Manipuri".
  • "Meitei" is closer to the endonym for the language.
  • As for Wikipedia's name choice, this happened in 2016 or earlier, and there is debate on the talk page about whether to call it Meitei or Manipuri, with the people in favor of Manipuri claiming it is the common name in English.
Benwing2 (talk) 08:36, 13 May 2024 (UTC)Reply

Adding Proto-Micronesian

[edit]

I don't edit Austronesian languages, but it seems sensible to decide whether we should add Proto-Micronesian (poz-mic-pro), since we've had the entry Reconstruction:Proto-Micronesian/faasa since February, which currently has to use the language code und. I note that Proto-Micronesian seems to be notable enough to warrant its own Wikipedia article, which sources the reconstruction, so my uninformed view is that we should add it. Theknightwho (talk) 20:34, 10 June 2024 (UTC)Reply

Added, given no objects and a need for a code. Theknightwho (talk) 23:16, 26 June 2024 (UTC)Reply

Please help to sort out Scandoromani

[edit]
See also: #Merger into Scandoromani

Lattjo dives! I have started to make some more Scandoromani and there are 4 main problems which i need to ask about advices before i can go on.

Problem 1. As far I understood, Tavringer Romani is Swedish Scandoromani, also known as Traveller Swedish. Tavring is not something exlusively Swedish, and we already have Traveller Norwegian. May it be a good idea to rename Tavringer Romani to Traveller Swedish? Anyway, it's almost no difference between TS and TN, so may it be even a better idea to merge them into one L2 (Scandoromani)? See also the same problem number 4 about Månsing.

orthographies are consistently different, which seems to be the case. - said Theknightwho once about this problem. But is it really a good reason?

Problem 2. More serious one. Some of my first editions on Wiktionary were in Scandoromani and then i was so dumb that i have not included sources on the most entries i've created. And now many of my sources are completely gone from internet. Now i remember that some entries - i don't remember which exactly - are not even from sources, but i've created them together with my former neighbor, an old drunk guy who spoke the language. I mean, i checked them in dictionaries and found them, but some of them not, and now i don't remember which one exactly, and some of the dictionaries are gone.

Dictionaries i remeber but can not find: an old web 1.0 Norwegian website with black background; an long English PDF with ugly monospaced font comparing Scandoromani and Kalo; a scan of an old Swedish book with big fat letters"

Problem 3. What is "Tavringer Romani terms in nonstandard scripts"-category? The script is unspecified, so why is this category coming up?

Problem 4. What to do with Rodi and Månsing? They are jargons of Swedish and Norwegian, so how we should refer to them? I use to refer to them as jargons, using code "sv" (Swedish), specifying that its also used in Norwegian. I hope it's ok to do so. Otherwise, we maybe need them as independent L2s.Tollef Salemann (talk) 19:42, 15 June 2024 (UTC)Reply

Glottonym tweaks: Franco-Provençal, Venetian → Francoprovençal, Venetan

[edit]

(moved from Wiktionary:Beer parlour/2024/August#Glottonym tweaks: Franco-Provençal, Venetian → Francoprovençal, Venetan)

These changes would bring Wiktionary in line with the naming conventions of modern English scholarship, as found in for instance the Oxford Guide to the Romance languages (2016).

Context:

  • Francoprovençal has been the name used in French scholarship since the 1970's. Removing the older hyphen lessened the misleading impression that the language is some sort of secondary blend of French and Provençal (Occitan). There is also an element of typographical convenience.
  • Veneto has always been the name used in Italian scholarship, if I'm not mistaken, with Veneziano predominantly or exclusively reserved for the varieties spoken in Venice and environs, as opposed to the rest of the Venetan domain (Ve1, Ve3‒7).

Nicodene (talk) 22:05, 9 August 2024 (UTC)Reply

Support, the Venetan proposal in particular has been a long awaited change, and given a part of modern Anglophone scholarship handle this sensibly we have little reason to stay behind. Catonif (talk) 22:15, 9 August 2024 (UTC)Reply
Support. Never heard of Venetan but if this is the accepted term, so be it. Benwing2 (talk) 07:40, 10 August 2024 (UTC)Reply
Thoughts, @Apisite, IvanScrooge98, Samubert96, Sartma, Ultimateria, Urszag, Word dewd544?
(Active users who speak Venet[i]an or have contributed to its entries.)
Nicodene (talk) 20:52, 13 August 2024 (UTC)Reply
Thanks for pinging me. I am pretty indifferent to the hyphen question for Francoprovençal, while I am not fully convinced about Venetan; after all, Venetia is the anglicized name for the region of Veneto (if the linguistic reasoning is to distinguish the specific dialect of Venice from the language as a whole). But if Venetan is now most common in English-language professional literature, then I don’t think there is much to debate. [ˌiˑvã̠n̪ˑˈs̪kr̺ud͡ʒʔˌn̺ovã̠n̪ˑˈt̪ɔ̟t̪ːo] (parla con me) 21:21, 13 August 2024 (UTC)Reply
The region's name occurs ~15 times more often in English as Veneto than Venetia, according to a Google search for “region of ____” (119000 results versus 7960). The latter occurs generally in historical as opposed to modern contexts.
Also at the moment we have no (reasonable) way to indicate a term used in Venice proper, as opposed to, say, Padua. A dialect label like Venetian would be identical to the name we currently use for the overall language (contra, as mentioned, the name used in linguistics). Nicodene (talk) 22:05, 13 August 2024 (UTC)Reply
Yeah, as I said, I get the reasoning. The thing is Venetian, despite being most commonly a word for stuff from Venice specifically, is not a strictly technical term like Venetan is—which is what comes to me a bit off given that this project is not directed to linguists but rather to the general public. And we could still label entries from the dialect of Venice as Venice, Venice dialect, Venice Venetian or something along those lines. But, again, it doesn’t mean I strongly oppose changing Venetian to Venetan. [ˌiˑvã̠n̪ˑˈs̪kr̺ud͡ʒʔˌn̺ovã̠n̪ˑˈt̪ɔ̟t̪ːo] (parla con me) 22:19, 13 August 2024 (UTC)Reply
The general public in Italy would be surprised to hear the dialect of, say, Padua described as veneziano. E.g. on Italian Wiki Dialetto padovano redirects to this page, where veneziano is mentioned solely as an external entity: “le parlate dei centri più importanti…sono state influenzate dal veneziano”.
So this is more about the general public of English-speaking countries, which isn't aware that such a language exists, as opposed to a local variety of (Standard) Italian. Nicodene (talk) 23:00, 13 August 2024 (UTC)Reply
Fair enough. [ˌiˑvã̠n̪ˑˈs̪kr̺ud͡ʒʔˌn̺ovã̠n̪ˑˈt̪ɔ̟t̪ːo] (parla con me) 23:09, 13 August 2024 (UTC)Reply
How do you pronounce "Venetan"? Benwing2 (talk) 23:20, 13 August 2024 (UTC)Reply
For me it's /ˈvɛnətən/ < /ˈvɛnətəʊ/ (≈Italian /ˈvɛneto/) + /-ən/. Nicodene (talk) 23:31, 13 August 2024 (UTC)Reply
@Benwing2: I would rather pronounce the term as /ˈvɛneɪtʌn/. --Apisite (talk) 10:49, 14 August 2024 (UTC)Reply
Support If we are not going to have separate h2 for the main dialect groups of the Venetan language, then we must go for Venetan. As @Nicodene said, Venetian is the dialect of Venetan spoken in and around Venice. For instance, Paduans, Vicentines and Trevisans speak Paduan, Vicentine and Trevisan respectively, not Venetian. — Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 15:27, 15 August 2024 (UTC)Reply
@Benwing2 Shall we go ahead, then? Nicodene (talk) 18:00, 22 August 2024 (UTC)Reply
@Nicodene I'm finally getting around to this. For reference, here is (I think) the correct way to rename a language (e.g. "Venetian" -> "Venetan"):
  1. First, list all the categories in Wiktionary (this takes a little while as there are ~ 1,000,000 categories and the listing is only 5,000 per second). Then find all the categories containing the word "Venetian", e.g. using python3 list_pages.py --namespaces Category (it is not sufficient to use the prefix-listing functionality to list categories starting with "Venetian" because there are other categories with "Venetian" in it elsewhere than at the beginning). Use this list to generate a list of category renames to supply to a script such as my rename.py script.
  2. Then, download the latest dump file from https://dumps.wikimedia.org/ (beware, it may be up to 20 days out of date) and search through it for all occurrences of 'Venetian' (e.g. like this: bzcat enwiktionary-20241001-pages-articles.xml.bz2 | python3 find_regex.py -e '^.*Venetian.*$' --all --stdin > find_regex.enwiktionary-20241001-pages-articles.xml.bz2.Venetian.out.1).
  3. Then, change the name in the language module itself (e.g. Module:languages/data/3/v for 'vec' = Venet(i)an), then regenerate the code <-> canonical name caches by going to Module:languages/code to canonical name and clicking on the Update button.
  4. Then, rename the categories containing the old name, using the script input created in step #1. You want to do this soon after renaming the language itself. It should follow the language rename rather than precede, so that when each page gets regenerated as it's renamed, the {{auto cat}} regeneration succeeds.
  5. Then, rename the language in the header of the lemmas and non-lemma forms, e.g. like this: python3 rewrite.py --from '==[ \t]*Venetian[ \t]*==' --to '==Venetan==' --cats 'Venetian lemmas,Venetian non-lemma forms,Venetan lemmas,Venetan non-lemma forms' --diff --track-seen --comment 'rename Venetian language headers to Venetan per [[Wiktionary:Language_treatment_requests#Glottonym_tweaks:_Franco-Provençal,_Venetian_→_Francoprovençal,_Venetan]]' --save > rewrite.venetan-venetian-lemmas-non-lemma-forms.venetian-to-venetan.out.1.save. This should follow the category renames so that e.g. the new categories don't end up in Category:Empty categories. Note that we loop over both "Venetian" and "Venetan" lemmas and non-lemma forms (the latter last) so that we get any terms that were regenerated and moved categories between this step and the previous one, or while this step is in progress.
  6. Then, rename the language in references to it in various places (especially but not exclusively in translation sections), using the output of step #2 as a guide. To do this, download the pages containing the word "Venetian", something like this: python3 find_regex.py --pagefile <(extract_pagename.sh < find_regex.enwiktionary-20241001-pages-articles.xml.bz2.Venetian.out.1) -e 'Venetian' --text > find_regex.find_regex.enwiktionary-20241001-pages-articles.xml.bz2.Venetian.out.1.Venetian.out.1.orig. Copy the file, e.g. cp find_regex.find_regex.enwiktionary-20241001-pages-articles.xml.bz2.Venetian.out.1.Venetian.out.1.orig find_regex.find_regex.enwiktionary-20241001-pages-articles.xml.bz2.Venetian.out.1.Venetian.out.1. Edit the latter file appropriately to change all occurrences of Venetian to Venetan that need to be changed. Push the changes using e.g. python3 push_find_regex_changes.py --direcfile find_regex.find_regex.enwiktionary-20241001-pages-articles.xml.bz2.Venetian.out.1.Venetian.out.1 --origfile find_regex.find_regex.enwiktionary-20241001-pages-articles.xml.bz2.Venetian.out.1.Venetian.out.1.orig --comment 'Venetian -> Venetan per [[Wiktionary:Language_treatment_requests#Glottonym_tweaks:_Franco-Provençal,_Venetian_→_Francoprovençal,_Venetan]]' --diff --save > push_find_regex_changes.find_regex.find_regex.enwiktionary-20241001-pages-articles.xml.bz2.Venetian.out.1.Venetian.out.1.out.1.save.
Benwing2 (talk) 05:53, 15 October 2024 (UTC)Reply
@Benwing2 Nice! Thank you for your work, this is a good day. :) Catonif (talk) 21:09, 15 October 2024 (UTC)Reply
@Catonif Thank you! @Nicodene I tried to find all the remaining instances of Venetian that should be changed to Venetan, but some I'm not sure about, e.g. the "Venetian" dialect of Italian (should that be "Venetan"? is this actually referring to the Venetan language?). The remaining instances are here: User:Benwing2/venetian-to-venetan Please look over them and change any pages needing changing. Thanks! Benwing2 (talk) 21:17, 15 October 2024 (UTC)Reply
@Benwing2 I went through that list, only a few needed to be changed, very well done! By the Venetian dialect of Italian, do you mean CAT:Venetian Italian? That's fine, it is the regional Italian of the city of Venice. Catonif (talk) 21:51, 15 October 2024 (UTC)Reply
@Catonif Thank you! Yes, I was referring to that category. Benwing2 (talk) 21:54, 15 October 2024 (UTC)Reply
Thank you. I had no idea the process was so complicated.
I’ve gone through the list and made one correction. The other cases were already addressed by Catonif. Nicodene (talk) 22:29, 15 October 2024 (UTC)Reply
@Nicodene Also, I'd like to get more input before renaming 'Franco-Provençal' -> 'Francoprovençal'. No one above commented on this change, and the Wikipedia article on the language (which has a hyphen in it) says this:
Although the name Franco-Provençal appears misleading, it continues to be used in most scholarly journals for the sake of continuity. Suppression of the hyphen between the two parts of the language name in French (francoprovençal) was generally adopted following a conference at the University of Neuchâtel in 1969; however, most English-language journals continue to use the traditional spelling.
Benwing2 (talk) 21:29, 15 October 2024 (UTC)Reply
It seems roughly 50/50 in English, judging by results from the last few years in Google Scholar. There doesn’t seem to be an official spelling in English, but there is one in both French and Italian (in both cases without the hyphen). The closest thing to an official English spelling that I could imagine is the one preferred by Oxford University, which is more or less the “capital” of anglophone scholarship in Romance Linguistics. Nicodene (talk) 22:49, 15 October 2024 (UTC)Reply
@Benwing2: In your edits here and here you changed Venetian to Venetan, citing this discussion. The thing is that you changed the names of external Wikimedia projects, which are still the "Venetian Wiktionary" and "Venetian Wikipedia" regardless of the spelling convention we use in our own entries. So I'm not sure those edits are worthwhile. Ioaxxere (talk) 23:06, 15 October 2024 (UTC)Reply
@Ioaxxere Oops, I didn't realize those are external links. Please undo them, thanks! Benwing2 (talk) 23:12, 15 October 2024 (UTC)Reply
OK went ahead and did this. Benwing2 (talk) 23:13, 15 October 2024 (UTC)Reply

Rename wca from Yanomámi to Yanomam

[edit]

I suggest we rename wca Yanomámi → Yanomam.

Our current name for this language (Yanomámi) is extremely confusing, given that its close relative guu, which we call Yanomamö, is also commonly called Yanomami (with or without various diacritics). In addition, the langauge family to which both of these languages belong is also called Yanomami, even by us (cf. Category:Yanomami languages). (The accent mark on Yanomámi is irrelevant; it may be present or not in any of these uses, so it doesn't help in distinguishing one from the other.)

Current practice in the academic literature is to call wca Yanomam, avoiding this confusion. See Helder Perri Ferreira, Yanomama Clause Structure, page 6: 'To avoid confusion then, the following terms are used in this thesis: [] Yanomam = either refers to a language of the Yanomami family or to its speakers. It corresponds to what Ramirez (1994a: 35) called the “Oriental super-dialect of Yanomami” or “Oriental Yanomami” (Yor). Migliazza (1972: 34) calls this language “Yanomam” as well.' Glottolog uses the similar term Yanomám; see here. Jacques Lizot's work tentatively follows Migliazza and also labels the variety as Yanomam, as does the Endangered Languages Project; see here. 'Yanomam' seems by far the most common designation for this lect in the current literature; it would make sense to rename the language accordingly. — Vorziblix (talk · contribs) 14:21, 28 August 2024 (UTC)Reply

Since I am intending to do some work with this language in the immediate future, I’m going to go ahead and make this change now to avoid having to make many more changes down the line. If there end up being any objections to the move, we can still discuss and undo the change then if needed. — Vorziblix (talk · contribs) 13:27, 3 September 2024 (UTC)Reply

East Lechitic typology

[edit]
  • Relevant Wikipedia articles:

In this thread I would like once and for all to try and determine what should be and what shouldn't be an L2 on en.wiktionary based linguistic, technical, and other criteria.

It's not secret that when dealing with dialect clusters and groups that it can be a headache to determine all of this.

When it comes to Lechitic, the West (Polabian)/North (Pomeranian)/East isn't even a strong grouping anyway; much of East Lechitic didn't even undergo the so-called Lechitic ablaut (some linguists argue that it was later levelled, some argue it never took place), and Old Polish, like many other "Old" languages, is not a single language, but rather a group of dialects with varying phonological features and changes that can be shown to go to a single etymological form, even if that form wasn't omnipresent across all the lects it represents (for example Masuration is a very early change).

The lects in question are Silesian, Masurian, and Goral.

In terms of linguistics, Silesian doesn't differ from other dialect groups as much and shares much in common with Greater Polish and Lesser Polish. However, it has undergone a huge standardization recently, and the socio-linguistic aspect of all this cannot be ignored, either. In terms of technical aspects on Wiktionary, there's not much that it needs that is special, to be honest, but I feel its status as an L2 is fairly safe. I mention this for later points and for context. Mutual intelligibility between Silesian and Polish can vary vastly - depending on the vocabulary used it may be intelligible or not, typical of other Slavic languages.

Masurian was split initially for being incredibly divergent from Polish. It shares a fair amount with some neighboring dialects such as Kurpian, however, to a much greater extent, and mutual intelligibility between Masurian and Polish is limited. Even when using more common vocabulary, it can be difficult to understand, and also a large number of everyday terms differ either by etymology or by a significant number of phonemes. I feel the Appendix:Masurian Swadesh list demonstrates this well (Appendix:Polish Swadesh list]] for reference). As far as the orthography goes, Masurian is a not widely spoken lect, so levels of normalization within the culture are not high, but also its daily usage is not either. It could be possible to normalize to a Polish orthography with a few additions (namely áéóôû, which we are going to need for other dialects anyway. an explanation can be found at w:Dialects of Polish). In terms of technical aspects, many Masovian dialects, such as Kurpian, might need similar support, such as a different declension module, as many more consonant alternations exist due to the decomposition of soft bilabialis (i.e. budowa > budozie). Its status as an L2 is debatable.

Goral sits in between Silesian and Masurian in most regards. Culturally, it is one of the most spoken dialect groups (itself being a dialect group WITHIN the Lesser Polish dialect group, but the number of differences between dialects here is smaller than between other dialects within a dialect group) and its mutual intelligibility is much like that of the relationship between Silesian and Polish. Depending on the vocabulary used as well as the "thickness" of the speakers accent, mutual intelligibility can vary wildly. In terms of orthography, pagenames would differ about as much as some other dialect entries. What I mean is that in Middle Polish you had so called "slanted-vowels) (áéó) which all developed differently in different dialects, as well as w:Masuration. Goral dialect would be spelled on the whole very similarly to other Lesser Polish dialect words, so lekarz would be lykorz for both groups. In terms of technical support, it would also need new declension templates, but it could be handled using most of the same infrastucture as the rest of other Polish dialects. However, one big difference is many Goral dialects have initial stress, which stands in huge contrast to the rest of East Lechitic, which is penultimate.

Solutions:

  1. Split all. Keep Silesian and Masurian split and split Goral as well, setting it as a descendent of Old Polish.
  2. Status quo. Keep Silesian and Masurian split and do not split Goral.
  3. Remerge Masurian. Silesian remains an L2, and Masurian and Goral would be dialects of Polish.
  4. Remerge all.

I personally can see the first three options, or more specifically options 1 or 3. I'm strongly against merging Silesian, and I suspect most people here would be as well, but I am placing the option here for the sake of completeness. I have already set Polish dialects as LDL's on WT:About Polish, so questions of attestation can be put aside.

I am opting to leave out anything about Old Polish and Middle Polish here. Vininn126 (talk) 12:56, 1 September 2024 (UTC)Reply

I would prefer option 3. Almost no language is homogenous, and we can't endlessly split, we need to stop somewhere; I think written language is the most important thing for languages in (western) Eurasia: I'm pretty sure an average Masurian speaker will not see Standard Polish as a language separate from the one they write in day-to-day, and will have little problem to encode their variety in written Polish to a satisfactory degree. You can write a word like ony and pronounce it as /ónÿ/ without much of a problem. You can write /ôwtén/ as owten (which is probably attestable by the way!) and show you're a dialectal speaker. Just in the same way Finnish speakers write Finnish, Scots write Gaelic and Italians write Italian. Thadh (talk) 13:29, 1 September 2024 (UTC)Reply
Second idea: (Notifying KamiruPL, BigDom, Hythonia, Tashi, Sławobóg, Silmethule, Rakso43243, Skerillion): @Benwing2, @PUC, @Thadh How would you feel about having etymology codes for the major dialect groups? We have already one for Middle Polish which has been very useful. I could see it being very useful for having for example pl-GP for Greater Polish, pl-LP for Lesser Polish, pl-MS (or something similar) for Masovian, maybe pl-BOR or something for both Borderlands (but I'm not sure we need that one) and also potentially pl-gor for Goral. This would be very useful for etymologies as each group has different tendencies for borrowing and its relation to other languages, such as some Greater Polish dialects having some vocab in common with Kashubian, for example. Vininn126 (talk) 10:21, 12 September 2024 (UTC)Reply
Only for those that have an actual demonstrably significant number of borrowings into another language that set them apart from Standard Polish or other groups. Thadh (talk) 10:26, 12 September 2024 (UTC)Reply
This would be fairly easy to do if we consider dialectal borrowings - (dialectal) Prussian German often borrowed from Masovian dialects, Slovak dialects often borrowed from Goral/Lesser Polish. Greater Polish most assuredly gave certain words in dialects of Kashubian. I'm fairly sure we could find examples of non Standard Polish words for each, and the given lects mentioned are unlikely to have borrowed from other dialect groups. Vininn126 (talk) 10:31, 12 September 2024 (UTC)Reply
@Vininn126 I just received your "Second idea" ping: 4 days late. No objections to adding etym codes for the major dialect groups, but they should follow the standard etym code notation, hence pl-gre for Greater Polish, pl-les for Lesser Polish, pl-mas maybe for Masovian, pl-bor for Borderlands, pl-gor for Goral. Benwing2 (talk) 22:44, 15 September 2024 (UTC)Reply
And I assume you are for option 3 in the first. Vininn126 (talk) 05:21, 16 September 2024 (UTC)Reply
Yes, not strongly though; I trust whatever you think is best. Benwing2 (talk) 05:28, 16 September 2024 (UTC)Reply
Okay, I think everyone who's going to say something has said their piece. I have tried asking everyone for their opinion. The decision is: Remerge Masurian, don't split Goral, and don't Merge Silesian. Greater Polish, Lesser Polish, Masovian, and Goral will get their own etymology codes. I can implement this starting this week. Vininn126 (talk) 17:48, 25 September 2024 (UTC)Reply

Paraguayan Guaraní (again)

[edit]

Guaraní is a mess. Its problems include a broken pronunciation module, dozens of conjugation templates with no documentation of what they are for[7][8] and a complete lack of references, but the worse one are the language codes. Currently, we have codes for both Guaraní (gn) and each one of its "varieties" — Chiripá (nhd), Classical Guaraní (gn-cls), Eastern Bolivian Guaraní (gui), Mbyá Guaraní (gun), Paraguayan Guaraní (gug) and Western Bolivian Guaraní (gnw) — and all of these are treated as distinct languages with their own L2 heading, which raises the question: if we have a heading for each variety, what the even is Guaraní? Looking through the lemmas, it seems to be a duplicate of Paraguayan Guaraní, an issue that has already been addressed seven(!) years ago, with no consensus in changing anything. Also, Classical Guaraní is currently listed as a descendant of Guaraní and a sister language of Paraguayan Guaraní, which is not ideal.

My proposal is:

  • Making gn a family code, similarly to Tupi-Guarani (tup-gua), putting Classical Guaraní as the ancestor of Paraguayan Guaraní and moving everthing from the Guaraní L2 to Paraguayan Guaraní.
    • The position of the ancestor is still not clear to me, though. To my understanding, what Wiktionary calls Classical Guaraní is the language used in the 17-18th century Jesuitc missions of Paraguay, Argentina and South Brazil. It's the ancestor of Paraguayan Guaraní for sure, but its relation to Mbyá and Chiripá is not well explained, and authors just calling everything "Guaraní" doesn't really help...
  • Another way would be doing the opposite: merge everything into gn, make Classical Guaraní its ancestor and use {{lb|gn|x}} for the different varieties. This would be specially counterproductive because we would end merging Mbyá and Paraguayan, and they certainly aren't the same language. The problem is aggravated with Mbyá having a different spelling that uses X instead of CH.

Taggin' the only active Guaraní editors I know @RodRabelo7, Ovey 56 and @Theknightwho who seemed interested :p. Trooper57 (talk) 17:27, 13 September 2024 (UTC)Reply

Thanks! Finally someone spoke about it! Yes, the Wiktionary pages on Guarani are certainly a mess, but I'd say I liked more your first proposal, since the Guarani varieties are already considered by some as different languages.
Just some things on the language used by the Jesuits in their missions, it was its own language, just like the Jesuitic Nahuatl (I don't remember the language's official name).
I've already wanted for so long for Guarani be recognized as a group of languages than a languages with so many different dialects and not only for the differences in their vocabulary, pronunciation and integibility with one another, but because the contemporary Guarani peoples do not consider the group of varieties as a single language.
I totally agree on editing the pages to show they are different languages, as well as changing the automatic name that pops up when the code "nhd" is used. It should be either "Nhandeva", "Yandeva" or "Nandeva", since "Chiripa" is an outdaded term that some Yandeva people consider derogatory/insensitive. Junior Santos (talk) 13:17, 14 September 2024 (UTC)Reply
Interesting, so Classical and Paraguayan Guaraní were actually spoken at the same time, with the first being like a "formal" version used by the Jesuits?
And I think the categories were created when these names were still in use lol, most of the Tupian languages have been left untouched for years. The Kaapor don't seem fond of "Urubu", too. Trooper57 (talk) 14:52, 14 September 2024 (UTC)Reply
Also pinging @Rodrigo5260 who commented on the issue on Discord. Trooper57 (talk) 14:54, 14 September 2024 (UTC)Reply
  • Thank you, Trooper57, for pinging me into this discussion. First of all, I would like to mention that I have indeed noticed this mess with the Guarani entries. I have worked on some (Paraguayan) Guarani entries, and from my experience, almost all Guarani (gn) entries are actually Paraguayan Guarani (gug). However, since it's more common to see just Guarani, I opted to record them that way... I agree with the question: if we have a code for each variety, what is actually Guarani? I must admit that I only know the differences between (Paraguayan) Guarani, the Mbyá, and the Kaiwá (to which I recently added some entries, such as yrygwasu). I am less familiar with the other varieties. Regarding Classical Guarani (I prefer the term Old Guarani, by analogy to Old Tupi), this is the origin of (Paraguayan) Guarani, Mbyá, and Kaiwá, at the very least. Old Guarani is to these varieties what Old Tupi is to Nheengatu, for example. I also note that I have created the very first entries for Old Guarani, such as cabayu and ĭgaratá. What to do? I'm not sure yet, but I would like others to share their ideas. By the way, it would be interesting if we could gather at least one dictionary for each variety to get a better idea of what we are dealing with. I have a dictionary for (Paraguayan) Guarani, Mbyá, Kaiwá, and, of course, the Montoya's vocabulary on the so-called Classical Guarani, Tesoro de la lengua guaraní. RodRabelo7 (talk) 04:05, 15 September 2024 (UTC)Reply
    Oh, and I'd support removing the diacritic from "Guaraní". "Guarani" is way better... RodRabelo7 (talk) 04:08, 15 September 2024 (UTC)Reply
    About the last part, I haven't found any dictionaries yet, but there's some Eastern Bolivian Guaraní vocab in this pdf by UNIBOL Guarani. Trooper57 (talk) 16:22, 15 September 2024 (UTC)Reply

Add Guachí

[edit]

Guachí is an extinct language known to have been spoken in Argentina in the 19th century; the only record is a word list of 145 words, from 1845. Apparently, it's usually classified as Guaicuruan, but WP says the data is insufficient to demonstrate that. For reference, we already have Appendix:Guachí word list. Theknightwho (talk) 14:18, 17 September 2024 (UTC)Reply

Hi, in the future I'd recommend not adding a language even if you want to, but no one replies to your suggestion to add it in 10 days. In general you need at least one other person to look over and agree with your suggestion. Please don't take silence as consent. In this case you should have pinged User:-sche, who can give you thoughts. I'm personally a bit skeptical as to whether a single word list is enough data to indicate even that it's a separate language as opposed to either a dialect of an existing language or a mishmash of randomly collected words. Benwing2 (talk) 10:19, 28 September 2024 (UTC)Reply
Same thing goes for Kalašma, which you recently added with a similar "silence = consent" assumption. Benwing2 (talk) 10:20, 28 September 2024 (UTC)Reply

Changing the canonical name of kla from "Klamath-Modoc" to "Klamath"

[edit]

Wiktionary's canonical name for the language kla, spoken by the Klamath and Modoc peoples, is currently "Klamath-Modoc", which reflects the fact that the two peoples spoke different dialects. I propose that it be renamed "Klamath", which is the name that sources discussing the language predominantly (though not universally) call it.

  • The Klamath Tribes themselves call the language "Klamath". (The Modoc Nation could conceivably have a stake in the language being called "Klamath-Modoc", but I can't find any references to the language by name on their website.)
  • Most of the academic literature I can find about the language identifies it as "Klamath". In particular, the works of Albert S. Gatschet and M. A. R. Barker, who each produced by far the most extensive and most cited documentation of the language, call it "Klamath".
    • The search string "Klamath language" yields significantly more results in both Google Scholar and JSTOR than the string "Klamath Modoc language".
  • The English Wikipedia article for the language has been titled "Klamath language" since 2011. Also, almost all sources in that article's bibliography refer to the language as "Klamath".

(In the interest of a fully informed discussion, it's worth noting that the following sources use the name "Klamath-Modoc": SIL International, Ethnologue, Glottolog, OLAC, and the California Language Archive.)

— Äþelwulf (talk) 20:56, 24 September 2024 (UTC)Reply

Is there anything I can do to elicit input on this matter? — Äþelwulf (talk) 20:19, 15 October 2024 (UTC)Reply
@Athelwulf Maybe ping User:-sche, who is often involved in these discussions? -sche, can you ping anyone else who you think might have relevant comments? Benwing2 (talk) 21:20, 15 October 2024 (UTC)Reply
BTW the fact that both Ethnologue and Glottolog use the name "Klamath-Modoc" is significant, although not decisive. Benwing2 (talk) 21:22, 15 October 2024 (UTC)Reply
You are right that "Klamath" is the more common term, and although it is hard to be sure how many uses of it mean the language [that encompasses both 'Klamath' and 'Modoc'] and how many mean the dialect ("Klamath-Modoc" is arguably clearer about the scope), probably our preference for using the most common name should lead us to use Klamath here.
It is interesting that there are almost no uses of the native name. ("Klamath" is derived from the Upper Chinook designation for all the natives of the Klamath River Basin, including the Klamath and Karuk and Shasta and Yurok — Modoc is at least [a clipped rendering of] a Klamath-Modoc word for that variety — and Victor Golla, in California Indian Languages (2022), page 135, notes that after "Gatscher used 'Klamath' as the specific ethnographic name for the Indians of the reservation on Upper Klamath Lake and for their dialect of Klamath-Modoc, [...] this usage soon became standard among anthropologists [but] there was [initially] reluctance, however, to extend the term to the Modocs, who had been treated as a separate tribe since the Modoc War of 1872-1873 and their subsequent removal to Oklahoma.") - -sche (discuss) 21:26, 21 October 2024 (UTC)Reply

Ancestor of Azerbaijani

[edit]

Hello, I wrote wiktionary articles in Azerbaijani written in the Azerbaijani Abjad (Turco-Perso-Arabic alphabet), but some other Azerbaijani users cancel all my edits on the pages, because they are "too old for Azerbaijani". The question is related to the constant rollbacks of information from articles written in the Azerbaijani Abjad alphabet, I constantly encounter these restrictions that they write "this word does not exist in modern Azerbaiani". This is due to the fact that the ancestor of the Azerbaijani language is not defined in Wiktionary, or rather it is defined as Old Anatolian Turkish, but this is too ancient an ancestor. For comparison, in the Turkish language (of Turkish Republic) the ancestor is indicated as the Ottoman language and then the old Anatolian Turkish, this is logical, Ottoman Turkish was used until 1920s. This completely solves the problem in the case of the Turkish language (of Turkish Republic). At the same time, there is no solution to this problem for the Azerbaijani language - the ancestor of the Azerbaijani language is indicated in wiktionary as Old Anatolian Turkish, which was used until the 14th century at the latest. Azerbaijani has no ancestor in the time intervals from the 15th to the beginning of the 20th century (according to various sources, modern Azerbaijani can begin in 1922-1923, when the USSR occupied Azerbaijan, or in 1928, when the USSR translated the Azerbaijani language into latin alphabet) — Azerbaijani has no ancestor in the time intervals from the 15th to the beginning of the 1920s. However, historically, the ancestor of Azerbaijani was considered as Ajami Turkish (trk-ajm, "Turkish of Persia" and was language of Qajars, Afshars, Qizilbashs, Qashqayi, Afshar etc, it is also ancestor for Iraqi Turkmen and Sonqori languages, also possible for Khorasani Turkish and Khalaji languages, For example, In book The Turkic varieties of Iran , Christine Bulut says (page 406) that written language for theese language was Ajam Turkic since 16th century. It is a good term). I could write Azerbaijani articles written in the Abjad alphabet within this language so as not to encounter restrictions, but as I understand it is not possible at the moment. Please help me with this issue, since I have a lot of literature and I want to create pages indicating these words, but I encounter restrictions from other users.

At the moment Azerbaijani language page says that Azerbaijani language comes from:

  • Proto-Turkic
  • Proto-Oghuz
  • Old Anatolian Turkish

but it should be

  • Proto-Turkic
  • Proto-Oghuz
  • Old Anatolian Turkish
  • Ajami Turkish

Please, create the language Category for this language Ajami Turkish (https://www.wikidata.org/wiki/Q110812703) and make it ancestor it for Azerbaijani language. It will look like this: Azerbaijani language comes from Ajami Turkish (trk-ajm), which comes from Old Anatolian Turkish:

m["trk-ajm"] = {
"Ajami Turkish",
110812703,
"trk-ogz",
"fa-Arab",
ancestors = "trk-oat",
entry_name = {["fa-Arab"] = "ar-entryname"},
}

Sebirkhan (talk) 19:31, 8 December 2024 (UTC)Reply