Wiktionary:Beer parlour/2025/February

Global ban proposal for Shāntián Tàiláng

Latest comment: 24 days ago1 comment1 person in discussion

Hello. This is to notify the community that there is an ongoing global ban proposal for User:Shāntián Tàiláng who has been active on this wiki. You are invited to participate at m:Requests for comment/Global ban for Shāntián Tàiláng. Wüstenspringmaus (talk) 12:08, 2 February 2025 (UTC)Reply

Are dictionaries of other languages acceptable as a reference for an entry?

Latest comment: 22 days ago3 comments3 people in discussion

A while back I found that the entry palco has a monolingual Portuguese dictionary as the reference for the Spanish entry. This didn't seem right, so I deleted that reference, but I see it's been reverted. I'm wondering if perhaps it is in fact acceptable to use dictionaries of other (relatively closely related) languages as references for an entry? Laralei (talk) 23:35, 2 February 2025 (UTC)Reply

I don't see the issue with doing so for etymologies, as the entry in question does, as long as the source in question also mentions the Spanish term. — SURJECTION ^{/ T / C / L /} 07:05, 3 February 2025 (UTC)Reply

I won't weigh in on the use of the reference, but since it was my bot that reverted your edit I want to clarify why: your edit removed the <references/> tag without deleting the actual <ref>...</ref> reference. Since <references/> just tells the page where it should display any previously tagged references, after your edit, there was no explicit destination for the references and they ended up displayed at the end of the page after the categories. If there had been another language after Spanish that included <references/>, the reference would have been incorrectly displayed there. The bot detected that the Spanish entry included explicit references without a References section and added one. JeffDoozan (talk) 18:40, 4 February 2025 (UTC)Reply

Reminder: first part of the annual UCoC review closes soon

Latest comment: 23 days ago1 comment1 person in discussion

Please help translate to your language.

This is a reminder that the first phase of the annual review period for the Universal Code of Conduct and Enforcement Guidelines will be closing soon. You can make suggestions for changes through the end of day, 3 February 2025. This is the first step of several to be taken for the annual review. Read more information and find a conversation to join on the UCoC page on Meta. After review of the feedback, proposals for updated text will be published on Meta in March for another round of community review.

Please share this information with other members in your community wherever else might be appropriate.

-- In cooperation with the U4C, Keegan (WMF) (talk) 00:49, 3 February 2025 (UTC)Reply

English verb conjugation tables

Latest comment: 11 days ago17 comments7 people in discussion

Occasionally I come across conjugation tables for English verbs, such as at clarify#Conjugation. Do we have a policy about including or not including these? {{en-conj}} says "In general, it should only be used for verbs with archaic forms", but doesn't every English verb (potentially) have archaic forms in "-est" or "-eth" at least? Even verbs invented in modern times can be cast in this archaic style for effect. Mihia (talk) 18:53, 4 February 2025 (UTC)Reply

@Mihia: in theory, yes, but I always check to see if at least one of the archaic forms is attestable before adding the conjugation table, and I don't create entries for forms which I can't find uses of. I think it is useful to include conjugation tables, otherwise there really isn't anything linking the lemma to the archaic forms. — Sgconlaw (talk) 19:49, 4 February 2025 (UTC)Reply

For clarity, should we change "should only be used for verbs with archaic forms" to "should only be used for verbs with attestable archaic forms"? Mihia (talk) 19:55, 4 February 2025 (UTC)Reply

@Mihia: I guess that would be OK with me. We should clarify, though, whether this means (1) the same standards of verifiability as lemmas (I think not, since as far as I can tell we don't require other inflections of verbs to be separately attestable with at least three quotations); and (2) whether unattestable archaic forms need to be omitted from the conjugation table altogether, or whether they can remain in the table as red or green links (I would say yes since, for example, in an Irish verb entry the mutation table notes that "[c]ertain mutated forms of some words can never occur in standard Modern Irish. All possible mutated forms are displayed for convenience."). — Sgconlaw (talk) 20:26, 4 February 2025 (UTC)Reply

I agree that there should be linkage to the archaic forms. I wonder, though, whether the conjugation table makes more of a meal of this than is necessary. Apart from vanishingly unusual special cases, such as perhaps the "be" verb, are there many (any?) verbs with "unexpected" entries in any of these conjugation fields, beyond what is already listed in the head anyway (notably irregular past tense / participle)? Is the conjugation table in fact making things look more complicated than they really are -- giving the impression, for instance, that certain verbs have an irregular imperative or subjunctive, or a (modern) past tense varying with person/number, etc.? Mihia (talk) 18:03, 5 February 2025 (UTC)Reply

I would say that it should be restricted to verbs which existed back when the forms were common, or modern verbs that use the arachic forms. (Maybe not even those, since only certain archaic verb forms are used a lot these days.) CitationsFreak (talk) 20:38, 4 February 2025 (UTC)Reply

I agree. I always thought conjugations were sort of exempted from the WDL rules. I mean, they're conjugations. MedK1 (talk) 20:42, 6 February 2025 (UTC)Reply

IMO English verb conjugation tables are overkill in the vast majority of cases. I've removed several of them. I think the only situation where such tables are needed are (as mentioned by @CitationsFreak) verbs that existed <= 1650 AD (when the archaic forms were still common), and even then maybe only if the forms aren't completely predictable. Otherwise it just adds noise to the entries. Benwing2 (talk) 00:27, 10 February 2025 (UTC)Reply

I have been in favour of conjugation tables in English; I find it weird that we (the English Wiktionary) have big tables showing all kinds of obscure or obsolete French and German verb and noun forms, but many English verb entries don't link to their less-common inflected forms at all, or acknowledge that they existed. I concede that the average user may not be interested in such forms. Perhaps we might alternatively have a little button at the end of the forms that are listed in the headword, that said "show archaic and obsolete forms", that would make them appear, but then, that's not all that different from having a collapsed table with its button to make the forms appear. I don't know, perhaps someone can think of some other way of linking from lemmas to their inflected forms. - -sche (discuss) 03:45, 10 February 2025 (UTC)Reply

A feature to "show archaic and obsolete forms", which showed just those forms, is a better idea, in my opinion. The present "conjugation table" gives the impression that any of those fields might be irregular in modern English, over and above what is already displayed in the head. For example, that English verbs routinely might have an irregular or unpredictable subjunctive form, or whatever. While archaic forms are shown too, there is no indication that the purpose of the table is to show these, and not deal generally with conjugation of the verb, including modern forms. It makes English verbs, which in modern use overwhelmingly have only the forms listed in the head, look more complicated than is necessary. A "show archaic and obsolete forms" feature would avoid that. Mihia (talk) 10:00, 10 February 2025 (UTC)Reply

I support this idea. I have the same concerns as @Mihia about having a big conjugation table that gives a misleading impression about English verbs, and I think something that only shows the archaic and obsolete forms, or (maybe) shows both but clearly indicates which ones are archaic vs. normal and is only present when there are archaic forms, would be the best. It is a bit similar to the way that I've segregated archaic/obsolete/literary/etc. forms for Italian verbs like essere, facere and avere into separate tables; similarly, Russian pre-reform conjugations get their own tables. The difference here is that English verbs are simple enough that the headword is able to show *ALL* non-archaic/obsolete info, so there's no need for the "regular" table at all. Benwing2 (talk) 10:14, 10 February 2025 (UTC)Reply

Worth noting that I tend to agree with -sche on this and would likely oppose the vote. I see value in, say, come#Conjugation; sure, we could present this in a textual form ("archaic second-person singular forms camest and camedst are attested for this verb"), but a table is much clearer way of presenting that info. Whether the imperative and subjunctive need to be included is a different question to removing the table entirely. This, that and the other (talk) 01:52, 14 February 2025 (UTC)Reply

@This, that and the other What exactly do you oppose/support? Do you want verb tables on *all* verbs (which I would very strongly oppose) or only on verbs with attested archaic forms? Are you opposed to labeling the verb table "(including archaic and obsolete forms)" to emphasize that that is the main point of the table? Please note that even in come, the verb table presented is misleading, because (as can be seen by clicking on it) the form camedst labeled "archaic or obsolete" is not a genuine archaism but a rare, hypercorrect pseudo-archaism, and the form comen labeled "rare" is in fact archaic. I would suspect quite a lot of the tables have basic mistakes in them like this, because many of the people who are inclined to add such tables are not doing it to impart actual information but because it's "kewl". I would be inclined in fact to rename the table to {{en-conj-archaic}} and remove the support for not setting |old=1, so it's obvious that it's only intended for verbs in existence <= 1650 AD and would not make any sense for e.g. "to text" or "to download". Benwing2 (talk) 03:41, 14 February 2025 (UTC)Reply

@Benwing2 My understanding is that {{en-conj}} is only to be used on verbs where at least one form not listed in the headword line is attested. I don't support use beyond that. As such I'd definitely support removing the old= parameter and would not be opposed to renaming the template. I could support the "(including archaic and obsolete forms)" caption as well. All very good ideas!

Honestly I think the template needs to be written in Lua; it would have to be one of the most complex pure-wikitext templates that we still have. This could make it easier for people to specify proper labels using <q:> syntax. You could even reformat the cells as

came

come (nonstandard)

camedst (archaic, rare, hypercorrect)

to avoid the proliferation of footnotes. This, that and the other (talk) 03:53, 14 February 2025 (UTC)Reply

Yeah I completely agree about rewriting it in Lua. Benwing2 (talk) 03:58, 14 February 2025 (UTC)Reply

@Benwing2, CitationsFreak, Mihia: I don't object if the template is rewritten so that it is only for displaying the archaic forms, and would just make the following comments on that:

It would be easier to understand the table if it compares the modern forms with the archaic forms. For example, if on the 3rd-person singular present row the table showed the cells for both clarifes and clarifieth, this would be helpful to readers as the label "3rd-person singular" alone may not be understandable by readers unfamiliar with grammar.
Should the table also allow for modern variant inflected forms to be specified if, say, there are two or more of such forms, so they do not clutter up the headword line? (Offhand, I can't think of any specific examples.)

— Sgconlaw (talk) 12:47, 15 February 2025 (UTC)Reply

I created a draft vote on the usage of en-conj at Wiktionary:Votes/2025-02/Retiring_the_English_verb_conjugation_table. Please put comments specifically on the wording of the vote at the discussion page there. Mihia (talk) 09:26, 14 February 2025 (UTC)Reply

Lutuv vs. Lautu Chin for language name

Latest comment: 11 days ago5 comments3 people in discussion

A majority of recent academia on Lutuv (language code: clt), a Kuki-Chin language of the Maraic branch, uses the name Lutuv (as used by its speakers) rather than the term "Lautu Chin," as it is currently set on here. "Lautu" is an exonym used by the Hakha Chin adopted in the earliest scholarship that mention the language & people, much of which did not work directly with Lutuv people or their language, & all recent works either exclusively use the name Lutuv, or list "Lautu" merely as an alternative name. For an example of this among recent working papers on the language, see here, & for an example among papers working with the community in a non-linguistic context, see here. One can also find community centers like churches, both in the native region, & among diaspora communities preferring the "Lutuv" spelling.

So, this is my longwinded way of asking- could we change the name of "Lautu Chin" on Wiktionary to reflect the preferred name? CedarForest14 (talk) 01:38, 5 February 2025 (UTC)Reply

Pinging @-sche who seems to know a lot about obscure languages :) ... Benwing2 (talk) 00:24, 10 February 2025 (UTC)Reply

Support, "Lutuv" indeed seems to be more common these days. ("Lautu Chin" should be retained as an alt name; bare "Lautu" and "Lutuv Chin" also occur in a few papers and could be added as alt names for findability.) Glottolog's bibliography, which can sometimes (for languages that are written about more often) give indications of what names are commonly used to refer to certain languages, is no help here as it has only a forty-year-old French-language ethnography of the people, the ISO code request form, and two linguistic works that are more concerned with Proto-Chin than anything modern, and don't mention this language in their titles AFAICT. But on Google Scholar I indeed find more papers about "Lutuv" than "Lautu"/"Lautu Chin". - -sche (discuss) 02:12, 10 February 2025 (UTC)Reply

Thanks @-sche! If there are no objections, in a couple of days I'll switch the language name and move the lemmas, categories, etc. to use the new name. Benwing2 (talk) 10:15, 10 February 2025 (UTC)Reply

Thank you both very much, @-sche & @Benwing2, for your input- I'm hoping to add more entries for the language from existing publications soon, so I appreciate the help, & eagerly await the change! CedarForest14 (talk) 19:07, 15 February 2025 (UTC)Reply

Switching all turkish verbs to tr-conj-table template?

Latest comment: 10 days ago22 comments4 people in discussion

It is formatted more accurately (the 6 big blocks with 4 sub-blocks each is very systematic and how verbs are formed) and has more information than tr-conj so I think we should. If not then I'll edit tr-conj to make it better but that would be a waste of time considering the work has already been done in tr-conj-table Zbutie3.14 (talk) 02:41, 5 February 2025 (UTC)Reply

@Zbutie3.14 I support this although the {{tr-conj-table}} template should be renamed {{tr-conj}} and the existing {{tr-conj}} table deleted. How would I bot-convert from the old templates to the new one? I don't know much anything about Turkish so I don't know what parameters need to be passed to {{tr-conj-table}}. Benwing2 (talk) 00:23, 10 February 2025 (UTC)Reply

@Trimpulot Zbutie3.14 (talk) 00:27, 10 February 2025 (UTC)Reply

@Benwing2 {{tr-conj-table}} only requires at most 1 parameter in the form of any of the vowels <ı, i, u, ü> or the letter <d>: the vowel is required when the verb's stem is monosyllabic and its aorist is expressed with a -V⁴r suffix (current tables always have a parameter that is the aorist vowel), while the <d> is required only for the verbs etmek, gitmek and gütmek, plus their compounds (ie. covering all the verbs currently covered by {{tr-conj-*tmek}}).

In all other cases, no parameters are needed.

Trimpulot (talk) 06:21, 10 February 2025 (UTC)Reply

If I understand this correctly, a current

{{tr-conj|stem|V₁|aorist|V₂|t-or-d}}

should be replaced by

{{tr-conj-table|V₂}}

if stem is monosyllabic (contains only one occurrence of an ⟨a⟩, ⟨e⟩, ⟨ı⟩, ⟨i⟩, ⟨o⟩, ⟨ö⟩, ⟨u⟩ or ⟨ü⟩) and V₂ is one of ⟨ı⟩, ⟨i⟩, ⟨u⟩ and ⟨ü⟩. Otherwise, it should be replaced by a simple

{{tr-conj-table}}.

For example:

at almak: {{tr-conj|al|a|alır|ı|d}} → {{tr-conj-table|ı}};

at yanmak: {{tr-conj|yan|ı|yanar|a|d}} → {{tr-conj-table}} (V₂ is ⟨a⟩);

at edilmek: {{tr-conj|edil|i|edilir|i|d}} → {{tr-conj-table}} (edil is polysyllabic).

Moreover, a current

{{tr-conj-*tmek|⟨letters⟩}}

should be replaced by

{{tr-conj-table|⟨letters⟩t|d}}.

For example:

at affetmek: {{tr-conj-*tmek|affe}} → {{tr-conj-table|affet|d}}.

--Lambiam 07:55, 10 February 2025 (UTC)Reply

@Lambiam Slight correction: {{tr-conj-table|d}} is enough for any {{tr-conj-*tmek}}, so affetmek would only need {{tr-conj-table|d}}. Trimpulot (talk) 08:39, 10 February 2025 (UTC)Reply

I agree that ultimately the new template should replace the old one. To avoid switch-over problems, I think this is best done in phases.

Phase 1A: Move page {{tr-conj}} to page {{tr-conj-obs}}, leaving a redirect behind.

Phase 1B: Edit all uses (transclusions) of {{tr-conj}} to use {{tr-conj-obs}} instead.

Phase 2A: Move page {{tr-conj-table}} to page {{tr-conj}}, overwriting the redirect.

Phase 2B: Transform all uses of {{tr-conj-obs}} into uses of the new {{tr-conj}}.

Phase 2C: Transform the uses of {{tr-conj-*tmek}} into uses of {{tr-conj}}.

Phase 3: The now unused templates {{tr-conj-obs}} and {{tr-conj-*tmek}} (double check) may be relegated to the dustbin of unused bits.

--Lambiam 08:16, 10 February 2025 (UTC)Reply

@Zbutie3.14 Of course I agree with this decision, but we should also hear what the other Turkish editors have to say.

@Kakaeater, Keleci, Lagrium, Lambiam, Moonpulsar, Orexan, Whitekiko.

Trimpulot (talk) 06:31, 10 February 2025 (UTC)Reply

Fine with me. I don’t think anyone will miss forms like “almaz mıymışsın? ” and their ilk. --Lambiam 08:26, 10 February 2025 (UTC)Reply

This sounds good. What you describe about moving the old template out of the way is exactly what I've done in similar situations; essentially {{tr-conj}} -> {{tr-conj/old}}, then {{tr-conj-table}} is moved to overwrite {{tr-conj}}, and then the bot does its thing, and finally you delete {{tr-conj/old}} and the other old templates. If there are no objections, I'll do this in a couple of days. Benwing2 (talk) 10:08, 10 February 2025 (UTC)Reply

@Trimpulot @Lambiam I'm doing a bot run now to convert the calls. There's still ~ 300 verbs using {{tr-conj-v}} as well as a few using {{tr-conj-mi}}, {{tr-demek-yemek}} and {{tr-conj-aux-bil}} to convert, plus some miscellaneous leftover stuff to delete like {{tr-conj-exp}}. Benwing2 (talk) 08:42, 14 February 2025 (UTC)Reply

Also the module can't handle the suffixes -akalmak and -ekalmak and throws an error; please fix, thanks! Benwing2 (talk) 08:46, 14 February 2025 (UTC)Reply

@Benwing2 tr-conj-mi should stay for now, whilst tr-conj-aux-bil verbs are already treated in the template in their non-bil form.

I will add support for suffixes right away. Trimpulot (talk) 08:48, 14 February 2025 (UTC)Reply

@Trimpulot Thanks! What about {{tr-conj-v}}, how should they be converted? Also there are three verbs still using {{tr-conj/old}} that my module refused to convert because there was something wrong with the params. Can you convert them by hand? Then I can remove {{tr-conj/old}}. Benwing2 (talk) 09:48, 14 February 2025 (UTC)Reply

@Benwing2 tr-conj-v shouldn't need any params to be converted. As for those three pages, I'll take care of them. Trimpulot (talk) 10:46, 14 February 2025 (UTC)Reply

@Trimpulot I converted {{tr-conj-v}}. There are still three terms that use {{tr-conj-head}}. Can you add the requisite support for them (bilmek using {{tr-conj-aux-bil}} and dayak yemek and kazık yemek using {{tr-demek-yemek}}) to {{tr-conj}} and convert them to {{tr-conj}}? Also can you add support to {{tr-conj}} for {{tr-conj-mi}} and convert mı to use {{tr-conj}}? The overall appearance of the new {{tr-conj}} templates is radically different from the old ones and it would be best to have all Turkish verbs use the same (new) look and feel. Once you convert all the templates, you should convert the colors in {{tr-conj}} so they support dark mode; I can help you with that. Benwing2 (talk) 07:25, 15 February 2025 (UTC)Reply

I've fixed dayak yemek and kazık yemek, and I'll get to work on implementing a table for mi. As for suffixed bilmek, I'm not sure we should even give it a table of its own anymore, since it's included in the standard verb table. I'll greatly appreciate your help to make the table dark-mode friendly since I have no idea how to do that. Trimpulot (talk) 09:50, 15 February 2025 (UTC)Reply

@Trimpulot The only thing left now preventing deletion of {{tr-conj-head}} is {{tr-conj-aux-bil}}. The text says that it is suppletive as an auxiliary; how should we handle this? Can you format the bilmek entry appropriately? As for dark mode, it would be good to restructure Module:tr-conj to avoid duplication and use CSS classes rather than raw inline styles for the colors, along with a separate style.css file containing the CSS class definitions. For an example, see Module:is-noun/style.css and its use in the make_table() function of Module:is-noun (starting on line 3829). Benwing2 (talk) 00:57, 16 February 2025 (UTC)Reply

@Benwing2 I'll add a special case for if the entry ends in "-bilmek", so that {{tr-conj-aux-bil}} will need to be replaced with {{tr-conj|pot}}, though I think other entries in -bilmek aside -bilmek itself shouldn't exist. I'll try to see if I can make the css work, just where can I find all the --wikt-palette colours?

Trimpulot (talk) 07:06, 16 February 2025 (UTC)Reply

@Trimpulot see MediaWiki:Gadget-Palette/table. If there's a color you need that isn't in the palette, I may be able to add it. Benwing2 (talk) 07:12, 16 February 2025 (UTC)Reply

@Benwing2 Thanks! I'll get to it as soon as I can.

Trimpulot (talk) 07:14, 16 February 2025 (UTC)Reply

@Benwing2 It's done. Trimpulot (talk) 11:29, 16 February 2025 (UTC)Reply

French pronunciation

Latest comment: 16 days ago9 comments7 people in discussion

Why is there shown no primary stress mark on the last syllable of French words in the pronunciation sections although, in speech, it can very easily be heard to be there – maybe because it practically always falls to be situated just there? 2001:14BB:112:8152:0:0:340:5F01 17:54, 5 February 2025 (UTC)Reply

Our automatic transcriptions for French are phonemic, and stress is not phonemic in French. Nicodene (talk) 04:22, 6 February 2025 (UTC)Reply

@Nicodene: why is that? I’m curious. — Sgconlaw (talk) 04:33, 6 February 2025 (UTC)Reply

Why our transcriptions are phonemic, or why stress in French is not? Nicodene (talk) 05:39, 6 February 2025 (UTC)Reply

Perhaps it could be possible to create a template for phonetic transcriptions, also? 2001:14BB:AF:7F8A:0:0:5A68:101 08:23, 6 February 2025 (UTC)Reply

French doesn't always have stress on the last syllable of a word. That is how English speakers hear the pronunciation of French words in isolation, but when words are connected in phrases, this stress normally appears only on the last syllable of every phrase, not on the last syllable of each word in every phrase. Wikipedia gives the example "la.pə.tit.mɛ.zɔ̃ dɑ̃.la.pʁɛ.ˈʁi" for "La Petite Maison dans la prairie". That is another good reason not to transcribe it, aside from the fact that stress does not distinguish different words from each other in French and so, as Nicodene said, is not a phonemic property of French words.--Urszag (talk) 10:11, 6 February 2025 (UTC)Reply

Yeah French is well-known for glomming together multiple words with various transformations applied, such as elision, liaison, enchaînement (= resyllabification?) and schwa-dropping. Hence Doukipudonktan = D'où (est-ce) qu'il pue donc tant at the beginning of Zazie dans le Métro. (FWIW similar things happen in spoken Egyptian Arabic, and maybe also in Icelandic.) Benwing2 (talk) 00:33, 10 February 2025 (UTC)Reply

Why is it assumed that every reader knows how isolated words are pronounced in French? Because it is UN language? If i Chinese learner comes in here what will s/he say? mérsi -ending up in English- or mersí? Why aren't there small explanatory notes for us the ignorantes? (plus a nice audio with isolated words and a whole phrase) Notes for basic things, like 'this language lemmatises verbs in the infinitive' 'that language lemmatises verbs in 1st person present' etc... En.wikt prides itself on accuracy and documentation (: ‑‑Sarri.greek ^♫ I 02:15, 10 February 2025 (UTC)Reply

Phonemic transcriptions by their nature do not show non-phonemic information. Their purpose is to abstract away from details that are not significant in a language (which can be distracting or irrelevant) and just show the aspects of pronunciation that are unique to a word, not phonetic features shared across the entire language. The latter would be discussed in a phonetic description of the language (if looking for a scholarly treatment), or a general guide to pronunciation (if looking for a practical resource for language learners). For some languages, we show phonetic transcriptions, but this is often more complicated to do. It's normal for a word to have many phonetic variants that could potentially be transcribed differently depending on the transcriber's preferences and what the transcriber wants to emphasize. Furthermore, audio recordings of French can easily be found, so it's not clear to me how much value it adds to attempt to include detailed, narrow phonetic transcriptions of French words.

For comparison, pitch is a phonetic aspect of the pronunciation of words in any language. But we do not include pitch in our pronunciation information for English words, because pitch is not phonemic in English and pitch patterns are not consistently associated with particular words (but instead depend on the intonation of phrases). If somebody wants to understand how pitch is used as a feature of English pronunciation, the way to do it is not to look up the pronunciations of individual words: instead, it's better to read descriptions of English prosody and listen to audio of English sentences. Likewise, French prosody is not communicated effectively by putting stress marks on the entries for individual French words.--Urszag (talk) 10:48, 10 February 2025 (UTC)Reply

Wiktionary:Audio whitelist

Latest comment: 8 days ago7 comments4 people in discussion

Hello,

I am considering a petition to create a project page titled Wiktionary:Audio whitelist. Users can consider major audio contributors for autopatrolled audio rights, and an administrator approves. Trusted audio contributors' sound files are presumed high-quality, thus do not need to be manually patrolled.

Thank you Flame, not lame (Don't talk to me.) 19:40, 5 February 2025 (UTC)Reply

If we can the whitelisters' audios automatically added to en.wikt after creation on Lingualibre, that'd be cool - save me some time. Father of minus 2 (talk) 22:17, 5 February 2025 (UTC)Reply

psst! the project page Wiktionary:Approved Lingua Libre users already exists! Juwan (talk) 19:54, 6 February 2025 (UTC)Reply

We want Flamey on the list! Father of minus 2 (talk) 11:06, 7 February 2025 (UTC)Reply

You do not know how much that means to me. Flame, not lame (Don't talk to me.) 16:05, 9 February 2025 (UTC)Reply

I had no idea. Flame, not lame (Don't talk to me.) 16:04, 9 February 2025 (UTC)Reply

@Flame, not lame I've added you to the list. Please note that the only bot that imports LL files is currently blocked and wasn't very active before the block. I think I'll post in the GP soon asking if anyone else can do the job. Ultimateria (talk) 01:19, 18 February 2025 (UTC)Reply

tr table template to fix proto turkic pages

Latest comment: 13 days ago9 comments4 people in discussion

This is what I have so far. https://en.wiktionary.org/wiki/User_talk:Zbutie3.14/trtable#example \

The descendent list on here https://en.wiktionary.org/wiki/Wiktionary:About_Proto-Turkic#Descendants needs to be fixed and then I can finish it.

@BurakD53 @Allahverdi Verdizade @Yorınçga573 @Blueskies006 @Ardahan Karabağ @Bartanaqa @Samiollah1357 @Zbutie3.14 @Rttle1 @AmaçsızBirKişi Zbutie3.14 (talk) 02:05, 6 February 2025 (UTC)Reply

Support, though your pings did not work.

AmaçsızBirKişi (talk) 10:35, 6 February 2025 (UTC)Reply

If I used desctree for the loanwords wouldn't it get pretty messy? Like if I used desctree for یوغورت (yogurt) then wouldn't there be way too many? Zbutie3.14 (talk) 14:10, 6 February 2025 (UTC)Reply

Maybe for such entries you can adopt the Indo-European reconstruction pages' model of just writing "(see there for more)", if that is applicable here.

That would also reduce the work needed to keep the pages consistent with each other.

AmaçsızBirKişi (talk) 15:40, 6 February 2025 (UTC)Reply

@BurakD53 @Allahverdi Verdizade @Yorınçga573 @Blueskies006 @Ardahan Karabağ @Bartanaqa @Samiollah1357 @Zbutie3.14 @Rttle1 Zbutie3.14 (talk) 14:06, 6 February 2025 (UTC)Reply

Languages without descendants should be set to not appear in the table. However, if a language, such as Turkish, exists but is not attested in its ancestor languages, the ancestor languages should still appear in the table as empty fields. The table should be arranged to reflect this structure. BurakD53 (talk) 08:41, 7 February 2025 (UTC)Reply

Support Yorınçga573 (talk) 17:35, 6 February 2025 (UTC)Reply

Support BurakD53 (talk) 08:30, 7 February 2025 (UTC)Reply

@AmaçsızBirKişi

https://en.wiktionary.org/wiki/User:Zbutie3.14/trtable

can you look at the descendent structure I have now, I added oghuric but I still need the [*xbo-dnb] and [*xbo-vol] language codes. I also need the orkhon turkic [otk-ork] and ajem-turkic codes. For siberian turkic I followed the classification on here. https://en.wikipedia.org/wiki/Siberian_Turkic_languages Zbutie3.14 (talk) 01:04, 13 February 2025 (UTC)Reply

Should Category:Japanese terms spelled with jukujikun and Category:Japanese terms read with jukujikun be merged?

Latest comment: 14 days ago2 comments1 person in discussion

It looks like the former category is given to any entries which use the t:ja-jukujikun template, and the latter to any entries with the yomi= field in t:ja-kanjitab set to juku, but it's unclear to me if these are supposed to be separate categories, and if so, what the intended difference between them is supposed to be. Horse Battery (talk) 02:13, 6 February 2025 (UTC)Reply

Also pinging @Eirikr (I'm not sure who else to add). Horse Battery (talk) 14:03, 12 February 2025 (UTC)Reply

Nasalization of diphthongs

Latest comment: 16 days ago2 comments2 people in discussion

If I wanted to specify the phonetic realization of a word such as time or no, should both vowels of the diphthong be nasalized? [tãɪ̯̃m], [nõʊ̯̃]

I don't know if this varies between languages, or there's a kinda univesal phonetic rule.

Additionally, how about triphthongs in Received Pronunciation?

JMGN (talk) 10:42, 6 February 2025 (UTC)Reply

IMO nasalization in the pronunciation of English words is TMI and not necessary except for a few words like uh-uh (where it's either quasi-phonemic) or can't (where the nasal that triggered the nasalization is clearly deleted). Benwing2 (talk) 00:19, 10 February 2025 (UTC)Reply

Entries needing images

Latest comment: 18 days ago15 comments7 people in discussion

in order to improve Wiktionary per the think tank policy I've written, I wish to add images to most entries that would warrant them. as my interests focus on subcultures, which often lack good documentation and completely lack free images, I request a template and maintance category for pages to publicly keep track of them. Juwan (talk) 19:50, 6 February 2025 (UTC)Reply

I think that a category tree of requested images by language is appropriate and second the proposal. Since so many entries are lacking images, it would be a little too unwieldy to list them on a page like Wiktionary:Requested entries, but I could be persuaded that this is the better option than a category. —Justin (koavf)❤T☮C☺M☯ 19:59, 6 February 2025 (UTC)Reply

{{rfi}} Vininn126 (talk) 20:02, 6 February 2025 (UTC)Reply

And Category:Requests for images by language. —Justin (koavf)❤T☮C☺M☯ 20:07, 6 February 2025 (UTC)Reply

I will dig a hole to bury myself in, anyone wanna join me? this was in front of my face. Juwan (talk) 20:11, 6 February 2025 (UTC)Reply

😌 Vininn126 (talk) 20:13, 6 February 2025 (UTC)Reply

BTW, {{rfi}}, unlike the most common request templates like {{rfe}}, {{rfd}} and {{rfv}}, doesn't play well with right-hand-side table of contents, with images, or with project boxes like {{wikipedia}}. It leaves a lot of white space at the top of the screen. (See Bovidae for a basic example.) DCDuring (talk) 22:32, 6 February 2025 (UTC)Reply

Sense IDs resulting in dead links

I apologize for posting this question here. I tried to convert the Belarusian entry каса (kasa) to senseid/senseno via this diff, but it somehow only results in dead links. Additionally, where exactly does the senseid need to be placed in the image description? In the beginning of it? Right after the bolded term? In the end? --Ssvb (talk) 07:39, 8 February 2025 (UTC)Reply

How important is it to use {{senseid}} / {{senseno}} anyway? It seems to require less efforts to just keep images and senses in sync manually. --Ssvb (talk) 07:56, 8 February 2025 (UTC)Reply

@Ssvb: it's not mandatory to use those templates, but it does reduce the work required if someone adds or rearranges the senses in an entry. — Sgconlaw (talk) 10:37, 8 February 2025 (UTC)Reply

@Theknightwho: any idea why {{senseid}} and {{senseno}} are not working properly for the Belarusian entry mentioned above? I have not had any problems with these templates when using them with English entries. — Sgconlaw (talk) 10:36, 8 February 2025 (UTC)Reply

@Sgconlaw I'll have a look.

@Ssvb It's a lot more work to do it manually, because entries go out of sync over time, which is really annoying. Theknightwho (talk) 10:51, 8 February 2025 (UTC)Reply

@Ssvb I've fixed these. Part of the problem here was because you put things like {{lang|be|(etymology 1 {{senseno|be|hair}}) ...}} in the image caption, which isn't right, because the words "(etymology 1 sense X)" aren't Belarusian. The {{lang}} template should only be for the parts of a sentence which are in a given language.

There does look to be some underlying issue which I'll look into now, as it's not clear why {{senseno}} breaks if it's inside {{lang}}, but I also can't think of any reasons why it would ever need to be in the first place, which I suspect is why this bug hasn't come up before. Theknightwho (talk) 10:59, 8 February 2025 (UTC)Reply

@Theknightwho: Thanks a lot for your help! As for being in the middle of {{lang}}, the stock car entry that is given as an example, puts it immediately after the term: A '''stock car''' ({{senseno|en|sports}}) in a race. The output of the {{senseno}} template is indeed not localized, and probably rightfully so, in the same manner how the "etymology 1" section header and the description label isn't localized either. This makes putting the "(etymology 1 sense 1)" label before or after the image description reasonable. Such label is also a bit longish and would disrupt the image description text if placed in the middle of it. --Ssvb (talk) 13:15, 8 February 2025 (UTC)Reply

@Ssvb That's a good point. I've fixed the underlying issue, anyway, which was down to the fact that {{senseno}} exposed an underlying bug in Module:links, as it produces links like [[#Belarusian:_hair|sense 1]], which weren't being handled properly. If the link module sees a #, it knows the target page is everything before the #, but if there's nothing before # then the target page is supposed to be the current page, which it wasn't accounting for. Instead, it thought the target was nothing, and assumed the # must be the start of the intended page name (i.e. the start of an unsupported title), which is why it was creating the link to the page we'd use if the term was #Belarusian:_hair. Theknightwho (talk) 13:36, 8 February 2025 (UTC)Reply

Move all reference and quotation templates with appropriate language codes

Latest comment: 16 days ago5 comments2 people in discussion

several of the reference and quotation templates (those that start with R: and RQ:, respectively), mostly those for English, lack the appropriate language code in the template title. this is not great for searching and confusing for when you want to add several of them and they inconsistently implement it. templates that refer to multiple languages should likely have redirects with different codes that all point to one. Juwan (talk) 20:31, 8 February 2025 (UTC)Reply

I support this and it's not only English templates that lack the language code. I think there is already consensus for this (there was a previous BP discussion a year or two ago on this topic) as long as the reference or quotation templates refer to only one language; if a work covers several languages, it's less obvious what to do and may make sense to leave off the language code. In any case, at various times I've renamed reference templates appropriately. Benwing2 (talk) 00:13, 10 February 2025 (UTC)Reply

Oh I see you mentioned the case of multiple languages. IMO this should be a separate discussion as the consensus isn't so obvious there. Benwing2 (talk) 00:15, 10 February 2025 (UTC)Reply

@Benwing2 if you can generate a list of the templates that need work, it would be nice to then see what editing communities to reach out for comment. Juwan (talk) 21:17, 10 February 2025 (UTC)Reply

@JnpoJuwan I can try but the list might be huge ... Benwing2 (talk) 21:37, 10 February 2025 (UTC)Reply

Living languages hapax

Latest comment: 17 days ago3 comments3 people in discussion

Does it make sense to use the "hapax" label for living languages? I am asking this because I added the label to two entries in Nheengatu that I created, misa-pituna and gandú (both were only recorded in a 19th-century vocabulary published by Gonçalves Dias). However, since Nheengatu is a living language, it is hard to say for sure if any speaker, wanting to revive a so-called purer version of their language, might end up using these (archaic, perhaps obsolete) words, thus making them no longer hapax. Nheengatu is spoken by a few thousand people, so its presence on the internet is not very large. Therefore, if this were to happen, I would not know... Pinging Trooper57, who also contributes to Nheengatu entries. Opinions from other users, who are not familiar with the language, are of course welcome too; otherwise, I would not have started this topic. RodRabelo7 (talk) 04:45, 9 February 2025 (UTC)Reply

Yes, but not for this one. It is more of a thing if the corpus is somehow exposed, by a sizable literary tradition and subsequent digitization. You still don’t usually know whether something is a hapax however, the concept is somehow transferred from classical philology where dictionary authors historically read all texts.

Since I engaged in Arabic in the same way I acquired Latin and other philogists also imitated classical studies when foraging Arabic, by searching academic treatments I could make some guesses referring more to known than actually retained occurrence of a word, but you see even for this supposedly “well-documented” language we “cut corners”. Nor is it possible for Syriac which has editions running, unlike Latin which after the nineteenth century has few editions enriching its wordhoard with statistical significance (i.e. not that it would even increase our number of recognized hapax legomena, actually the obliteration of ghost words is more likely).

Only where an area’s documents have been completely studied we can actually claim, and apply the label of hapax without analogy to allow for the situation, e.g. Andalusi شُلُنْبَر (šulunbar), because Romancists appear to have complete dictionaries of Andalusi Arabic ({{R:xaa:ELA|II}} the latest, not widely different from {{R:xaa:Corriente}}, {{R:xaa:Corriente-Additions}} but there is some insanity in it), the rest of Category:Arabic hapax legomena is al-Qurʔān, the oldest prose source and excessively discussed in every culture employing the language, so there is cultural knowledge whether a word has not been recorded elsewhere; actually I did not implement any of the guesses except بِرْجار (birjār), though I used to believe that كُذِينَق (kuḏīnaq) and عَمْرُوسَة (ʕamrūsa) are unique as well: maybe it is just our collective attention.

What about the cases where a use is hapax but it is joined by a few mentions? I remember to have had this, especially for terms beginning with ك (k) or ل (l), the only letters covered by {{R:ar:WKAS}}. This is a distinction introduced by Wiktionary. Technically the Latinists and Grecists didn’t care, mentions in wordlists of antiquity always counted and also counted on Wiktionary but due to CFI we have a core belief in their bastardhood; for Classical Arabic which is not declared a well-attested language I preferred one use to be certain of a reading and reach the threshold of one rather than 0.75, as the words collected by old wordlist compilers have various reasons for mutilation, in both forms and meanings given.

Then there are those terms only introduced by conjecture into our texts, it broke the system at lausa vs. *lausa, attested in multiple states of existence, a schizolegomenon … Fay Freak (talk) 06:09, 9 February 2025 (UTC)Reply

I doesn't make much sense now that I think about it. We have this with Arabic, but their hapaxes are a thousand years old, while the oldest Nheengatu recording is less than 300. Also, is it still truly a hapax when it's mentioned in Avila's work? He never really claimed it was a historical dictionary, and some of the archaic terms seem to be attested in Amazonas. Trooper57 (talk) 16:13, 9 February 2025 (UTC)Reply

Voting majorities and supermajorities

Latest comment: 13 days ago14 comments5 people in discussion

Sorry to raise this issue again. It was a little while ago, and a fresh look may be beneficial.

I have just created Wiktionary:Votes/2025-02/Deletion of "Tennis player test", where we vote "support" to delete the test, and "oppose" to keep the test.

Suppose that 7 people want to delete the test, and therefore vote "support", while 5 people want to keep the test, and therefore vote "oppose". According to Wiktionary:Votes/2019-03/Defining a supermajority for passing votes, this results in "no consensus", i.e. no consensus to change, i.e. we keep the test.

On the other hand, suppose I had worded the vote so that people voted "support" to keep the test, and "oppose" to remove it. This time 5 people vote support and 7 people vote oppose, so the vote fails, and -- what?

What should happen?

Or should votes worded so that "support" supports the status quo be disallowed? Mihia (talk) 21:37, 9 February 2025 (UTC)Reply

@Mihia: Yes, this is why votes to "affirm" the status quo don't really work and imho should be avoided. AG202 (talk) 21:43, 9 February 2025 (UTC)Reply

I would say that votes that have no consensus retain the status quo. Although the best way to approach this is to make it a rule that "support" means "change status quo" and "oppose" means "don't change". CitationsFreak (talk) 23:07, 9 February 2025 (UTC)Reply

In reply to your first sentence, "no consensus" is presently defined as at least 50% but less than 2/3 in support. There is presently no concept of "no consensus to fail", and no apparent reason why a "fail" majority of any margin should not result in the "fail" mandate being carried out. Mihia (talk) 23:54, 9 February 2025 (UTC)Reply

@Mihia: Seeing the corresponding BP discussion, I think that this particular vote might be easily getting 2/3+ support for removing the tennis player test, so it wouldn't matter much either way here.

I initally considered suggesting that this could be kept as a bare-majority-passing vote since there was no formal vote when this test was added to the page. See also Wiktionary:Votes/2022-01/Label for lower register as another example of a vote that was designed to pass by simple majority instead of the standard 2/3. But the argument against doing this would be that controversial tests are allowed to stay on that page even if their application is not universal, e.g. Wiktionary:Votes/pl-2018-12/Lemming principle into CFI and WT:LEMMING paragraph.

So IMO it makes sense for this to be a regular vote that would need 2/3 or more supports to pass. In any case, it would always be possible to continue this discussion after the voting has completed. Svārtava (t ɕ) 12:04, 13 February 2025 (UTC)Reply

Although I raised this (again) using the "Tennis player test" vote as an example, it is not an issue specifically about that vote. It is a general point that the "supermajority" rule, as presently worded, is fatally illogical UNLESS there is also a stipulation that votes must be worded so that "support" is to change the status quo and not preserve it. Mihia (talk) 14:54, 13 February 2025 (UTC)Reply

@Mihia: Even if not written down explicitly, it is true that in practice we don't have any confirmation votes (where "support" would be a vote to preserve the status quo), so yes, a "support" vote would indeed be to change the status quo in votes. Svārtava (t ɕ) 15:16, 13 February 2025 (UTC)Reply

It would be as well to make it explicit. I'm pretty sure that last time I looked at this I found that we DID have such a vote, and, guess what, nobody noticed. If I'd worded my vote so that "support" supported the "tennis player" principle, would anyone have noticed any problem? I wouldn't bank on it. Mihia (talk) 15:23, 13 February 2025 (UTC)Reply

@Mihia: I'm pretty sure that it would be easily noticed. The only way this could be possible is if you first removed it successfully (i.e. without anyone objecting or reverting for doing it without a vote) without a vote and then started a vote on whether the test should be on the page or not such that the status quo is not having it.

Also, is the vote you mentioned one of your own votes like Wiktionary:Votes/2021-08/Scope of English prepositions? Svārtava (t ɕ) 15:31, 13 February 2025 (UTC)Reply

Unfortunately I don't remember now which vote(s) I noticed last time as being "wrong polarity". I would need to trawl through all the posts again. Anyway, it will not hurt to make it explicit. What is the downside? I think I may try again to get some wording agreed. Mihia (talk) 15:37, 13 February 2025 (UTC)Reply

@Mihia: Sure, there is no downside in making it explicit. I remember seeing Wiktionary:Votes/2021-03/Polarity of voting proposals and application of supermajority rule but that failed mainly due to the attempt to codify cases with an unclear status quo, which could be left out for now in favour of prioritizing making the part Voting proposals must be worded so that a "support" vote is a vote to change the status quo, while an "oppose" vote is a vote to leave things unchanged. explicit. Svārtava (t ɕ) 15:45, 13 February 2025 (UTC)Reply

In Wiktionary:Votes/2019-03/Defining a supermajority for passing votes, “supports and opposes” implies “supports for the proposed change and opposes to the proposed change”, even if you word your proposal as supports maintaining the status quo, which would only result in supports under such a proposal being opposes under the rule and opposes being supports under the rule. It does not explicitly disallow such motions, since there is no way to game this. Fay Freak (talk) 12:52, 13 February 2025 (UTC)Reply

There is no mention whatsoever in the wording to indicate that "supports" and "opposes" would be reversed in the way that you describe. Mihia (talk) 14:58, 13 February 2025 (UTC)Reply

Yeah, but a bidding can always be formulated as a forbidding, an action as an omission of an omission, and there is always context that is not inside the formulation. This is how language works, that there are things indicated by other means than “mention”; if you think a lot about it, I will think a lot about it. And if you make an effort making up cases it will also take us effort to apply the rules within their very scopes of application. The language is but typical. Fay Freak (talk) 17:52, 13 February 2025 (UTC)Reply

⁅ and ⁆

Latest comment: 14 days ago14 comments4 people in discussion

These are used in Swedish dictionaries. Would it be okay to create articles w a swedish language heading for them? kwami (talk) 00:06, 10 February 2025 (UTC)Reply

To clarify, the question is whether they should be created as Translingual or Swedish entries. In general there doesn't seem to be consensus on how to handle single-character symbols and such. Benwing2 (talk) 00:10, 10 February 2025 (UTC)Reply

Yes. In this case this is, AFAICT, a specifically Swedish convention, so IMO it would be odd to claim, without supporting evidence, that it's translingual. But there have been objections to creating language-specific entries for Unicode characters. kwami (talk) 01:42, 10 February 2025 (UTC)Reply

For Swedish, the entry will need the possibility of being given three independent, durably archived quotes indicating usage. For translingual, the fact that it is described in one work is enough.

So I guess the question to Kwami here is, are you prepared to add three quotes to the Swedish entry, correctly formated, or are you not. In the latter case, it would be best to create a translingual entry instead.

I would also like to make sure whether it is, indeed, only used in Swedish, or also Elfdalian, early Finnish, Danish, and Norwegian, as these languages could be influenced by the Swedish orthographical practices. If it is used there, as well, I would definitely not create a Swedish entry, just a Translingual one. Thadh (talk) 10:13, 10 February 2025 (UTC)Reply

So translingual use is assumed to be true unless proven otherwise?

I'm only aware of use in Swedish. These symbols are a dictionary convention, not an orthographic practice in the usual sense. It's possible they're only used in one line of Swedish dictionaries, Norstedts. If that's the case, there cannot be 3 independent quotations. We're then in the bizarre situation that a demonstrable use in Swedish cannot be added to Wiktionary, but that it is acceptable to falsely claim translingual use. kwami (talk) 18:32, 10 February 2025 (UTC)Reply

I'll go ahead and create a translingual entry. It can always be converted to Swedish if ppl think that's justified. kwami (talk) 18:52, 10 February 2025 (UTC)Reply

Done. I don't know how to suppress the request for translation, which isn't appropriate in this case. kwami (talk) 19:19, 10 February 2025 (UTC)Reply

Well, characters are characters, they don't by themselves necessarily belong to one language or another. I can write a nonsensical string "hekegob|{•×fje|[™ x_:&" and this will not be any language, but every single character there is now attested once (assuming I've printed this in a book/durably archived media). Translingual is the L2 we use for this. Thadh (talk) 10:10, 11 February 2025 (UTC)Reply

But you could do the same for any word as well. I can jumble together every word from every language on Wiktionary. By your argument, that makes every word translingual.

The common-sense meaning of 'translingual' is that it's used across languages, not just that we imagine that it could be. What happened to the idea that entries on Wikt need to be attested in the senses and languages that we claim for them? kwami (talk) 10:24, 11 February 2025 (UTC)Reply

@Kwamikagami: No, because words have meanings. You cannot have a word in a language without a meaning. You can however have a symbol without a meaning. This is why symbols are not language-specific when meaningless, but are when meaningful. Basically, a string of letters in Translingual would be SOP, a Sum of Parts. Thadh (talk) 11:36, 12 February 2025 (UTC)Reply

But if the symbol is meaningless, we wouldn't provide it with a definition. What you're saying is not only that the sense doesn't need to be verified, but that there doesn't need to be a sense at all. How is that appropriate for a dictionary? kwami (talk) 18:24, 12 February 2025 (UTC)Reply

Oof, a thorny question; I'm unsure of the best approach/answer, and (I think) can understand the arguments for both positions. FWIW, my gut reaction is that if we can only find something occurring in Swedish, we enter it as Swedish, and then if people later find it also attested in e.g. Finnish, the entry can be changed (to make it Translingual, or to add Finnish, etc).
Regarding hekegob|{•×fje|[™ x_:&, if three authors were to use hekegob|{•×fje|[™ x_:& in (let's say) Swedish texts, AFAICT we would consider it to be ==Swedish== and its gibberishness would belong on the (non-gloss) definition line, in the same way くぁwせdrftgyふじこlp is Japanese (or asdfghjkl is English). If the authors used the symbol in texts which were entirely devoid of L2-having-language or meaning, I don't think the symbol would be included at all. Can anyone think of counterexamples, where a text belonging to no language(s) and consisting only of meaningless characters is the sole basis for an entry? And if the authors used the symbol in texts which were meaningful language, and the language simply couldn't be identified, it might still get included but AFAICT it'd be as Undetermined, not Translingual... and that doesn't seem relevant to this situation, where the language of the texts using this symbol is identifiable as Swedish. - -sche (discuss) 14:10, 12 February 2025 (UTC)Reply

@-sche: My example wasn't for three quotations of hekegob|{•×fje|[™ x_:&, but rather three instances of any of these characters in a similar manner. For instance, "hekegob|{•×fje|[™ x_:&", "lrngkaowkm38($?#?{`°¥=®" and "wopalf|§{¥©{[¢]?" being three independent quotes for the character { existing. Now, regardless of whether we know what this character is used for, whether it has meaning in this case or could have in some other case, as far as I know, we include these characters simply because readers that find this character in some text or another, would want to know what it is, not necessarily what it means. If { is then used in Swedish in a specific meaning not found elsewhere and attested three times, then indeed it should also be included as a Swedish entry. However, the inherent characteristic of { being a character is imo not a feature of Swedish. Thadh (talk) 17:21, 12 February 2025 (UTC)Reply

What you seem to be advocating is that Unicode characters are inherently notable as Unicode characters. Indeed, it's easy enough to find probably any character in a string of gibberish where one of the non-Unicode conventions for Chinese is spuriously converted to Unicode, for example on Gbooks. But by that argument, we should have an article for every Unicode character, including every emoji, as a couple other wiktionaries do but which by consensus we do not.

Instead, Wk-en has the very nice feature that if you click on a red link for a character, you'll see the info box for that character, giving its Unicode definition. I use that feature all the time, but in the past when people created articles for characters, where the definition was nothing more than the Unicode name, those pages were deleted. kwami (talk) 18:34, 12 February 2025 (UTC)Reply

English or Translingual for constellations?

Latest comment: 9 days ago14 comments6 people in discussion

I notice that And, Cass and probably others are defined weirdly:

# {{lb|mul|astronomy}} {{abbreviation of|en|Cassiopeia|nodot=1}} {{n-g|or its genitive form {{m|mul|Cassiopeiae}}.}}

This is the doing of User:Moverton. @Chuck Entz corrected the corresponding definition of And to use mul in the first abbreviation and Moverton undid this later. Indeed, the genitive forms Cassiopeiae and Andromedae are Translingual but the nominatives are not. This seems strange, but I dunno how Translingual is supposed to work. Either: (a) we need to create Translingual entries for the nominatives or (b) we need to split the abbreviations into English and Translingual entries (or (c) remove the nominative as a possible abbreviation). Also pinging @DCDuring, @Thadh who may have a better understanding of what counts as "Translingual" than I do. Benwing2 (talk) 21:57, 10 February 2025 (UTC)Reply

I don't think we have anything but the general idea that, for a term (or letter, synbol, etc) to be Translingual it should be used (attestably?) in multiple languages, presumably with the same meaning (and pronunciation?). That fits with proper names that are mostly used in writing (pronunciation being secondary), especially when regulated by some multinational body, like the taxonomy, astronomy, and chemical bodies. We include the CJKV characters and taxonomic names systematically and, I suppose, many symbols. I know that at least some of the names of astronomical entities are treated as translingual. I don't know why chemical names other than abbreviations are not so treated, given the role of IUPAC. DCDuring (talk) 03:59, 11 February 2025 (UTC)Reply

Chemical names commonly have slightly different spellings in different languages, I think, don't they? (E.g. varying in the presence or absence of terminal -e, or of inflectional suffixes. It seems English "Barrelene" = German "Barrelen".) The IUPAC's 1993 Introduction makes some mention of language-specific aspects of spelling: "In this guide, efforts have been made to systematize the style (spelling, position of locants, typography, punctuation, italicization, etc.) of the names of organic compounds according to the IUPAC English style. As usual, IUPAC recognizes the needs of other languages to introduce their own modifications". In contrast, forms that originate as Latin genitive forms are probably less likely to be modified for language-specific spelling or inflection conventions, which could be a reason to have Translingual entries for the genitive forms of constellation names but not for their nominative versions (if the latter are not in fact widely used unchanged across languages). I see that the German Wikipedia article on Cassiopeia uses the spelling "Kassiopeia" to refer to the constellation directly but uses "β Cassiopeiae" (rather than e.g. "β Kassiopeiä").--Urszag (talk) 09:43, 11 February 2025 (UTC)Reply

Names of constellations are definitely not translingual, as various languages have traditional names for these (and languages with different scripts have their own form). Abbreviations however, are a different thing, as they may be used in scientific research worldwide regardless of the main body's text. I'm not an astronomer though, so I wouldn't know if in this case that is true. I would change the definition to simply "Abbreviation of the constellation Cassiopeia" without the template (which should be used for within-language abbreviations) or the mentioning of the genitive (as the abbreviation can probably be used for absolutely any case form). Translingual doesn't have grammar. Thadh (talk) 10:06, 11 February 2025 (UTC)Reply

We wouldn't be talking about "vernacular" names of astronomical entities, just whatever standardized names astronomers use, just as kangaroo isn't a taxonomic name. There are many abbreviations (of, eg, asteroids) that seem to be used a lot in different languages, even outside technical literature, and seem obviously translingual. DCDuring (talk) 14:58, 11 February 2025 (UTC)Reply

@DCDuring: "Standardised" astronomical entries afaik are still country- and language-specific, unlike taxonomic names. On the abbreviations I've elaborated above. Thadh (talk) 11:33, 12 February 2025 (UTC)Reply

See my ignored comment below. DCDuring (talk) 14:32, 12 February 2025 (UTC)Reply

The entries seem to suggest that abbreviations such as And, Cass are used specifically or particularly in star names constructed according to the formula "Greek letter + Latin genitive form", rather than being used willy-nilly as a replacement for the name of the constellation Cassiopeia in any context. I don't know if that's true, but if so, it would mean that this type of abbreviation is not necessarily "used for absolutely any case form". For example, the abbreviations appear in this table under the column "Const.", but the immediately preceding column, labeled "ID", provides the Greek letter designation (e.g. ξ And), so it might make sense to interpret the abbreviations in this context as being implicitly short for Andromedae, Cassiopeiae, etc.--Urszag (talk) 16:00, 11 February 2025 (UTC)Reply

Yeah I would prefer this approach to having a vague untemplated "Abbreviation of Foo" definition. In the case where an abbreviation is translingual but is based on a specific language, it should probably use {{abbrev}} in the Etymology section and rely on the explicit language code support that I'm about to add (if it's not already there), so Translingual IOP could use {{abbrev|mul|en:[[independent]] [[Olympic]] [[participant]]s}} or similar, and have a definition that says "a neutral designation for athletes competing in the Olympic games not under a specific country's flag" or similar. Benwing2 (talk) 00:08, 12 February 2025 (UTC)Reply

These 3-letter abbreviations do seem to be officially defined by the IUC, as mentioned by some articles related to the one DCDuring posted below; maybe a template specifically for these could be created that would automatically generate wording like what I have now put at And. Adjusting the "abbreviation" template to enable examples like the one you mention also seems like a good idea in general.--Urszag (talk) 15:26, 12 February 2025 (UTC)Reply

Per WP, in May 2016, the w:International Astronomical Union ("IAU") has established the w:IAU Working Group on Star Names ("WGSN"). As of June 2018 they had approved 330 names, often enshrining traditional or historical names. Per WP there are w:Astronomical naming conventions for stars, constellations, galaxies, comets, novas, pulsars, black holes and geological or geographical features of some of these. I would think that, if attestable, these should be Translingual by default, but subject to challenge as to their use in multiple languages. I would think that Latinate forms, as Cassiopeiae could be treated as Latin inflected forms and as Translingual 'adjective' lemmas, specific epithets are now, albeit very incompletely and unsystematically. DCDuring (talk) 15:28, 11 February 2025 (UTC)Reply

The current state of IAU naming policy can be found here. An 8/14/2022 list of 24,254 IAU names can be found here. For star names there is a downloadable list that includes "proper name", "designation", "constellation", etymological information, "reference", one or two star catalog designations, etc. This seems ripe for a template for the ~500 star names. Analogous templates might be worthwhile for several other classes of astronomical entities. DCDuring (talk) 15:59, 12 February 2025 (UTC)Reply

For the original question, the three-letter IAU abbreviations are international. In German, for example, Cassiopeia is Kassiopeia, but the abbreviation is still Cas with a 'C'.

The international forms of the full names are Latin. I don't know if those should be listed as Latin or as translingual, but 'Cas' is definitely translingual.

Note that there are also language-specific abbreviations, such as Cass in English. kwami (talk) 18:44, 12 February 2025 (UTC)Reply

What kwami says here is what I understood to be true. But I'm interested in what others think. The whole Translingual thing always feels a bit awkward since it isn't used by other dictionaries. Mike (talk) 05:41, 17 February 2025 (UTC)Reply

New records

Latest comment: 10 days ago7 comments6 people in discussion

According to stats.wikimedia.org, last month was our biggest month ever with...

224,806,388 page views (previous record: 221,640,467 in September 2024)
227,144 user edits (previous record: 215,498 in August 2012, but that was probably bot activity)
971 active editors (previous record: 957 in December 2024)

Looks like we're doing something right! Ioaxxere (talk) 03:02, 12 February 2025 (UTC)Reply

🍻 —Quercus solaris (talk) 04:57, 12 February 2025 (UTC)Reply

Yes, just to note that the 224.8 m figure includes everything, while human page views (as far as can be detected ... I don't know how reliable this can be) were 91.5 m. Mihia (talk) 15:46, 13 February 2025 (UTC)Reply

Any stats on number of added bits? Vininn126 (talk) 16:29, 13 February 2025 (UTC)Reply

WT:STATS includes the recent month. Fay Freak (talk) 17:54, 13 February 2025 (UTC)Reply

It coincides with Wonderfool being unemployed and single. In fact: of the 227,144 user edits, 69,069 were WF, of the 971 active editors 343 are WF. So it's not that impressive. Father of minus 2 (talk) 21:27, 16 February 2025 (UTC)Reply

Okay dude. Vininn126 (talk) 21:30, 16 February 2025 (UTC)Reply

Derived terms from a different part of speech we do not have

Latest comment: 13 days ago20 comments6 people in discussion

We had Walmart only as a verb, with firebomb a Walmart under “Related terms”, but @WordyAndNerdy changed it to “Derived terms” (before adding the proper noun) with the edit summary, “WT:ELE: "List terms in the same language that are morphological derivatives. For example, the noun driver is derived, by addition of the suffix -er, from the verb to drive." All these terms derive from the name of the store. That we don't have a noun sense for the store is a byproduct of WT:CFI/WT:BRAND. It doesn't change the derivation of these terms.” How should these situations be handled? This issue also came up with Dothraki, where Dothrakian is not derived from the proper noun (the language). Perhaps the “Derived terms” section should be in the entry as a level-3 heading instead of a subsection of the wrong part of speech? J3133 (talk) 10:05, 12 February 2025 (UTC)Reply

As another example, @LunaEatsTuna mentioned in the RfD of FedEx that, were the proper noun deleted (leaving the verb), the “Derived terms” section (with FedExer and FedEx quest), would be changed to “Related terms”. J3133 (talk) 10:19, 12 February 2025 (UTC)Reply

This seems like a solution in search of a problem. Mallwart and firebomb a Walmart derive from the company name Walmart (i.e., the retail store chain) regardless of whether we have a dedicated entry proper-noun definition for Walmart. Gaps in Wiktionary's coverage – whether rooted in policy or oversight – shouldn't shape how we document the relationships of words. Any company/brand/fictional concept that is linguistically productive enough to have multiple derived terms (such as Facebook, Twitter, etc.) may warrant its own entry/definition. Anyway, Dothrakian is also used as a non-standard synonym of the conlang, so there's now a one-for-one relationship between it and Dothraki. WordyAndNerdy (talk) 01:52, 13 February 2025 (UTC)Reply

"may warrant its own entry/definition" is the key thing here: should we implement a policy that would enable disallowed terms, such as corporations, fictional locations, political parties etc. to be allowed to have entries if they have a certain number of derived terms (perhaps two or three)? It might appear odd for us to have entries for websites like Mumsnet (which has three derived/related terms on here) but not more popular ones like Bilibili, Canva or xHamster. Would an “etymology hub” (allowing these entries similarly to THUBs) be a terrible idea? It would enable the more convenient categorisation of terms derived from the same source while simultaneously letting editors know that said entries should not be RfD'ed (and, it would not enable a massive flood of entries for websites and corporations either). Pinging @This, that and the other who had proposed a similar idea two months ago. LunaEatsTuna (talk) 05:14, 13 February 2025 (UTC)Reply

@LunaEatsTuna: The question is whether they should currently be under “Related terms” or “Derived terms”. From your mention in the RfD, I assume that you support the former. J3133 (talk) 06:02, 13 February 2025 (UTC)Reply

Yes—that is the logical option IMO if there is no proper noun sense listed. Otherwise including a “Derived terms” header under the incorrect word (like a verb) could be misleading to readers. LunaEatsTuna (talk) 06:04, 13 February 2025 (UTC)Reply

I also wrote that it is misleading, but WordyAndNerdy claimed that it is irrelevant because it “meets definition of a "derived term" laid out in WT:ELE” and “the word morphology is the same”. (I included one edit summary above, but you can see the history of the Dothraki entry). J3133 (talk) 06:18, 13 February 2025 (UTC)Reply

WT:ELE provides the only policy guidance on the distinction between "derived terms" and "related terms" of which I'm aware. The way it reads is that a "derived term" is one that directly derives from another. For example, Sherlockian derives from Sherlock through the addition of the -ian suffix, Tescoization through the addition of -ization, etc. Whereas a "related term" is one that shares a common/parallel etymology with another word but isn't directly derived from it. Examples would be broligarch and broligarchy. Broligarchy didn't derive from broligarch. Both words were formed by blending bro with oligarch/oligarchy. Whereas the hypothetical broligarchical would be a "derived term" of broligarchy since it would be formed by combining the latter with the -ical suffix. In the absence of other clear, codified policy guidance, this is the framework we should use. WordyAndNerdy (talk) 06:43, 13 February 2025 (UTC)Reply

I am aware of our policy, but we sort derived terms by the part of speech they derive from. It is misleading to state that firebomb a Walmart is derived from the verb Walmart. J3133 (talk) 06:47, 13 February 2025 (UTC)Reply

I'd hazard that most people looking at the previous version of the Walmart entry would have intuitively concluded that firebomb a Walmart derives from the name of the store. They wouldn't conclude it derived specifically from the verb sense of Walmart because most readers don't know – much less care – about inside-baseball considerations like header levels. This is a solution in search of a problem in the truest sense. WordyAndNerdy (talk) 07:10, 13 February 2025 (UTC)Reply

The solution is not to claim that firebomb a Walmart is derived from “To shop at Walmart” or “To outcompete, […]”. We used this solution but you decided that this is incorrect. J3133 (talk) 07:27, 13 February 2025 (UTC)Reply

Well, that might not be true in every case; the policy itself is still misleading in my view and, I do not entirely see why it must remain so? I would presume that “Derived terms” are terms actually derived from the entry or sense the subheading appears on. Changing it to “Related terms” is more correct as it removes any potential ambiguity or misinformation. LunaEatsTuna (talk) 07:32, 13 February 2025 (UTC)Reply

The problem is that the distinction you're making here is an entirely circumstantial one. It's based on the (former) absence of a proper-noun sense at Walmart rather than the inherent properties of firebomb a Walmart. The derivation of a term/phrase doesn't change simply because Wiktionary doesn't have an entry for its source. Firebomb a Walmart is a derived term as outlined by WT:ELE because it combines the store name with a verb in a fashion similar to prefixing, suffixing, or blending. However, I'm not opposed to J3133's suggestion of resolving such edge cases by having a level-3 "derived terms" section floating unattached to any POS sub-heading. Seems more constructive than applying "related terms" in a way that doesn't align with WT:ELE. WordyAndNerdy (talk) 08:06, 13 February 2025 (UTC)Reply

I don't see a need to remake the wheel as far as policy goes. WT:BRAND, WT:COMPANY, and WT:FICTION allow for the inclusion of otherwise disallowed terms as long as narrow (usually idiomatic) use is documented. We have had entries for Facebook, McDonald's, Darth Vader without issue for over a decade. I do like Luna's "etymology hub" idea though. Would allow for the inclusion of less-obvious productive proper nouns like Mumsnet. (The proliferation of Mumsnet-related terms is explained by the site's context in UK politics.) I agree that two or three derived terms would be a good threshold for such "hub" entries. WordyAndNerdy (talk) 07:33, 13 February 2025 (UTC)Reply

@LunaEatsTuna @WordyAndNerdy I called it "derived terms hub", because the intent is to provide a central place for all those derivations to be listed together and showcased. The "etymology hub" aspect (whereby we avoid having to repeat the proper noun's etymology in umpteen places) is also important, but, to me, secondary.

Honestly WT:COMPANY could do with being rewritten from scratch while we're at it. It currently says that company names can only be included if they're not company names, which might have been useful advice for Wiktionarians of 20 years ago, but is tautological by today's standards. This, that and the other (talk) 08:07, 13 February 2025 (UTC)Reply

I made a similar argument in the RfD nomination of Minecraft a while back. WordyAndNerdy (talk) 08:12, 13 February 2025 (UTC)Reply

@WordyAndNerdy ah, I knew I had seen that somewhere - it included the nice turn of phrase "linguistically productive".

For your or anyone else's interest, I drafted some text at User:This, that and the other/WT:COMPANY. One needs to be acutely aware of the failed 2022 vote on this topic, which I opposed as too prescriptive. This, that and the other (talk) 08:50, 13 February 2025 (UTC)Reply

Pinging @AG202, Mihia, Polomo47, Svartava for This, that and the other's proposal above since I know they might be interested. LunaEatsTuna (talk) 09:05, 13 February 2025 (UTC)Reply

I strongly agree with what WordyAndNerdy said on the RFD for Minecraft. I sort of lean towards including single-word brand/company name if they even satisfy a requirement of just having one inclusion-worthy derived term or sense (such as a verb or common noun having the same spelling). Svārtava (t ɕ) 09:27, 13 February 2025 (UTC)Reply

I would support WordyAndNerdy's suggestion. I find WT:COMPANY rather confusing as is. AG202 (talk) 14:55, 13 February 2025 (UTC)Reply

Babel rework

Latest comment: 13 days ago6 comments2 people in discussion

@-sche, benwing2 I've reworked the Babel template to a module, MOD:User:Saph/Babel, which allows default messages and has a parameter for disabling categorisation; it's also completely (I think) back-compatible with the current template. All that's missing now is for translations of the default message to be added, but I'm hesitant to do that before it's out of my user space so that there aren't a ton of data subpages that need to be moved. - saph ^_^^⠀talk⠀ 20:28, 12 February 2025 (UTC)Reply

You can find examples in my sandbox. - saph ^_^^⠀talk⠀ 20:30, 12 February 2025 (UTC)Reply

Just a heads-up that your template seems not to print the "This user cannot read or write any languages. Assistance is required." message correctly when no parameters are specified. Lunabunn (talk) 23:23, 12 February 2025 (UTC)Reply

Fixed, thanks. - saph ^_^^⠀talk⠀ 02:38, 13 February 2025 (UTC)Reply

Thank you. As another heads-up — I have already brought this up on Discord, but note on top of parsing JSON files from the extension repo, we also need to crawl user templates manually for some languages we have added on-site that aren't covered by the extension. We then have to decide upon how we will reconcile conflicts between the two sources; benwing has suggested that extension data should be prioritized.

(Tangentially, I hope you don't mind the signature plagiarism ;-)) 🌙🐇 ^⠀talk⠀ ^{⠀contribs⠀} 10:34, 13 February 2025 (UTC)Reply

I've moved the module to mainspace, Module:Babel. - saph ^_^^⠀talk⠀ 13:51, 13 February 2025 (UTC)Reply

Arabic root links from category "terms derived from the Arabic root" broken

Latest comment: 12 days ago3 comments2 people in discussion

https://en.wiktionary.org/wiki/Category:Swahili_terms_derived_from_the_Arabic_root_%D9%84_%D8%AD_%D9%82 says "Swahili terms that originate ultimately from the Arabic root ل ح ق (l ḥ q).". But the link to the root page is broken, although the root page exists (Appendix:Arabic_roots/ل_ح_ق). Same for https://en.wiktionary.org/wiki/Category:Swahili_terms_derived_from_the_Arabic_root_%D8%AD_%D8%B6_%D8%B1 and Appendix:Arabic_roots/ح_ض_ر.

These links used to work before.

CC @Fenakhay tbm (talk) 04:34, 13 February 2025 (UTC)Reply

Should be fixed (although note that bug reports of this nature should go to the WT:Grease pit rather than the WT:Beer parlour). Benwing2 (talk) 09:30, 13 February 2025 (UTC)Reply

@Benwing2 thanks, I can confirm it's fixed. Doh, I wanted to report it in Grease pit. I didn't notice I opened the wrong page. Thanks again! tbm (talk) 03:25, 14 February 2025 (UTC)Reply

Extended Mover Request: User:Lunabunn

Latest comment: 5 days ago3 comments2 people in discussion

I would like to request WT:Extended mover rights for easier cleanup of Middle (okm) and Old (oko) Korean entries. We have recently settled a new consensus on lemmatization policy, leaving us with several entries to be moved and many others to be reviewed. For context see WT:Beer parlour/2024/December#Rethinking Middle Korean verb lemmatization, WT:About Middle Korean#Lemmatizations. 🌙🐇 ^⠀talk⠀ ^{⠀contribs⠀} 10:52, 13 February 2025 (UTC)Reply

All right, I have granted this. It's been 7 days, no one has specifically objected and this user seems responsible based on their prior edits to Korean pages and pronunciation modules and their participation in various discussions online and in Discord. Benwing2 (talk) 09:06, 21 February 2025 (UTC)Reply

Thank you always! I will pick up past editors' great work and see that Koreanic gets the housekeeping attention it needs. 🌙🐇 ^⠀talk⠀ ^{⠀contribs⠀} 10:03, 21 February 2025 (UTC)Reply

proposed new set or POS category: Category:Postal abbreviations?

Latest comment: 11 days ago5 comments3 people in discussion

English and other languages have lots of postal abbreviations such as AZ for Arizona in the US, and Wilts for Wiltshire in the UK. I think we should have a set (or POS) category for this. I notice we have Category:Geographic abbreviations outside the category tree, with only 3 entries, so someone else had the same idea and was (half-assedly) trying to implement it. I was thinking the postal abbreviations category would contain abbreviations both for toponyms and other types of postal abbreviations (St. = "street", Dr. = "drive"; COD = "cash on delivery"), but maybe it makes more sense to separate out the ones referring to toponyms. If so, we could call the category postal toponym abbreviations or maybe just toponym abbreviations, to incorporate things like Calif. for California; and because at least in the US, postal abbreviations like AZ have expanded their use beyond the mail system; and also because of ISO 3166-2, which establishes standardized abbreviations for first-level political subdivisions of countries with broader application than just postal services. (In of the case of the US at least, the abbreviations look like US-AZ for Arizona, i.e. they recycle US postal abbreviations.) (FWIW they also establish codes for some "lower-level divisions" in weird cases of de-facto countries that ISO doesn't consider countries; case in point, Taiwan, which is listed as "Taiwan, Province of China" [grrrr] because the UN seems to see things this way on China's behest, but where Taiwanese counties, independent cities and special municipalities still get codes).

Next question: Should this be a set category like Category:en:Postal abbreviations or Category:en:Toponym abbreviations etc., or a POS category like Category:English postal abbreviations or Category:English toponym abbreviations etc.?

@-sche, @Ioaxxere who have commented on past proposals for new categories and helped separate out the boundary between set and POS categories. Benwing2 (talk) 07:46, 15 February 2025 (UTC)Reply

@Benwing2: Since we have Category:ISO 3166-1 alpha-2, why don’t you add Category:ISO 3166-2 alpha-2? I realize there may be unofficial codes, but this would be a reason to rename both categories to something more generic. Fay Freak (talk) 08:31, 15 February 2025 (UTC)Reply

Okay, you think of “other types of postal abbreviations”, which has the potential to become an unorganized wastebasket from what I can see, the country subdivisions however are not language-specific, you print it on package labels sent from one EU country to another. (Just sent some designer drip from DE to IT.) Fay Freak (talk) 08:35, 15 February 2025 (UTC)Reply

To me, "toponym abbreviations" seems more maintainable (perhaps more useful?) than "postal toponym abbreviations", as it seems difficult to determine what constitutes a "postal" vs "non-postal" abbreviation: if you address a letter to "Willcox City Hall, 101 Sou. Railroad Av., Willcox, Ariz." rather than "...S. Railroad Ave., Willcox, AZ", or address a letter to "Nola City Hall, 1300 Perdido Str., NOLA" rather than "...Perdido St., New Orleans, LA", the postal service will still deliver it, so which of those are "postal" abbreviations? And while some historical postal abbreviations (e.g. official USSR or DDR ones) surely count, it seems likely there are cases where it's unclear whether a country in the past or present uses particular abbreviation(s) officially. (I found things like Chs. in the English Dialect Dictionary and other dictionaries and have no idea whether it's a Royal-Mail-recognised abbreviation or not.) But if there are official lists like "ISO 3166-2 alpha-2" (which is will-defined) that someone wants to categorize, that seems fine, and if someone wants to make a case for why postal and nonpostal toponym abbreviations should be in separate categories, please do!
Abbreviations like "St." and "COD" do not seem restricted to postal use(?) nor do they seem to have anything in particular in common except that some post offices use both, but don't post offices also use e.g. "i.e." or "e.g." in some publications? So I'd want to see more explanation of why "St." and "COD" should be together in one category (besides the overall "abbreviations" category), and how to decide what else should or shouldn't be in that category. It is possible we could assemble enough "postal service terminology" to merit a category (nutting truck comes to mind).
Regarding what type of category it should be, I guess it should be a subcategory (like Category:English case citation abbreviations) of, and thus the same type of category as, Category:English abbreviations...? - -sche (discuss) 19:25, 15 February 2025 (UTC)Reply

OK this makes sense. Should it be "toponym abbreviations" or "geographic abbreviations"? Some might argue that the latter uses a more familiar term, but I personally prefer "toponym abbreviations" because someone could argue that Mt. or mtn. = mountain is a "geographic abbreviation". Benwing2 (talk) 22:14, 15 February 2025 (UTC)Reply

Deprecating parameters in `{{ko-IPA}}`?

Latest comment: 9 days ago2 comments2 people in discussion

@AG202 @Chom.kwoy I am currently working on a complete rewrite of all Koreanic translit and pron modules/templates (see my user page) that I hope to roll out gradually in coming months (see my user page). This will hopefully bring easy maintenance and consistency by replacing huge data tables and disjointed code with shared, modular, imperative functions. That aside, @Solarkoid and I have found this opportunity to provide impetus for simplifying {{ko-IPA}}'s parameter interface. This will also be inherited by {{jje-IPA}}. I propose the following:

Add: |alt=alternative pron; allow alternative pronunciations to be specified without having to retype the headword
Strong Remove: |ui=; in the standard pronunciation, ui becoming i in non-word-initial position is completely regular and does not/should not need to be specified; |uie=; this is for one word, 의 (-ui). why?? manually specify; |svar=; this is for two words, 멋있다 (meositda) and 맛있다 (masitda). manually specify
Remove: |iot=; this is very seldom used & can be manually specified; |nobc=; this is seldom used & can be manually specified
Modify: ~~|com=~~ |tense=; specify the syllable being tensed, e.g. |tense=1 in 사이트 (saiteu), not the previous syllable, e.g. |com=0
Keep: |cap=; |l=; |bcred=; |nn=, |ni=; internally, one can be an alias of the other, but n-insertion and nl > nn are semantically distinct, so both can be kept for intuitiveness

🌙🐇 ^⠀talk⠀ ^{⠀contribs⠀} 10:14, 15 February 2025 (UTC)Reply

Strong Support. AG202 (talk) 15:56, 17 February 2025 (UTC)Reply

CFI edit request

Latest comment: 10 days ago3 comments2 people in discussion

At Wiktionary:Criteria_for_inclusion#Idiomaticity, after the sentence "Idiomaticity rules apply to hyphenated compounds, including hyphenated prefixed words, in the same way as to spaced phrases", could someone with permission please add a reference linking to the vote at Wiktionary:Votes/2019-10/Application of idiomaticity rules to hyphenated compounds. Many votes are linked, but this one seems to have been overlooked. Thanks. Mihia (talk) 12:41, 16 February 2025 (UTC)Reply

@Mihia: Added. J3133 (talk) 12:45, 16 February 2025 (UTC)Reply

Great, thanks. Mihia (talk) 12:46, 16 February 2025 (UTC)Reply

Allowing technically SoP entries involving highly polysemic words

Latest comment: 9 days ago2 comments2 people in discussion

There is already a "get-out clause" at WT:SOP ("In rare cases ... etc. etc."), but I have for some time thought that we should make specific allowance for inclusion of phrases involving words with very many meanings, where the phrase almost invariably has one specific meaning that is obtained by choosing the correct sense of each of the components, and where it may be unreasonable to expect readers to be able to readily do this. One recent example that comes to mind is patch file, though I don't want to focus particularly on whether that would or would not qualify, just on opinions about the general idea. Does anyone have any views? Mihia (talk) 12:55, 16 February 2025 (UTC)Reply

I assume we are just talking about two-part compounds. It could be desirable, where both terms in the compound used non-obvious tertiary senses for highly polysemic (including multi-etymology) terms. Unfortunately, our ability to efficiently come to a conclusion about this kind of thing (either a policy or individual definitions) has proven to be insufficient to prevent low-quality compounds from remaining in Wiktionary for years. I can hope that having well-defined criteria for inclusion would also mean well-defined criteria for exclusion, which would make it easier to remove some of the dreck. Unfortunately we seem to have an inclusionist bias, so that hope is probably unjustified. DCDuring (talk) 23:20, 16 February 2025 (UTC)Reply

harmonizing families and proto-languages, and other proto-language warnings

Latest comment: 1 day ago9 comments3 people in discussion

We have a whole host of warnings (17) issued concerning mismatches between proto-languages and families:

Proto-Central Togo (alv-gtm-pro) does not have the expected name "Proto-Ghana-Togo Mountain", even though it is the proto-language of the Ghana-Togo Mountain languages (alv-gtm).
Proto-Arawa (auf-pro) does not have the expected name "Proto-Arauan", even though it is the proto-language of the Arauan languages (auf).
Proto-Arawak (awd-pro) does not have the expected name "Proto-Arawakan", even though it is the proto-language of the Arawakan languages (awd). [harmonize under Arawak]
Proto-Ta-Arawak (awd-taa-pro) does not have the expected name "Proto-Ta-Arawakan", even though it is the proto-language of the Ta-Arawakan languages (awd-taa). [harmonize under Ta-Arawak]
Proto-Basque (euq-pro) does not have the expected name "Proto-Vasconic", even though it is the proto-language of the Vasconic languages (euq). [keep as-is]
Proto-Norse (gmq-pro) does not have the expected name "Proto-North Germanic", even though it is the proto-language of the North Germanic languages (gmq). [keep as-is but rename gmq-pro to non-pro]
Proto-Kamta (inc-krn-pro) does not have the expected name "Proto-KRNB lects", even though it is the proto-language of the KRNB lects (inc-krn). [rename family to KRDS languages, keep proto-language as-is]
Proto-Chumash (nai-chu-pro) does not have the expected name "Proto-Chumashan", even though it is the proto-language of the Chumashan languages (nai-chu).
Proto-Maidun (nai-mdu-pro) does not have the expected name "Proto-Maiduan", even though it is the proto-language of the Maiduan languages (nai-mdu).
Proto-Mixe-Zoque (nai-miz-pro) does not have the expected name "Proto-Mixe-Zoquean", even though it is the proto-language of the Mixe-Zoquean languages (nai-miz).
Proto-Pomo (nai-pom-pro) does not have the expected name "Proto-Pomoan", even though it is the proto-language of the Pomoan languages (nai-pom).
Proto-Mazatec (omq-maz-pro) does not have the expected name "Proto-Mazatecan", even though it is the proto-language of the Mazatecan languages (omq-maz).
Proto-North Sarawak (poz-swa-pro) does not have the expected name "Proto-North Sarawakan", even though it is the proto-language of the North Sarawakan languages (poz-swa).
Proto-Salish (sal-pro) does not have the expected name "Proto-Salishan", even though it is the proto-language of the Salishan languages (sal). [harmonize under Salish]
Proto-Samic (smi-pro) does not have the expected name "Proto-Sami", even though it is the proto-language of the Sami languages (smi).
Proto-Kuki-Chin (tbq-kuk-pro) does not have the expected name "Proto-Kukish", even though it is the proto-language of the Kukish languages (tbq-kuk). [harmonize under Kuki-Chin]
Proto-Saka (xsc-sak-pro) does not have the expected name "Proto-Sakan", even though it is the proto-language of the Sakan languages (xsc-sak).

We also have four warnings about proto-languages without associated families;

Proto-Amuesha-Chamicuro (awd-amc-pro) has a proto-language code associated with the invalid code "awd-amc".
Proto-Kampa (awd-kmp-pro) has a proto-language code associated with the invalid code "awd-kmp".
Proto-Paresi-Waura (awd-prw-pro) has a proto-language code associated with the invalid code "awd-prw".
Proto-Puroik (sit-khp-pro) has a proto-language code associated with the invalid code "sit-khp".

We also have two weird miscellaneous warnings:

Proto-Rukai (dru-pro) has a proto-language code associated with Rukai (dru), which is not a family.
Kelantan Peranakan Hokkien (mis-hkl) has its canonical name ("Kelantan Peranakan Hokkien") repeated in the table of aliases.

I can look into the second miscellaneous warning, but for the others, I mostly don't have enough context. Proto-Norse being the ancestor of the North Germanic languages is a special case because it's attested, but for the other mismatches, I imagine a lot of them are unintentional due the existence of multiple names for the same family. It should be possible in many cases to rename either the family or proto-language to avoid the mismatch. Pinging @-sche and @Theknightwho who might know something about this; please feel free to ping others. Benwing2 (talk) 04:09, 19 February 2025 (UTC)Reply

In some cases, I think the family uses a different name to avoid having the same exact name as a (non-proto) language (as described in WT:FAM). For example, "Proto-Vasconic" gets only 13 Google Books hits (that actually use that term; the subsequent pages upon pages of results that Google returns don't use the term or sometimes even have any particular relevance — who knows why Google returns them), whereas I find 10+ pages [of ten uses each] of "Proto-Basque", so "Proto-Basque" is clearly the more common name for the language ... but without even checking whether "Basque languages" or "Vasconic languages" is more common for the family, I can see that one benefit to calling them "Vasconic languages" is that if they were called "Basque languages", then things like {{der|en|euq|-}} would display identically to {{der|en|eu|-}}. (That might not matter that much in that particular case, but for larger families it'd be confusing. However, {{der|en|qwm|-}} and {{der|en|trk-kip|-}} do display identically... so maybe we need to rename one of those, or find some way of solving this "same name" issue...)
In some cases, the proto-language and family might really have different common names.
In the case of Salish, it looks like the family could be renamed "Salish" to match the proto-language; "Proto-Salish" gets 11 pages of relevant Google Books results vs only 9 pages for "Proto-Salishan", and "Salish languages" is apparently also more common. - -sche (discuss) 05:04, 19 February 2025 (UTC)Reply

"Ta-Arawak" seems to be marginally more common than "Ta-Arawakan", if we wanted to synchronize that pair: on Google Scholar, "Ta-Arawak" gets 40 hits, "Ta-Arawakan" 26; on Google Books, each one gets about 14 hits (discounting a few which are not in English and are only using ta as a particle while mentioning the Arawak/an languages). "Proto-Ta-Arawakan" gets 1 GBooks hit and "Proto-Ta-Arawak" gets none; "Ta-Arawakan languages" returns 2 copies of 1 book, "Ta-Arawak languages" returns 1 book. On Google , "Ta-Arawakan languages" returns 0 hits while "Ta-Arawak languages" returns 7 (of which 3 are duplicates of a single work). - -sche (discuss) 18:31, 19 February 2025 (UTC)Reply

@-sche What about Proto-Arawak vs. Arawakan? Wikipedia has w:Arawakan languages and w:Ta-Arawakan languages (although the w:Arawakan languages article uses "Ta-Arawak" in reference to the family). Since Ta-Arawakan is a subfamily of Arawakan, it seems we should be consistent in the names of these two families. (Meanwhile, confusingly, Category:Arauan languages is an apparently unrelated family; Wikipedia's article is at w:Arawan languages, which looks more "modern".) Benwing2 (talk) 00:59, 21 February 2025 (UTC)Reply

Although both names seem to be common enough that the Google (Books) Ngram Viewer should be able to plot them (both seem to get well over 40 hits), it doesn't like the hyphens, so this claims no results, and I can't be sure whether this is actually a graph of "Proto-Arawak" or instead of how many books have "Proto" minus "Arawak". Nonetheless it seems like "Arawak" is more common, if we wanted to standardize everything on that. (Google Scholar also claims to find slightly more results for "Proto-Arawak" than "Proto-Arawakan", and significantly more for "Arawak" than "Arawakan".) - -sche (discuss) 18:32, 22 February 2025 (UTC)Reply

For Kamta, I notice there's the added oddity that the language family/category is named "... lects" rather than "... languages", even though the languages in the category are named "Category: ... language". AFAICT, that part of the name should be regularized (from "lects" to "languages"). For the name itself, google books:"KRNB" languages Kamta turns up zilch (and I spy only three Google Scholar hits), but "Kamta languages" also turns up zilch (and if the family were renamed "Kamta" to match the proto-language, we would run into the Kipchak issue where {{der}} etc would return the same name whether the family or the [non-proto] language that's already called "Kamta" was called). Wikipedia uses a third name, "KRDS", which I can find a couple of Google Books and a couple of Google Scholar hits using. There are a couple Google Books and Scholar hits for "proto-Kamta", and none for "Proto-KRNB" or "Proto-KRDS", so maybe we leave the proto-language name as "Proto-Kamta" but change the family from "KRNB lects" to "KRDS languages"? Or maybe some Indian-language editors have better knowledge/ideas: pinging User:AryamanA who created Category:Rajbanshi language (and you already pinged TKW, who Category:Surjapuri language). - -sche (discuss) 18:32, 22 February 2025 (UTC)Reply

In general, I'd follow the literature; if they generally use a different name for the proto-language vs. the group by which the proto-language is reconstructed, so be it. If it's an even split between multiple names: sure, harmonize it for convenience. However, I have a few suggestions.

Rename "Kukish" to "Kuki-Chin" (Kuki-Chin is more common)
Change the code of Proto-Norse from gmq-pro to non-pro but keep the "Proto-Norse" name (since that's what the literature calls it). It doesn't really make sense for Old Norse to be non but Proto-Norse to have "gmq" instead.

— Ceso femmuin mbolgaig mbung, mellohi! (投稿) 17:06, 24 February 2025 (UTC)Reply

Definitely, in cases where one name is more common for the proto-language and another for the group, I agree it's fine for them not to match. - -sche (discuss) 17:50, 25 February 2025 (UTC)Reply

@-sche, Mellohi! I added the results so far in bold. There's a trend here in that so far generally the name of the proto-language has remained and the name of the family changed. I don't know if that applies to the remainder, though. Benwing2 (talk) 20:30, 24 February 2025 (UTC)Reply

Turkish IPA module proposal

Latest comment: 6 days ago9 comments3 people in discussion

(Notifying İtidal, Fytcha, Vox Sciurorum, Lambiam, Whitekiko, Ardahan Karabağ, Orexan, Moonpulsar, Lagrium): I'd like to propose the possibility of introducing a pronunciation module to tidy up the mess that is currently on the pronunciation section of Turkish pages. I have already made a prototype of such module on Module:User:Trimpulot/tr-IPA-test. It is currently capable of guessing the pronunciation of most words:

it understands that â, î and û cause palatalization and are usually long in open syllables
in other cases the palatalization of consonants needs to be marked with capital letters (K, G or L), and the length of vowels with a following colon (:)
stress must be marked with an apostrophe (') preceding the stressed syllable, unless it's the last one
if a word's last vowel becomes long before a suffix beginning in a vowel, the word must be followed by a plus sign and the accusative vowel; this plus sign should be replaced by a dash if the final consonant also undergoes lenition
if the word's pronunciation is perfectly understandable from its spelling, the template needs no parameters, except for a potential +V or -V, which can stand on its own (e.g. on mahbup, the only needed parameter is |-u)

Trimpulot (talk) 09:06, 19 February 2025 (UTC)Reply

Judging from the example, the "dash" is an ordinary hyphen (-)

Some entries give only a phonemic pronunciation (e.g. abi: “IPA^(key): /aːbi/”), some give only a phonetic pronunciation (e.g. açım: “IPA^(key): [ɑˈtʃɯm]”), and some have both phonemic and phonetic ones (e.g. kar: “IPA^(key): /ˈkaɾ/ [ˈkʰɑɾ̞̊]”). How will the new module handle this?

How is the module invoked? ({{#invoke:User:Trimpulot/tr-IPA-test|???|...|???}}). The documentation should include some examples. For example, how should mal and hal be handled? --Lambiam 12:43, 19 February 2025 (UTC)Reply

@Lambiam As of now, the module is designed to only give a phonemic pronunciation (which I deem to be sufficient). I will add a documentation on Template:User:Trimpulot/tr-IPA-test to better explain how it works.

Trimpulot (talk) 15:44, 19 February 2025 (UTC)Reply

I agree with only giving a phonemic pronunciation; see my argument at Wiktionary:Tea room/2021/March#Turkish pronunciation. But others may disagree, seeing how much work appears to have been put in these narrow transcriptions. In particular, User:Science boy 30 writes on his talk page: “This user is against broad transcription.” His latest contribution, at gâvur, has been to replace [ɟɑˈβ̞uɾ̞̊] by [ɟɑˈβ̞ʊɾ̞̊]. So it may be wisest to at least allow room for narrow, phonetic transcriptions and use it to retain existing ones. I also suggest testing the new module by comparing its results with currently given phonemic pronunciations. --Lambiam 17:34, 19 February 2025 (UTC)Reply

@Trimpulot @Lambiam My personal view is that we should provide a "lightly phonetic" transcription that includes aspects of pronunciation that may not be phonemic but which significantly impact the actual pronunciation and may be non-obvious to language learners. An example is Spanish voiced stops /b d g/, which become approximants [β̞ ð̞ ɣ̞] in certain positions (e.g. between vowels). The pronunciation as approximants is very salient and audible, and pronouncing them as stops marks you as a foreigner with a bad accent. OTOH the exact quality of Spanish mid vowels /e o/ is less important and probably doesn't need to be indicated. I don't know Turkish well but it seems to be that a "lightly phonetic" rendition would include palatalization of /k g l/ whenever it occurs but not necessarily things like aspiration of voiceless stops or the other details found in a transcription like [ɟɑˈβ̞ʊɾ̞̊] (which seems too detailed). Overall though I'm strongly in favor of having a pronunciation module; manually generated pronunciations always end up messy and inconsistent. Benwing2 (talk) 22:16, 19 February 2025 (UTC)Reply

@Trimpulot One other thing ... you should probably come up with a different way of marking palatalization than capital letters, because capital K G L will clash with proper names that happen to have capital letters in them that aren't palatalized. You could for example use an apostrophe to indicate palatalization (k' g' l') and switch to using an acute accent to mark stress (á é ...), or mark one of them with an apostrophe and the other with a double quote ". Actually, apostrophes might not be so good if there are Turkish words that have apostrophes in their normal spelling (I don't know if that's the case, but you don't want people to be forced to provide respelling of words that happen to have capital letters, apostrophes, etc. in them). Also your module should be able to handle multi-word terms correctly; I can help you come up with a syntax for this, as I've written several pronunciation modules. Benwing2 (talk) 22:27, 19 February 2025 (UTC)Reply

@Benwing2 Capital letters in proper names do not constitute a problem, since titles are converted to lowercase before being analysed (of course, if a proper name needs to be manually transcribed, the editor should not use capital letters for anything but palatalization). As for the stress, using accent marks would make it difficult to stress special characters such as ü, ö and ı, in turn making it necessary to use an apostrophe for stress. Furthermore, I don't see a valid reason to switch to an apostrophe-double quote system, since that would require more characters to transcribe what it can already handle well. Multi-word terms are already handled, I only need to figure out where it's best to automatically place the stress. As for phonetic transcriptions, I might add a way to add them manually.

Trimpulot (talk) 09:22, 20 February 2025 (UTC)Reply

If a proper name needs to be respelled, it's IMO going to be very awkward to require that lowercase letters be used in place of capital letters in respellings so that capital letters can be used for palatalization. You're likely to have bad output coming from editors who forget they need to lowercase all capital letters in respelling. It also makes the substitution notation (see e.g. {{ca-IPA}}, {{fr-IPA}}, {{cs-IPA}}, {{pt-IPA}} for examples of this in action) significantly more awkward. Trust me that it would be better to use something other than capital letters for palatalization. Benwing2 (talk) 09:27, 20 February 2025 (UTC)Reply

I can't tell when the impact on the actual pronunciation should be considered significant. The minimal pair kar – kâr shows the distinction is not always purely phonetic, but I don't know any comparable /ɡ/ – /ɟ/ pairs. I tend to indicate palatalization of /ɡ/ in entries I create, mainly for consistency with how native Turkish editors tend to handle this (e.g. using /bɛlˈɟe/ for belge). Other native Turkish editors, however, may disagree (as shown by a preference for the broad /t͡ʃiˈzel.ɡe/). Some anomalous pronunciations, such as the common monosyllabic pronunciation [diːl] for değil, are IMO also worth recording. So I think we should allow one or more narrow transcriptions in conjunction with the broad one. (We will still miss colloquial sandhi phenomena like [nɑˈbæɾ] for ne haber and [nɑpˈtɯn] for ne yaptın.) --Lambiam 09:00, 20 February 2025 (UTC)Reply

FYI: "About [Language] pages" are being moved to "[Language] Entry Guidelines"

Latest comment: 6 days ago3 comments3 people in discussion

Per Wiktionary:Requests for moves, mergers and splits § Wiktionary:English entry guidelines vs "About (language)" in every other language, all the About Language pages, like WT:About Jeju, are being moved to Language entry guidelines, such as WT:Jeju entry guidelines, by @ExcarnateSojourner. This change blindsided me a bit considering what I've been used to, and looking at the discussion, I don't feel that there was enough participation (and it should've been mentioned here). Nonetheless, this is more so a message out there for other folks so that they're not confused as well. AG202 (talk) 06:00, 20 February 2025 (UTC)Reply

We should also consider renaming Cat:Wiktionary language considerations and replacing references to "language considerations pages" to something like "Wiktionary language guidelines"/"language guidelines pages". ("Wiktionary language-specific entry guidelines"/"language-specific entry guidelines pages" is too much of a mouthful.) This, that and the other (talk) 10:23, 20 February 2025 (UTC)Reply

@AG202 Thanks for the feedback, and sorry to have caught you off guard. Counting RFM discussions there were twelve participants, which is a lot for an RFM (though I get that this is a particularly large change). — excarnateSojourner (ta·co) 20:46, 20 February 2025 (UTC)Reply

Social media account

Latest comment: 6 days ago8 comments4 people in discussion

@Chuck Entz, CitationsFreak, DCDuring, Ioaxxere, Thadh, Theknightwho, Vininn126: there was a previous discussion in March last year about whether it would be a good idea to start one or more social media accounts to publicize the English Wiktionary, and maybe also to interact with people (though I'm slightly sceptical about that). Having tried out Bluesky for a while now, I wonder if we want to experiment by setting up an account which can be accessed by a few trusted users. I can put up a daily Word of the Day post, and maybe someone can do one for the Foreign Word of the Day too. Maybe others would like to highlight other entries which are relevant to current affairs, or talk about how they improve the dictionary. To register an account on the main Bluesky Social platform, we'd need to put down an e-mail address (one that the trusted users can access, I suppose), a password and a "birth date" (the date when the dictionary launched??). A possible account name is @en.wiktionary.

Alternative, "Bluesky is an open network where you can choose your hosting provider. If you're a developer, you can host your own server." See https://atproto.com/guides/self-hosting. Not sure if that's better, but someone else would have to set up and maintain this.

Thoughts? — Sgconlaw (talk) 11:37, 20 February 2025 (UTC)Reply

I made a Bluesky account a few days ago. Vininn126 (talk) 11:43, 20 February 2025 (UTC)Reply

@Vininn126: for yourself or for the English Wiktionary? — Sgconlaw (talk) 11:44, 20 February 2025 (UTC)Reply

For English Wiktionary. Haven't done much to set it up, but it exists. As it stands, access is generally limited to admins. Setting up some code or something to automatically post (F)WOTD's would be nice. Vininn126 (talk) 11:47, 20 February 2025 (UTC)Reply

@Vininn126: great! Well, as I mentioned, I’m happy to post WOTDs. No idea if this can be automated, but it might be nice to do it manually as I can add interesting comments about the etymology or meaning, as well as an image from the Commons. I could start, say, on 1 March 2025. — Sgconlaw (talk) 11:58, 20 February 2025 (UTC)Reply

I think experimenting is a good idea. Starting on a platform that doesn't have a vast number of users seems wise. But we wouldn't get a lot of new users without going big (FB, etc.). OTOH, going big scares me. DCDuring (talk) 17:19, 20 February 2025 (UTC)Reply

Speaking of which, Sgconlaw, your latest WOTD picks have been absolutely popping. ―⁠K_(ə)tom (talk) 18:12, 20 February 2025 (UTC)Reply

@Ktom: ha ha, thanks! The holoalphabetic month has been fun to work on. — Sgconlaw (talk) 19:40, 20 February 2025 (UTC)Reply

The continuity of Foreign Word of the Day

Latest comment: 3 days ago21 comments13 people in discussion

This is a notice that you will soon need someone new to prepare Foreign Words of the Day if you desire to continue it. I shall not detail why, but I am no longer available, neither for setting them nor mentoring somebody else nor for standing by.

All slots to the end of March have been filled.

I wish good luck to the next person in charge. General advice is to learn the WDL and LDL rules fast, always feature one definition with a quotation at the least (if applicable), and to look at older examples and copy them when in doubt (in particular for Chinese and Egyptian). ~~←₰-→~~ Lingo ^Bingo _Dingo (talk) 17:40, 20 February 2025 (UTC)Reply

@Lingo Bingo Dingo: thanks for all your hard work! — Sgconlaw (talk) 17:42, 20 February 2025 (UTC)Reply

You will be missed. Thank you for all you've done! Vininn126 (talk) 17:44, 20 February 2025 (UTC)Reply

@Lingo Bingo Dingo: I'm also sorry to see you go. Your userpage picture is beautiful; I hope it stays. I also hope you won't object to the restoration of your talk page and its archives, for the sake of retaining accessible and searchable discussion records. Thank you for all your work. Whatever has happened, is happening, or will happen in your life, I wish you the best. 0DF (talk) 18:23, 20 February 2025 (UTC)Reply

Thanks for your work on FWOTD. I may be interested in filling in the role a little bit. (New around here but I have 3½ years experience posting words every day) Hftf (talk) 22:29, 20 February 2025 (UTC)Reply

Wonderfool is also leaving. Father of minus 2 (talk) 22:33, 20 February 2025 (UTC)Reply
When Equinox left, Wonderfool said he was leaving too. That was a year or so ago. Benwing2 (talk) 00:48, 21 February 2025 (UTC)Reply

See you, mister. Polomo47 (talk) 23:20, 20 February 2025 (UTC)Reply

Working with you has been a treat. Flame, not lame (Don't talk to me.) 16:27, 22 February 2025 (UTC)Reply

@Lingo Bingo Dingo: Thank you for all the work, and I am sad to see you go :( Thadh (talk) 16:30, 22 February 2025 (UTC)Reply

@LBD: Thank you for all your hard work maintaining FWOTD for so long! - -sche (discuss) 18:01, 22 February 2025 (UTC)Reply

@everyone: when someone isn't available to set an English WOTD, the system recycles last year's word, whereas with FWOTD (unless this has changed in the time since I was familiar with it) the system fails. If no-one is able to step up and maintain FWOTD, two ideas that'd allow for a smaller workload are (1) switch to a fallback system like WOTD, and/or (2) reduce the frequency, e.g. make it "f. word of the week". (Of course, if someone has time to maintain the current system, great!) - -sche (discuss) 18:01, 22 February 2025 (UTC)Reply

In the Information Desk post complaining about a lack of entries, I mentioned how I would be interested in adding words. Also, it doesn't have to be a single person — say, everyone with the autopatroller role should be able to edit FWotD. Polomo47 (talk) 02:14, 23 February 2025 (UTC)Reply

We should let AI choose and set up FWOTD. Or what Polomo47 says, sometimes one can (excelling editors can) shortcut it and dump some quoted terms into FWOTD because it spares braincells of one who would maintain setting FWOTD from nominations; I never understood numerological criteria, to be frank.

As a middle ground, I fancy a “FWOTD adder” equivalent to the translation adder where we can just drop ready lemmas and some computer program will arrange it according to which languages have been too recently featured and are stocked. Fay Freak (talk) 04:47, 23 February 2025 (UTC)Reply

@Polomo47: if you are interested in taking on the FWOTD, you should go for it and try it out. I also think it's a good idea to set up a fallback system for the FWOTD like the one used for the WOTD. (Actually, the WOTD's fallback isn't fully implemented yet. I've been (slowly) adding permanent fallbacks for various days of the year now and then, but I'm sure there are some present fallbacks that are incorrect because they refer to movable holidays in past years.) — Sgconlaw (talk) 22:20, 23 February 2025 (UTC)Reply

Well, FWotD, like WotD, is (was?) locked from editing — I found that out at the start of the year. Not sure how one gets access... But yes, I'd like to try it out! Polomo47 (talk) 22:22, 23 February 2025 (UTC)Reply

@Polomo47: I assume you have been around long enough to be autoconfirmed? I see you've been editing since last year. I'm not very sure how the FWOTDs are protected; perhaps @Chuck Entz can advise on this. — Sgconlaw (talk) 22:27, 23 February 2025 (UTC)Reply

It seems back when I tried I was not autopatrolled; tried it now and it worked. That settles it, then. Polomo47 (talk) 22:31, 23 February 2025 (UTC)Reply

@Polomo47 What page were you trying to edit? I just looked at Wiktionary:Foreign Word of the Day/2025/January 8 and it has no protection at all, not even autoconfirmed. I was able to edit it logged out. Benwing2 (talk) 22:33, 23 February 2025 (UTC)Reply

It wasn't protection, per se, but an abuse filter. I tried to add a FWotD on January 1st but got hit with what I know know is Special:AbuseFilter/119. That might've been because I tried to edit it on the day itself, and I might not've been autopatrolled back then. Polomo47 (talk) 22:41, 23 February 2025 (UTC)Reply

@Lingo Bingo Dingo: Really appreciate your work, especially FWOTD! The care and effort you take to maintain this for so long is really admirable. All the best on your endeavours! (Note for whoever takes over: Feel free to bug me for/about more Chinese entries if relevant.) — justin(r)leung _{{ (t...) | c=› }} 01:21, 23 February 2025 (UTC)Reply

Transliteration of Ethiopic ቐ

Latest comment: 4 days ago5 comments2 people in discussion

Currently the transliteration norms listed at Wiktionary:Ethiopic transliteration give the transliteration of ቐ (and other glyphs with the same consonant) as <ḳʰ>; I feel this is misleading, as this consonant does not indicate an aspirated stop in Tigrinya (the only language widely using this glyph), but rather indicates an ejective fricative [x']~[χ'].

I feel that <x̣> would be a more representative and internally consistent transliteration, as it ties its transliteration to ኸ <x> (another velar stop commonly spirantized post-vocalically in Tigrinya) and continues to follow the established norm of using the underdot to indicate its emphatic articulation, as well as more clearly showing its actual phonetic realization. Rsmit274 (talk) 00:36, 21 February 2025 (UTC)Reply

Then is the Wikipedia page for Geʽez script wrong in giving ቐ as "qʰ [q]" ? Exarchus (talk) 15:18, 21 February 2025 (UTC)Reply

Thanks for flagging that; it is indeed wrong. If there are any scholarly sources that identify the phone as [q], I'd certainly be interested to hear about it; however, on a quick (informal) literature review, it looks like Maria Bulakh, Niguss Mehari and Rainer Voigt identify it as [χ'], and Tsehaye Teferra, Colleen Fitzgerald and Wolf Leslau identify it as [x'], with none identifying [q]. Rsmit274 (talk) 19:30, 21 February 2025 (UTC)Reply

I was looking for the ISO standard for Geʽez transliteration, but it doesn't appear to exist.

ቐ does seem to be used for [q] in the Awngi language, so that might explain the statement on Wikipedia. But given the greater importance of Tigrinya (we only have one Awngi lemma), using <x̣> for ቐ seems a good idea. Exarchus (talk) 20:52, 21 February 2025 (UTC)Reply

I took the liberty of making the proposed move from 〈ḳʰ〉 to 〈x̣〉 in the relevant modules and pages. It would still be possible to have a different transliteration for Awngi if needed. Exarchus (talk) 11:23, 22 February 2025 (UTC)Reply

Entries with no Etymology headers

Latest comment: 3 days ago25 comments12 people in discussion

Is there, or could there be a category or other tool to track down entries which contain lemmas without Etymology section? Saumache (talk) 20:29, 21 February 2025 (UTC)Reply

The usefulness of such a category might even still be minimal as there are plenty of things such as alt forms or English multiword phrases where the etymology is clear/unnecessary. Vininn126 (talk) 20:31, 21 February 2025 (UTC)Reply

In what cases would it be so clear as to be unnecessary? Even doghouse has an etymology and I really cannot imagine the scenario where someone is confused as to how that word came about. —Justin (koavf)❤T☮C☺M☯ 21:00, 21 February 2025 (UTC)Reply

I listed two such cases. I'm kind of wondering how you missed those. Vininn126 (talk) 21:02, 21 February 2025 (UTC)Reply

But how is a multi word phrase more obvious than the compound word "doghouse"? Is "doghouse" somehow less clear than "dog house"? The only reason why a multi-word phrase may not need an etymology is because the header template is likely to just link to each word individually, making it redundant in that regard. That said, there are clearly plenty of multiword phrases where how it was coined or why it exists as a phrase is actually far more obscure than "this is a house for a dog, so it's called a 'doghouse'", so an etymology would be helpful. I will grant that I don't know of any alternative forms that have separate etymologies (e.g. only one etymology at color, not colour or yogurt but not yoghurt), so that may be a case where we don't in practice have them, but that doesn't mean they shouldn't. Has there been discussion on this? —Justin (koavf)❤T☮C☺M☯ 21:12, 21 February 2025 (UTC)Reply

The space shows where the gap is between words. Without it, we can't tell "psycho- + therapist" from "psycho- + the + rapist". 2A00:23C5:FE1C:3701:DCF2:CDF7:FC1F:D3F 21:14, 21 February 2025 (UTC)Reply

Okay, but who would think that "doghouse" is "do-+gho+-use"? It's a house for a dog, so it's a "doghouse". Is there anyone who is confused by this? —Justin (koavf)❤T☮C☺M☯ 21:16, 21 February 2025 (UTC)Reply

Some foreign learners might look up a complex word, with little knowledge of its constituents. (I do this with Finnish.) It's good to be consistent and include these things anyway (where there is any ambiguity, i.e. no spaces), for example to allow machine parsing. 2A00:23C5:FE1C:3701:DCF2:CDF7:FC1F:D3F 21:18, 21 February 2025 (UTC)Reply

I'm not sure what your point is IP EQ. Vininn126 (talk) 21:20, 21 February 2025 (UTC)Reply

I think he means that some people might be a tad confused, at think that "doghouse" has a different etymology than "dog+house" (e.g. coming from Spanish *perrocasa). It also helps etymology bots, but telling them that "dog" and "house" are the origins of "doghouse", leading to stuff like etymology trees and the like. CitationsFreak (talk) 01:42, 22 February 2025 (UTC)Reply

Given terms like anethole and cathode, it may not be immediately obvious to a non-native speaker that cathole is not the spelled form of a word pronounced /ˈkæθ.oʊl/. ‑‑Lambiam 19:28, 23 February 2025 (UTC)Reply

@Koavf I would say the main thing to consider is that dog house has links to dog and house in the headword title, and doghouse doesn't (aside from the etymology section). But at the end of the day a blanket rule is easier than trying to figure out what's "obvious" enough. For example I doubt most people could point out that haphazard is made up of hap +‎ hazard even though in principle it's equivalent to doghouse. Ioaxxere (talk) 22:02, 21 February 2025 (UTC)Reply

My sister for years thought that "misled" was pronounced like a past tense (MAI-zuld), as she had only seen it in print. 2A00:23C5:FE1C:3701:DCF2:CDF7:FC1F:D3F 22:03, 21 February 2025 (UTC)Reply

I'm not saying all multiword entries shouldn't have etymologies. Vininn126 (talk) 21:15, 21 February 2025 (UTC)Reply

Granted, but I'm asking what is the difference between what is an apparently sufficiently clear multiword phrase and a sufficiently clear compound like "doghouse"? If we have an etymology at one, why not the other? You also seemed to not see my other questions. —Justin (koavf)❤T☮C☺M☯ 21:17, 21 February 2025 (UTC)Reply

I do not think you understand my point and are accidentally making a strawman. I said that it's a numbers game, that the number of such entries not needing a section could easily be larger than those needing it. Reread my first comment. Vininn126 (talk) 21:19, 21 February 2025 (UTC)Reply

Apart from it being sufficiently clear, it may also be that we do not know any wording or formatting that could make the matter more clear than it is, which is why we only include {{ar-rootbox}}, {{syc-rootbox}}, {{he-rootbox}}, {{aii-root}} when a word has a native root with a transfix serving vague purposes. (Just found out that {{shi-rootbox}} exists for two months but has not had success in deployment yet.) If you can only be superficial you can just as well leave it at the linked constituents. Fay Freak (talk) 22:22, 21 February 2025 (UTC)Reply

Eh, nor sure what this would do that {{rfe}} doesn't already cover. Also, to be quite honest, for some languages, there simply isn't anything to mention etymology-wise. It's not unclear, and putting {{unk}} everywhere doesn't feel right. AG202 (talk) 21:47, 21 February 2025 (UTC)Reply

If it isn't unclear, why not put down the etymology for the languages? If it's unknown, then I could see the argument. CitationsFreak (talk) 01:46, 22 February 2025 (UTC)Reply

It is unknown. For a lot of underrepresented languages' base morphemes, there haven't been much major research into their etymologies and they have no written ancestors. Other than possible cognates, there's really nothing to add. Ex: Yoruba bùn and most other Yoruba monosyllabic verbs; the etymology sections are empty, and if the header "Etymology" exists, it's only to separate out lemmas per our entry layout. Reconstructions have only been made for certain words and outside of those words, there's quite literally no information out there. AG202 (talk) 05:50, 22 February 2025 (UTC)Reply

You could use -insource:/\=Etymology/ to eliminate pages with etymology headers and incategory:"English lemmas" in Special:Search to find English lemmas without etymology sections anywhere on the page, but you would want to narrow it down further, and it would take some tweaking to keep the search from timing out. Chuck Entz (talk) 21:56, 21 February 2025 (UTC)Reply

@Chuck Entz Thanks! I wasn't narrowly thinking of English entries in making my query and most of the comments are far off what I intended to do with such a tool. The idea is that, apart from the fact I deem them mandatory, entries lacking Etymology headers (and that should have one, most of these lemmas simply being of affixational origin) are more or less all stubs, old entries that need some clean up and/or added content. I keep stumbling upon these randomly and wanted to really address the issue. Saumache (talk) 22:49, 21 February 2025 (UTC)Reply

And, by the way, where do I find documentation on search box "templates"? Saumache (talk) 22:59, 21 February 2025 (UTC)Reply

@Saumache the Help button in the top-right of Special:Search takes you to the documentation at mw:Help:CirrusSearch. There is also the advanced search dropdown at Special:Search. This, that and the other (talk) 00:25, 22 February 2025 (UTC)Reply

Please don't add "etymology" sections to taxonomic species names. It is more useful to make sure that there are entries with etymologies for genera and for specific epithets. DCDuring (talk) 16:14, 22 February 2025 (UTC)Reply

Upcoming Language Community Meeting (Feb 28th, 14:00 UTC) and Newsletter

Latest comment: 4 days ago1 comment1 person in discussion

Hello everyone!

We’re excited to announce that the next Language Community Meeting is happening soon, February 28th at 14:00 UTC! If you’d like to join, simply sign up on the wiki page.

This is a participant-driven meeting where we share updates on language-related projects, discuss technical challenges in language wikis, and collaborate on solutions. In our last meeting, we covered topics like developing language keyboards, creating the Moore Wikipedia, and updates from the language support track at Wiki Indaba.

Got a topic to share? Whether it’s a technical update from your project, a challenge you need help with, or a request for interpretation support, we’d love to hear from you! Feel free to reply to this message or add agenda items to the document here.

Also, we wanted to highlight that the sixth edition of the Language & Internationalization newsletter (January 2025) is available here: Wikimedia Language and Product Localization/Newsletter/2025/January. This newsletter provides updates from the October–December 2024 quarter on new feature development, improvements in various language-related technical projects and support efforts, details about community meetings, and ideas for contributing to projects. To stay updated, you can subscribe to the newsletter on its wiki page: Wikimedia Language and Product Localization/Newsletter.

We look forward to your ideas and participation at the language community meeting, see you there!

MediaWiki message delivery 08:30, 22 February 2025 (UTC)Reply

Transliteration of Bactrian υ /h/

Latest comment: 4 days ago1 comment1 person in discussion

Would it be an idea to transliterate Bactrian υ (Greek script) as 'h'? I noticed that Bactrian φ is already transliterated differently (viz. as 'f') than Greek ('ph'). Exarchus (talk) 14:05, 22 February 2025 (UTC)Reply

Inaccurate label and usage notes on non-standard English verb forms

Latest comment: 2 days ago3 comments3 people in discussion

English conjugations like knowed and swimmed are marked as mistakes typically made by non-native speakers or children, but these forms are extremely common in the South. From visiting my kin in Kentucky, I have heard "knowed" from native speakers probably more often than "knew". Conjugating irregular verbs in the past tense as tho they are standard is more the rule than the exception in these dialects, particularly in years/decades/centuries past. I want to be conservative about removing the labels and usage notes as they are or modifying them, so I wanted to get some validation here that these are not merely or even primarily mistakes made by someone who doesn't know better, but a perfectly normal part of some American dialects. —Justin (koavf)❤T☮C☺M☯ 00:21, 23 February 2025 (UTC)Reply

The quotes for knowed show it is common in dialectal English. nonstandard is the normal label for this, and indeed these terms have this label. You could expand on this by writing something like {{lb|en|nonstandard|;|dialectal|or|non-native speaker error}} (which displays as (nonstandard; dialectal or non-native speaker error)) and remove the usage note. Benwing2 (talk) 01:32, 23 February 2025 (UTC)Reply

Generally agreed on all of the thinking above. One nuance that can be added is the concept of "nonstandard in most dialects but a standard alternative form in some." Thus I would word the label more like "...|nonstandard in most dialects|..." rather than "...|nonstandard|;|dialectal|...", for full accuracy. The examples that leap to my mind for that aspect are come, run, and seen as preterite inflections (in addition to being the past participle inflection), which can fairly be said to have been traditionally standard (i.e., alternative but not-nonstandard) in working-class sociolects of AmE in the 19th and 20th centuries, and still today for plenty of people. The only reason it was taught in schools that they were "wrong" is the theme that "if you want to participate in 'upper-class' discussions, you must shed those forms from your usage." The difference is conflating upper-class usage with the only usage that can be standard in a language, versus the linguistically accurate understanding that each lect can have some standards that are different from those of other lects. It is interesting how in the 21st century there is more room, culturally, for people to properly understand how working-class sociolects are not inherently "backward" (just different), whereas in the 19th and 20th centuries there was no room for admitting that. A complex topic of course. Quercus solaris (talk) 17:46, 24 February 2025 (UTC)Reply

Hokkien (or Southern Min) as a separate language again

Latest comment: 1 hour ago35 comments15 people in discussion

I'm a frequent Wiktionary user frustrated by this enough to look up ways to open a discussion here. This appeal will be for Hokkien, but the same also applies to at least Cantonese and Hakka. This is effectively an appeal to reverse Wiktionary:Votes/pl-2014-04/Unified Chinese.

There is no dictionary other than Wiktionary that treats Hokkien as Unified Chinese. For a learner trying to look up words in Hokkien, the experience has them looking for their language in just a random "Etymology 2" entry. For example, try looking for the Hokkien definition of 阮.

In the Unified Chinese vote from 2014, it was stated:

the reason for the marginalisation of other varieties is that it is practically troublesome and unnecessary to have to duplicate everything (...) except the pronunciation for all 17 ISO-coded Chinese topolects.

This is not the reason for the marginalization. That runs much deeper; cf. me trying to make a case here that the language shouldn't be relegated to a subsection of the macrolanguage.

The vote was also rooted in an incorrect understanding of Chinese languages. They are not simply different pronunciations of the same language; this remains true even if one thinks of them as dialects of Chinese. A reminder that Northern Thai gets to have its own section, while the non-mutually-intelligible language/dialects Cantonese, Mandarin, Hokkien have to use the same section.

The perceived de-duplication also has no such effect, as other unquestionably-non-dialect languages that use Han/Chinese Characters still necessitate separate templates, etymologies, and definitions.

I would also like to note that Southern Min is the only case in [[Wiktionary:Language_treatment][the current language treatment policies]] that is a subdivision that's also treated as a language family. For that matter, Chinese is also the only language family with this Unique Treatment, and it hurts English Wiktionary as a whole. Kisaragi Hiu (talk) 06:59, 23 February 2025 (UTC)Reply

It's an interesting piece of Wiktionary history that the user who was (to my recollection, at least) most responsible for Sinitic languages being merged under one header (and for traditional Chinese being lemmatized rather than the modernly-more-common simplified Chinese!), and was doing the work of implementing and maintaining that system, Wyang, subsequently became quite argumentative, edit- and wheel-warring with people, and ultimately left the project (and thus stopped doing that work), but now it'd be a lot of work to undo or modify either of the changes. You're not the first person to suggest this, and I'm glad it's being discussed. There are benefits and drawbacks to either approach, merging or splitting; the current approach indeed makes it harder / less intuitive to find content on a specific lect, or tell whether or not a given (unlabelled) definition exists in a given lect or not, but it's more compact. - -sche (discuss) 08:53, 23 February 2025 (UTC)Reply

I will add that in the past I've seen people argue that splitting would result in less coverage of smaller lects, but I don't see how: surely all the information we currently have on them should be preserved in any split, and for that matter, I don't see why we couldn't retain any "unified" infrastructure (e.g. dialect maps) that editors found useful to maintain in a unified way; and any claim that it's easier to enter Hokkien [etc] information under the current system is ^{[citation needed]}. - -sche (discuss) 18:30, 23 February 2025 (UTC)Reply

Strong support - shouldn't have been merged. Chihunglu83 (talk) 09:11, 23 February 2025 (UTC)Reply

I actually saw some problems in the current "Unified Chinese" representation:

The "Traditional Han script" vs. "Simplified Han script" part didn't respect different Han simplification standards/facts － for example, "個"=>"个" is the Han simplification in Mandarin standard while "个"=>"个", "個"=>"個" (unmerged) is the Han simplification standard in Hakka, Hokkien, Wu.
The current "Unified Chinese" implementation did not clearly give any information about whether the word is only used in Mandarin or only lack of "Pronounciation in other Sinitic languages" － this is the case for most entries with only Mainland Chinese Mandarin/Taiwanese Mandarin pronounciation written in the "Pronounciation" section.

-- 2402:7500:586:3B29:0:0:34C5:81A6 11:42, 23 February 2025 (UTC)Reply

I agree with the view that the current treatment of Chinese is flawed (there has been multiple posts and discussions on this in the past years), and certainly it needs improvement. I should also note that the original 2014 vote is deeply flawed in its rationale, assuming that the main differences are in vocabulary and sometimes (quote: 1%) in vocabular (and later the proposer asserts in the discussions that there are zero grammatical differences between Sinitic languages, when in fact there are many).

There are two ways to approach the problem, splitting or merging.

Splitting Chinese up might seem straight forward, but there are outstanding problems on how the grouping should be done (it's known that the traditional or ISO groupings are problematic in certain parts, and often omits minor dialect groups e.g. She), and how deep do we want to go splitting up (e.g. should Southern Min be a macro-L2? Or should Hokkien, Teochew, Leizhou, and Hainanese each be an L2? What about marginal dialects that don't really fall under a proper grouping?).

On the other hand, I'm not opposed to putting the entirety of "Chinese" under one L2 (if done properly) – but the current approach clearly doesn't work (arguably this is caused by Wyang created the Chinese L2 by merging other lects into Mandarin). At the minimum we should distinguish between senses that are pan-Sinitic, or "MSC" (i.e. put {{lb}} onto every definition no matter what), and split classical/literary Chinese off. – wpi (talk) 16:38, 23 February 2025 (UTC)Reply

In my personal opinion Chinese shouldn't be an L2 at all, and all Chinese languages should be split into individual ones (by whatever classification seems best; for instance, a separate Dungan L2 not being poorly linked to a [China] Mandarin L2 would maybe be a good idea). However, I can understand why that would be a problem for the editors of Sinitic languages on Wiktionary, since that likely means years of work carefully splitting up the definitions and re-designing the entire infrastructure.

So the main question in my opinion should be: Are our Sinitic editors (e.g. @wpi, Justinrleung, TongcyDai and others, forgive me if I've forgotten to ping anyone else, I'm not too familiar with our editor base) prepared to put in the work right now, or not? And in which domains? Thadh (talk) 17:08, 23 February 2025 (UTC)Reply

My opinion hasn't really changed much from what I have said in Wiktionary:Beer parlour/2022/March#Why are all Chinese varieties stuffed under one Chinese?. I do think there are trade-offs with either approach. I may be less opposed to splitting Chinese up than before, but I still think the value of the current infrastructure allows us to worry less about the fuzziness of boundaries among varieties and focus on the lexical items one by one. I guess this can be too much of an editor-centric convenience and really make it less useable for users. If we are to continue with the current format, labelling is definitely an issue that needs to be dealt with, especially with single-character entries. Another issue of the current format is the problem of Mandarin/"mainstream Chinese"-centric writing standards applied to other varieties, as pointed out above, rather than respecting regional variation. This is partially the problem of overusing {{zh-see}}, which often forces us to pick a "standard" form, even though sometimes this is a rather arbitrary process. — justin(r)leung _{{ (t...) | c=› }} 19:08, 23 February 2025 (UTC)Reply

When it comes to phonetic loan words into English, I treat the Cantonese, Hokkien and Mandarin derived words as if those varieties are the languages of origin. (I'd like to see y'all try to reverse that!) That is, there are no phonetic loans from "Chinese", only semantic loans from Chinese. I support division of the Chinese header. It is an inevitability that it will be divided, so I don't need to really push too hard. Geographyinitiative (talk) 19:45, 23 February 2025 (UTC)Reply

I'd like to raise several practical concerns.

The first major question is the granularity of division. Even within Southern Min, we face complex decisions: should Teochew be in the same L2 as Hokkien? What about Longyan, which currently shares pronunciation module with Hokkien despite their limited mutual intelligibility? Similar questions arise for other varieties - Northern Wu alone could potentially be split into at least three L2s. Each decision to split one variety could create precedent for further divisions, potentially leading to a very large number of L2s with substantially duplicated content.

This leads to the scale of the proposed changes. Given that you mentioned this would apply "at least" to Cantonese and Hakka as well, we're looking at restructuring over 300k entries (90k for Hokkien, 180k for Cantonese, and 32k for Hakka, among others). Before we could even begin such restructuring, we should ensure every definition is properly labeled with its variety (as wpi just mentioned) - a substantial task in itself. Do you have specific plans for managing such a large-scale reorganization? Additionally, how would we handle the numerous synonym templates that currently work across varieties? These modules are still of considerable linguistic/dialectological value even though they span multiple unintelligible variants, and splitting these could make them significantly more fragmented and harder to maintain.

As volunteer editors, we need to be mindful of the long-term maintenance burden of any major structural changes. While the current system has multiple drawbacks, it provides a workable framework for handling the fuzzy boundaries between varieties and focusing on lexical items individually. TongcyDai (talk) 20:25, 23 February 2025 (UTC)Reply

I have strong concerns about splitting on the same lines as @TongcyDai. For some data points:

We were unable to merge North and South Levantine Arabic (respectively 310 and 2,872 lemmas) due to the enormity of the task despite the fact that ISO merged them and that we had a specific request from the instigator of the ISO merge process to merge them here; he initially offered to help but then vanished once the scope of work was realized.
@Theknightwho instigated a split of Min Nan maybe 2 years ago (?), which is still far from complete and currently stalled (and this didn't involve major reorganization of the infrastructure since all the resulting lects still sit under the Unified Chinese umbrella).
I tried to propose a split and reorganization of the Yue lects along the lines of what we did with Min Nan but it stalled due to disagreements among the various Chinese editors over how to partition the Yue space into languages and general lack of will to carry out the resulting work.
When @Vininn126 decided to re-merge Masurian (c. 750 lemmas) into Polish, it was decided easier to delete the entire language and start from scratch rather than try to merge the existing lemmas.

It is true that splits, in my experience, are generally easier than merges, but in the one case where I was able to carry out a large split (Kurdish, with about 4,000 lemmas), it was helped enormously by the fact that Northern Kurdish and Central Kurdish generally use different scripts. In this case, the macro-language we're talking about has orders of magnitude more lemmas (c. 300,000) and everything is written in the same script. If we were unable to finish a much smaller split (the case of Min Nan) and couldn't even agree on how to split a subfamily of Chinese (the case of Yue), how are we going to have a prayer of carrying out such a task as splitting Chinese? This is even apart from the major concerns I have about potential duplication of data across potentially dozens or even hundreds of Chinese varieties (depending on how many separate L2's we end up with).

I would instead suggest identifying the main pain points of the current organization and seeing how we can resolve them without throwing away the baby with the bathwater. Some examples:

Links to Mandarin, Hokkien, etc. currently show up yellow because the corresponding pages usually only have a Chinese header, not a Mandarin or Hokkien header. We can fix that in Module:links with a system that, for example, redirects links for any Chinese lect that is written in Chinese characters to the Chinese header. (We can also consider a system where we actually check the page to see whether a specific lect header exists, but I have concerns about running up against memory or expensive-call limits. Maybe this is overblown though; @Theknightwho can comment more.)
@JnpoJuwan complained that all the Chinese lect labels are under the zh code and don't work with any other code. I am already about to add family-level categories and I have considered family-level labels, which could solve this issue. We already have support for label handlers to display labels in a smart fashion as well as a Chinese-specific label handler the removes duplication when multiple labels of the same subfamily are given, and we can extend this so that e.g. the "Taiwanese Hokkien" label displays "Taiwanese Hokkien" when the language is zh but just "Taiwanese" when the language is Hokkien.
We already have ad-hoc "lect" codes for several dozen written Chinese lects for use with {{zh-x}}. I have an existing proposal to replace these with proper etymology codes, but it stalled due to some disagreements about how to handle some of the edge cases. If we can resolve these disagreements, we can scrap the ad-hoc codes in favor of standard codes, which should simplify etymologies for terms borrowed into other languages and similar such things.
We (meaning mostly TKW and I) have been gradually deprecating some of the Chinese-specific infrastructure in favor of using the language-independent infrastructure, which is generally more robust, more featureful and easier to maintain. We did this with {{zh-syn-saurus}}, {{zh-syn-list}} and mostly with {{zh-der}}; the next target is probably {{zh-abbrev}}. This can be continued.

Benwing2 (talk) 21:08, 24 February 2025 (UTC)Reply

I will add that one of the biggest reasons was orthography and also the source used, which covered two neighboring (but very different) dialects. Vininn126 (talk) 21:10, 24 February 2025 (UTC)Reply

I've been of the opinion that Chinese shouldn't have been merged, but alas, trying to split it now would be way too daunting of a task. I do have two main thoughts though:

I do believe that historical lects should be split out, as @Wpi brought up. The way that it's set up now is a mess when it comes to descendants, as I've mentioned since 2022. Chinese 筆 / 笔 (bǐ) and 白菜 (báicài) are some of the main culprits. The former is entirely unclear as to what descendants come from what historical lect, and uninformed readers could assume that everything under "others" comes from Modern Chinese! Similar thing with 白菜 (báicài), it doesn't make clear which entries come from anything other than Sino-Xenic & Early Mandarin. The English descendants make it even more clear: bok choy comes from Cantonese 白菜 (baak⁶ coi³), pechay comes from Hokkien 白菜 (pe̍h-chhài), baicai from Mandarin 白菜 (báicài), it's not clear at all, and is fairly misleading when compared to the etymology sections of the descendants. I also feel that it obscures inter-lect borrowing when the term is spelled the same. I don't believe that there are only 9 Cantonese terms borrowed from Mandarin. Same with the weird way we handle Chinese 麥當勞 / 麦当劳 (Màidāngláo) and its etymology and descendant Cantonese 牡丹樓 / 牡丹楼 (maau⁵ daan¹ lau⁴). It says that the latter is borrowed from the former in Mandarin, which in turn is from Cantonese, but 麥當勞 / 麦当劳 (Màidāngláo) does not make this clear at all, and doesn't even list the descendant. Something needs to be done, as it's harming the way we present information. CC: @Benwing2
Additionally, I am a bit concerned about the discrepancy in the number of usage examples & quotations and overall coverage between Chinese lects, as wpi and @Justinrleung brought up. With merges like this, the "main" lect, for lack of a better term, tends to almost completely eclipse the other lect when it comes to usage examples, since they could be seen as nonstandard or almost unworthy of usage example creation. This is made even more evident in the case where the vast majority of terms are spelled the same way across lects. Ex: Hakka only has 60 terms with usage examples, with many, if not most, of them being only found at Hakka-specific senses. Imho having separate L2s for Chinese lects could incentivize more dedicated coverage to the smaller ones, if there are editors willing to work on the effort. It's worked very well for Jeju, as the coverage we have now would not have been possible if not for it being a separate L2. (That being said, the typical language vs dialect issue still applies, I'm not saying that an L2 should be made for every dialect out there) Maybe the macro-L2 idea could work.

That being said, I don't speak any Chinese lect, but I'd be willing to help out if needed, since I do think that this would be a net benefit for users in the long run. AG202 (talk) 06:49, 25 February 2025 (UTC)Reply

@AG202 What was your ping in reference to? Can you expand? As for the issue concerning discrepancy of usage examples and quotations, I think that's inevitable when you have one dominant lect among many. Compare Arabic, which is handled in exactly the opposite fashion (one L2 for every lect), and where almost all lects other than MSA and Maltese are sorely lacking in every way. (In fact I would use Arabic as a good cautionary tale of what happens when you have too many splits.) As for historical Chinese lects, I was a bit surprised myself to see them merged under the Chinese header; possibly they could be split out, but that would be a lot of work and would need a really well-thought-out and fleshed-out plan of action before we proceed. (Min Nan didn't have that which is part of the reason it's sitting in a stalled half-split stage.) Benwing2 (talk) 07:01, 25 February 2025 (UTC)Reply

Sorry, the ping was specifically in reference to the historical lects section. And as for Arabic, yeah I've seen that and I do think that a middle ground could be found between the two extremes. AG202 (talk) 07:24, 25 February 2025 (UTC)Reply

@AG202 I don’t have any strong feelings on whether the contemporary varieties of Chinese should be split or not, but splitting the historical forms of Chinese is neither feasible nor desirable.

My understanding is that Old Chinese and Middle Chinese are essentially phonological constructs that do not correspond one-to-one with any attested written language. The written language itself existed as a spectrum between the exemplary classical Warring States models and the dominant vernacular of the period, so that all “Old Chinese” structures and lexemes, even if obsolete in the spoken language, could be used in writing in the right context. There are plenty of late imperial texts that partly imitate even the style of the Shijing, from 700 BCE or earlier. And what about texts that are in a perfect mix of vernacular Early Modern Mandarin and Literary Chinese, or texts that are mostly Literary but use vernacular terms for effect?

The best way to deal with this is to use {{datedef}} more extensively, not to split the languages.-—Saranamd (talk) 09:07, 25 February 2025 (UTC)Reply

@Saranamd: Technically the "conventional" way of handling Old Chinese would be to transport it to the reconstruction mainspace as a purely phonological reconstruction of the attested Sinitic varieties. I think that is best if we split, but if we don't, it's pretty worthless. But essentially, this is what it is, a Proto-Sinitic reconstruction that happens to be attested in a logographic script. Thadh (talk) 11:58, 25 February 2025 (UTC)Reply

@Saranamd: Unfortunately, {{datedef}} does not solve the problems I mentioned. And if Old Chinese & Middle Chinese can't be separated out, could we at least separate out Classical & Literary Chinese? AG202 (talk) 19:27, 25 February 2025 (UTC)Reply

@AG202 I think the descendants section is honestly the least important section of a well-attested language, since it pertains entirely to other languages. What matters most for a dictionary is the definitions section, the quality of which will be severely impaired by splitting Literary Chinese and Standard Written Chinese because the two written languages even now exist in a continuum. Even today, virtually any Literary Chinese term can be used in SWC in the right (historical or literary) context, and of course splitting the two would be even more impossible for older written forms of Mandarin. Any split would lead to massive duplication of definitions.—Saranamd (talk) 05:42, 26 February 2025 (UTC)Reply

Okay but we have the Descendants section, so something needs to be done about the major confusion that exists currently. We can’t just hand-wave it away. Otherwise there’s no point in having Descendants sections in the first place. AG202 (talk) 06:36, 26 February 2025 (UTC)Reply

@Saranamd: While OC and MC are phonological constructs, I strongly disagree that Classical Chinese can't be split out. (and if we do treat Classical Chinese separately, OC and MC should be placed under it due to the time period)

Although Classical vocabulary can still be used within modern texts and dialects, the grammatical structure of Classical is fossilized and cannot be altered (this often also applies to non-grammar words, which creates fossilized idioms i.e. Category:Chinese four-character idioms), and some constructs like anastrophe and 互文 no longer work.

There is also a very clear dividing line (New Culture Movement and May Fourth Movement) which marked the change from Classical Chinese to (early) MSC. – wpi (talk) 05:42, 26 February 2025 (UTC)Reply

@Wpi What about Baihuawen texts that incorporate classical constructions extensively—are they Mandarin or Literary Chinese? Do we say that a single chapter in the same novel alternates freely between two different Wiktionary languages? What about Baihuawen or mixed Baihuawen-Wenyanwen texts written in Korea which were always read out as Sino-Korean, or Tang-era vernacular texts that really cannot be called Mandarin? My impression is that the clear dividing line only looks clear from the vantage point of today, and when we get down to the historical sources it’s much less clear.

Furthermore, as a dictionary and not a grammar, the lexicon is what matters most. There are many languages where the literary and colloquial varieties differ in important grammatical structures but much less in vocabulary, and where the colloquial variety can borrow freely from the literary variety. Splitting harms the functionality of the dictionary when there is no lexical dividing line between the two varieties.—Saranamd (talk) 05:51, 26 February 2025 (UTC)Reply

And even from a lexical viewpoint, given that the death of Literary Chinese was not spontaneous, there are plenty of texts that use late nineteenth- and early twentieth-century neologisms in a mostly Classical grammatical framework. So even words like 自由 (zìyóu, “liberty”) or 民主主義 / 民主主义 (mínzhǔzhǔyì, “democracy”) could be said to be “Literary Chinese” words.—Saranamd (talk) 05:58, 26 February 2025 (UTC)Reply

@AG202: Fully agree with both points. Regarding point #1, I think it would definitely be helpful to list all descendants from OC (including internal ones if appropriate). (previous failed discussion). As for usage examples, I agree the uxes and quotations are heavily focused on MSC, but I'm also concerned about duplication of collocations, for example sense 2.4 of 落 repeats 落車 twice (and if more collocations are added, there will only be more duplication, which arguably is the thing that "we" originally tried to avoid).

(Category:Cantonese terms borrowed from Mandarin should only include phonological borrowings – there's probably quite a bit more, but I reckon it's less than 100, perhaps maybe in the low hundreds (?), so not super far off) – wpi (talk) 16:32, 25 February 2025 (UTC)Reply

Spitballing an idea for testing the feasibility of a split: (bot-)duplicate Chinese entries to subpages of some project or userspace page (e.g. WT:Chinese split demo/天, WT:Chinese split demo/馬, etc), making whatever tweaks are needed to let our modules also function in that new non-mainspace place, and then apply whatever tactics would be used to split Chinese, to those pages: e.g. if someone is prepared to write a bot to go through and split out separate Mandarin and Hokkien L2s for all the pages that have Mandarin and Hokkien pronunciations, then have the bot do that to the project/userspace pages. Start manually (or automatedly) adding Hokkien usexes. Etc. (Or, instead of duplicating all entries and then starting to modify them, only duplicate entries when modifying them, e.g. only duplicate 天 into the user-/project-space at such a time as you're splitting it up into different L2s.) If it proves feasible to split the pages in user-/project-space, then either the same techniques can be used to split the mainspace pages, or the project/user pages can be moved to mainspace. If the project proves infeasible and gets abandoned, the pages can be (bot-)deleted en masse. - -sche (discuss) 18:22, 25 February 2025 (UTC)Reply

@-sche Although I respect your judgment greatly, I'm a bit concerned that you're suggesting something like this in a "spitballing" kind of way. Splitting Chinese would be an enormous task, and before even beginning on something like this, particularly splitting the main body of modern-lect definitions into separate L2's, we'd need (a) buy-in from a large majority of Chinese editors, (b) a detailed plan about how to proceed with some estimates of how much work this would involve. Simply spitballing a proof-of-concept like this without either buy-in or a plan would, in the best case, waste a lot of someone's effort once it gets abandoned, or in the worst case, create a major fork in Wiktionary, essentially splitting the Chinese editing community, with resulting mutual animosity, and forcing people to either choose to contribute to one or other fork or double their effort by contributing to both. Something like this would likely take several person-years of effort at least, meaning we'd potentially have a long-lasting fork hanging around causing innumerable problems. If we're really serious about splitting of some sort (which I don't at this point see the buy-in for), it would be more practical to split out certain smaller chunks (e.g. historical lects, although @Saranamd has several issues with this) rather than trying to split the whole thing at once. Benwing2 (talk) 05:57, 26 February 2025 (UTC)Reply

Oh, I certainly don't mean to suggest that one person should unilaterally do this right now! I mean to bring the idea up for discussion here to see if anyone thinks it'd be a good idea. My rationale is that a fair few people seem to agree that merging all of Chinese was inappropriate, but a fair few people also agree that splitting Chinese has the potential to create a mess while the split is in progress, so this struck me as an idea for a possible way to determine / demonstrate the feasibility (or infeasibility) of a split, iff enough users want to try it. It does occur to me that, in the vein of my second idea (only duplicating entries as they're modified, instead of duplicating everything and then modifying it), duplicating just a random thousand Chinese entries might provide a sufficient testbed for people to try splitting techniques on. (Or perhaps people don't even need to actually modify entries but can just post what their code would do.) I'm trying to think, since many people think "splitting has the potential to create a mess while it's in progress, and it might not finish" is a blocker to trying to split, of ways people could determine / demonstrate the feasibility of splitting. - -sche (discuss) 17:45, 26 February 2025 (UTC)Reply

But the problems are more fundamental than just "splitting has the potential to create a mess while it's in progress, and it might not finish". First of all it's not at all clear to me there's even consensus to split, and secondly no one has even remotely come up with a feasible plan for what a split-Chinese system would look like that would be demonstrably better than what we have. Plenty of people have complained about the deficiencies of the current system, but no one has proposed any workable alternatives, particularly concerning splitting the modern lects. The only proposal I see so far is coming from @AG202, about historical lects only. I would be strongly opposed to a split that put every "mutually incomprehensible" lect (however we define that) under its own L2; this is effectively what we did with Arabic, following the ISO splits almost to the letter, and the result IMO is absolutely not any better than the current Chinese system. I don't want to be a party pooper but I really think people are both underestimating the magnitude of the task and failing to appreciate the serious problems that are likely to ensue if a split is begun in a willy-nilly fashion, without a detailed plan of operation that has strong consensus behind it. It's kind of like we're jumping right into talking about doing open-heart surgery before we've seriously considered all the less-invasive options, or even enumerated what the actual problems are. Benwing2 (talk) 22:09, 26 February 2025 (UTC)Reply

Question: how do other Wiktionaries handle Chinese? (Do any split it?) zh:天 seems to merge everything under one Chinese L2, and so does fr:天, despite fr.Wikt splitting a lot of other lects (especially any that ISO assigned separate codes). de:天 doesn't even have a Chinese section. (Some of those wikis are borderline-unusable in dark mode, as an aside.) - -sche (discuss) 18:22, 25 February 2025 (UTC)Reply

@-sche: It seems like fr.wikt does have separate headers (at least for Cantonese) as in fr:德國. AG202 (talk) 19:30, 25 February 2025 (UTC)Reply

In my experience zhwiktionary tends to follow us quite closely, so it's no surprise to see it merging Chinese like we do.

As for dewiktionary, they only has 356 Chinese entries and their coverage of the language seems to be in a rudimentary state (de:Project:Chinesisch opens with the truism "Das Chinesische umfasst verschiedene Varietäten").

As for other large Wiktionaries, ruwiktionary does appear to split Chinese (see ru:烏鴉 for instance) but the coverage of non-Mandarin lects is so limited it's difficult to know what direction they have chosen, and jawiktionary seems to merge Chinese like us. This, that and the other (talk) 12:54, 26 February 2025 (UTC)Reply

Regarding the valid point above that some aspects of our Chinese infrastructure (like synonym templates) are of considerable linguistic/dialectological value even though they span multiple unintelligible variants, and splitting these could make them significantly more fragmented and harder to maintain: iff people think such things are useful, and iff people also want to split Chinese, couldn't we just keep those templates and not split them, even if we give lects their own L2s? Make whatever tweaks are needed to let those templates/modules handle full language codes (not just dialect codes) and link to different L2s, but if we think that "showing synonyms from (certain) other, mutually-unintelligible lects" is useful, why not just keep (the infrastructure that is) doing it? I don't see why a split would necessarily have to throw out any usefully-centralized infrastructure. Sure, it might make Chinese lects a somewhat special case if they linked to each other despite having different L2s, but it's already a quite special case (merging varieties under one L2 despite them being mutually unintelligible), it can't get any specialer. (I've (rarely) put links to different languages/L2s in the "see also" sections of a few entries. If it's useful, why not do it?)
If the issue is that some information is not being stored centrally, but is being input on each entry, and might have to be input in each L2 section if we were to split Chinese, then let's evaluate whether there are feasible ways of centralizing, whether that's putting a "see list at X" notice in each L2 other than one — say, the alphabetically first, or the Standard Mandarin Chinese entry — or whether it's moving the lists to one centrally-editable and transcludable place, the way we have e.g. usage notes that are transcluded across multiple entries. But I don't see why "linking between different varieties is useful" would block splitting, iff (again: iff) people want to split, or at least want to play out what splitting would look like and entail. - -sche (discuss) 17:50, 26 February 2025 (UTC)Reply

There is precedent in using {{dial syn}} across multiple L2s (e.g. Yoruba), and there is precedent in having modules being shared between related languages (e.g. Module:Jpan-headword). I think the point about throwing out existing infrastructure is a non-issue. (Some minor changes will definitely be needed but those will not be significant) – wpi (talk) 19:04, 26 February 2025 (UTC)Reply

Also with Koreanic as well, ex: at Korean 가위 (gawi), which is why we renamed the title to "Historical and regional synonyms". AG202 (talk) 21:25, 26 February 2025 (UTC)Reply

I strongly oppose having a "see list at X" notice, which is just equivalent to merging but worse. Theknightwho (talk) 20:55, 26 February 2025 (UTC)Reply