Wiktionary > Discussion rooms > Beer parlour

Shortcuts:
WT:BP
WT:BEER

Click here to start a new Beer parlour discussion.

Wiktionary discussion rooms (edit) *see also: requests*
Information desk start a new discussion \| this month \| archives Newcomers’ questions, minor problems, specific requests for information or assistance.	Tea room start a new discussion \| this month \| archives Questions and discussions about specific words.	Etymology scriptorium start a new discussion \| this month \| archives Questions and discussions about etymology—the historical development of words.	Beer parlour start a new discussion \| this month \| archives General policy discussions and proposals, requests for permissions and major announcements.	Grease pit start a new discussion \| this month \| archives Technical questions, requests and discussions.

All Wiktionary: namespace discussions 1 2 3 4 5 – All discussion pages 1 2 3 4 5

Welcome to the Beer Parlour! This is the place where many a historic decision has been made, and where important discussions are being held daily. If you have a question about fundamental aspects of Wiktionary—that is, about policies, proposals and other community-wide features—please place it at the bottom of the list below (click on Start a new discussion), and it will be considered. Please keep in mind the rules of discussion: remain civil, don’t make personal attacks, don’t change other people’s posts, and sign your comments with four tildes (~~~~), which produces your name with timestamp. Also keep in mind the purpose of this page and consider before posting here whether one of our other discussion rooms may be a more appropriate venue for your questions or concerns.

Sometimes discussions started here are moved to other pages for further development. In particular, changes to a major policy or guideline may be discussed on the corresponding talk page and “simple votes” (as opposed to drawn-out discussions) can be conducted on our votes page.

Questions and answers typically remain visible on this page for one to two months, but they can always be found in the appropriate monthly archive (based on the date discussion was initiated). While we make a point to preserve all discussions that were started here, talk that is clearly not appropriate for this page may be deleted. Enjoy the Beer parlour!

Beer parlour archives edit

2025

2024

Earlier years

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

December

December 2024

Use of y instead of ij in Early Modern Dutch

Latest comment: 1 month ago5 comments4 people in discussion

In the modern Dutch alphabet, the digraph ij is used instead of y (although it's often written like a y with an umlaut), but in Early Modern Dutch y was used (up until 1804 apparently). But if you search Wiktionary for any of the y versions, you won't find them. Should we be including the y versions in Wiktionary? And if so, should they be listed under Early Modern Dutch or just Dutch (Early Modern Dutch is not listed at Wiktionary:List of languages). Nosferattus (talk) 01:35, 1 December 2024 (UTC)Reply

Yes, these forms should be included. Wiktionary views any term written after 1500 to be modern Dutch. There are already a few of these y-forms added: zyn, cyfer. As you can see, they use the {{obsolete spelling of}} template. I think their inclusion is limited, not because they shouldn't be included, but because editors mainly focus on adding terms in current use.

It would perhaps be a good idea to create a template similar to {{pt-pre-reform}}, to better organize the obsolete and superseded spellings.

Stujul (talk) 09:38, 1 December 2024 (UTC)Reply

We still find this in Max Havelaar, written in 1860. For example, on just one page we find myn, my, blyken, hy, tyd, belangryke, twyfel, pryzen, stryden, misdryf and zyn.^[1] The author used his own, somewhat idiosyncratic spelling, though. --Lambiam 10:30, 1 December 2024 (UTC)Reply

Thanks! I've started to add some of the more common words like hy and my. Nosferattus (talk) 02:19, 4 December 2024 (UTC)Reply

Regarding Dutch spelling, pannekoek is given as superseded spelling of pannenkoek, and the latter is the official spelling, but not everyone agrees with that, see Witte Boekje. Shouldn't 'pannekoek' be rather a 'non-official spelling'?

More examples here. Exarchus (talk) 19:18, 5 December 2024 (UTC)Reply

Reveal potentially shocking/NSFW images only upon clicking?

Latest comment: 1 month ago10 comments6 people in discussion

I visited loxoscelism to add a translation and was greeted by a slightly revolting image.

I would be in favor of hiding such images behind a "click to reveal" message so that they aren't shown by default. This should be quite easy to do using JS.

What do others think? — Fytcha〈 T | L | C 〉 13:13, 1 December 2024 (UTC)Reply

@Fytcha: The image would still be visible in the mobile search: [2]. It would be better to just make that image an external link. Ioaxxere (talk) 15:17, 1 December 2024 (UTC)Reply

FWIW, this was discussed in 2015 and last year. If we start censoring images, it's a slippery slope: people made headlines the very week we last discussed this for censoring Michelangelo's David. People—you see them on Talk:gay as recently as this week—complain that gay people are pornography / NSFW, and pass laws to that effect. There are people who think, and seek laws saying, trans people are pornography / NSFL. There are people (conservative Jews, Muslims, Christians) who think images of any women are NSFL. People complain about the image at swastika, or issue legal challenges (at least to WP) over maps of countries they'd prefer had different borders. Some people object to the image at penis, or the image at areola, but I think they're worth a thousand words, and don't see why a workplace would be OK with someone looking up penis, and only freak if the dictionary were illustrated—as others said in prior discussions, if one works at such a place, one may need to avoid Wiktionary at work, since images are liable to show up.
With that said, I acknowledge that it's reasonable that we unofficially have some practices, e.g. the entry for mangle doesn't contain an image of a mangled body even though it theoretically could. I don't mind the image at loxoscelism, but I'm not entirely opposed to hiding some images behind a click... I'm just very wary of the slippery slope.
One idea, if this doesn't exist already, is an opt-in gadget which would hide all images and require a click to see them; that avoids the slippery slope by being image-agnostic and opt-in. - -sche (discuss) 16:43, 1 December 2024 (UTC)Reply

@-sche: Thanks for the reply as well as the links. One of my take-aways is that selectively hiding pictures (behind a "click to show" message) is not at all "politically unviable" on Wiktionary.

As pointed out plentifully, finding a sane demarcation will prove difficult. Reading these discussions, the impression I got was that the wisest strategy would be to start with a very liberal policy (that is however enacted for everyone by default in an opt-out fashion) and then have people incrementally work out amendments in subsequent (BP) discussions. These kinds of demarcations are not found conclusively in a single sweep. My mentioning of "NSFW" above was probably ill-advised, so what I'd suggest now as a starting point for which images to hide by default is (medical) gore, i.e. photos of wounds, deformations, the effects of disease, photos taken during surgeries, etc.

one may need to avoid Wiktionary at work, since images are liable to show up.: That's true; currently, people cannot access Wiktionary at work (or in similar situations) free of risk. What I would point out is that this is an unusual and thus surprising fact about Wiktionary as a dictionary. Of all the dictionaries I've used, I don't think there has ever been another one where I had to be cautious using it in front of other people. — Fytcha〈 T | L | C 〉 19:07, 1 December 2024 (UTC)Reply

Whether it is a slippery slope depends on the art of formulating policy, otherwise of course it can be watered down if we are unsure about it. We can distinguish motivations by which people might avoid images. For cases of medical irregularities the hardiness which we expect differs — one may well prefer a certain time of decision and mental preparation to see the image because there is only so much repulsive content any one can consume without his affective wellbeing being called into question – from the responsiveness to the regularly behaving exposed human body. If someone does not suffer locally appropriate coverage of it on كَتَبَ (kataba, “to write”) it is his problem and it is not even easy to have a depiction of an action while on the other hand the majority of the internet is porn anyway, and grounds for much greater dissonances and contradictions to scripture offended readers would have to care about, calling the survival of Islam in the 21st century into doubt, a question of available and appropriate attention we have to put into the balance. We do not have to equate illness, violence, nudity, and making love. There is also a historical depth to the matter: I guess Nazi stuff falls under “violence” but we can expect a distance towards things because of how long ago a thing prevailed, possibly again leaving only a limited number of images.

However yes, I’d rather not burden our editors with dealing with thinking about the general guidelines even, and keep a policy of deliberate ambiguity beyond what we have written. You could try some technical execution anyway of course, just that, unless we exert ourselves much to bloat our policy pages, the eventual uncontentiousness of which is doubtful, we won’t deploy it with discernible regularity beyond reverting new users futzing around with images by reason that “I have 10,000 edits per year/I am admin and I know well enough which pictures are appropriate in the given context, you however have an ideological agenda, from what I can see.” It would result in templates and/or gadgets which, in effect, new users would be discouraged to use, not to say disallowed. Fay Freak (talk) 17:48, 1 December 2024 (UTC)Reply

@Fay Freak: [...] unless we exert ourselves much to bloat our policy pages, the eventual uncontentiousness of which is doubtful, we won’t deploy it with discernible regularity beyond reverting new users futzing around with images [...] I think I agree which is why I'm thinking towards some kind of consistent policy. We're currently in the wild west with respect to images. — Fytcha〈 T | L | C 〉 19:14, 1 December 2024 (UTC)Reply

FWIW this is being discussed on Wikipedia, too: Wikipedia:Village_pump_(policy)#Can_we_hide_sensitive_graphic_photos?. (On the whole, I find myself in the NOTCENSORED camp.) - -sche (discuss) 01:45, 3 December 2024 (UTC)Reply

Here's the right link: w:Wikipedia:Village pump (policy)#Can we hide sensitive graphic photos?. FWIW, I think we should hide them, mostly for the low-bandwidth peoples. CitationsFreak (talk) 03:02, 3 December 2024 (UTC)Reply

Can one use user CSS or similar to suppress all images (to address NSFW problem)? DCDuring (talk) 15:23, 3 December 2024 (UTC)Reply
Should "File:"/"Image:" be replaced by templates that could carry information to allow selective filtering of images by type? DCDuring (talk) 15:23, 3 December 2024 (UTC)Reply

Template:defdate and pre-1500 dates

Latest comment: 28 days ago18 comments11 people in discussion

A number of English entries contain things like this (at sky):

(obsolete) A cloud. [13th–16th c.]

However, we take the cutoff between Middle English and Modern English to be around 1500, so it has always struck me as anachronistic to say that the Modern English word arose in the 13th century. That information should be at the Middle English entry.

I'd like to propose moving the origin date of these senses to Middle English, then replacing the Modern English {{defdate}} invocation (some of which can be found using this crude search) with

(obsolete) A cloud. [Middle English–16th century]

or

(obsolete) A cloud. [until the 16th century]

(Also, this is not a paper dictionary, so there's no good reason to abbreviate "century" as "c.")

Thoughts? This, that and the other (talk) 05:39, 2 December 2024 (UTC)Reply

Commenting and subbing as I have been wondering the same for Lechitic lects. I similarly do not use {{etydate}} in Polish if the term was inherited from Old Polish, etc. Vininn126 (talk) 08:31, 2 December 2024 (UTC)Reply

Support. Tollef Salemann (talk) 08:51, 2 December 2024 (UTC)Reply

I don’t see a contradiction perforce. You propose to water down, information that could later be used, to edit history. If these datings are credible information at all and not random attestation ages that can happen with the Middle Ages; we still have not solved the problem of regularly labelling “reconstructed” lects, which would allow us to cleanly state things like “probably in the 4th century already, but attested from the 9th”; okay I sometimes use the etymology for this, as on بَال (bāl), if a reconstruction entry is not feasible. How is the move of Byzantine Greek going? 🙄 Fay Freak (talk) 18:05, 3 December 2024 (UTC)Reply

Would sister languages also be marked as being attested since that time? Would we use Latin to mention since when we see attestation dates of Spanish? Vininn126 (talk) 18:47, 3 December 2024 (UTC)Reply

Our coverage of Latin is larger. The decision depends somewhat on how secure an individual language’s editors are expected to be with the corpora, and what they can expect to be created any time soon. If we had lots of Greek entries having such phrasing as proposed, the planned reorganization would be considerably more challenging, demanding to revisit the attestation situation in affected cases. Just let the editors—including you—leave as much as they know, in so far as it is not overwhelming to the eye?

To ever halt before your problem, one has to construe one’s task as an editor gigantic enough that one boasts to never leave any gap, inconsistency, or inconsequence, which also does not align with reality, in as much as the presence of a gap, inconsistency, and inconsequence appears to align not with the actual reality of a language. Instead we acknowledge our finite manpower. Not some imaginary limit stemming from language cutoffs, the purpose of which apparently one has remind editors about once in a while again: inasmuch as they are justified by mutual intelligibility of languages, they do not constitute impermeable walls, though we may remember them as such and speakers constitute their identities by such ideas to some degree; instead the language headers, subheaders and labels are there to communicate something which you otherwise wouldn’t immediately relate to them. Seen in such a way, the defdates to senses are, beyond their situation in time and place—as identified by dialect and chronolect headers—, exactly what the dictionary glosses to senses of a word are supposed to do. What you bring up as a question of logics turns out a question of balance. Fay Freak (talk) 22:27, 3 December 2024 (UTC)Reply

I'm still overall against including it over our arbitrary boundaries. Vininn126 (talk) 22:33, 3 December 2024 (UTC)Reply

Adding the information to the Middle English entries definitely seems like a good idea. While I can see the theoretical justification for replacing dates before 1500 with "Middle English", I'm not sure that change is really an improvement: it obviously removes some information, and the periodization convention of distinguishing between Middle English and Modern English is not particularly significant in and of itself.--Urszag (talk) 22:35, 3 December 2024 (UTC)Reply

I support using dates with definitions, so 13th to 16th century and not from Middle English to the 16th century. The date rage is more informative and easier to read. The sometimes arbitrary boundaries between stages of a language can live in the etymology section and the categories generated from it. Vox Sciurorum (talk) 00:30, 4 December 2024 (UTC)Reply

I would delete that from the list of Modern English senses and move it to the etymology section (‘from Middle English foo “bar”…’) and to the Middle English entry. Nicodene (talk) 21:09, 4 December 2024 (UTC)Reply

@Nicodene: 16th century is Modern English, hence if attested it should not be removed. J3133 (talk) 07:58, 7 December 2024 (UTC)Reply

Obviously. I was going by the “until the 16th century”. Nicodene (talk) 09:30, 7 December 2024 (UTC)Reply

By all means add information to Middle English entries, but I don't see any reason to remove it from English entries. The proposal just makes things vaguer and more imprecise. The distinction between ‘Middle English’ and ‘Modern English’ is just a historical convention anyway; for a linguist, enforcing this distinction in practice is next to impossible if you're working with texts from the 16th century (as I have done here in the past). At least with Old English there is a clear break in the written record which makes the change in grammar and vocabulary pretty sharp. Ƿidsiþ 07:23, 7 December 2024 (UTC)Reply

I agree. Plus, anyone who cares about the distinction between Middle English and modern English can extrapolate from the dates given. But anyone who has never heard of Middle English (which is probably most people) won't find the information meaningful. Andrew Sheedy (talk) 05:04, 20 December 2024 (UTC)Reply

I still feel this doesn't hold up for languages like Latin with multiple children and also the fact it's well known. Vininn126 (talk) 08:54, 20 December 2024 (UTC)Reply

I agree. English is a bit of a special case relative to other major languages, because so much of its vocabulary entered the language late. I wouldn't go past Old English, and maybe not even that far (I would be fine with the defdate template reading [Old English to present], but not [Middle English to present]. Andrew Sheedy (talk) 17:18, 20 December 2024 (UTC)Reply

So would I, I suppose – especially since the dates of OE texts are often a bit speculative. Ƿidsiþ 06:40, 21 December 2024 (UTC)Reply

Support, provided the 'removed' information is transferred over to Middle English. I disagree with changing "c." to "century" though. Regardless of whether you're doing things online or on paper, it's generally a good idea to optimize the space used and keep things concise; it just looks prettier that way. MedK1 (talk) 04:05, 26 December 2024 (UTC)Reply

FYI: December 2024 Unicode update

Latest comment: 1 month ago1 comment1 person in discussion

https://us11.campaign-archive.com/?u=c234d9aba766117eac258004b&id=d533f3804f —Justin (koavf)❤T☮C☺M☯ 23:19, 2 December 2024 (UTC)Reply

'LANG forms' -> 'LANG spellings'

Latest comment: 1 month ago5 comments3 people in discussion

IMO it is confusing that we use 'forms' to mean 'spellings' in categories like Category:American English forms and Category:European Portuguese verb forms and Category:Brazilian Portuguese forms superseded by AO1990; also for that matter, more generally in CAT:Obsolete forms by language, CAT:Archaic forms by language, etc. Most of the descriptions of these categories make clear that the "forms" referred to are superseded/archaic/obsolete/etc. spellings, not some other kind of form. Even opening up Category:Ukrainian archaic forms produces 5 subcategories whose names all contain "spellings" or "terms spelled with" in them. Unfortunately the term "form" is badly overloaded at Wiktionary; any action to reduce the overloading is welcome in my book. So I propose at first to rename ad-hoc language-specific categories containing 'forms' to 'spellings'; and if there are no objections, rename the more general 'LANG superseded/archaic/obsolete/dated/rare/uncommon/informal/nonstandard forms' -> 'LANG superseded/etc. spellings'. Any terms that are in a 'LANG foo forms' category but aren't mere spelling variations should be moved to the corresponding 'LANG foo terms' category (which exist for all 'foo' except for 'superseded', but 'superseded' seems specifically for spellings, so this is unlikely to be an issue). The only 'foo forms' category I've excluded is 'LANG short forms', which is using "forms" differently, and should eliminated in favor of either 'LANG ellipses', 'LANG clippings' or 'LANG abbreviations' (depending on what sort of short form is involved), but that's a different can of worms. Benwing2 (talk) 09:08, 7 December 2024 (UTC)Reply

@-sche Sorry to ping you directly but surprisingly no one has commented and I figure you might have something to say here. Benwing2 (talk) 09:46, 9 December 2024 (UTC)Reply

Thanks for the ping. The entries in these categories don't seem to be all of one type: it seems they will need pruning (especially, but not only, if renamed) iff people still want to distinguish spellings from forms. (Or are we abandoning that distinction? I know some later commenters in that discussion argued for that instead, and I'm not sure whether a decision was reached or, if not, which approach would be best.)
For example, I see we have anemia as an American form of anaemia (it should indeed rather be spelling if we're distinguishing those two words), but we also have airfoil in the same "American forms" category but using an "American spelling" label although it differs from aerofoil in more than just spelling. Likewise Abissinia, currently listed as an archaic "form", would be better as a "spelling", but the difference in adipsy and adipsia is not just spelling; abyssus, currently presented as an "archaic form of abyss", also does not seem like a mere archaic "spelling", but perhaps it is also best not as a "form" but as an archaic (or obsolete) synonym of abyss, or as (obsolete) Abyss. So, especially (but not only) if renaming the categories, it seems like we need to decide what we want the scope to be, and whether we want to distinguish "only the spelling is different" and "the pronunciation is also different" or combine those two things...? - -sche (discuss) 17:23, 9 December 2024 (UTC)Reply

Hmm. In practice I suspect people won't be able to distinguish mere archaic/obsolete/American/British spelling variants from those that also differ in pronunciation (aluminum vs. aluminium). At the same time I think "form" is far too overloaded. Maybe we could say "American English variants" etc.? Also technically the "European Portuguese verb forms" vs. "Brazilian Portuguese verb forms" reflect slight differences in pronunciation; they are mostly in past tense -amos (Brazilian) vs. -ámos (European), which is meant to indicate a difference in vowel quality. Likewise although the majority of "Portuguese forms superseded by AO1990" are just spelling differences, there are a few that are not, e.g. pre-reform abeto Douglas vs. modern abeto-de-douglas (although in that case the definition specifically says "pre-reform spelling of ..." and it seems there was also a pre-reform abeto de Douglas). So maybe we should use the term "variant". As for alt forms vs. alt spellings, I do think we should try to make that distinction since some of the things tagged as "alt forms" differ quite a bit from the form they are said to alternate with. Benwing2 (talk) 20:50, 9 December 2024 (UTC)Reply

Me and the Portuguese editors I know use {{alt spelling of}} when the difference is in spelling but not in pronunciation, at least “phonemically” — i.e., different spellings between European and Brazilian Portuguese are alternate spellings because the difference comes from each dialect’s pronunciation of phonemes, not just of that particular word. Meanwhile, I use {{alt form of}} when it’s a different pronunciation that doesn’t stem from a systematic difference between dialects.

However, this distinction in template usage is almost entirely moot if the category that gets assigned is the same. I think the most useful decision is to create new categories, 'LANG archaic spellings' etc., as daughters of 'LANG archaic forms' etc. Though this would need us to pay some real attention to replace the category tree definitions as well as the categorizations called by templates. Polomo47 (talk) 01:48, 10 December 2024 (UTC)Reply

Template:syncopic form / Template:syncopic form of

Latest comment: 1 month ago17 comments5 people in discussion

syncopic seems to be vastly less common than syncopal, which is itself less common than syncopated (see ngrams). Should we rename these templates? P U C – 16:09, 8 December 2024 (UTC)Reply

Maybe it should just be {{syncope}}/{{syncope of}}, since we already have {{clipping}}/{{clipping of}}, {{ellipsis}}/{{ellipsis of}}, etc.? Benwing2 (talk) 09:53, 9 December 2024 (UTC)Reply

This sounds better to me. Vininn126 (talk) 10:05, 9 December 2024 (UTC)Reply

Or just replace it entirely with {{clipping}} (of), since syncope is a type of clipping anyway, and it's not clear why one would want a specialized category for it. Nicodene (talk) 10:23, 9 December 2024 (UTC)Reply

I actually only had instances of clipping in Polish entries. Syncopy might be seen as more phonological and clipping is often a process in more colloquial things. Not sure. Vininn126 (talk) 10:25, 9 December 2024 (UTC)Reply

There isn't a difference, as far as I am aware, other than the fact that syncope refers to clipping in medial position. And that it sounds fancier. Nicodene (talk) 10:37, 9 December 2024 (UTC)Reply

I too was under the impression that syncopy is a purely "mechanical" phonetic process whereas clipping is a deliberate removal of syllables used to coin new words. Not that I have any source to support that interpretation but... P U C – 11:18, 9 December 2024 (UTC)Reply

I can't find any sign of such a difference outside the realm of (accidental?) Wiktionary convention. Google Books, for instance, brings up a laundry list of sources confirming that these are indeed synonyms. Nicodene (talk) 12:00, 9 December 2024 (UTC)Reply

I have to add my voice to the chorus of people saying that I find the present distinction to be valuable. I certainly wouldn't insist on the current terms used - and I am increasingly convinced we shouldn't keep using them as we are. But distinguishing between clipping that occurs as part of a gradual phonological process (e.g. Romance syncope, or English fancy) vs deliberate, conscious truncation (e.g. math) seems very valuable. This, that and the other (talk) 00:22, 10 December 2024 (UTC)Reply

Very well. In that case the issue is finding a pair of terms that can reasonably be specialized in the way that you have described.

We could try elision and shortening respectively. Nicodene (talk) 04:07, 10 December 2024 (UTC)Reply

I personally am fine with elision and clipping respectively since we're already using clipping in the sense of "conscious truncation". Benwing2 (talk) 07:27, 10 December 2024 (UTC)Reply

Seconded. Vininn126 (talk) 07:29, 10 December 2024 (UTC)Reply

I agree with @Nicodene here; I don't see why we need to categorize syncopes, apocopes and aphereses separately from clippings. See Wiktionary:Requests_for_moves,_mergers_and_splits#Template:clipping_of,_Template:aphetic_form_of;_Template:clipping,_Template:aphetic_form (but unfortunately there was pushback for this). Benwing2 (talk) 10:26, 9 December 2024 (UTC)Reply

Alternatively, if "clipping" seems specific to colloquial language, we could merge syncopes/apocopes/aphereses into "elisions". Benwing2 (talk) 10:29, 9 December 2024 (UTC)Reply

It could be that these are all part of the same process, I'm not sure I like the usage of ellipsis - differentiating skipping a word versus a syllable (and from there skipping a syllable in other ways) could be useful. Perhaps that could be a separate parameter. Vininn126 (talk) 10:32, 9 December 2024 (UTC)Reply

@Vininn126 elision, not ellipsis -- they are different processes and would be categorized differently. Benwing2 (talk) 11:07, 9 December 2024 (UTC)Reply

Ah, yes you're right! So perhaps there's something to that, then. Vininn126 (talk) 11:08, 9 December 2024 (UTC)Reply

Why is there no quote-thesis template?

Latest comment: 1 month ago6 comments4 people in discussion

We have cite-thesis. I was wanting to add a quote from a thesis to an entry, and the quote, quote-book, and quote-journal templates are not fit for purpose. Is it worth putting it to a vote? I don't know if I can do that as I've only been active on Wiktionary quite recently. Cameron.coombe (talk) 23:04, 9 December 2024 (UTC)Reply

@Cameron.coombe: I think {{quote-book}} is fine for this purpose, and don’t think a separate template is required. — Sgconlaw (talk) 23:10, 9 December 2024 (UTC)Reply

@Sgconlaw cheers, I assumed thesis titles needed to be set in quote marks, not italics, but that's Chicago style. Does Wiktionary not require this? Cameron.coombe (talk) 23:15, 9 December 2024 (UTC)Reply

I don't think we're that fussy 😊 This, that and the other (talk) 23:37, 9 December 2024 (UTC)Reply

I usually use quote-journal or quote-book and add "|genre=Thesis" as a ham-handed work-around. --Geographyinitiative (talk) 23:45, 9 December 2024 (UTC)Reply

Thanks all! Will proceed with boldness Cameron.coombe (talk) 00:11, 10 December 2024 (UTC)Reply

Request AutoWikiBrowser

Latest comment: 1 month ago2 comments2 people in discussion

I've been doing mass-correction of Portuguese pre-reform or otherwise archaic spellings — for reference, see how many pages are listed in WT:RFVI, and how I've cleared out [[Category:Portuguese superseded forms]]. My current project is clearing out the categories dated forms, archaic forms, and, above all, obsolete forms.

I think AutoWikiBrowser might just be able to help me with my antics — they must've helped my friend @MedK1 —, so I'd like to request access. Polomo47 (talk) 01:37, 10 December 2024 (UTC)Reply

@Polomo47 I gave you access. Benwing2 (talk) 07:26, 10 December 2024 (UTC)Reply

Use of titles in quotes and citations

Latest comment: 1 month ago3 comments2 people in discussion

I'm just fixing a quote here and noticed tbe editor put a title (Dr. med.) preceding the author name. Is this established practice here? I don't personally like it because it's clutter and likely not applied consistently. But I couldn't find a policy. Cameron.coombe (talk) 04:45, 10 December 2024 (UTC)Reply

@Cameron.coombe: I don’t think we have a policy on this yet, but I always remove titles and forms of address unless they are strictly required to identify the author (for example, in some early works, female authors were named after their husbands, as “Mrs. John Smith”). — Sgconlaw (talk) 05:07, 10 December 2024 (UTC)Reply

@Sgconlaw thank you. Do you think it's worth me drafting a policy proposal? Cameron.coombe (talk) 05:26, 10 December 2024 (UTC)Reply

Do we not label attributive adjectives?

Latest comment: 1 month ago8 comments3 people in discussion

I noticed that common attributive-only adjectives are not labelled as such:

former
mere
elder (though in usage notes)
principal

Is there a reason behind this? Whether an adjective is general use (so no note), attr only, postpositive only, or pred only is important information, especially for non-native speakers, and it's provided in other dictionaries. There is an attributive label in the lb template, but it links to a gloss of the meaning for nouns, not adjectives, and, at least based on the common examples above, doesn't seem to be in use? Cameron.coombe (talk) 10:53, 11 December 2024 (UTC)Reply

It is a counsel of perfection that we should properly label every adjective sense that needs such a label. Add the label to the appropriate senses when you find them. I can't think of a practical way to detect all the cases where such labels are missing. A list from some source would be helpful, probably just for the more common cases.

We have labeled as "attributive" (mostly not "attributive only") some 200+ English terms. To label a sense of a polysemic adjective "attributive only" may risk user confusion. DCDuring (talk) 14:38, 11 December 2024 (UTC)Reply

@DCDuring thank you for the thoughts. I'm quite happy with simple "attributive," which is what I'm familiar with from other dictionaries. "Mostly attributive" can also be helpful if pred. sense is rare or nonstandard but attested. I'm not sure about the label auto-linking here though when I'd use it of adjs. Cameron.coombe (talk) 22:11, 11 December 2024 (UTC)Reply

If you are saying that we should link the attributive (and postpositive and predicate) labels to something explanatory, I agree, though I would usually settle for our entries for the terms or {{senseid}}ed definitions at the entries. It also might make sense to have categories for the terms that have such labels. Making the changes required is not in my wheelhouse. DCDuring (talk) 22:30, 11 December 2024 (UTC)Reply

@DCDuring cool, thanks. I wasn't familiar with senseid. I can have a play next time I need to. My only concern now though would be adding a whole lot of attributive labels and then having someone go through and revert them. I've got your support, but I don't know how universal that translates to! Cameron.coombe (talk) 23:15, 11 December 2024 (UTC)Reply

@Cameron.coombe: I would also

Support your additions, FWIW. 0DF (talk) 00:02, 12 December 2024 (UTC)Reply

Maybe, we should give folks a chance to comment.

I'm already disagreeing with myself about my rejection of attributive only as a label rather than attributive. The normal ("unmarked") state of an English adjective is that is prepositive and usable both attributively and as a predicate. The function of our labels is to mark exceptions to the unmarked state. Bare attributive does not do this, IMO. I don't know that we can be certain that only should follow attributive, because exceptions are likely, if not now, then perhaps in the past, and if not in UK and North America, then in Australia, the Caribbean, or India. Maybe the default for all of these should contain usually, with stronger only reserved for cases where the supporting evidence is strong. DCDuring (talk) 00:32, 12 December 2024 (UTC)Reply

True, attributive only or usually attributive is more precise. Other dictionaries use simply attributive, but probably because of space restrictions. (I know space isn't a concern for Wiktionary, but is clutter?) For exceptions, I would handle these as subdefs:

former

(attributive) Previous.
1. (nonstandard, India) Predicatively [I can't think of a usage example lol]

I'm sure this isn't used in India, just an example. Cameron.coombe (talk) 00:57, 12 December 2024 (UTC)Reply

The U4C is ordering an admin to respond to a block appeal

Latest comment: 1 month ago10 comments9 people in discussion

User Ghilt from the Universal Code of Conduct/Coordinating Committee (U4C) has proxy-posted a 3rd block appeal on User_talk:Gapazoid. They have stated that an en-wiktionary admin other than the blocking admin (Surjection) must read and respond to the unblock request. 2603:6011:C801:9FED:B33F:5C52:5400:3763 14:56, 11 December 2024 (UTC)Reply

Why not just change the blocking reason to "disruptive editing" and have done with it? 0DF (talk) 18:22, 11 December 2024 (UTC)Reply

I'm fine with changing the block reason, but I maintain that this editor cannot be allowed to edit again. — SURJECTION ^{/ T / C / L /} 18:57, 11 December 2024 (UTC)Reply

@Surjection: As far as I can make out, your new block reason should satisfy the blocked editor. I think it's reasonable, FWIW. 0DF (talk) 19:23, 11 December 2024 (UTC)Reply

Are you the IP who said they'd kill themself? Polomo47 (talk) 19:01, 11 December 2024 (UTC)Reply

I note that User:Gapazoid is (in addition to being locally blocked) globally locked as a "Vandalism-only account", though User:Xaosflux stated that global unlocking might be considered if en.Wiktionary unblocks. Unless Gapazoid has deleted contributions on other wikis that I can't see, I actually find the global lock rationale harder to understand than the local block rationale; the user appears to have edited only en.Wiktionary and en.Wikipedia, and the few edits to en.Wikipedia appear to be mundane copyediting.
AFAICT Gapazoid has made only a single edit to Wiktionary content (?), to MAP; the only other (eight) edits the user has made were to his/her talk page; is this correct? (I see no deleted contributions.) If Special:Contributions/2600:387:0:803:0:0:0:95 (also locally blocked and globally locked) and/or Special:Contributions/2603:6011:C8F0:E4E0::/64 are the same person, their own sole contributions were to threaten, on Gapazoid's talk page, to commit suicide. If the user has made other edits I have missed, either on Wiktionary or elsewhere, I hope someone will bright them to light.
The user's edit to MAP was to change the usage note from commonly interpreted as a sign that the speaker supports (or is sympathetic to) such people to ...a sign that the speaker supports sexual contact between adults and children. That change seems mistaken / incorrect to me, and had I seen it, I would have undone it with an edit summary explaining that the "supports such people" language seems more accurate, but — if the edit had been made with no edit summary, or with a mundane edit summary — I would have taken it to be a mistaken but good-faith edit, not vandalism, and would not personally have issued a block. However, the edit was made with an edit summary which, like the user's posts on his/her talk page, state that he/she is a pedophile but is opposed to child sexual abuse.
I can understand the user's objection to the original block summary saying he/she engaged in "pedophilia advocacy", and I appreciate the improved block summary. I also understand the position that a user openly announcing himself/herself as a pedophile is disruptive, somewhat similar to w:WP:HID; threats of w:WP:SUICIDE also seem disruptive. I'd also note that my spider sense is that people who are blocked for things like this [edited to add for clarity: meaning "borderline disruptive things", not meaning "pedophilia-related things", which are rarer] and then spend this much time trying to get this many wikis / functionaries / organs of the WMF / etc involved in unblocking them . . . in the situations in the past where it's happened, such users have either been felt also by other wikis' admins to be NOTHERE (and so remained blocked not only here but also on other wikis that considered their appeals), or have been unblocked but then proven themselves to indeed be disruptive (NOTHERE, here only to bog people down in debates, etc) and gotten reblocked in time. Considering all of that, I, for my own part, as just one admin here, decline to unblock. If other admins (or other editors) want to weigh in, I encourage them to do so! I pinged Xaosflux above to make him aware of this discussion, and now ping User:Ghilt and will also leave a note on Gapazoid's talk page pointing to this discussion. - -sche (discuss) 00:44, 12 December 2024 (UTC)Reply

I remember a pedophilia commenting on BP, saying that are a pedophile. It was later hidden within a folder, though. CitationsFreak (talk) 03:53, 12 December 2024 (UTC)Reply

Thank you for your statement, -sche. And also thanks to Surjection for changing the log entry. This concludes the matter for us. On behalf of the U4C, --Ghilt (talk) 09:27, 12 December 2024 (UTC)Reply

We can entertain user lock appeals at m:Special:Contact/Stewards, and yes: overcoming community blocks is a good way help such an appeal be successful. Xaosflux (talk) 12:24, 12 December 2024 (UTC)Reply

After a private lock appeal I have unlocked the account. To respond to comments here, the lock was implemented after an SRG request due to pedophilia advocacy - similar to why we, for example, lock accounts for uploading CSAM on Commons, even if they only edited that project. With that being said, they have made a reasonable further explanation to me in private and I see it as a sign that this can currently be locally handled. EPIC (talk) 16:20, 19 December 2024 (UTC)Reply

Protecting pages as "model pages"

Latest comment: 1 month ago17 comments8 people in discussion

Saltmarsh (talk • contribs) has semi-protected a couple dozen Greek entries as "model pages". I don't think this is a good practice, since it deters editors who could materially improve these pages (no dictionary entry is ever complete), and there are much better approaches, e.g. having example entries in a separate namespace. — SURJECTION ^{/ T / C / L /} 18:46, 11 December 2024 (UTC)Reply

The full list of protected pages appears to be -τερος, Άγγλος, άγγλος, αγγούρι, αγγούρια, ακρογωνιαίος, ανηψιών, ασκί, βαθύς, βρέχει, λύνομαι, λύνω, μεταφρ., περισσότερος. Some of these were originally fully protected (i.e. only admins could edit them). — SURJECTION ^{/ T / C / L /} 18:48, 11 December 2024 (UTC)Reply

@Surjection, PUC, these pages were not locked; I have edited often (they used to be protected from anonymous greek editors who mostly write vanadlisms about soccer teams, and silly schooljokes. That is because we -editors of modern greek- are not around every single day). The models are in Category:Greek model pages so that we can copypaste from them. All languages should have copypaste models for us: because wikitext is getting harder and harder. Also see a trial at User:Erutuon/Ancient Greek model pages which is even more complicated. I always try to find copypaste patterns from recent edits by administrators; I would have liked to have them in some Cat with their endorsement, rather than going around Histories and their Contributions, hoping to find something similar to my task. If not protected, fine: but someone has to patrol them. Thank you. ‑‑Sarri.greek ^♫ I 10:47, 12 December 2024 (UTC)Reply

These pages are still semi-protected and many of them were admin-protected. I don't see any "anonymous greek editors who mostly write vanadlisms about soccer teams, and silly schooljokes" in the history of any of these pages, so they cannot simply have been protected to guard them from vandalism.

The idea of model pages on its own appears sound, but it's not a good idea to make the actual mainspace pages the 'model pages' and then protect them because they're 'model pages'. These should be in a separate namespace. — SURJECTION ^{/ T / C / L /} 10:51, 12 December 2024 (UTC)Reply

No problem: unlock them, M @Surjection. We can make a List and write specific examples -because they cannot be changed without discussion: they are heavily copypasted- at the About Greek page or Help Greek. My administrator @Saltmarsh has done SO much for Modern Greek! I would like to help him a bit. It's just... mmm I need a little help from programmers. For example, a little template for the Orthographic Reform to monotonic of 1982. (cf Άγγλος.2024 cf Notes Little things like that. I could make it myself, but from experience, I see that only interface programmers check Templates and make them in a correct way. ‑‑Sarri.greek ^♫ I 15:52, 12 December 2024 (UTC)Reply

@Surjection, Sarri.greek As far as I can see these "Model pages" do no harm (kindly point any out if you any see one). New editors need help with layout, not always easily extracted from "Help". Protecting them (which again does no harm) ensures that any changes in suggested layout can be discussed. — Salt marsh ^☮ 14:29, 12 December 2024 (UTC)Reply

Yes, thank you @Saltmarsh. Need to trust some pages; the ones checked by an admin. By the way, I am checking some of the pages. When robots finish their work, we can check again. (... I know only named parameters, cannot remember the sequences of positional params: I hate it). I have to throw away alll my cheatsheets. Thank you, dear Salt!! ‑‑Sarri.greek ^♫ I 14:38, 12 December 2024 (UTC)Reply

The harm they do by being unnecessarily protected is to prevent users from editing them. This goes against the entire idea of having a wiki. — SURJECTION ^{/ T / C / L /} 15:21, 12 December 2024 (UTC)Reply

I would oppose protecting any page in principal namespace on the grounds that it is a model page. Such model pages might be useful in Wiktionary space. I wonder how that could work in any page with multiple L2 sections.

It might be useful to have templates, possibly located on entry talk pages, that indicate that a given L2 section has achieved some stage of "completion", so that contributors could find such "models". DCDuring (talk) 14:56, 12 December 2024 (UTC)Reply

Nice idea, thank you M @DCDuring. Something analogous to wikisource's coloured bars. not reviewed / reviewed - see List so and so. A list of 'SOS' pages can be created, especially the ones with 3 Greek L2s, 2 Greek L2s, for every part of speech or inflectional group etc. Usually, I edit Ancient and Modern Greek in unison (lots of pages coinicide and Modern refers all the time to previous etymologies and inflections. Especially with Hellenistic Koine -which has many problems and is usually ignored-). I hope robots will normalise the standard templates because it is very difficult to have 2 or 3 ways to write the same thing in the same page. I am awaiting also for the pending Medieval Greek gkm. Thank you all for your attention. ‑‑Sarri.greek ^♫ I 15:11, 12 December 2024 (UTC)Reply

I realized that the situation in Ancient/Modern/Medieval? Greek made the model-page-in-principal-namespace idea practical for those languages, as other languages do not use the same characters. But it wouldn't work so well for CJKV entries where the different L2s often have different levels of development. I would prefer an approach that worked across all kinds of entries with multiple L2s. Maybe it would be useful to see what works for Greek-character entries along the lines that you suggest, without protecting model pages. That might be a 'model' for entries with multiple L2s in other character sets. DCDuring (talk) 15:30, 12 December 2024 (UTC)Reply

I agree with DCDuring there may be a case to be made for putting such entries in the Wiktionary namespace, but I also agree strongly with Surjection that these protections should be reverted. This is a bad use of the page protection mechanism. — Mnemosientje (t · c) 19:53, 17 December 2024 (UTC)Reply

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘ Well @Surjection "against the entire idea of having a wiki." Wikipedia has numerous such protected pages. These pages do no harm at all - I suspect that basically you "just don't like them". Well I do!, and these interminable discussions, which some people seem to relish, really piss me off !! — Salt marsh ^☮ 19:58, 12 December 2024 (UTC)Reply

Pages are protected because of vandalism, because of high use rate (templates, modules, etc.) or because they are non-content pages that should not be edited by anonymous users. Neither applies here. Again, if we want to have "model pages" that are protected, then they need to be copies outside of mainspace. — SURJECTION ^{/ T / C / L /} 20:45, 12 December 2024 (UTC)Reply

I agree that entries should not be fully protected (admin-only) unless they have been or are likely to become the target of enough vandalism to warrant that (and even then, unless the vandalism has been enduring, protection should generally be temporary, like the protection applied to words that appear on the mainpage). Protecting pages (even at a lower protection level) simply because they are "good" is not the way to go; as recent edits to some of the pages mentioned here have shown, they were far from complete, so preventing some people from improving them is inadvisable. I agree with Surjection that if the goal is to show ideal formatting (or such), it is better to have examples (or even one single example, e.g. made-up word illustrating all possible things, e.g. how to format an adjective, a verb, a noun, all at once) somewhere in Wiktionary: space like the language's "About" / "Entry guidelines" page.

吃飽

Inspired by the discussion above, I looked at what other pages are indefinitely edit-protected at high levels. 吃飽 has been indefinitely protected, allowing only template editors(!) and admins to edit, since an edit war 2019; is this still needed? The user who was edit-warring back then seems to have matured. (Even if there is still a problem, we now have the ability to block specific users from editing specific pages, while still allowing them to edit the rest of the site, which seems like it'd be better than protecting the whole page and thus blocking anyone from editing it.) - -sche (discuss) 00:08, 13 December 2024 (UTC)Reply

Chiming in: Protecting pages for them to be models is obviously unsound. This is platonic idealism, which does not hold water empirically given that there is always room for improvement, you just have not exerted yourself long enough on it. And protecting pages always expresses distrust to users, which needs to have some basis other than the quality of the page. Fay Freak (talk) 00:57, 13 December 2024 (UTC)Reply

Cantonese, Hainanese, and Hakka lemmata treated as Chinese

Latest comment: 1 month ago9 comments3 people in discussion

I've recently done some work periodically to clear out Wiktionary:Todo/Lists/Derivation category does not match entry language. Currently, there are ten entries in categories entitled Cantonese, Hainanese, or Hakka terms derived from X which are not also part of the corresponding lemmas or non-lemma forms categories. Chinese/Cantonese 0T used to be an eleventh such entry, but then I edited it to resolve that problem with it. However, doing so seemed to cause other problems.
Firstly, the entry is still a member of Category:Chinese lemmas and Category:Chinese nouns despite the lang-code changes. Secondly, whereas {{lb|zh|HKC}} correctly displays (Hong Kong Cantonese) and adds the entry to Category:Hong Kong Cantonese, {{lb|yue|HKC}} just displays (HKC) and does not categorise the entry at all. And thirdly, my changes moved the entry from Category:zh:Beverages (173 members) and Category:zh:Chinese restaurants (48 members) to Category:yue:Beverages (0 member[s]) and Category:yue:Chinese restaurants (0 member[s]), both of the latter of which were red-linked until WingerBot created them.
This seems a manifestly suboptimal way to resolve the language-mismatch issue in the cases of these Cantonese, Hainanese, and Hakka lemmata. What is the proper way to deal with these cases? 0DF (talk) 00:37, 12 December 2024 (UTC)Reply

Pinging ND381 (who added Cantonese sorry), Wpi (who added Cantonese 0T and the other Latin-script Cantonese terms), Justinrleung (who added Hainanese 弄 and 枚), and Mar vin kaiser (who added Hakka 雪文). 0DF (talk) 03:31, 16 December 2024 (UTC)Reply

@0DF: Chinese is a special case, because terms are simultaneously Chinese and any of a huge variety of sublects. The writing system has a lot to do with this, since it allows writing things that are basically the same words in writing but completely different when spoken. It's very complicated, with variations in grammar, in pronunciation, and in writing that only partly overlap.

There's a whole universe of Chinese-specific templates and modules that do things in a completely different way from anything else on Wiktionary. When I'm going through the Todo lists, I treat most of the Chinese-related stuff as false positives and leave it alone. In all likelihood, "fixing" things will just cause other problems. The other CJKV languages share some of the same issues and are best left alone, for the most part.

I do fix things like Chinese etymologies that use language codes for Tibetan, and any {{lb|en}} on CJKV definition lines- but I know my limits (I took a year of Beginning Mandarin at UCLA, but that was 38 years ago). Chuck Entz (talk) 04:16, 16 December 2024 (UTC)Reply

@0DF: This is happened before (see here) and the correct way would be something like Special:Diff/72937108/75521861, and not changing it to Cantonese L2.

There seems to be a bug(?) in mod:zh-pron where it did not add Cantonese lemmas if |c= is empty and | (which adds Cantonese nouns) - I'll look at this later this week. – wpi (talk) 06:43, 16 December 2024 (UTC)Reply

P.S. I should note that on Wiktionary:Todo/Lists/Derivation category does not match entry language/description#Cleanup instructions it says Occasionally, the L2 language header, etymology template and {{head}} template may disagree on the language of the entry. If you do not speak the language(s) involved, it is best to ask the entry's creator to resolve the issue. (bold text mine) – wpi (talk) 06:47, 16 December 2024 (UTC)Reply

@Chuck Entz: Yes, I was somewhat aware that Chinese is a unique case: unified in writing, but divided in speaking. Thank you for chiming in.

@Wpi: OK, I'll add {{cln|yue|lemmas}} vel sim. henceforth. That should fix things. Thanks for pointing me to the correct solution, and I hope you're successful in fixing the issue with Module:zh-pron. I'd already clocked that “If you do not speak the language(s) involved,…” caveat, but if I observed that literally, I wouldn't be being nearly as productive or helpful as I would be by being bold in editing. I'd already noticed that my changes to 0T were inadequate, hence my raising the issue in this BP section and then pinging you and the other relevant editors, which has led to the proper resolution, so I think I have my boldness–caution level fairly well calibrated.

0DF (talk) 13:34, 16 December 2024 (UTC)Reply

@wpi: I've employed your {{cln|…|lemmas|[POS]}} solution. However, I'm apprehensive that that lead to the creation of Category:Hainanese verbs and Category:Hainanese classifiers, at least the former of which I would have expected already to have existed (like Category:Hakka nouns already did). Is there some reason why Hainanese terms shouldn't get POS categories? Justinrleung? 0DF (talk) 14:03, 16 December 2024 (UTC)Reply

Thanks. Regarding Hainanese, I believe it's because {{zh-pron}} does not support Hainanese (yet), so there hadn't been any category infrastructure for it. – wpi (talk) 14:40, 16 December 2024 (UTC)Reply

@wpi: Ah, OK. In that case, the categories are ready-made for a time when {{zh-pron}} does support Hainanese. Thanks for your help. 0DF (talk) 17:02, 16 December 2024 (UTC)Reply

Temporary Accounts - introduction to the project

Latest comment: 29 days ago4 comments3 people in discussion

The Wikimedia Foundation is in the process of rolling out temporary accounts for unregistered (logged-out) editors on multiple wikis. The pilot communities have the chance to test and share comments to improve the feature before it is deployed on all wikis in mid-2025.

Temporary accounts will be used to attribute new edits made by logged-out users instead of the IP addresses. It will not be an exact replacement, though. First, temporary users will have access to some functionalities currently inaccessible for logged-out editors (like notifications). Secondly, the Wikimedia projects will continue to use IP addresses of logged-out editors behind the scenes, and experienced community members will be able to access them when necessary. This change is especially relevant to the logged-out editors and anyone who uses IP addresses when blocking users and keeping the wikis safe. Older IP addresses that were recorded before the introduction of temporary accounts on a wiki will not be modified.

We would like to invite you to read the first of a series of posts dedicated to temporary accounts. It gives an overview of the basics of the project, impact on different groups of users, and the plan for introducing the change on all wikis.

We will do our best to inform everyone impacted ahead of time. Information about temporary accounts will be available on Tech News, Diff, other blogs, different wikipages, banners, and other forms. At conferences, we or our colleagues on our behalf are inviting attendees to talk about this project. In addition, we are contacting affiliates running community support programs.

Subscribe to our new newsletter to stay close in touch. To learn more about the project, check out the FAQ and look at the latest updates. Talk to us on our project page or off-wiki. See you! NKohli (WMF) and SGrabarczuk (WMF) (talk) 03:27, 12 December 2024 (UTC)Reply

Could a user chose to use IPs instead of temp. accounts if they wanted? CitationsFreak (talk) 03:50, 12 December 2024 (UTC)Reply

No, it will not be possible. The only choice will be between the accounts: logged-out (temp account) or logged-in (regular account). SGrabarczuk (WMF) (talk) 14:08, 12 December 2024 (UTC)Reply

Great! It'll give me a chance to have even more usernames! P. Sovjunk (talk) 23:44, 24 December 2024 (UTC)Reply

Banning Proto-North Caucasian and Proto-Northeast Caucasian reconstructions

Latest comment: 1 month ago10 comments9 people in discussion

1. Proto-North Caucasian. In my opinion, there are currently no reconstructions of the Proto-North Caucasian simply on the grounds that there are no reconstructions of the Proto-Northeast Caucasian. Here I would prefer to end any discussion about this superfamily and delete the category itself in order to avoid reconstructions.

2. Proto-Northeast Caucasian. Just as it was written above, I believe that there are no reconstructions of the Proto-Northeast Caucasian. Whereas the so-called reconstructions of Starostin and Nikolaev are actually tentative pseudo-reconstructions. In addition, they do not give reconstructions of Proto-Northeast Caucasian forms anywhere. All their reconstructions in the database are Proto-North Caucasian, which are identical, apparently. Realizing this, Johanna Nichols uses the pound sign (#) for pseudo-constructions in her works.

This convention follows Williams 1989, who uses the asterisk for reconstructions based on regular sound correspondences and the # for "[p]seudo-reconstructions based on a quick inspection of a cognate set without working out sound correspondences".

It should be noted the recent case of a User:Qmbhiseykwos who began to add (in addition to pseudo-reconstructions by Nichols and "reconstructions" by Starostin and Nikolaev) "reconstructions" by the Dutch linguist P. Schrijver (2018, 2021, 2024), which should also be considered pseudo-reconstructions. For example, Reconstruction:Proto-Northeast Caucasian/rɔḳʷ(ə).

2.1. Appendix. Since the Wiktionary does not operate with the concept of tentative pseudo-reconstructions, all such "reconstructions" of the Proto-Northeast Caucasian should be indicated only in the appendix. For example, Appendix:Proto-Nakh-Daghestanian reconstructions

2.2. Renaming. I believe that it is necessary to rename the family to the (Proto-)Nakh-Daghestanian one. This must be done, since the name hint at the division of the North Caucasian and South Caucasian languages (Kartvelian), which is unacceptable.

2.2.1. Accordingly, it is necessary to rename the (Proto-)Northwest Caucasian to the (Proto-)Abkhazo-Circassian or (Proto-)Adyghe-Abkhaz, etc.

3. Proto-Daghestanian. It may be necessary to create a category for this family. Regarding this family, there are curious reconstructions by B. Giginejšvili (1977) and E. A. Bokarev (1981). But they don't seem to give any reconstructed forms. It is difficult to tell me anything here, since I have not studied these languages. I'll give you a comment by the American Caucasologist Alice C. Harris (2003: 180):

“It should be noted first that the phonetic reconstructions proposed by Nikolayev and Starostin (1994) and adopted by Alekseev (1985) are not widely accepted. For example, Nichols (1997) and Schulze (1997) show serious problems with the proposals in Nikolayev and Starostin (1994), and Giginejšvili (1977), Schulze (1988), Talibov (1980), provide reconstructions that are in various ways more rational”.

@Vahagn Petrosyan, კვარია ɶLerman (talk) 12:17, 15 December 2024 (UTC)Reply

Not even Nakh-Daghestani sound correspondences are fully understood, to even consider enrolling Abkhaz-Adyghe here is insanity. Proto-North Caucasian was never _not_ controversial, so I have no clue why it was even added to Wiktionary in the first place. Nuke Proto-North Caucasian. On the other hand, banning Proto-Nakh-Daghestani reconstructions is perhaps too extreme. Imho, there's no great harm in having them exist even if it turns out they're wrong/imprecise. კვარია (talk) 14:39, 15 December 2024 (UTC)Reply

Agree. Tollef Salemann (talk) 16:14, 15 December 2024 (UTC)Reply

Nuke North Caucasian both as a group and as a reconstructed language, for Nakh-Daghestani/NE-Caucasian - I'm fine with agreeing not to create reconstructions, but I think having a code may be a good idea nonetheless. Just need to patroll them from time to time. Thadh (talk) 21:57, 15 December 2024 (UTC)Reply

Nuke North Caucasian yes, already since it's unclear if even the family exists at all. Tentative cognates in NWC could be noted in NEC entries if we end up having/keeping them (same as we do with longer-standing Indo-Uralic, Altaic, etc. etymologies).
Keep Proto-Northeast Caucasian. NCED's (and Schrijver's) reconstructions may have many problems, but they are generally not "pseudo-reconstructions", and there's enough reason to think many of them are at least valid etymological groups. Any etymologies where Nikolayev & Starostin propose NWC reflexes are given in PNC form, but this is mainly because they set up very few changes from there to PNEC. The one I find on a lookover of their preface is *gg(w) > *ddɮ(w). In effect they admit the reconstruction is of PNEC in the first place, but it comes out so complex they end up able to derive (their reconstruction of) PNWC almost directly from it.
I do not follow the argument against "Northwest Caucasian" and "Northeast Caucasian", perfectly illustrative and mainstream names as far as I can tell. What is the "division [that] is unacceptable"? Treating Kartvelian / South Caucasian as an unrelated family? That if anything seems to be much closer to consensus than the question of North Caucasian, and I also do not see how this would be "hinted at" here.
"Daghestanian" as a distinct node is also not consensus, does not have distinct reconstructions for it either, and should not be added (seems IMO like an outdated typological unit against Nakh being more innovative). Probably we should not commit to any NEC grouping scheme beyond the unambiguous base units like Lezgic or Avar-Andic.

--Tropylium (talk) 13:58, 16 December 2024 (UTC)Reply

I agree, let's ban Proto-North Caucasian. I have no opinion on the rest of the issues. I will only note that there are many weak scholars and outright charlatans dealing with the three Caucasian branches. All of their etymological works should be reviewed by our more intelligent editors. @Qmbhiseykwos, no mindless copying, please. Vahag (talk) 18:03, 16 December 2024 (UTC)Reply

Perhaps we should have a section in the Appendix namespace for StarLingish. It would fit right in with Klingon, Na'vi and other constructed languages from fictional universes... Chuck Entz (talk) 06:06, 17 December 2024 (UTC)Reply

I, too, am surprised about the presence of Proto-North Caucasian on Wiktionary. It must have slipped our eyes like the proverbial monkey walking down the street, few would be willing to see, and only been included as a consequence of Wikipedia or another reference not having been unequivocal about its unacceptedness. It has to be removed. Fay Freak (talk) 21:38, 16 December 2024 (UTC)Reply

Since there is consensus to delete (Proto-)North Caucasian, I'm going to implement that. — SURJECTION ^{/ T / C / L /} 13:29, 17 December 2024 (UTC)Reply

How many users does it take to make a decision? ɶLerman (talk) 15:45, 17 December 2024 (UTC)Reply

Adverbs?

Latest comment: 1 month ago3 comments2 people in discussion

I know how much everyone loves a part-of-speech question, so here is another one.

"I was indoors."

"He is upstairs."

"They were outside."

"Look, your keys are there!"

Sometimes I feel that some dictionaries, including Wiktionary, are coy about giving examples of this nature, as if they are unsure of the part of speech of the complements. While these are not "traditional" adverbs, in that they do not modify anything adverbially in a traditional sense, and cannot be removed leaving a relevantly valid sentence, nevertheless they do answer adverbial wh-questions, and do not seem like adjectives. Some people call these "adverbial complements", I think. Are we happy to place these uses under "adverb"? Another possibility for some cases -- e.g. "outside" in these examples -- is "intransitive" preposition, in that "They were outside" implies "They were outside (somewhere/something)", but I'm not sure that this concept is fully mainstream. What do you think? Mihia (talk) 21:52, 17 December 2024 (UTC)Reply

The 2021 vote against categorizing words as intransitive prepositions is still current, isn't it? "Adverb" seems acceptable to me.--Urszag (talk) 22:04, 17 December 2024 (UTC)Reply

Gosh, I forgot entirely about that vote. Thanks for reminding me. Can we make any clear distinction between "I was indoors" being an adverb, and "Is Mr. Smith in?" being an adjective, as is currently listed at in — and, indeed, generally between my examples above and various other supposedly adjectival instances of other "short function words", where in some cases the philosophy, quite possibly perpetuated in part by myself, seems to be "if it's the complement of the be-verb then it's an adjective"? Mihia (talk) 22:54, 17 December 2024 (UTC)Reply

Rethinking Middle Korean verb lemmatization

Latest comment: 7 days ago7 comments5 people in discussion

@AG202 @Solarkoid @Chom.kwoy @Tibidibi

As per Wiktionary:About Korean/Historical forms#Lemmatizations and this discussion in October 2020, we currently lemmatize morphophonemic forms for nouns but allomorphic forms for verbs (faithful to "而餘皆爲入聲之終也然ㄱㆁㄷㄴㅂㅁㅅㄹ八字可足用也").

But I really cannot help but think (as I and others have already stressed) that this is misguided and adds needless confusion.

Etymology sections already use the faithful phonemic form by convention. This creates at best alt hyperlinks/double hyperlinks and at worst redlinks even when we have an entry for the MK verb in question. This is especially problematic because, let's be real, 99% of people ever going to MK entries on here do so through a MoK ety section.
In the discussion linked above, it was said that "by convention" Korean lemmatizes actual inflected forms for verbs.
1. Since when? Even in Modern Korean, 다 (-da) is defined as a "dictionary citation form ending," sufficiently demonstrating that even within our morphological orthographic framework we are specifically citing "dictionary forms," not any real form in use.
2. This should and has carry/ied over to Middle Korean dictionaries. Consider four popular Middle Korean dictionaries—15세기 국어 활용형 사전, 우리말큰사전(옛말과 이두), 고어사전, and 한불자전. Consider now that the former two use the "morphophonemic spelling," and only the latter two use "faithful" spelling as we do now. Consider further that 고어사전 is a) from 1960 and b) also lemmatizes other forms such as the infinitive in some cases, with the express goal of being accessible for learners. We don't do this, we shouldn't do this, etc. 한불자전 is from 1880(!) and was written by a French missioner. Is this really the precedent for us to be following?

I would love to start adding more MK entries but there are a lot of gaps right now in infrastructure(?) that make this difficult. This is IMO the largest blocker; I've brought this up countless times on the Discord, but I'd love to reach an actual BP consensus. Any input appreciated. Lunabunn (talk) 03:48, 18 December 2024 (UTC)Reply

Agreed. I've already expressed my opinion on this several times before, but I'd rather the forms show the original stem. This will be beneficial for the learners and those curious in the long run, and it will help majorly advance the cause to create an automated conjugation template, most importantly for the header.

Additionally, I myself also want us to reach a consensus fast, as there seems to be a confusion whether or not Modern Korean etymology header contains the actual, attested form of the verb (=용언) or the root. The only real caveat is, syllable-final ㅸ looks pretty ugly in syllables — 어드ᇦ다, 셔ᇕ다, ᄠᅥᇕ다, etc... would be some of the roots we have to add. Other than that, I think this is for the best.

- Solarkoid (talk) 18:02, 18 December 2024 (UTC)Reply

Support: Matches what we do for other Koreanic lects. AG202 (talk) 18:05, 18 December 2024 (UTC)Reply

Strong support with a suggestion. This exact issue has been on my mind for the past few years ever since editors, including myself, have begun adding significant numbers of MK verb/adjective entries. I thought I should speak on this matter as someone who has added numerous MK entries throughout the years. Thanks @Lunabunn for finally bringing this up.

I can now see consistency for consistency's sake is really the only thing going for the current "historically faithful (allophonic)" framework we have. While unapologetically uniform in its lemmatization rules, I agree that this leads to needless confusion and is at the expense of navigability. This is especially true for readers who likely access MK entries through MoK etymology sections (whom I assume are the overwhelming--I can't stress this enough--majority as you have mentioned). As for the "convention" from the previous discussion (I was there), I believe it referred not to dictionaries but the MK spelling convention, i.e., 표면형(表面形) (phonetic), as using the 기저형(基底形) (morphophonemic) would be anachronistic.

Speaking from personal experience, this has also been quite confusing and time-consuming for even editors who are familiar with MK orthography. For example:

Having to actively think about the proper "historically faithful" lemma when creating wikilinks (see how, in ᄉᆡᆷ, I had to link 기픈 to the "proper" 깁다, ATM a totally unrelated MoK entry, instead of the phonemic 깊다, at least the descendant MoK entry), which isn't intuitive at all as myself a native Korean speaker accustomed to MoK orthography. Although correctly linked according to current conventions, I would imagine this would be utterly baffling to a beginner.
Having to add "phonemically faithful" stubs to make up for this (e.g., see the MK "entry" for 및다, which would become the main MK entry under this proposal), unnecessarily adding workload to the already thin MK editor base.

All in all, it is clear that, in addition to the points Lunabunn has brought up, the positives--if any, really, other than doing it ostensibly for faithfulness' sake--of creating lemmas consistent with historical MK orthography do not outweigh its numerous negatives. Being anachronistic is not a good reason to continue this. I am now convinced that this is not the goal for which we should aim, especially Wiktionary being a word dictionary and not a spelling guide. This is also what modern monolingual dictionaries do, and this is what we should follow, which is more in-line with general Wiktionary policy. Moreover, we already don't do this for nouns, so the historicity argument is indeed moot.

However, I do not think an entirely phonemically faithful lemmatization scheme is desirable. As @Solarkoid specified, this would mean we would need to create entries such as 셔ᇕ다, which never appeared in actual MK or MoK texts (it's like an imaginary number) and is, well, yes, "ugly." Aside from looks, which shouldn't be something we consider in a dictionary, the general 표준어대사전 and the academic 15세기 국어 활용형 사전 both actually list the historically faithful 셟다 as their headword, while mentioning the phonemic 셔ᇕ- as a "form" appearing before vowels (which is not wrong). Conversely, both list the phonemically faithful 맞다 rather than the historically correct 맛다 as the headword. Monolingual dictionaries (and thus conventionally cite verbs/adjectives with the ending -다) seem to treat the lenes ㅸ and ㅿ (distinct phonemes in MK) as exceptions, to align, I am pretty sure, with how MoK treats them. In fact, 15세기 국어 활용형 사전 explicitly states this in its preface. Therefore, you simply are not going to find 셔ᇕ다, particularly as the headword with the ending -다, in any mainstream dictionary or work (except, I suppose, research papers in Korean, which have the liberty of, well, not being a dictionary for learners; they could use forms such as 셔ᇕ다 all they want).

I believe we should not implement an entirely phonemically faithful lemmatization scheme for, again, the sake of uniformity, as neither do popular modern monolingual dictionaries do this; we would be the first dictionary to do this, as it is also demonstrably not a "convention," as with currently using historically accurate forms. As such, I think creating entries such as 셔ᇕ다, spelled in Hangul, would also be a source of confusion for those expecting the same lemma coming from popular MK dictionaries (as well as deviating from MoK morphophonemic standards [which treat vestigial W [-w-] and z [-∅-] as allophones calling it "irregular conjugations"] on which they by principle base lemmatization [for a stage of the language when W and z were still phonemes] yet with which most people would be familiar). It's not that using 셔ᇕ다, besides aesthetics (lol), is inherently wrong (it's actually correct); however, 셟다, technically "wrong" (read: an inconsistent treatment), is how contemporary dictionaries have chosen to lemmatize in order to make it easier for modern readers unfamiliar with MK phonology. So it's really nobody's fault; we would just be following precedents--conventions as you will.

Yet, this is not perfect either, as some words would be lemmatized according to a different principle from the rest (and for something as superficial as their spelling at that). Nevertheless, I propose that we still likewise make exceptions for cases like ㅸ and ㅿ (the only exceptions I could find with a cursory review of dictionaries) for the reason explained above, in Hangul, of which we use the historically faithful spelling, but apply an entirely phonemically faithful (containing the root) scheme in Romanization. We can do this as Wiktionary is unique in that it always provides both Hangul and Romanization for MK.

So, for example, in -ᆸ다 where ㅂ represents an underlying /β/ such as in 셟다, we would use the Yale W. Consequently, we would get 셟다 (Yale: syelW-ta), with the historically faithful Hangul spelling as the headword and phonemically faithful spelling as the romanization. We would not be the first to do this, as some English language works on MK, which only use Yale Romanization, do exactly this (see Martin 1992 p. 57, who uses the phonemic stem syelW- to refer to this exact word). In -ᆸ다 where ㅂ represents an overt /p/ such as in 저줍다, we would obviously still use the Yale p, and such verbs/adjectives are not affected by this proposal. Hence, instead of using -ᇦ다 and -ᆸ다, -ᆸ다, as an exception, could have two possible romanizations, -Wta and -pta, depending on the word, but the Hangul spelling won't reflect this. The same goes for -ᆺ다 with -zta and -sta, instead of -ᇫ다 and -ᆺ다 (e.g., ᄃᆞᆺ다 and 벗다). In all other cases, both Hangul and romanization would represent the phonemic spelling as opposed to the historically faithful spelling that we use now, as per the proposal. This compromise would follow the convention found in monolingual dictionaries while still being consistent in providing readers with at least one phonemically faithful representation throughout all MK verbs/adjectives; there is no ambiguity, and the two different phonemes are distinguished.

This seems like a simple enough solution for a problem of an otherwise commonsense change IMO. The only downside I could think of is the need for manual input for transliteration, but MK already has these cases.

For those who might not fully understand or tl;dr, here is essentially what would happen:

Current entries with "historically faithful" spelling must be moved and could be converted to non-lemma entries as an inflected form. 맞다 becomes the lemma while 맛다 is reserved for an entry for "inflected" forms (if they are ever created, though [this should not be the focus]; Middle Korean 다 (-ta) had a more complicated usage compared to its modern descendant, so it wouldn't be a mostly empty, redundant entry. Nonetheless, I think the entry at Middle Korean 다 (-ta) will suffice). 다 (-ta) would serve two functions: form part of the dictionary citation form as per the modern convention (with phonemic spelling) and as part of inflected forms (e.g., declarative mood suffix) (with historical spelling). The second case would ever only be seen in conjugation templates, quotations, or, as mentioned above, non-lemma stubs. This entails that, for example, 맞다 (mac-ta), as the lemma, is the only form you would see in most parts of Wiktionary, while 맛다 (mas-ta), despite being historically accurate, would only be seen in the above mentioned places. For the ㅸ/ㅿ cases, if accepted, the romanization 셟다 (syelW-ta), the lemma version, would be the one seen in most parts of Wiktionary, whereas 셟다 (syelp-ta), with the same Hangul spelling and "accurate/literal/surface" transcription, would, again, only be seen in the above mentioned places (telling the reader that it represents a real (attested or possible) form/inflection with 다 (-ta), for disambiguation purposes). And, of course, for anything else, normal romanization rules apply (e.g., 셟고 (syelp-kwo)); only ㅸ/ㅿ headword forms get this special treatment.

Examples of current entries whose main entry would be affected (if we adopt an entirely phonemically faithful lemmatization scheme) are:

더럽다 (telepta) and ᄃᆞᆺ다 (tosta) would be moved to 더러ᇦ다 (teleWta) and ᄃᆞᇫ다 (tozta), respectively. However, if the above-mentioned exception is applied, these would stay at their original locations.
벗다 (pesta) would stay where it is, as its Hangul phonemic and historic spellings are the same; ᄌᆞᆽ다 (cocta) would stay where it is, but ᄌᆞᆺ다 (costa) is correct under the current convention.
여다 (yeta) and 우다 (wuta) would be moved to 열다 (yelta) and 울다 (wulta), respectively.
깃다 (kista, “to rejoice”) would be moved to 기ᇧ다 (kiskta), whereas 깃다 (kista, “to cough”) would be moved to 깇다 (kichta).
됴타 (tywotha) and 나타 (natha) would be moved to 둏다 (tywohta) and 낳다 (nahta), respectively.

-- 123catsank (talk) 01:14, 21 December 2024 (UTC)Reply

Thank you for your thorough contribution. I am relieved to hear that my opinions on the matter are shared by other editors (no doubt more experienced than myself). Just one thing I would like to comment on:

In fact, 15세기 국어 활용형 사전 explicitly states this in its preface.

This seems misleading. Yes, that dictionary does indeed state in its preface that W and z stems would be listed with p and s respectively, but it also explicitly states that it is for convenience only, not indicative of an analysis of these stems as p and s stems under any circumstance ("... 이런 어간들의 기본형을 'ㅅ, ㅂ'으로 하겠다는 인식을 반영한 것은 아니고 편의상의 조치임을 밝혀 둔다."). Indeed, modern scholarly practice does not treat W and z stems as irregulars (although 표준국어대사전 does, that's just because it's ass), so we shouldn't either.

Now, if we choose to lemmatize these forms with p and s instead of W and z anyway for convenience, I do not necessarily object. I do, however, find myself wondering what convenience we gain by lemmatizing p and s if that means we have to manually specify the headword for romanization.

If we decide to lemmatize with p and s, I would also like to suggest that we use the W/z form in the hangul headword as well, not just its romanization. This would be aligned with how we don't include diacritics in entry titles but still show it in the headline.

(p.s. Do you edit on a different account and/or are you in the English Wiktionary Discord?) Lunabunn (talk) 03:14, 27 December 2024 (UTC)Reply

I agree. Manual transliteration for the same "spelling" displayed would not be ideal. I would support Lunabunn's idea if we do decide to lemmatize with /p/ & /s/. AG202 (talk) 03:40, 27 December 2024 (UTC)Reply

I support @123catsank's suggestion. Let's do it like this:

놉다 (nwop-ta) shall be 높다 (nwoph-ta), and ᄀᆞᆮ다 (kot-ta, “same”) shall be ᄀᆞᇀ다 (koth-ta).
깃다 (kis-ta, “to cough”) shall be lemmatized as 깇다 (kich-ta).
깃다 (kista, “to rejoice”) shall be lemmatized as 기ᇧ다 (kisk-ta), and 맛다 (mas-ta, “to take responsibility”) shall be lemmatized as 마ᇨ다 (mast-ta).
ᄌᆞᆺ다 (cos-ta, “frequent”) shall be lemmatized as ᄌᆞᆽ다 (coc-ta).
됴타 (tywotha) shall become 둏다 (tywoh-ta), and 여다 (yeta) shall become 열다 (yelta).
더럽다 (telep-ta) and ᄃᆞᆺ다 (tos-ta) shall stay.

-- Chom.kwoy (talk) 07:43, 16 January 2025 (UTC)Reply

Beekes

Latest comment: 1 day ago9 comments6 people in discussion

Bluntly, Beekes is neo-Vennemann, except for Greek, and without even an actual attested language (/family) from which to derive the substrate.

That may even be too polite. I personally have thought Beekes dubious since my first encounter with him (his grammar of Avestan, in which he identifies numerous Avestan roots without Sanskrit analogues, almost none of which are actually without obvious Sanskrit analogues). That said, I am joined by Meissner, de Decker, Vine, Verhasselt, Beckwith, Nikolaev, Woodhouse, Olson, Miller, Simkin, Colvin, Meester, Garnier, Nardelli, and countless others in my reservations about Beekes as a source in the specific matter of Greek etymology/'Pre-Greek'. Even *within* Leiden, Beekes was considered peculiarly dogmatic, even by very close colleagues (e.g. Lubotsky, Kloekhorst, etc.) - indeed, even van Beek, his prize student, has published numerous papers over the last several years, especially after Beekes' death, rejecting Beekes' particular approach to Pre-Greek. Kroonen's public critique is also worth noting.

I have numerous criticisms of Beekes' approach to Pre-Greek, and am happy to systematically go through them if anyone should wish, but I hardly need to, since the critical scholarly literature is, at this point, voluminous. That said, if anyone is curious, do ask.

I am not going to go so far as to say that Beekes should not be cited at all on matters of etymology, but his views should *always* be tagged as his, as opposed to in the voice of Wiktionary, and preferably with a modifier that makes it clear that his views do not reflect the communis opinio ('Beekes, typically, assigns...' or similar), where applicable, which is frequently the case.

GatlingGunz (talk) 18:11, 18 December 2024 (UTC)Reply

Beekes's etymologies are all over Wiktionary not because we find him particularly reliable, but because his accessible dictionary is the only one in English, so it was easily copy-pastable into Wiktionary. Frisk is in German, Chantraine is in French. Others' English etymologies are sprinkled across inaccessible articles.

Now good luck finding someone to go over the several thousand pages referencing his dictionary and reviewing his proposals one by one. The damage may be permanent. Vahag (talk) 18:24, 18 December 2024 (UTC)Reply

By the way, apart from the pre-Greek stuff, Beekes is almost entirely a word for word translation of Frisk. —Caoimhin ceallach (talk) 23:04, 19 December 2024 (UTC)Reply

Regular Wiktionary editors all have made pertinent observations and more or less openly concluded with remarks encouraging liberal dismissal of Beekes’ etymologies.

It would be more frank to mark etymologies as unknown or uncertain or otherwise speculated upon, while pushing Beekes’ claims of to his mere reference, not worthy of taking space in serious etymology, since collectively they have to be regarded as nuisant.

Of course, the silver bullet for anyone in the know about the particular philology is to cite an author or more to positively provide differing opinion. You don’t “need to” but it is a gain for all of humanity and your personal scholarly achievement. There is a mismatch between those who have an intimate familiarity with certain comprehensive university libraries and other historically interested people who attempt to have conceptions of the past, if only because one works on another language touching upon Greek. Our open Hellenic lexicography is seriously underdeveloped, and part of it is uncritical thinking, burdened by Beekes’ dogmaticism and lifeless superficiality in place of inviting examples of how language science is actually done. Fay Freak (talk) 16:19, 19 December 2024 (UTC)Reply

I heartily agree that Beekes' proposals should be differentiated from the 'voice of wiktionary' or 'the fact of the matter,' in some way that is also succinct and clear for people who are (to the point) unused to doing or engaging with etymology/philology themselves, though I think any wording alluding to 'typical' positions or proposals should be charitable. It's obvious to anyone with any exposure to the field that Beekes' opinions would be his alone; the issue is the lack of alternative proposals in entries (something that has to be done by hand) and that subset of people that don't understand that, who would see a reference and an etymology and say 'OK then, checks out,' and go no further.

I also agree that actually going through and addressing every single claim is a monumental task; however, I think there is little for it other than for people with the relevant background to simply chip away at it, little by little.

I (for one) would not support some modification that removes etymologies attributed to Beekes wholesale, or that simply marks them all as contravening common opinion or as 'typical of Beekes' without further comment (the obvious implication being with such wording that Beekes is typically dubious.) I think wholesale marking of opinions as typical or against common opinion would be committing more of the same sin that has saddled us with thousands of these entries - a quick convenience but one that is poor practice, even if it is the case (in my opinion or yours) that the wording would end up being accurate in the majority of cases. Instead, criticism or contrasting opinion and claims should be specific and given on a case-by-case basis, just as it would be if you were to publish a paper on the relevant etymologies and reconstructions - ie. you ought to provide supporting information as to why (via references is fine, being as it is that wiktionary isn't the place for primary research.) This is to say, if you are to add such notes on an entry, I think you'd do best to expand on your note with alternative views and why both exist (at least implicitly via references) instead of leaving it at 'this proposal is typical of XYZ.' This can only be done one-by-one or in smaller batches.

I'd also observe that this topic/issue reflects broader issues here on wiktionary: there are many etymologies that are unjustified and/or unsourced (but clearly copied almost verbatim from some source), or which have been mass-added from some random source, or which use some good quality source of yesterdecade that is nevertheless outdated and wrong, either on the particular lemma in question or in a more pervasive manner (eg. modern reconstructions have moved on in some systematic respect), or which have done one of these, someone has later added information to it (agreeing or disagreeing), maybe this has happened 2-3 times, but the referencing work hasn't been done to distinguish which claims are associated with which reference. Sometimes etymology sections cite differing reconstruction schemes in the same paragraph without giving any indication that they're different/why.

There are likewise many cases of a lack of supporting argumentation or reasoning for why two proposed etymologies may exist, often they are just mentioned, and when editors come to justify one etymology over another, the best you often get is some comment like 'X is preferred' with no justification or reasoning (which suggests to a reader that the editor doesn't understand why and is taking it on authority, or that they are biased.)

All of that is bad practice, though to differing degrees - someone may mass-add from either a good source or a bad source and as long as they include adequate referencing I will say in their defense that at least they have tried to improve coverage in some measure. If someone obfuscates the referencing later by adding additional claims it is hardly their fault. I suspect many will mass-add from some source, let's just take the Geiriadur Prifysgol Cymru as a hypothetical, knowing some subset of given etymologies are wrong/outdated with the intent of updating them with further references and information later, only never to get to it. If someone has used an egregiously awful or biased source, it is best to approach them on it via talk pages, and some cases could warrant mass spoilers/notes or even removal, but I am not sure Beekes or any number of other pet poor-historical-linguistics-opinions scholars (I certainly have a personal list...) fit in that category, even as it is that I may strongly disagree with their associated proposals.

Some languages and language families are a lot better/worse for this stuff than others (I notice a lot of etymologies that were state-of-the-art at one time but that are now out-of-date on Japanese pages particularly, for example.)

In all these cases, my intent is not to flame past contributors' efforts, instead, I think simply chipping away at the issue on a case-by-case, fixing references and providing new ones, and occasionally reaching out via talk pages, is the only realistic solution. Herthaz (talk) 22:08, 7 January 2025 (UTC)Reply

Being skeptical about Beekes' tendency for pre-Greek origins is all fine, but the idea is not to simply remove this without any understanding of what Beekes is saying. Exarchus (talk) 13:48, 19 January 2025 (UTC)Reply

A prerequisite for commenting is being able to distinguish between X doesn't understand what Beekes is saying and X is dismissing what Beekes is saying. That you can read through a thread in which I demonstrate an exhaustive command of the literature on Beekes' work alone, including verifiable personal observations on his work on a language that is not Greek, and reply with this kind of contextually deeply stupid and hostile nonsense, marks your deficit of competence, not mine.

You were always, of course, welcome to ask me on my talk page as to why I think or have written <whatever>. But clearly you preferred hostility and condescension, which is both ironic and, as I have remarked elsewhere, very stupid, in context.

Grow up. GatlingGunz (talk) 14:13, 20 January 2025 (UTC)Reply

I'm not sure you actually read my edit comment when you wrote the above. Your comment on Beekes was: "rm Beekes nonsense ("this is Pre-Greek because there's a nasal, even though the nasal is attested in Latin, Germanic, and Sanskrit as well")"

My point was that the nasal would have disappeared in Greek, so if there's an -ν- it needs to have another explanation than the Latin, Germanic and Sanskrit terms. What that explanation should be, is another discussion.

I think you had a somewhat similar misunderstanding here (the wording was indeed not very clear). Exarchus (talk) 23:34, 20 January 2025 (UTC)Reply

One case where I'm a bit baffled by Beekes is his statement at θύω (thúō, “to rush in, storm, rage”) that it is unclear what "to shake" (धूनोति (dhūnoti)) has to do with the Greek meaning. The semantic connection doesn't seem far-fetched at all to me (storms can 'shake' things, right?). I doubt sources can be found that follow Beekes in this. Exarchus (talk) 11:20, 22 January 2025 (UTC)Reply

French Wiktionary Word of the Year

Latest comment: 1 month ago3 comments2 people in discussion

Dear colleagues,

In the French Wiktionary, we have experimented this year our first top 10 words of the year!

It started on November 15th with a call to suggest words, without any specific methodology in mind, like an analysis of statistics of reading or anything. A dozen of people suggested about 50 words. Then in December, we had a vote with 30 participants and a simple result as a list. It wasn't perfect but it was not that complicated to do.

To my knowledge, it wasn't experimented yet in English Wiktionary, is it? If you want to try next year, I suggest you create an on-going draft to keep track of some new words during the year, it would make the selection easier. Also, having a meeting in person with seven Wiktionarian in December helped a lot. Finally, I am not hoping any echos in the press this year, but we may work to build something for 2025 and 2026, and I think we could be stronger together, if several editions of the Wiktionary project are organizing a similar initiative in parallel. So I invite you to try it too! Cheers Noé 12:33, 19 December 2024 (UTC)Reply

@Noé: there is currently an ongoing vote on whether to have a Word of the Year, and what that word should be. Currently, it looks like the vote will fail, as it failed last year. There doesn't seem to be enough support for the proposal at the English Wiktionary. — Sgconlaw (talk) 12:56, 19 December 2024 (UTC)Reply

Thanks for pointing this discussion, I missed it in November Beer parlour, and it was not called back in December. It is interesting to read the various opinions on this process and goals. I did not asked for a collective validation at first, I just started it and I realize now that it should have be nice to open a discussion first. Well, sometimes, it is hard to have pros and cons on something completely new, without having evaluation what words may be in the final list. Having two weeks to collect entries suggested by anyone and a simple vote with top 5 was, I think, was easier to manage that your way of doing it. I am not sure. Well, if someone want to discuss this idea next year, in October maybe, I would be glad to help with more feedback on our experimentation and media responses

Noé 13:20, 19 December 2024 (UTC)Reply

WT:TRANS

Latest comment: 1 month ago1 comment1 person in discussion

I don't understand what kind of situations this sentence refers to: "If there are multiple paraphrases in the target language for an English term but no direct translations, one such paraphrase may be provided after {{no equivalent translation}}." Template:no equivalent translation/documentation isn't helpful either. What is "potentially unidiomatic / sum-of-parts descriptive" supposed to mean? I want to know when this template should be used and when it shouldn't.

More generally, I am often in doubt about what to do when the most direct translation (with the same part of speech) isn't actually the best translation. I know I'm not the only one with this issue, because such tricky translations are currently most often left blank. Can we clarify the relevant sections? —Caoimhin ceallach (talk) 10:08, 20 December 2024 (UTC)Reply

Dobrujan Tatar language name

Latest comment: 1 month ago7 comments5 people in discussion

Hi there, the Dobrujan Tatar language doesn't have a separate language code. Therefore [crh-ro] is used in Wikis. But in Wiktionary there is only [crh] Crimean Tatar, and when I add a word in Dobruja Tatar it appears in Crimean Tatar categories. This is a problem, because the languages use different orthography and are not actually not so connected how it seems. Also there is the Category:Dobrujan Crimean Tatar, but this naming is wrong, it's Dobrujan Tatar. Would it be possible to use the code [crh-ro] Dobrujan Tatar, with Dobrujan Tatar categories? Zolgoyo (talk) 10:34, 20 December 2024 (UTC)Reply

Hello! I personally do not think we need a separate name space for Dobruja Tatar specifically, for the following reasons:

1. It is a dialect of Crimean Tatar (so assumes Ethnologue and Glottolog[3].)

2. From what I have seen, Wiktionary does not show dialects in their own name space, to give examples on Turkic languages we have:

Yenisei Kyrgyz (Old Turkic,) uses different letters altogether and would be illegible to someone familiar with the Orkhon script. Orthographical differences is not a big deal for inclusion.
Viryal and Anatri Chuvash represented as just 'Chuvash' (except in etymologies)
Various dialects of Turkish and Azerbaijani, all shown with a lb tag.
Kumandy, Kuu-Kizhi and Kyzyl dialects (which can be quite divergent at times) of Northern Altai are under the same name space.

and so on... Dobrujan Tatar would be best to be shown just by a lb tag, so like the rest.

3. There seems to be only one published dictionary[4] for this dialect ('Dobruca Kırım Tatar Ağzı Sözlüğü'), and the vocabulary is clearly reminiscent of the main Crimean Tatar one.

4. If Wiktionary added Dobrujan Tatar, then why shouldn't it add Nogai Tatar also? Spoken 10 km. north of the Dobrujan Tatar speakers with a far divergent lexicon?

However, this is my opinion. We really don't need this name space.

AmaçsızBirKişi (talk) 00:05, 22 December 2024 (UTC)Reply

We have Nogai as a distinct code, CAT:Nogai lemmas. And we do often have a separate code for varieties traditionally considered 'dialects', if this is found necessary to effectively document the variety. I can't speak for this specific case though. Thadh (talk) 01:03, 22 December 2024 (UTC)Reply

These Dobujan Tatar words are from a book, which is probably not so good for etymology. Not bad book, but be carefull. Check it out on Tomriga in the references to Taner Murat. He writes "Tomri - queen of Mesagetes, also known under Persian form Tahm-Rayis, Greek Tomiris.... from her name came the name for Dobruja province, Tomriga". The guy is obviously a Turkic nationalist from the parallel world where Massagetes are speaking Tatar and establish Dobruja. Tollef Salemann (talk) 01:39, 22 December 2024 (UTC)Reply

It seems like that. The dictionary there is not quite academic I figured.

Moreover, it seems like Dobrujan Tatar is just a descendant of a (relatively) larger 'Romanian Tatar' family[5]. This article also says how similar the Dobrujan Tatar dialect is to Crimean Tatar, saying how children use Crimean Tatar primers/reading books in schools.

There's also a poem, in Dobrujan Tatar, that's the extent I could find about this language[6].

AmaçsızBirKişi (talk) 09:32, 22 December 2024 (UTC)Reply

Note that the Nogai Tatar you speak about are probably quite different from Nogai lemmas listed in the category which Thadh speaks about. The Nogais of Dobruja are related to Nogais of Caucasus, but they have splitted up in 1850-60s because of the war with Russia. I mean, they have splitted even before it, but had some contacts until 1850-s. So their language are probably closer to the Crimean. Tollef Salemann (talk) 02:00, 22 December 2024 (UTC)Reply

Limited prior discussion: Wiktionary:Grease_pit/2023/November#Add_Category:Dobrujan_Tatar_language_to_the_relevant_language-related_modules_if_appropriate. Unfortunately, other than your own writings on other wikis, I am having a hard time finding evidence that Dobrujan Tatar is a separate language. I am trying to think if we have any editors who might be able to find relevant resources in other languages (Romanian?). - -sche (discuss) 06:12, 23 December 2024 (UTC)Reply

WT:TENNIS

Latest comment: 30 days ago14 comments8 people in discussion

It seems curious that tennis player, the archetype of the "Tennis player test", supposedly a test of idiomaticity, is itself listed not on its own merits, but only as a "translation hub". Does this make sense? Mihia (talk) 18:38, 22 December 2024 (UTC)Reply

Is WT:Idiom supposed to be a policy page? DCDuring (talk) 23:38, 22 December 2024 (UTC)Reply

No. Svārtava (t ɕ) 09:11, 23 December 2024 (UTC)Reply

It does, to be fair, say at the top of the page that "Tests can be used as guides during RFD, but they are not hard/fast rules", but, even so, one would expect the guidelines to at least apply to the examples given. Mihia (talk) 09:47, 23 December 2024 (UTC)Reply

The closing statement from the 2016 RFD is quite interesting. I wonder if there's more to the history of the ‘tennis player test,’ because this alone makes it pretty questionable. Seems THUB was the keep reason all along?

RFD kept as no consensus for deletion: ≥ 12 keep votes. Note that translation target was used often as the keeping rationale, while the "tennis player test" was rejected by multiple participants. Polomo47 (talk) 00:08, 23 December 2024 (UTC)Reply

The text was added by @Catonif, though I'm not sure why. I personally strongly support WT:TENNIS for its usefulness. AG202 (talk) 06:18, 23 December 2024 (UTC)Reply

The usefulness is limited by our incomplete coverage of names of professions. Where are emergency services dispatcher,^[7] franchise opening trainer^[8] and heavy equipment operator^[9]? --Lambiam 22:17, 23 December 2024 (UTC)Reply

Whoops, I wasn't aware of the policy when I did that, given the policy existence it would need be removed. But IMO the policy itself sounds pretty dubious, by its wording it would also allow professions such as turtle feeder or cookie taster. I would personally ditch the policy and keep tennis player for THUB. The test's paragraph itself claims its partial redundancy to THUB anyways. Catonif (talk) 08:51, 23 December 2024 (UTC)Reply

@Catonif: For the record, unvoted tests given at WT:IDIOM are not binding policy; only WT:COALMINE is since that is voted upon. Svārtava (t ɕ) 09:11, 23 December 2024 (UTC)Reply

It wouldn't hurt to make the distinction clear, preferably by having WT:COALMINE on a separate page, ONLY including it by reference, and placing a banner at the top of the WT:IDIOM page. DCDuring (talk) 13:30, 23 December 2024 (UTC)Reply

Noting that "COALMINE" is mentioned individually in the CFI. So is the "fried egg" test, which is also part of "WT:IDIOM", implying that that one is policy too, I suppose? I haven't checked all the others. Mihia (talk) 15:36, 23 December 2024 (UTC)Reply

I’m not that favorable to the test either. If almost all terms that qualify for it also qualify for THUB, then all it does is prevent us from adding (This sense is a translation hub). Is that desirable? I don’t think so, since the main reason for keeping them appears to be translation. Polomo47 (talk) 15:08, 23 December 2024 (UTC)Reply

I could be wrong, but I think the Tennis player test predates a consensus on keeping translation hubs. So it may have been a good workaround when it was first proposed, but it seems redundant now. Andrew Sheedy (talk) 16:06, 23 December 2024 (UTC)Reply

Yeah, exactly. Polomo47 (talk) 17:12, 23 December 2024 (UTC)Reply

Romance languages: reflexive verb forms and enclisis

Latest comment: 25 days ago14 comments4 people in discussion

This discussion is an offshoot from this RFM, which discusses reflexive verbs in Portuguese specifically. Said RFM in turn derives from this RFD discussion.

Currently, some Romance languages have a specific way of making entries for reflexive verbs; others do not have a pattern at all. Per @Benwing2, Spanish and Portuguese currently follow this scheme:

If a verb is only used reflexively
- It is listed at the page with an enclitical -se. See Portuguese automedicar-se and Spanish automedicarse
- The page without -se lists, for Spanish, that the word is only used with a proclitic pronoun; see Spanish automedicar. For Portuguese, the page without se usually does not exist.
If a verb has reflexive senses in addition to non-reflexive ones
- The reflexive verb is listed as a sense under the page without -se. See Portuguese suicidar, despedir; Spanish suicidar, despedir.
- The page with -se lists a stub reflexive of, or lists the entry as a combined form. See Portuguese suicidar-se, despedir-se; Spanish suicidarse, despedirse.

Some Portuguese editors complained about this arrangement a while ago. We proposed a new scheme in an RFM (linked above), but some editors felt the need for consistency with other Romance languages. Thus, this is a proposal on changing/standardizing how it works for most other Romance languages — the use of unhyphenated enclisis (despedirse vs. despedir-se) changes things slightly. For languages that do use a hyphen in their enclises, such as Catalan, a proposal closer to the one for Portuguese is more adequate.

The proposal, for languages with unhyphenated encliticals:

If entries exist for both the forms with -se and without it, they will get merged under the page without -se. The entry at the page with -se will list infinitive of verb combined with se.
If an entry exists only at the page with -se, it will be moved to the page without -se. In its place, the page will list infinitive of verb combined with se.

A brief list of applicable reasons. For more detail, please read the Portuguese RFM and RFD discussions (which also includes some unapplicable arguments).

It is inconsistent and confusing to list reflexive-only verbs at the page with -se, but list verbs with reflexive senses only at the page without -se.
Listing reflexive-only verbs at their enclitical forms implicitly prescribes the use of enclisis, but proclisis is just as valid and may even be used more often.
- By having the entry under the page with no -se, we could format its headword to include both forms. Like, automedicar-se or se automedicar
Among dictionaries, there is no consensus on what URL reflexive verbs get put under. The only consensus is that the headword includes the enclitical pronoun, which we can do regardless per the above.

Ping, for Italian: @Samubert96, Federico Falleti, Emanuele6, Catonif, Imetsia

Ping, for Spanish: @Ultimateria, AG202, Ser be etre shi, JeffDoozan, Orrigarmi, Brawlio, Jberkel

Ping remaining members from the Galician-Portuguese usergroup: @Davi6596, Faviola7, JnpoJuwan, MedK1, Ortsacordep, Rodrigo5260, Stríðsdrengur, Trooper57

Please ping other editors you know who may be interested in the discussion.

Polomo47 (talk) 00:01, 23 December 2024 (UTC)Reply

CC: @Benwing2: For Spanish, honestly, I'd match what the RAE does: if the verb is only used pronominally/reflexively, then they put the lemma at the version with -se. Ex: RAE entry for automedicarse. I really don't like the idea of putting "se" in the headword at the lemma without "se", especially when the page with "se" already exists. That seems to add a much higher level of inconsistency.

I also don't like the idea of moving entries like automedicarse to automedicar, as a learner familiar with Spanish is going to search for the latter one only to be redirected to the former, as the verb is only used pronominally. What we have now isn't my favorite way to go about things (I'd have the reflexive usages at the entries with -se, regardless of if the non-reflexive version exists), but it's better than having everything at the bare infinitive. There's also precedent, at least with Spanish. AG202 (talk) 03:00, 23 December 2024 (UTC)Reply

I also don't like the idea of moving entries like automedicarse to automedicar, as a learner familiar with Spanish is going to search for the latter one only to be redirected to the former. How so? My proposal is that we move the definitions over precisely to solve this type of issue.

Also, while the RAE categorizes URLs in that way, the RGL does not, and many Portuguese dictionaries don’t either. I don’t know about Italian, though. Polomo47 (talk) 03:56, 23 December 2024 (UTC)Reply

@Polomo47: Oops, I meant search for the former and be directed to the latter, sorry. Learners are more likely to search for the forms with "se" is what I wanted to say. AG202 (talk) 06:13, 23 December 2024 (UTC)Reply

Hm, I’m not confident that’s how people usually search for words. I would expect native speakers (even if we don’t particularly appeal to them) as well as more advanced learners to search without the enclitical. That’s what I do, at least — do others google differently? Polomo47 (talk) 15:04, 23 December 2024 (UTC)Reply

At least for Spanish, having been studying it since 2013, I've almost always seen learners search with the "se" form once they're aware of it, as that'll give them more direct hits, especially from learning websites. In pretty much every learner's text as well, they'll be listed as the "se" form in any vocabulary section. I personally still search that way as well. For (notably Brazilian) Portuguese, I'd expect the trends to be different, since the se forms aren't used as much. AG202 (talk) 17:38, 23 December 2024 (UTC)Reply

I actually find this to be very persuasive in Spanish's case. Having had some mild interactions with it over the years, it's very true that Spanish speakers just love their "se" forms. — comparatively, "lo" forms go essentially unused by Portuguese speakers around me.

While I'm really starting to think that 'it tracks' that reflexive clitics in Spanish are seen as more integral to the verb — and not necessarily because of the spelling — I can't help but wonder about other forms.

I hope I'm not bringing this up too early when we haven't even truly talked at length about the initial proposal, but do we really need pages for all the forms? This likely enters CFI territory, but I'd like to draw some attention to the non-reflexive forms. In Spanish medicar and mostrar, there's an entire table dedicated to combined forms, and yet I see several that might be missing?

Admittedly, I don't know a lot about Spanish, but one such form would be "medícote" — corresponding to "te medico" in proclisis — or something like "mostrárlela". Perhaps Spanish's rules forbid these pairings (tho I did get a hit for the latter), but Standard Galician's doesn't. — you'll find many hits for, say, quérote and mostrarlla online. There's even a TV program named Dígocho Eu.

I guess we could include all of these combinations (every single tense of many many verbs with nearly every single clitic tacked on afterward — me, te, che, vos, os, o, ma, mo, ta, to, cho, cha, lle, lles, nos, lla, llo, possibly a couple more), but I can't help but think it'd be a more productive use of our time to instead draw a line somewhere.. I'm getting some serious COALMINE conversation flashbacks right now. MedK1 (talk) 19:13, 23 December 2024 (UTC)Reply

@AG202 just in case the mobile reply button didn't actually ping you. MedK1 (talk) 21:28, 24 December 2024 (UTC)Reply

Sorry for the late reply, but yes, the "se" forms are integral to the verbs. However, forms like "medícote" are no longer standard usage in Spanish. Pronouns can only be attached afterwards to the gerund, infinitive, and imperative forms. AG202 (talk) 03:30, 29 December 2024 (UTC)Reply

Thanks for the CC.

It occurs to me there are various possibilities for the way reflexives are handled, and this may have some consideration on the ultimate outcome (please expand with other languages):

Reflexives are always enclitic, and written as part of the verb. Examples: East Slavic (Russian, Ukrainian, Belarusian, ...) and North Germanic (Icelandic, Swedish, Danish, Norwegian, Faroese, ...).
Reflexives are normally proclitic, including in particular on the infinitive, and written as a separate word. Examples: German, French, apparently also Romanian. (Clarifications: German reflexives sometimes come after the finite verb, particularly when the verb is in V2 constructions and in imperatives. French reflexives come after imperatives and are joined by a hyphen, and when coming before the verb are joined with an apostrophe if the verb is vowel-initial.)
Reflexives are sometimes proclitic, sometimes enclitic. AFAIK, all such languages have the reflexive pronoun enclitic on the infinitive.
1. When enclitic on the infinitive, the verb + reflexive is written as a single word. Examples: Spanish, Italian, Galician in standard spelling.
2. When enclitic on the infinitive, the reflexive is attached to the verb with a hyphen. Examples: Portuguese, Galician in reintegrationist spelling.
3. When enclitic on the infinitive, the reflexive is written as a separate word. Examples: West Slavic languages (Czech, Polish, ...), South Slavic languages (Bulgarian, Macedonian, ...).

I mention this because there is a lot of inconsistency in how reflexive verbs are lemmatized, and it may partially correlate with the way the reflexive infinitive is written.

Benwing2 (talk) 03:38, 23 December 2024 (UTC)Reply

AFAIK, all such languages have the reflexive pronoun enclitic on the infinitive. Is that really how it works? In the case of Portuguese, from what I gather automedicar-se is no more valid an infinitive than se automedicar — the former is just the preferred form used by dictionaries because (1) it’s a single word (2) it’s less predictable than proclisis (3) it’s something people generally like to prescribe, lol. I’ve yet to find another explanation for the preference for enclisis, but I have no reason to believe it’s because automedicar-se is the only possibility. Polomo47 (talk) 04:06, 23 December 2024 (UTC)Reply

Sorry, I meant to clarify that "all such languages have the reflexive pronoun enclitic on the infinitive" refers to how dictionaries express the forms. I know that Brazilian Portuguese, for example, leans towards proclisis in all cases and thus says vou me deitar, not #vou deitar-me. West Slavic languages similarly are very flexible in word order and sometimes have the reflexive pronoun before the infinitive and sometimes after, but all dictionaries I've seen lemmatize the reflexive pronoun after. In contrast, French dictionaries always list reflexive infinitives with the reflexive pronoun before, because it never comes after in actual usage. Benwing2 (talk) 05:15, 23 December 2024 (UTC)Reply

I mentioned above that Galician has a hundred forms (bare minimum) that Spanish completely lacks coverage for at the moment (potentially because they don't exist over there? I wouldn't know); you can mix and match any tense with any clitic for the most part.

It might be worth noting that for Portuguese, these countless forms exist as well, and often with more patterns — Galician roughly shares the European Portuguese rules prioritizing enclises, while for Portuguese, we have Brazilian Portuguese's proclises preferences to consider as well.

Since for Portuguese, they're framed as 'regional preferences' rather than the rules actively changing, you get far more possibilities than you would normally, all of them being SOP — you have either a separate word before, a separated suffix or a separated infix according to tense.

With Brazil liking proclises, the lemmatized enclitical can end up being quite rare in comparison to the proclitical ones. "Precisamos parar de automedicar-nos" even sounds weird in comparison to nos automedicar to me. You can have similar sentences for -te and others too. Do note that these are all considered impersonal infinitives (i.e. the ones that get lemmatized in Wiktionary).

For these and many, many other reasons, it stands to reason that one shouldn't include any of those clitical forms as separate pages for Portuguese at least. This doesn't necessarily mean anything for Spanish; more and more I'm thinking their systems are different beasts altogether and as such should be treated differently..

PS: Priberam at least does express proclitical forms for verbs. MedK1 (talk) 00:06, 24 December 2024 (UTC)Reply

I'll also add that PT-PT dictionaries seem to prefer the -se forms: entry for "arrepender-se" at O Dicionário da Língua Portuguesa & entry for "arrepender-se" at Infopédia.pt. AG202 (talk) 03:33, 29 December 2024 (UTC)Reply

Yiddish in Latin characters

Latest comment: 17 days ago14 comments6 people in discussion

Please lift the ban on including Yiddish terms attested in Latin characters. I know that writing it with other scripts is uncommon (except to assist beginners), but there are a few lengthy Yiddish works written mostly or entirely in the Latin script. Examples:

https://books.google.com/books?id=o_P6DQAAQBAJ

https://books.google.com/books?id=nrCYDwAAQBAJ

Probably even more. And in Cyrillic as well I guess? Also, I remember to own myself "Di Avantures fun Alis in Vunderland", having both Hebrew and Latin script in the same book. Tollef Salemann (talk) 02:20, 23 December 2024 (UTC)Reply

The Brill article confirms that Cyrillic is another script, yes, but it is the rarest of the three (and the only other script in which Yiddish is attested, as far as I'm aware). I welcome lifting the prohibition on that as well. Anyway, cheers for suggesting another source. (((Romanophile))) ♞ (contributions) 04:06, 23 December 2024 (UTC)Reply

Does the Wiktionary's transliteration of Yiddish terms match the spelling used in these books? I tried searching for some random words from the "Di Avantures fun Alis in Vunderland" preview sample and successfully found the relevant Yiddish entries on Wiktionary. Is this not good enough for the end users? --Ssvb (talk) 05:16, 23 December 2024 (UTC)Reply

Usually they do match, though historically Romanizations of Yiddish have varied in form and consistency. In any case, utility is not the motive here. We already have Romanization entries for Chinese, Japanese, and Serbo-Croatian. I doubt that a proposal to delete them would succeed on grounds that there are already transliterations in the main entries, thereby making the Romanization entries 'redundant'. (((Romanophile))) ♞ (contributions) 06:31, 23 December 2024 (UTC)Reply

I wouldn't advocate deleting them. Just creating additional Latin script entries and keeping them in sync with the Hebrew script entries is an extra maintenance effort. If contributors are ready to spend their time and efforts on that, then it's fine. If attestable Latin spelling of some terms encountered in real books differs from the transliteration of their corresponding Hebrew script entries, then these can be probably prioritized. --Ssvb (talk) 09:11, 23 December 2024 (UTC)Reply

Yes. This has been requested at least twice in the past year, once by me at Wiktionary:Beer_parlour/2024/April#Latin-script_Yiddish and once after that by someone else somewhere else... but although there seems to be support for at least allowing Latin-script entries to point to the Hebrew-script entries, like is done for Arabic-script Afrikaans (pointing to Latin-script Afrikaans) (or, in a different vein, for Latin-script Gothic), neither I nor anyone else has gotten around to it yet. Well: unless there are objections, I will finally add "Latn" as another script to yi in, say, a week (ping me if I forget), with the understanding that Hebrew script will continue to be lemmatized at least in most cases. - -sche (discuss) 06:28, 23 December 2024 (UTC)Reply

I personally favor a treatment like Serbo-Croatian where both scripts are lemmatic, but I won't feel devastated if we treat the Latin script as secondary to the Hebrew one either. You may want to include the Cyrillic script as another option, too, though I don't have examples on hand. (Yiddish's cousin Ladino is more my field of expertise. Or should I say Spanish Yiddish?) (((Romanophile))) ♞ (contributions) 06:42, 23 December 2024 (UTC)Reply

For Japanese we have such entries as

jiyaku — Rōmaji transcription of じやく

Is there a reason not to use a similar approach for Yiddish? --Lambiam 21:15, 23 December 2024 (UTC)Reply

My understanding is that this is the intention, yes; in the April discussion, Benwing proposed using {{spelling of}}, which would look like this. @Romanophile, if at some point in the future we have the ability to lemmatize two different scripts/spellings without them falling out of sync (e.g. via them both "transcluding", with "smart" changes, some underlying central backend page), I would support "double-lemmatizing" a great many things, but for now it would just lead to duplication. - -sche (discuss) 16:52, 26 December 2024 (UTC)Reply

If it's just a stripped down soft redirect entry, then the required maintenance effort is low. BTW, does it need a declension table? And what would be the right place for book quotations in Latin script? I'm interested in this topic, because many of the same guidelines would probably also apply to Belarusian Łacinka, like the horny entry. --Ssvb (talk) 17:33, 26 December 2024 (UTC)Reply

As Yiddish has been a contemporary of Early New High German, there needed to be Yiddish text in blackletter, and certain Germanists on the continent regularly deal with these Early Modern equivalents, but from the perspective of Anglos it is a suppressed blind spot: fractura est, non legitur. We have to cover Yiddish in Latin script like we include Hebrew spellings of Arabic language as Judeo-Arabic. The current Hebrew-written standard is just a later Ausbausprache like Luxembourgish, but unlike Luxembourgish, which is within the ballpark of another broader dialect (Category:Central Franconian language), Yiddish, due to ethnic and cultural separation, always was a distinct dialect, though the Middle High German beginnings are difficult to oversee, of course. So I don’t see how it was ever banned, only a skewed perspective; more parsimoniously one may observe an oversight in the language data, which until now only lists Hebrew script for Yiddish, factually wrong. A few times I also added Serbo-Croatian terms in Arabic script only to be annoying, without any preference for it and without believing it to be prohibited, only that rendering is faster if we only check Latin and Cyrillic script. Fay Freak (talk) 17:16, 26 December 2024 (UTC)Reply

Did you add "Latn" as another script to yi yet? (((Romanophile))) ♞ (contributions) 18:04, 4 January 2025 (UTC)Reply

Thanks for the reminder;

Done. - -sche (discuss) 06:24, 6 January 2025 (UTC)Reply

Dutch defective verbs

Latest comment: 27 days ago8 comments3 people in discussion

(Notifying Mnemosientje, Lingo Bingo Dingo, Azertus, Alexis Jazz, DrJos): I am working on an update of the Dutch verb conjugation module, and in that I came across the issue of how to handle defective verbs. These are verbs that act like they have a separable part, but are (generally) not actually separable.

I usually use woordenlijst.org for checking Dutch conjugation, and it seems two distinguish two types of defective verbs. The first is verbs like herinvoeren, for which the subordinate clause form is given, but the main clause omitted. The second is verbs like zakkenrollen, for which only the infinitive and present participle is given. However, searching online, it seems that in actual usage, the second type is actually used exactly like the first type (i.e., forms like zakkenrolt and zakkenrolde are attestable). I added the option to specify these types of verbs through a parameter |subonly= (see the bottom of the page at User:Stujul/test-nl-conj).

My main question is about how to categorise these verbs. Currently there are two categories for these verbs: Cat:Dutch defective verbs and Cat:Dutch uninflected verbs. The first is added manually and the second is added by a parameter in the headword template {{nl-verb}}. These should definitely be merged. But should the two types of defective verb I mentioned be categorised separately as different subcategories, because the forms of the second one are nonstandard?

I hope to hear your opinions on this.

PS - sorry if this not the appropriate place for this discussion.

Stujul (talk) 13:36, 23 December 2024 (UTC)Reply

If forms of zakkenrollen are missing, might it be the woordenlijst that is defective? In the conjugation table on the Dutch Wiktionary all seem to be present, although the subjunctive currently seems unattestable. Here, for example, is a use of gezakkenrold, and here of finite zakkenrollen in a main clause. Is it not just like stofzuigen (not only semantically, but also grammatically)? --Lambiam 21:07, 23 December 2024 (UTC)Reply

Maybe zakkenrollen was a bad example. It seems indeed to be used more like stofzuigen. This may have to do with the fact that rollen is a weak verb. For example geboogschiet and gelipleest return far fewer results than respectively booggeschoten and lipgelezen. About the Dutch Wiktionary's approach: I found a list of such verbs and most are listed as fully defective there. liplezen gives the main clause forms in parentheses, and on the main page gives a note that these forms appear sporadically. I also note that some verbs that you may expect to fall into this category are actually given as complete verbs on woordenlijst.org, e.g. hartenjagen.

It may just come down to a case to case analysis, but it would be nice to have a standard approach when dealing with such verbs, as we are currently very inconsistent with it.

Stujul (talk) 10:12, 24 December 2024 (UTC)Reply

Gelipleest is orthographically wrong anyway; /ɣəˈlɪp.leːst/ should be written as gelipleesd. But liplezen is one of the entries on this list of defective verbs.

We are not prescriptive; shouldn’t three properly attestable uses of forms like gelipleesd or lipgelezen trump any lists and suffice for including these forms (with a note warning that they are not generally accepted)? Here are two uses “in the wild” of lipleesde: [10], [11]. --Lambiam 11:26, 24 December 2024 (UTC)Reply

Sure, we are not prescriptive, and three attestable uses do merit an entry for these forms, I don't disagree with that. But I'm not sure whether we should include these forms in the conjugation table on the lemma entry. You can find many "in the wild" uses of "ik leesde", but we don't include that form in the table at lezen. Of course, in that case, there is a clear "correct" and "incorrect" form, while for liplezen, there isn't a "correct"/"standard" form we can point to (should it be lipleesde, liplas, las lip,...).

Stujul (talk) 11:57, 24 December 2024 (UTC)Reply

I see that the Dutch Wiktionary happely presents the unsplit conjugated form ik herindeel and the split form ik breng heruit. Both feel wrong to me; are these acceptable? --Lambiam 21:58, 23 December 2024 (UTC)Reply

The Dutch Wiktionary is again inconsistent in this regard: indeed heruitbrengen is conjugated as a normal separable verb, herinvoeren gives an alternative construction "ik voer opnieuw in", and heruitzenden just leaves the main clause forms empty.

Both these forms that you gave also feel wrong to me.

Stujul (talk) 10:22, 24 December 2024 (UTC)Reply

I'm amazed that I was completely unaware that these kind of verbs existed. Thinking about it I would indeed categorise them as defective, as the woordenlijst does. If you put a gun to my head I might indeed say "ik zakkenrol" or "ik herindeel", like other speakers, but they still don't feel quite right. My intuition is that these forms which can be sporadically attested are ad-hoc formations. Some standard strategy to deal with these in the language may crystalize at some point, but the fact that everyone feels unsure about them shows that it hasn't yet. —Caoimhin ceallach (talk) 18:02, 26 December 2024 (UTC)Reply

Extended Mover Request: User:AG202

Latest comment: 30 days ago4 comments3 people in discussion

Hi, I'd like to request extended mover rights, mainly to be able to fix issues like tones in entry titles where they're not supposed to be, such as with Igbo ákpị̀, per WT:About Igbo AG202 (talk) 18:21, 23 December 2024 (UTC)Reply

@AG202 Done. Benwing2 (talk) 21:35, 23 December 2024 (UTC)Reply

Thank you!!! AG202 (talk) 23:22, 23 December 2024 (UTC)Reply

@Benwing2: For the record, the process is WT:WL, see WT:Extended movers. Svārtava (t ɕ) 04:46, 24 December 2024 (UTC)Reply

Username pronunciations

Latest comment: 24 days ago2 comments2 people in discussion

Hello,

There is a new subpage for username pronunciations called User:Flame, not lame/Username pronunciations.

Thank you Flame, not lame (Don't talk to me.) 19:54, 25 December 2024 (UTC)Reply

Love the page! Polomo47 (talk) 17:08, 29 December 2024 (UTC)Reply

jive talk

Latest comment: 27 days ago1 comment1 person in discussion

We should categorise jive talk, like frolic pad, there's probs some good stuff in this website P. Sovjunk (talk) 23:37, 26 December 2024 (UTC)Reply

Hebrew transliteration

Latest comment: 11 days ago18 comments10 people in discussion

I'm probably not the first person to ask this, and I likely won't be the last: but what is the reason for Wiktionary to use conventional Israeli romanization (i.e. based on colloquial Israeli Jewish pronunciation) over something more narrow and scholarly like ISO 259? Narrower transliterations have a lot of bells and whistles, sure, but I think they still do a good job at being a compromise between various historical, regional and cultural variants of Hebrew. Why should ⟨צ⟩ be written as "ts" when that's not how Yemenite or Sephardic Jews pronounce it, or how it was historically pronounced during Biblical and Classical times? Why should ⟨ח⟩ and non-geminated ⟨כ⟩ be rendered both as ⟨kh⟩ when this merger pretty much only happens in Israeli Hebrew, while every other dialect still distinguishes the two? Why should ⟨א⟩ and ⟨ע⟩ not be rendered at all when, even inside Israel, some Jews do pronounce them? Even if Israeli Hebrew is the de facto standard dialect these days, the common transliteration isn't even the de jure standard, that would be the Hebrew Academy's, which is slightly different. I understand Hebrew is a living language, but if you're like me, a non-Jewish non-Israeli who has a mostly academic historical linguistic interest in Hebrew, the modern Israeli transliteration is just not very useful. Sure, it's more "phonetically accurate" (as discussed, for a single dialect anyway), but isn't that what the IPA section is for?

Obviously we'd have to agree on the details of the transliteration, and I have my opinions on the specifics, but overall, I think a narrower transliteration would make much more sense. It would also likely allow us to begin some sort of automatic transliteration template that languages like Russian, Arabic and Greek have got going on. Pescavelho (talk) 15:55, 27 December 2024 (UTC)Reply

No good reason, sure, only catering to cognitive biases of majorities. The thought of continuing to use your English keyboard without any acquired extra characters is just too appealing.

In recent months, I have increasingly succeeded to see through the grievances of the world as being the consequences of neurotypicals splitting up the world, they ever imagine, into social relations: what is relevant in the present context (see it again!), for this reason, is that they fail to imagine capable keyboard layouts or input methods, and rather configure six different keyboard layouts if they know French, Spanish, Romanian, Turkish and German, for instance, in addition to English, rather than to use the international version of any of these layouts, or a Unicode search made accessible on their machine for the very occasional but recurring goal of transcribing certain foreign phonemes faithfully.

Engaging the habit learning circuitry of the brain to switch to a more convenient, even if less intuitive (according to neurotypical cognitive biases), input setup would be easy though: it is just excusable, not defensible, not to switch to us(intl) or de(deadtilde) from us(basic) (in /usr/share/X11/xkb/symbols/), and many neurotypicals editing this dictionary or similar academic works already succumbed to this which is reasonable. I also use the actual Russian layout, with extensions, ru(prxn), for all Cyrillic languages, when my neurotypical bro is ticked off by it because its assignments do not phonetically correspond to the ones on the standard German layout—all being invented by someone around 1900 and hence carried forward, few ever questioning it, the social pressure to type the same layout with “ten fingers” is too high.

One just has to look up which combination can be utilized to get bonus characters, and repeat until one does not need to expend notable brainpower for it. Juggling multiple languages to maintain polyglotism is a context where one needs bonus characters, like it or not (everyone shall like it, following the adapt neuroscientific recipe). Fay Freak (talk) 16:33, 27 December 2024 (UTC)Reply

Is the point here that it's "too cumbersome to type"? That feels subjective, some people would feel like setting up all the templates an average Wiktionary page uses is rather cumbersome (I've certainly felt so at times). In any given case, I'm hoping the adoption of a narrower transliteration would go hand-in-hand with automated transliteration, so this concern would be null and void. Pescavelho (talk) 21:38, 27 December 2024 (UTC)Reply

@Fay Freak Talking about /usr/share/X11/xkb/symbols/, I've basically written my own keyboard layouts for a bunch of scripts, with the general idea of a ± correspondence to Azerty (rewriting them for Qwerty would be trivial). I've also added diacritics and IPA symbols to my Latin keyboard. Exarchus (talk) 20:19, 4 January 2025 (UTC)Reply

@Exarchus: From 2020, Red Hat developer Peter Hutterer enabled the coming decades to have custom layouts right out of the X Input System. I have not tried it. I already just had designed mine with everyone in mind and got them merged; and after weeks of tingling in 2017 I was like: snap, computers are above my paygrade, yet I had to study law. So, after pushing the Ugaritic layout, I still have, by reason that I abhorred to change the merge request, Old South Arabian and Nabataean layouts lying around on Github. Later someone branched the Ugaritic out into a separate file ancient, I see just now, so they still are up to be added there if someone fielding these languages tests them and considers them satisfying – I came to wonder if I had to decide about the designs of keyboard layouts alone, without feedback of people who would use them, since man sorely needed to be autistic for that feat. Indeed for that project I already license them, in case nothing ever happens, though there wasn’t much originality in superimposing other Semitic alphabets on the preexisting Arabic one (which I also designed in its current version so everyone on Linux and BSD got BiDi control characters not only theoretically and all). Somebody is using all this stuff, I see, if only because someone added a QWERTY version to the QWERTZ IPA layout I created in the file trans (/z/ more common than /y/ across languages), but I have no statistics whatsoever due to all the freedom and no tracking on the free desktops. 😂 Fay Freak (talk) 22:08, 4 January 2025 (UTC)Reply

What I find very useful, definitely for unicameral scripts, is to use Caps as "ISO_Level3_Latch". Exarchus (talk) 23:26, 4 January 2025 (UTC)Reply

I'm also not very happy about the transliteration situation for Hebrew. I don't edit it enough to have much sway in that sphere, but I would like to see a transliteration system that is actually transliteration and not transcription of a certain dialect that I am only marginally interested in. Andrew Sheedy (talk) 22:30, 27 December 2024 (UTC)Reply

(Lurker/new Hebrew editor. I've read some of the past discussions on this topic.) I would prefer to see both Israeli and Biblical/liturgical/scholarly transcriptions next to each other (except contexts where one of them is irrelevant, of course), ideally (somewhat) automated by a module. This would satisfy both main Hebrew user bases. It's my understanding that a lot of work has already been done on automatic transliteration; it's about time it should be deployed, so we can iterate and check edge cases. I appreciate those still adding (inconsistent) manual scholarly transliterations in 2024, but think it may be useless once we apply the module. Contra the above replies (and I don't understand/ignore/tldr whatever the fuck fay freak wrote), I am satisfied with the gist of the status quo Israeli transliteration system, and generally am not convinced that one-to-one reversibility is a major virtue (compared to, say, readability and not being laden with diacritics); but I'll sooner take any reasonable automated module finally being made widespread over continued bikeshedding of the exact romanization scheme. Hftf (talk) 11:06, 28 December 2024 (UTC)Reply

@Pescavelho I agree with you, but Neo-Hebrew editors will never agree. They have no understanding for the perspective and needs of people like you and me, who are only interested in Hebrew from a historical point of view. Unfortunately in my experience, they are incredibly biased and obtuse. Here a discussion we had in the past:

Hebrew transliteration – time to clear the mess

— Sartma ^{【𒁾𒁉 ● 𒊭 𒌑𒊑𒀉𒁲】} 19:42, 28 December 2024 (UTC)Reply

Seems like the main argument is that, if you are someone who is only interested in modern Hebrew, then a narrow transliteration isn't helpful. OK?... What if one isn't just interested in modern Hebrew? Then the modern Hebrew transcription is probably less than useless. I feel like there's a bigger case for having only a narrow transliteration over a conventional transcription, given that modern Hebrew mostly experienced mergers rather than splits compared to Tiberian Hebrew (which is the de facto standard Hebrew orthography), so you can just ignore half of the diacritics and you're basically left with modern Hebrew, but there'd be nothing wrong with having both systems side-by-side. And again, pronunciation is what the IPA section is there for.

Personally, ISO 259, with a few modifications, would be my go-to system. I am willing to provide reasoning each of the modifications in question, and, if we decide to go ahead with the transliteration system, it will be these modifications we'll spend the most time arguing about. (the biggest issue will undoubtedly be the vowels) Pescavelho (talk) 15:45, 29 December 2024 (UTC)Reply

Wiktionary's transliteration of Hebrew has been discussed (and disputed) a lot over the years (search the archives of this page for various discussions). One idea which seems to me to have been gaining support is, as mentioned above, to have two transliterations, one scholarly and oriented to representing the distinctions of Hebrew script / Biblical Hebrew, beside the current one that is oriented to representing the modern (Israeli) Hebrew pronunciation. A two-translit approach would also help with certain other languages where some people want a transliteration that reproduces the distinctions of the original script, and other people want a transliteration that hints at the pronunciation in the manner of a simplified version of enPR or IPA. (The second group thinks of the first group: if you want to know the distinctions of the original script, why not just learn the original script? The first group thinks of the second: if you want a pronunciation, why not provide a pronunciation, rather than putting an ambiguous respelling in the transliteration parameter?) Having seen how consistently scholarly/"Biblical" transliteration is something people want, I support adding it. - -sche (discuss) 17:21, 29 December 2024 (UTC)Reply

Not an editor of Hebrew, but someone with casual interest in the topic/who reads Hebrew entries - I ultimately agree with those who mentioned multiple romanisations/transliterations being given and I would also agree that it would be best for this process to be automated.

There remains the question perhaps of 'which should be the default' if there is a toggle in whatever template is being worked on, though with only two romanisations only it seems likely that there is no need for such a toggle.

I could see there being strong feelings and arguments going both ways if a default must be chosen - my preference is for maximum reversibility, but as is evident given the chosen default for Korean, this may not be shared by the majority. (Cf. the chosen scheme for Arabic romansiation.) Herthaz (talk) 20:52, 7 January 2025 (UTC)Reply

Just one more unofficial vote here for a system that reflects the precise spelling and (therefore, more or less) the Biblical pronunciation. I don't work on Hebrew here, but when I look it up, I want to know how it was around 500 BCE and related to Arabic or Afro-Asiatic, not modern Israeli sieved through Ashkenazic. But this is probably selfish: most people who use this site probably do want Israeli, and those of us with philological interests presumably know enough to work backwards. So not a strong demand or vote, just a voice in favour of using the letter 'q' in words for "kill" because it can't hurt. Hiztegilari (talk) 22:37, 7 January 2025 (UTC)Reply

Honestly, I don't think we have to include the conventional Israeli romanization, knowing how Israelis pronounce a given word is what the IPA section is there for. That being said, I think the system proposed by @Sartma, which is similar to the one used in pages making use of cuneiform, is a good compromise. Pescavelho (talk) 23:29, 7 January 2025 (UTC)Reply

@Pescavelho: I also don't think we have to include conventional Israeli romanisation. We could just follow the example of Modern Greek. See for instance οικογένεια (oikogéneia), transliterated oikogéneia but pronounced ikogénia /i.koˈʝe.ni.a/. That being said, I do understand that someone mainly interested in Neo-Hebrew would prefer a romanisation to a transliteration. I repeatedly tried to propose splitting Classical Hebrew from Neo-Hebrew, since in my view it's the only thing that would make Hebrew entries so much neater and less cluttered, but the majority here seems to abhor the idea. If we split the two languages, we could give transcriptions for Classical Hebrew and normalisations for Neo-Hebrew, plus numerous other improvements — from less cluttered headword lines (no need to give alternative forms), to more relevant references, &c.

As for the the transliteration system, I obviously have a preference for one of my own systems, but I'd be happy with whatever as long as it is automatised. For reference, here is a summary of transliteration and romanisation proposals by @Erutuon and me: Hebrew transliteration. — Sartma ^{【𒁾𒁉 ● 𒊭 𒌑𒊑𒀉𒁲】} 22:01, 11 January 2025 (UTC)Reply

Splitting Classical Hebrew and Modern Hebrew doesn't make much sense because Classical Hebrew and Modern Hebrew are both based on the same written standard (Tiberian Hebrew), just with phonological and grammatical differences. Old Hebrew (i.e. Hebrew written in the Palaeo-Hebrew alphabet) and Samaritan Hebrew make more sense as separate categories however since they use different scripts, however.

For the record I also think that having Arabic dialects separate from each other doesn't make much sense either (I was wondering if there was any (re?)ignition of that discourse ever since North and South Levantine Arabic were merged in ISO).

I also have my own system for transliterating Hebrew, I'd be willing to post when/where appropriate. Pescavelho (talk) 23:09, 11 January 2025 (UTC)Reply

I would be strongly opposed to splitting Classical Hebrew and modern Hebrew into two L2's, but I am all for having two transliterations as @-sche mentions. There is demand for doing this as well for e.g. Persian and probably several other languages, and it is something I can probably implement the underlying support for without an enormous effort (although I would need to survey the landscape to see what it would actually involve). @Sartma The modern Greek transliteration is quite controversial here at Wiktionary so I would not take that as a good precedent; here too, if people want a scholarly transliteration I would suggest two different transliterations, one scholarly and one pronunciation-based, rather than the current half-ass compromise we have. @Pescavelho There is general agreement that North and South Levantine Arabic need to be merged, but not currently the technical know-how available to help with it; the person who got the two merged in ISO was here for awhile and offered to help, but when they realized it is a big task, they bowed out and said they didn't have time. What this and other similar cases show is that splitting is a lot easier than merging, so we should be careful when we propose splits. More generally I agree that we need fewer Arabic L2's, possibly only one. The decision to split them was made a long long time ago without much thought, simply based on the ISO classification, and there has never been the will to re-merge, since it will take significant effort, both politically in terms of getting an agreement (if it's even possible) and technically in terms of actual implementation. Benwing2 (talk) 00:08, 12 January 2025 (UTC)Reply

Correcting myself a bit here: it's incorrect to say that Classical Hebrew and Modern Hebrew are both based on the same standard. Rather, Classical Hebrew wasn't a standardized language, Tiberian spelling being a post hoc standardization quite a handful of centuries after Hebrew was no longer spoken as a first language, that was nonetheless retroactively applied to texts like the Pentateuch and the Talmud (in fact, "preserving" as much of the "Classical Hebrew pronunciation" as was feasible before it was too late was the entire point of the vocalization). Modern Hebrew "full spelling" (mostly used in dictionaries and learning materials) continues to nonetheless be based on Tiberian vocalization in spite of modern (especially Israeli) Hebrew having gone quite a few phonological shifts and mergers that make many Tiberian diacritics superfluous (a language that continues to use a spelling based on how it was pronounced centuries ago... could you imagine that?)

In this sense, both the Torah and any given Israeli children's book are both written in the same orthographic standard (with slightly different grammar and vocabulary granted, but that's a different can of worms), so there's no need for splitting the two languages. Classical pronunciation can be marked in the IPA section (potentially alongside Ashkenazi, Yemenite, Sephardic and obviously modern Israeli pronunciations) and archaic grammatical forms can have a little disclaimer in the conjugation tables saying pointing out these only really appear in the Bible and artsy-fartsy works of modern literature.

Regarding Arabic: at the risk of veering a bit into something that merits its own discussion, I think it'd be much more practical to have it be like Portuguese where the dialects are treated as a single language but there are separate regional IPA pronunciations and semantic definitions. It's kind of baffling having to scroll through an entire page (it's Arabic! It's right near the beginning of any given language list!! I shouldn't have to scroll at all!!!!!) because it turns out a definition of a normal Arabic word used across much of the Arab world for some reason only has a "South Levantine Arabic" entry. I also can't really weigh in on Persian but I always found it weird that at least Tajik wasn't considered its own language in Wiktionary, but that is out of my scope.Pescavelho (talk) 01:12, 12 January 2025 (UTC)Reply

Adjective definitions

Latest comment: 25 days ago2 comments2 people in discussion

E.g.:

Whose first and last vertices are different.
That ends in a vowel.

My feeling is that adjectival definitions of this style seem old-fashioned or cryptic, and are potentially difficult for modern readers to understand. I would change them where I see them to e.g. "Ending in a vowel", but does anyone else have an opinion? Mihia (talk) 20:53, 27 December 2024 (UTC)Reply

I agree about "that ends in a vowel" and similar; I would definitely change to "ending in a vowel". The first definition "whose first and last vertices are different" seems OK; paraphrasing an "open polyline" as "a polyline whose first and last vertices are different" seems fine to me. You could change it if you want to "having different first and last vertices", which seems about the same in terms of understandability. Benwing2 (talk) 02:20, 29 December 2024 (UTC)Reply

Stress over hyphens (-́)

Latest comment: 13 days ago7 comments3 people in discussion

The official Spanish orthography mandates stresses over hyphens for compositional elements stressed on the immediate previous syllable, such as -́fobo (-phobe).

However, this rendering is not available in the English wiktionary entry (-́fobo, which does not even mention that it is stress-attracting...). JMGN (talk) 23:15, 30 December 2024 (UTC)Reply

I see no need to use this in the page title, but it makes sense to me to use it in the entry itself.--Urszag (talk) 23:34, 30 December 2024 (UTC)Reply

@Urszag: Should I make a formal proposition for vote? JMGN (talk) 21:23, 3 January 2025 (UTC)Reply

I think these pages are fine the way they are. I'd say this is an issue of stylization and not "mandates" (we're not beholden to the RAE anyway because we're not a prescriptive dictionary). Affixes are rarely written on their own, so it's hard to say that they "should" be written one way or another. Under the current titles, they can be typed and searched for easily; I think that's the biggest consideration. Ultimateria (talk) 23:33, 8 January 2025 (UTC)Reply

@Ultimateria: But as @Urszag: pointed out, it makes sense to use them in the entries themselves to provide, succintly, such important pronunciation info. that entries currently lack. JMGN (talk) 04:22, 9 January 2025 (UTC)Reply

While I don't find it strictly necessary, I wouldn't object to that. Ultimateria (talk) 17:30, 9 January 2025 (UTC)Reply

@Ultimateria: How's stress placement of affixes not necessary? smh ... JMGN (talk) 18:29, 9 January 2025 (UTC)Reply

Standardizing Alternative scripts heading for Pali and Sanskrit

Latest comment: 11 days ago26 comments5 people in discussion

Sanskrit and Pali show multiple a word on an entry in multiple script by using {{sa-alt}} and {{pi-alt}} which are inconsistently placed sometimes below the heading ===Alternative scripts=== or ===Alternative forms===. I propose standardizing this to ==Alternative scripts===, to be placed above ==Alternative forms===.

It's already used on 6000+ entries
Alternative scripts is much neater especially when there are real variants/alternative forms of the word too (e.g. at लघु (laghu))
Otherwise also it is nice to keep Alternative forms reserved for real variants instead of using it for transliterations of the same word in different scripts.

Per Wiktionary:Entry layout#Flexibility, Wiktionary:Entry layout#List of headings: The list below is not an exclusive list; other headings may be essential in some circumstances, Wiktionary:Entry layout#Variations for languages other than English: Some languages do have characteristics that require variation from the standard format. For links to these variations see Wiktionary:Language considerations. So, I think it is helpful to standardize ==Alternative scripts=== for these languages and add this to WT:About Pali and WT:About Sanskrit. This will result in consistency and the header not being flagged as incorrect/error. Svārtava (t ɕ) 14:46, 31 December 2024 (UTC)Reply

I thought it was obvious, but evidently not. Using "alternative form" for the same exact word written in a different script makes no sense. That header should be reserved for actual alternative forms/variants. -- 𝘗𝘶𝘭𝘪 𝘮𝘢𝘪𝘺𝘪^{(𝘵𝘢𝘭𝘬)} 02:54, 1 January 2025 (UTC)Reply

"Alternative forms" is what is prescribed in WT:EL. It also covers the case of alternative writing systems in the same script (most notably Thai, but also the Myanmar, Tai Tham and Lao scripts. Also, how should we handle what are significantly different forms in each writing systems? There is a recent example at sakkoti, with 4 different forms in the Roman script, each handled by its own invocation of {{pi-alt}}. There are also cases where the Lao-repertoire Lao writing system doesn't distinguish forms that other writing systems do, particularly cases of variable reflection, as in pāhuṇeyya. And there are occasionally multiple forms only for the Latin script (e.g. kat'añjalin, whose impact is reduced by categorising it as a misspelling), merging these into one call, the redesign of the parameter list would need some thinking about. (There's no academic interest in distinctively transliterating the corresponding Thai script form, กัตอัญชะลิน (katañjalin), which is currently not classified as a spelling error. The banning of certain Roman script writing systems for Pali should also be reconsidered.

If we're going to change the heading, which I oppose, we should consider 'Alternative writing systems'. --RichardW57 (talk) 14:59, 8 January 2025 (UTC)Reply

As written in the initial post, WT:EL allows some flexibility for languages if there is sufficient need - I can't think of a language more worthier Sanskrit and Pali to be allowed a header for ===Alternative scripts=== for the numerous scripts they are/were written in.
The proposed format for sakkoti would be as follows:

Alternative scripts

sakkoti (Latin script)
𑀲𑀓𑁆𑀓𑁄𑀢𑀺 (Brahmi script)
सक्कोति (Devanagari script)
সক্কোতি (Bengali script)
සක‍්කොති (Sinhalese script)
သက္ကောတိ or သၵ္ၵေႃတိ or သၵ်ၵေႃတိ (Burmese script)
สกฺโกติ or สักโกติ (Thai script)
ᩈᨠ᩠ᨠᩮᩣᨲᩥ (Tai Tham script)
ສກ຺ໂກຕິ or ສັກໂກຕິ (Lao script)
សក្កោតិ (Khmer script)
𑄥𑄇𑄴𑄇𑄮𑄖𑄨 (Chakma script)

Alternative forms

sakkati, sakkuṇoti, sakkuṇāti

I don't see any issue with this as the alternative scripts for the terms listed in ===Alternative forms=== would be findable at their respective entries.

Could you point out specifically what issue the proposed format would cause at pāhuṇeyya or kat'añjalin?
As for Alternative writing systems, that is the same in meaning as Alternative scripts but a longer, three-word, less used alternate of it.

Svārtava (t ɕ) 17:39, 8 January 2025 (UTC)Reply

@Svartava: At present, we have yet to automate Lao-repertoire Lao script Pali (my name for Pali using only the letters of the original Unicode encoding for Lao). When we do, we would inevitably find ປາຫຸເນຢຢະ (pāhuneyya) listed twice on the page for ປາຫຸເຓຍຍະ (pāhuṇeyya), once under 'alternative scripts' for your labelling preference, and once under alternative forms, as both forms (in your parlance) are internationally well established. I would prefer to bite the bullet and distinguish two flavours for |Latn2=, one for Latin-only forms and another for propagation to the other scripts. Perhaps the latter usage should be |alt2=.

As you can't be bothered to update documentation for templates, e.g. for {{pi-alt}}, when you change them, I may be wrong, but we have no mechanism to record the forms of kat'añjalin in the other scripts. This odd form originates in the Thai script, has been copied into at least one Roman script publication, and seems to have spread no further, so does not have a Sinhalese script form. Its page currently uses

{{pi-alt|Latn=katañjalin|Latn2=kat'añjalin|Thai=กตญฺชลินฺ|Thai2=กัตอัญชะลิน|Thai3=กะตัญชะลิน}}

. |Latn2= is redundant in this entry; it was intended for propagation to other pages before I realised that documented spelling mistakes didn't need such propagation. We would have to extend the template, or more precisely its supporting module, to say that a form didn't occur in a certain script. That would also apply to another well-established Thai spelling mistake, สะวากขาตะ (savākkhāta), if we decided that its Roman script form was a well-established, as opposed to infrequent, spelling mistake.

In knowledgeable usage, 'writing system' and 'script are not equivalent, though both terms suffer the same fuzziness as 'species'. For example, the first two Thai script parameters given above are in different writing systems, and it can be argued that UK and US English use the Latin script with different writing systems. --RichardW57 (talk) 10:34, 9 January 2025 (UTC)Reply

Actually, we do have a mechanism to prevent {{pi-alt}} producing forms for a script - one just uses something like |deva=-. It just needs to be documented. --RichardW57 (talk) 22:21, 9 January 2025 (UTC)Reply

@svartava: And are ᨻᩩᨴ᩠ᨵᩮᩣ (buddho) and ᨻᩩᨴ᩠ᨵᩮᩤ (buddho) different forms or different Tai Tham writing systems? I get a hint of regional preferences, but I'm not sure how strong they are. And there are definitely regional preferences between ᨾᩴᩈ (maṃsa) and ᨾᩘᩈ (maṅsa); the latter seems specific to Northern Thai Pali. And how do you propose to handle phonetically inappropriate gemination underneath repha in Sanskrit? That definitely turns up in Khmer (or Pallava), Bengali and what I presume is Brahmi script, even if it is now universally obsolete. Does it also turn up in Devanagari? --RichardW57 (talk) 14:24, 9 January 2025 (UTC)Reply

@RichardW57: Ideally, in such cases the Thai script entry would have both headers for scripts and forms, and we could suppress the Thai script displaying in alternative scripts table and the variants would just be placed below alternative forms header only. I didn't clearly understand phonetically inappropriate gemination underneath repha in Sanskrit, so could you give an example for understanding it correctly? Svārtava (t ɕ) 14:34, 9 January 2025 (UTC)Reply

@Svartava: Could ypu please be a bit clearer about what you're suggesting for the Lanna script. I wasn't even thinking of Northern Thai variants of Thai script Pali, though come to think of it, มงฺส (maṅsa) looks like one. Are you saying that you would tell the Tai Khuen, the Northern Thai or the Lao that their Lana script spelling of Pali is not the canonical one?

Svartava's changes to {{pi-alt}} starting in October 2024 have trashed manual alternative forms, so some of the examples I've been giving don't work - all the parameters but |Latn= are being ignored. For example, มงฺส (maṅsa) should be listed under the alternative forms of maṃsa, but the relevant parameter to {{pi-alt}} is being ignored. I'll try to fix it this afternoon, but if I run out of time I'll default |Latn= to avutta "not said" so as not to flood cat:E with pages exploiting your page. Did you even read the template's documentation?

For gemination under repha see ស្វគ្គ៌ (svargga). --RichardW57 (talk) 16:26, 9 January 2025 (UTC)Reply

@Svartava What are you playing at? I said I would fix the module and you went and editing in parallel, publishing non-working code. I think we now have it working. RichardW57 (talk) 18:31, 9 January 2025 (UTC)Reply

@RichardW57 That was a mistake, and by the time I fixed it you had already made the appropriate correction and I ran into edit conflict. Svārtava (t ɕ) 18:35, 9 January 2025 (UTC)Reply

@RichardW57 What do you think of the format for มงฺส (maṅsa) shown at here? In complex cases if there is need, we could anytime go back to the format which is better but I still think that for simpler pages keeping them under ===Alternative scripts=== instead of ===Alternative forms=== is a good approach. Svārtava (t ɕ) 19:04, 9 January 2025 (UTC)Reply

@Svartava: I am reviewing on the assumption that the omission of everything after the definition line is just laziness. Please correct me if I am wrong, for I believe the omitted parts are required. --RichardW57 (talk)

I've now noticed that the header line in มงฺส (maṅsa) was wrong - it should have shown that transliteration, as it differs from the standard Roman script. I've now fixed it. We want the Pali-specific header template {{pi-noun}} so as to:

Nag for gender
Omit transliteration by default, for it is usually given by {{pi-sc}}, though not in this case. @Benwing2 objected to the duplication.
Allow |tr=+ to force automatic transliteration where transliteration is needed and automatic transliteration works.

Searching for the nominative/accusative forms of the various Thai script forms, I get:

มงฺสํ (maṅsaṃ) : 10 irredundant Google hits (iGh)
มํสํ (maṃsaṃ) : 126 iGh
มังสัง (maṃsaṃ) : 143 iGh

Going by the line {{pi-alt|Lana=ᨾᩴᩈ|Lana2=ᨾᩘᩈ|Latn=maṃsa}}, it is clear that มงฺส (maṅsa) is not considered the (Wiktionary-)principal form for its writing system. To me, that implies its definition line should be

{{alternative spelling of|pi|มํส|tr=-}}, {{pi-sc|t|maṃsa}}, yielding

Alternative spelling of มํส, Thai script form of maṃsa

Given the amount of effort involved at this point, we might as well do as @Octahedron80 would prefer and type

{{alternative spelling of|pi|มํส|tr=-}}, {{pi-sc|t0|maṃsa}}, yielding

Alternative spelling of มํส, Thai script (with implicit vowels) form of maṃsa

except that I would prefer more memorable codes than the likes of 't0' and 't1'. Just using {{pi-sc}} led to a massive reduction in errors, though that will still suffice for the Wiktionary-principal forms, the vast majority of entries.

For this word, the 'Alternative forms' section, which contains "มํส (maṃsa), มังสะ (maṃsa)", is unneeded and arguably wrong. These forms are the Wiktionary-principal forms for their writing systems.

In keeping with this style, the page for มํส (maṃsa) does need an 'Alternative forms' section, listing มงฺส (maṅsa), while มังสะ (maṃsa) does not merit an 'Alternative forms' section.

I have been assuming that there is no information specific to the individual Thai-script writing systems for Pali (or Thai-script Pali writing systems in general) that needs to be gathered together under these forms. Pronunciation sections are a possibility. I don't know whether deep regional differences in pronunciation have been stamped out, and if they survive one should check what script the monks are reading. It would be amusing, but complicating for us, if monks in Surin read Thai texts with the Khmer pronunciation. (I still haven't analysed the Mon pronunciations for the Mon variant of the Burmese script for Pali.) Mazard cautions us to expect chaos.

Note that, barring regional accents, all three of these Thai script forms should be pronounced the same.

Is anyone willing to help me plan out the handling of Lao-script spelling differences? I rather fear my research on Lao-script Pali spelling is not yet adequate. A fall back is to assume that different Lao spellings indicate different writing systems.

It is all so much simpler if these are all treated on an even footing as forms of maṃsa. --RichardW57 (talk) 21:26, 9 January 2025 (UTC)Reply

@Svartava: I've drafted out what I think the non-Roman pages should look like for Wiktionary-script-principal lemma for สกฺโกติ (sakkoti) and the alternative form (in the same writing system) สกฺกติ (sakkati). You can see from the 'transliteration aid' comment in the 'Alternative forms' section that is is now more work to get reliable standard transliteration to Thai. Following the policy of preferring not to create entries without quotations, a lot of the links will be red links and thus with a relatively high risk of being typos. Of course, the Thai-script Pali we have splits up nicely into two writing systems. It's messier with the various spellings for the Tai Tham and Lao scripts. --RichardW57 (talk) 15:38, 10 January 2025 (UTC)Reply

@Svartava I couldn't find much of a discussion of gemination under repha, but there some text which just assumes it happens, e.g. the DHARMA project instruction "e.g. normalise varnna to varṇṇa (rather than fully standard varṇa) if the inscription normally doubles nasals after r". Bengali script Sanskrit নির্ব্বাণ (nirbvāṇa) is given as an example in one of Mazard's transcriptions of books about Pali, and Google will find examples of such a word, though probably obsolete Bengali নির্ব্বাণ (nirbban) rather than Sanskrit. --RichardW57 (talk) 21:37, 11 January 2025 (UTC)Reply

@RichardW57 This is definitely absent in Devanagari. In the scripts like Khmer, are the non-geminated variants also used or is it just the one variant with gemination that is used? Svārtava (t ɕ) 05:28, 12 January 2025 (UTC)Reply

(Notifying Atitarev, Octahedron80, AryamanA, Pulimaiyi, Svartava, JohnC5, Kutchkutch, Getsnoopy, Rishabhbhat, Dragonoid76, Exarchus): : The Khmer-script examples are from the time of Angkorian Khmer, so ultimately they might get converted to the Pallava script once it's been encoded in Unicode. I couldn't find enough examples to analyse the usage (inscriptions don't easily photograph well), but I suspect that the rule was that everything below a repha must occupy at least two storeys, and that determines whether gemination occurs for a phonetic rC-cluster. I saw no examples of this gemination in the 20th century Chuon Nath dictionary (Khmer-Khmer), the only source of Khmer-script Sanskrit I can view now that support for Adobe Flash is generally gone. --RichardW57 (talk) 12:20, 12 January 2025 (UTC)Reply

With and without occur in Bengali words, so I assume there are (at least) two writing systems for Bengali-script writing systems. That's on top of whatever expedients were used to avoid or limit the collapse of 'r', 'b' and 'v'. Eastern Nagari systems are not consistent amongst themselves in how and to what extent they avoid the collapse; in this latter matter, we've encountered differences in how Pali is handled in the Bengali script. --RichardW57 (talk) 12:20, 12 January 2025 (UTC)Reply

I think I saw, in a Mon context, an image of a flag saying something like Sanskrit ဓရ်္မ္မ (dharmma), but when I went back to record the URL for it, I couldn't find it. I'm sure of the stem and the (superscript) repha. The word's misrendering for me in the preview, but not in the editing window - the superscript mark should be part of the second akshara. --RichardW57 (talk) 12:20, 12 January 2025 (UTC)Reply

@Svartava: I've found a plausible hit for that form in what looks like a Burmese blog - https://listed.to/@thanhtunoo/37448/. I'm not sure that it's Sanskrit though. There's also a form သရ်္ဗ္ဗ (sarbba) there that seems to have inherited the b/v confusion of Angkorian Sanskrit spelling, but with v > b replacement as often in Thai, as opposed to the b > v replacement seen in Angkorian inscriptions. Of course, it might just be a Burmese Pali-Sanskrit hybrid - such are fairly common in Thai. However, I can't find that word in the SeaLang Burmese-English dictionary. There's a limit to what I can extract from the blog without someone (e.g.@Hintha) cleaning up the translation.--RichardW57 (talk) 14:27, 12 January 2025 (UTC)Reply

Notifying other Pali and Sanskrit editors. (Notifying AryamanA, Pulimaiyi, Svartava, JohnC5, Kutchkutch, Getsnoopy, Rishabhbhat, Dragonoid76, Exarchus): , @Octahedron80. RichardW57 (talk) 17:23, 8 January 2025 (UTC)Reply

I am in favour of an "alternative scripts" heading for the template. For Sanskrit and Pali, "alternative forms" should be for same-script terms that are spelled differently than the headword. Therefore, I think the current organisation at sakkoti is undesirable; the alternative scripts templates for the alternative forms (sakkati etc.) should be at the Latin-script entries for those terms. 4 of those templates on one page is unnecessary bloat. —Aryaman^A ^{(मुझसे बात करें • योगदान)} 22:12, 8 January 2025 (UTC)Reply

@AryamanA: Are you proposing that the 'alternative scripts' table should be at the principal script's (usually usually Latin's for Pali, Devanagari's for Sanskrit) page only? This would simplify matters - I've been resorting to subsidiary templates to ensure consistent lists across scripts, mostly as I turn up various Lao script spellings. --RichardW57 (talk) 13:43, 9 January 2025 (UTC)Reply

@RichardW57 That isn’t what they were proposing. Please re-read the comment. Theknightwho (talk) 13:57, 9 January 2025 (UTC)Reply

@Theknightwho: I have and it remains unclear. If 'Latin-script' qualified 'alternative forms' and not 'entries' it would indeed have a different meaning. My suggested interpretation is consistent with the popular (but not overwhelmingly popular) idea of minimising entries for alternative forms, and @AryamnaA's stated preference for minimising alternative script's entries, which mostly aren't quite alternative forms. --RichardW57 (talk) 16:58, 9 January 2025 (UTC)Reply

I think you understood right; I'm personally agnostic on whether "alternative scripts" should be at only the main script page or at all script pages. There are good arguments for either way. My only proposed restriction is that "alternative scripts" should only mean terms which are equivalent(ish) across different scripts (as I suggested in the example of sakkoti). —Aryaman^A ^{(मुझसे बात करें • योगदान)} 19:43, 9 January 2025 (UTC)Reply

Abuse of power by one of admins + the word "ministra" in Polish

Latest comment: 21 days ago47 comments9 people in discussion

I hereby want to report abuse of power by admin Surjection. Said admin twice reverted my edits on ministra then banned me from editing for a week. In their discussion page I pointed out that my change was clearly sourced with the highest authority on the Polish language and that while I could have made a technical mistake - which they pointed out - it's hardly the reason to revert the change or ban someone.

My argument: the word "ministra" in Polish used as feminine form of "minister" is an error. This was clearly stated by the Polish Language Council (RJP) in their statements from 2012 and 2019. The RJP stated that such form is atypical for the Polish language and in general used in colloquial way. The RJP suggested the conservative approach for feminine form in this case (ie. "pani minister") and pointed out that used correctly the form "minister" leaves no doubt on the gender of person it applies to. The RJP pointed out that correct way to create feminine forms of nouns is by adding -ka, and not -a (in example: doktor -> doktorka and not doktora). Both statements are available publicly. Mind that on the minister page it's also stated that correct Polish feminine forms are either minister or ministerka, but not ministra.

In response Surjection used following fallacious arguments:

"you don't like the word"
"you lie"
"you didn't read"
"you don't want to contribute"
"your motive is so obvious"

and so on...

As a dictionary, which is used by huge number of people, Wiktionary should present facts, not opinions. Now the fact that some political factions and media (in comparison to others that oppose it) want to enforce usage of specific word doesn't make this word correct and if the highest authority on the Polish language says "yes, let's make feminine forms of nouns but let's do it correctly and logically" then points out that certain forms, including the one in question, are not correct, not typical for Polish language, may have wrong associations, can be misread and misunderstood and so on, then Wiktionary should include that opinion and clearly mark such forms as incorrect, colloquial etc. And admins shouldn't put themselves above authorities on languages especially when they don't even know those languages.

It is also my opinion that admins should be impartial, should focus on facts and avoid being judgmental, self-rightous, close minded, stubborn and so on. Making decisions based on their imaginative assumptions and/or prejudice is simply wrong and against any rules I've read.

Based on quick look on their discussion page it would seem that admin in question had in past made multiple questionable decisions in which they acted on assumptions and prejudice rather than facts. Their discussion page proves they're unable to admit of being wrong and they tend to use fallacious arguments and not factual ones. Therefore I request the revocation of powers of said admin as they clearly violated the Code of Conduct of Wikimedia Foundation. I also ask for correction of the definition of word in question to indicate that said form is both colloquial and incorrect. 89.64.9.29 00:08, 1 January 2025 (UTC)Reply

Several notes:

not "banned from editing" - blocked from editing one page, i.e. the page in question,
the user first completely deleted the definition and then tried to add a completely subjective "corruption" label; it's obvious they do not like this term,
they seem to like appealing to authority, including dismissing an admin (@Vininn126) that tried to point them to policies,
I asked them multiple times on the thread they started on my talk page to contribute to the dictionary, but they are apparently interested in nothing else than trying to edit this entry to make it seem less favorable to the word in question.
Wiktionary doesn't specify that words are "incorrect", no matter how much you dislike the word.

— SURJECTION ^{/ T / C / L /} 00:12, 1 January 2025 (UTC)Reply

Obvious agenda-pushing because they don't like the use of the feminine form. No abuse of power. Theknightwho (talk) 00:39, 1 January 2025 (UTC)Reply

I will happily learn how making a changes based on opinion of Polish Language Council is "agenda-pushing" LOL. Read again: correct feminine forms of word minister in Polish are either minister or ministerka. I have nothing against feminine forms. This specific form ministra is simply incorrect. That's a fact not some "agenda". 89.64.9.29 00:45, 1 January 2025 (UTC)Reply

To add some context: There's been discussion in Poland regarding feminine forms for years (actually over a hundred years). There's been consensus regarding some forms for years, but in past 15 or so years some ideological powers decided to push for feminine forms for nouns that (for multiple reasons) had feminine form the same as masculine - like said minister. And while that is, in general, considered a good trend by language authorities the problem was that said ideological powers didn't like the correct feminine forms (like said ministerka) finding them derogatory. That's why they started pushing other forms, which in their opinion are less derogatory, like this specific form we're talking about - ministra. And yes - this form is in use by certain political powers and media, especially since 2023 elections in Poland. But that doesn't change the fact that - from the language point of view - this form is incorrect, that language authorities find it as incorrect and that, to the best of my knowledge, it doesn't exist in any serious dictionary. And that is what we should focus on here. I referred to two opinions of Polish Language Council, which clearly point out that such forms are not only incorrect, but also problematic from the language point of view. I don't really care if some people try to push their agenda by using incorrect forms of words. But in my opinion a dictionary shouldn't promote incorrect forms of words or should at least clearly indicate that while they may be in use they are considered colloquial or incorrect. And that is my only goal here. 89.64.9.29 01:27, 1 January 2025 (UTC)Reply

Your opinion on my label has been noted, but your assumptions here are the very reason you're being reported.
Your continuous attempts to dismiss valid argument by using manipulation or fallacious arguments are also the very reason you're being reported.
My response to clearly disparaging comment of another user on my profile has nothing to do with this case and doesn't matter.
You seem to have trouble understanding what is factual argument and fallacious argument. The fact that I didn't do any other edits (from this IP at least) does not matter at all in this case.
You shouldn't make comments in your own case. Everything here is public. People can check your page, my page, your comments, my comments. You made your point of view clear in your discussion page and this discussion is over. Stop trying to start it again here.

89.64.9.29 00:40, 1 January 2025 (UTC)Reply

As an uninvolved admin I'll just say that Wiktionary is a descriptive, not prescriptive dictionary. This means we describe actual usage, not usage as some authority says it should be. If a given authority says you shouldn't use a word in a specific way, but the word is nonetheless used that way, we may note this using the term proscribed, but we don't either delete such definitions or use language like "corruption". Benwing2 (talk) 01:20, 1 January 2025 (UTC)Reply

To explain: I found "corruption" in glossary and tried to use it, but wasn't sure how this works. I assumed someone will probably fix it when they see it. I agree that my action to delete the whole definition wasn't the best one despite my good intentions. I can admit to it. But with my second edit I actually tried to make the right thing, I looked into glossary trying to figure out what I should use to indicate that the form is incorrect - corruption was the closest thing I found. Maybe I made a mistake, but that's hardly a reason to attack me or ban me.

I added a little context for the ongoing linguistic dispute in Poland above.

I'd like to point out that currently there's a conflict between definition of minister which clearly states that feminine forms are minister and ministerka and definition of ministra. And that's one of reasons why, in my opinion, that definition should in some way indicate that the form, while in use, is considered incorrect and that correct forms are minister or ministerka (although to the best of my knowledge currently only minister is in Polish dictionaries).

On the side note in my opinion such behavior from admins is really discouraging. Especially if someone makes valid factual argument and gets nothing but false accusations in response and fallacious arguments in response. In my opinion admins shouldn't assume bad intentions. 89.64.9.29 01:44, 1 January 2025 (UTC)Reply

Could you please point out to me where the Polish Language Council calls these forms incorrect? I don't see it in their statement here, for instance. Even if we accept your appeal to authority (which isn't really how things work here), you seem to be misrepresenting what has actually been said. Theknightwho (talk) 02:02, 1 January 2025 (UTC)Reply

It doesn't even matter if they are proscribed, we include proscribed speęch. This is either clearly view-pushing or bad-faith, but I see no reason to give this particular person much more attention. Vininn126 (talk) 02:11, 1 January 2025 (UTC)Reply

Well, the article you linked has a section that translates as: "[T]he creation of feminine names by changing the inflectional endings, e.g. (ta) ministra, […] is not typical for the Polish word-formation system (the words blonda, szczęściara are clearly colloquial), and it is better to use the traditional suffix model, i.e. the creation of names like doktorka[.]" Doesn't say anything on if ministra is wrong, though. CitationsFreak (talk) 02:14, 1 January 2025 (UTC)Reply

@CitationsFreak Right, exactly. That's very different from what the other user said. I don't think it even supports the claim that they're colloquial either - it simply implies they may have been modelled on colloquial terms. Theknightwho (talk) 02:50, 1 January 2025 (UTC)Reply

There are two statements and it's not possible to understand second one without knowing the first one.

In statement from 2012 they say that in general we create feminine forms of names by adding -ka in case of noun names, for example nauczyciel changes into nauczycielka, and by adding -a for adjectival names, for example służący changes into służąca. They say that usage of form with -a is untraditional for noun names, but forms like ministra are being created because traditional forms like ministerka can be seen as colloquial and derogative (literally "showing smallness of the person they refer to") - which is part of broader social and ideological ongoing debate in Poland (as I explained above). Then they say that forms with -a also have cons, they also can be seen as derogative and can be ambiguous with honorifics, which creates conflict with intentions of use (and it's important to know that Polish language is full of formalities and usage of honorifics is normal in daily life). After that they talk about traditional use of masculine forms for both genders. Finally they remind that for years only adjectival names had feminine forms. In the end they say that the usage of feminine forms cannot be enforced in any way, especially not by decision of authorities, nor introduction of any laws.

In second statement, from 2019, they indicate that for certain names, the usage of feminine forms by adding honorific pani (ie. pani doktor) became a norm in second part of 20th century, however since 1990s the feminine forms started to gain popularity. They reject some popular arguments against creating feminine forms (which I totally agree with). Then they say that creating forms by "changing inflectional endings" (ie. ministra) is "not typical for Polish word-formation system" and explain that such forms are generally colloquial forms of feminine forms (like blondyna compared to blondynka. They also say that traditional forms are better (ie. ministerka) and that dominant form is by adding honorific (ie. pani minister).

So you're right that they don't directly call that form incorrect. However they say that such forms were created by changing inflectional endings, which they explained more in their first statement from 2012. That means that the ending which is normally used for adjectival names is being used for noun names. They also explain that such change is typically done to create colloquial form of feminine.

And since the use of incorrect inflectional ending is an grammatical (inflectional) error ([here in Polish about inflectional errors] - the use of incorrect ending is on top of the list) then consequently the word created with error is incorrect form.

To sum up: the masculine word minister normally has the feminine form as minister (optionally with honorific pani). The correct way - according to rules - to create feminine form is by adding the ending -ka to create the word ministerka. There is also a form ministra which is created by using incorrect ending -a, which by rules should be only used to create feminine forms of adjectival names and when it's used to create feminine forms of noun names it's done to create a colloquial form.

On the side note it's worth noting that the word ministra wasn't really in common use until current coalition government took over in 2023 and enforced its usage. To be more precise there was a short time discussion about it in 2012 when first mentioned statement was published and then nothing until the end of 2023. It seems even the second statement from 2019 didn't create any interest in media. 89.64.9.29 04:12, 1 January 2025 (UTC)Reply

These are not inflections, though; they are derivations. They may well be derived from inflections, and that might not be typical, but derivations are often unexpected or irregular in pretty much all languages, because they are a new word that has been derived from a pre-existing one in some way, and that can happen in many different ways for many different reasons. You definitely cannot conclude they are the "incorrect inflectional ending", which would only make sense if the word minister were being used with the wrong inflectional endings. That's not what's happening here, though: instead, they've taken the genitive/accusative form, reinterpreted it as a feminine noun, and inflect it in an entirely regular way. There's nothing incorrect or nonstandard about that. Theknightwho (talk) 05:50, 1 January 2025 (UTC)Reply

The alternative understanding is that it could be a back-formation of ministerka, or simply just a change in gender. You see nouns gain or lose vowels when changing genders in dialects, compare jud. Vininn126 (talk) 05:52, 1 January 2025 (UTC)Reply

I'm just gonna quote the 2012 statement from RJP here: "The argument that such names are regularly created word-formationally in other languages (e.g. German) is not accurate, because each language uses different ways of enriching itself, which depends on the grammatical structure of a given language, the word-formation possibilities and the customs established in it." 89.64.9.29 06:12, 1 January 2025 (UTC)Reply

That doesn't make it incorrect - it's a simply a statement that it isn't typical in Polish, but that isn't relevant. Theknightwho (talk) 07:07, 1 January 2025 (UTC)Reply

The admins have a natural prejudice against IP users and this is expected, because many of the IP users are vandals. Arguing with admins isn't a good idea either. You are in no position to change the rules, and Wiktionary lists all attested words, even if they look like https://en.wiktionary.org/wiki/Category:English_leet and are promoted or used only by a small fraction of the population.

You could count on the term getting appropriately labelled and/or categorized. And ministra already has had a "neologism" label. It's possible that this is not enough and some further clarifications were possible similar to the Polish Wiktionary entry for "ministra", but you made a mistake with the "corruption" label that got you blocked. Further angry comments didn't do you any favor, but just confirmed the admin's judgement. --Ssvb (talk) 13:04, 1 January 2025 (UTC)Reply

prejudice is against the rules. Admins should be impartial and should always assume good intentions unless proved otherwise. Presumption of guilt is against basic human rights. In this situation the admin made a presumption "you don't like the word" and then showed a tendency to interpret everything in a way that would support this presumption and made further assumptions to support this claim. The admin clearly showed the symptoms of power intoxication.
I never argued that I didn't make a mistake with corruption. I clearly stated that I found it in glossary and attempted to use it based on the fact it was there. As new editor I don't need to know everything, but it's hardly a reason to attack someone.
In my comments I applied to logic and reason while admin continued to use fallacious arguments. This is kind of behavior that should never be seen from admins. Everyone has the right to defend themselves against false accusations from admins.

89.64.9.29 16:28, 1 January 2025 (UTC)Reply

You were not attacked. You were called out for problematic behavior. Not everything is unbiased, and it is possible for you to evoke such comments with your behavior. Vininn126 (talk) 17:07, 1 January 2025 (UTC)Reply

It's customary on Wikimedia projects to add welcome message for new users on their user page. It's not customary to repeat this message or its parts in a way that can be seen as disparaging or derogatory. The welcome message is usually constructed in a way that new user can find all the rules, guidelines and help they need.

Things can be pointed out in various ways, but there's huge difference between pointing out something (ie. "hey I noticed you tried to use label corruption but such label doesn't exist - could you read more so you can avoid such edits? If you need help with editing you can find it here...") and attacking someone (ie. "I subjectively declare that you don't like a word and ban you. I don't care what you have to say"). 89.64.9.29 18:40, 1 January 2025 (UTC)Reply

Your behavior to such comments was assuming bad faith on Surjection's part, and I was calling that as well. We see people act this way around words all the time, it's not rude, it's direct. Vininn126 (talk) 18:44, 1 January 2025 (UTC)Reply

That's called confirmation bias.

There's no doubt that Surjection's behavior violates Wikimedia's Universal Code of Conduct.

There's also no doubt that you both suffer from power intoxication.

And there's also no doubt that you will never see that as this condition blinds you from seeing it. But you prove it with every comment you make. 89.64.9.29 19:03, 1 January 2025 (UTC)Reply

Teamwork doesn't seem to be your strong point. I don't mean it in a negative way, just the end result looks non-productive and this makes me sad. Still, if you happen to be incompatible with the other contributors, then it's better to quit early rather than become invested in the project, torment yourself and become a hindrance for the others. The admins, not going out of their ways to accommodate everyone, effectively happen to filter out people with potentially problematic non-cooperative personalities. --Ssvb (talk) 08:37, 2 January 2025 (UTC)Reply

@Benwing2: this is a double edged sword, and it plays both ways. The IP user's story sounds like the new term might be artificially promoted (or in other words "prescribed") by a certain group, based on their desire to displace another term ministerka [12] that they perceive as derogatory. And thus the process might be not entirely natural. The term surely deserves an entry, but a "usage notes" section with some explanations would help non-Polish readers understand the situation. Also maybe it would be useful to find more quotations and verify that the sources are truly independent, ruling out the possibility of them colluding with each other? The opinion of native Polish speakers would be interesting to know. Do many or most of them perceive the new term as natural? I presume that the IP user is a native Polish speaker, who is a little bit upset. On the other hand, there's an audio recording of this term from a native Polish speaker [13], who presumably didn't perceive it as incorrect (unless she just recorded it as the genitive/accusative form of minister).

My personal concern is that the WT:CFI policy can be potentially gamed if somebody is up to no good. Merely three authors with publications in durably archived sources are enough to impose arbitrary new words on multi-million nations, to the excitement of non-native wiktionarists. --Ssvb (talk) 10:22, 1 January 2025 (UTC)Reply

The word ministra first appeared around 2011-2012. It was promoted by one politician, got some media attention and was even commented by Polish Language Council (as already mentioned above), then it disappeared from general public space for years. Back in 2012 the fact that minister Mucha expected to be called ministra was seen by some as one of her many faux pas (media article in Polish). During that time there was a lot of hate towards Donald Tusk government from football supporters (and others) in Poland and Joanna Mucha was also widely criticized and put on memes in negative context. It's worth noting that despite her request in general population the form (pani) minister was used, although at least some more liberal media (like Onet.pl) were using form ministra sometimes (after quick Google search it seems the form was used alternately with minister around 2012 and after 2012 only form minister was used in articles about her). After 2012 until late 2023 the word ministra was practically non-existent. It was reintroduced after 2023 election by left-wing politicians (at first only two of them: Agnieszka Dziemianowicz-Bąk and Katarzyna Kotula) and started to be widely used in liberal media and widely criticized by more conservative part of society. For majority of population in Poland it's more natural to use pani minister rather than ministra or ministerka. On side note some language authorities in Poland, like Jerzy Bralczyk (Polish Wikipedia), pointed out that in Latin ministra means maid or servant and therefore can be seen as derogative form. 89.64.9.29 17:49, 1 January 2025 (UTC)Reply

Claims like "majority prefer" would need sources, not one's own intuition. Feminatives are quickly gaining popularity, as well. But all that aside, it sort of doesn't matter. The only label that really works here is "neologism", the council doesn't proscribe it, and it has seen enough use to be documented. Any other labels would be based on personal opinion. Vininn126 (talk) 17:53, 1 January 2025 (UTC)Reply

I understand that it's very hard for you to leave your bias out of this discussion, but if you look at previously mentioned statement by Polish Language Council (2019) it's clearly stated there that the honorific form pani minister is dominant. The feminine forms like ministra or ministerka are artificial in Polish language and (again the same statement) despite the push in media to use them, there is wide resistance in general society. In daily conversations people would rather say (pani) minister Kotula than ministra Kotula. I wouldn't expect to see a change in that in majority of population for at least 10 to 20 years - but that's just my estimate. In most cases usage of those artificial feminine forms is seen as ideological rather than lexical change. 89.64.9.29 18:26, 1 January 2025 (UTC)Reply

The Polish language council is often quite out of date with a lot of things. By source, I mean an actual study published in a journal, not what a closed group of academics say. Please keep the assumptions and bias and assumption of bad-faith to a minimum. Vininn126 (talk) 18:28, 1 January 2025 (UTC)Reply

I'm sorry but who are you to question the authority of Polish Language Council?

I understand that due to power intoxication (which we already discussed elsewhere) you have hard time understanding how a discussion works. So let me explain it to you. I made an argument and provided a valid, recognized, authoritative source to support this argument. If you disagree with that argument it's your job, not mine, to provide sources that would support your statements or prove my argumentation to be wrong. Your personal opinions, assumptions, biases aren't valid arguments. Now unless you actually have any factual arguments I consider discussion with you closed. 89.64.9.29 18:55, 1 January 2025 (UTC)Reply

Being a descriptive dictionary, and also a good scientist, the average person. You should question those authorities, not agree blindly. Vininn126 (talk) 18:59, 1 January 2025 (UTC)Reply

Since we're making this discussion off-topic I want to congratulate you on achieving C1 proficiency in Polish. That's definitely quite an achievement for American.

Now during my 40 years of life in Poland I have never met a single person that would say ministra in casual conversation. Everyone I know will say (pani) minister. The only people to use form ministra are some politicians and some media and in past 12 or so months some more ideologically left-wing commenters on social media.

And yes - this will probably change as it seems at least some part of Polish society has a need to use feminine forms. But for now the form minister is dominant both in casual conversations and in media (e.g. 1, 2, 3, 4, 5, 6, 7 and so on). And while some media generally use form ministra we can see that even politicians of ruling coalition use the form pani minister (as seen for example in this article by TVN where author use form ministra but quoted politician use form pani minister). 89.64.9.29 19:39, 1 January 2025 (UTC)Reply

How is this relevant to the label? Vininn126 (talk) 20:36, 1 January 2025 (UTC)Reply

It's as much relevant as your continuous pointless comments. 89.64.9.29 20:51, 1 January 2025 (UTC)Reply

This entire discussion is getting pointless IMO. IP 89, you need to stop with the insults and bad-faith accusations. Snarky statements like "quite an achievement for an American" don't help. The citations prove that this term is used and IMO the label neologism is totally fine. Benwing2 (talk) 21:04, 1 January 2025 (UTC)Reply

You misunderstood. My congratulations were sincere. It is quite an achievement as Polish is considered one of the hardest languages to learn for English speaking person. 89.64.9.29 21:21, 1 January 2025 (UTC)Reply

Let's look at what actually has happened:

the IP removed the entire Polish entry at ministra with the summary:

"Ministra" is a genitive or accusative singular masculine form of minister. It doesn't exist as feminine form of the word minister despite the fact that some leftist politicians and media try to enforce it. It's considered an error and as such shouldn't be in dictionary. The correct feminine form is "minister".

Surjection undid this with the summary: "don't remove words just because do not like them, this has quotes to attest that it is used"
Vinnin126 added the {{welcome}} template to the IP's talk page with the note:

I highly recommend you read our Criteria for Inclusion, and that of other major dictionaries, before pushing ideologies and making a fool of yourself.

The IP added the labels |colloquial|corruption to both senses
The IP replied to Vininn126 on the IP's talk page:

I highly recommend that you stop recommending things to others as you have neither knowledge nor authority to do so. I neither care nor ask for your opinion.

Surjection undid the IPs edit to ministra with the summary: "corruption" is not a valid label", then blocked the IP from editing that page for one week with the stated reason :Disruptive edits: "I don't like this word" kind of editing that is not in good faith"
The IP posted on Surjection's talk page:

Stop blocking people just because you don't agree with them. Do you think you're higher authority on Polish language than the Council for the Polish Language, which I put as the source?

Surjection replied:

You're trying to mess with an entry because you don't like the word. Go find something better to do, like contributing new entries"

This was followed by arguing back and forth, after which the IP came here.

In hindsight, Vinnin126's comment about "making a fool of yourself" was not the best way to address someone who obviously takes this very seriously. Surjection let his annoyance at the IP's high-handed rhetoric color his responses. Given the huge amount of sheer garbage he deals with that people are always trying to hide in our entries, I don't feel comfortable passing judgment on him for that.

The IP, on the other hand, has been acting the part of an admin abusing power without the power: indirectly pulling rank at every opportunity, lecturing everyone about every detail of what they've said, and talking like a one week block from a single page accompanied by a few dismissive replies is a Crime Against Humanity. If Surjection was really into abusing his authority, he could have just blocked them sitewide and taken away their talkpage access. No one but other admins would have been aware of it for weeks.

If the IP had just brought their concerns here with a simple statement that they had been blocked and an explanation as to why they think their version should be accepted, they would have had a much better chance of getting what they wanted. As it is, it's very hard to read their comments here without wanting to reject everything they have to say.

I see their recent comments are much less angry and more helpful- I hope we can all take a deep breath, step back, and work this all out. Chuck Entz (talk) 22:50, 1 January 2025 (UTC)Reply

I think this is a good summary. Thanks Chuck. Vininn126 (talk) 23:02, 1 January 2025 (UTC)Reply

How about we just add some usage notes explaining that the 'traditional' forms for "female minister" are ministerka and pani minister? If there's some sort of underlying connotation for the use of the term, that should probably be mentioned as well, right?

This convo feels very much similar to the presidenta controversy here in Brazil back when Dilma was president. It's not considered wrong per se afaik, but some groups of people definitely poked fun at it — it's seriously very similar to what someone mentioned above, I thought: 'seen by some as one of her many faux pas(ses?), more often spoken by people leaning left than otherwise'. I added "sometimes proscribed" to the word I mentioned here since indeed some people sometimes proscribe it despite not authorities on the matter or anything. Could that work here? I don't speak Polish, but I'd imagine you'd be able to find a couple blogs/articles of people poking fun at the ministra, yes? MedK1 (talk) 21:03, 1 January 2025 (UTC)Reply

Another example that might be more relatable to folks here is the singular they. Though it's perfectly acceptable, we still have "occasionally proscribed" s a label since it is indeed a little controversial, even though unlike ministra, it is not a neologism. And indeed, we even mention that part right in the article! It's very informative. MedK1 (talk) 21:09, 1 January 2025 (UTC)Reply

Some Polish language authorities like professor Bralczyk (Polish wikipedia) suggested it can be seen as derogatory due to the fact that in Latin ministra means servant - as result some people made memes about it. And of course a lot of people, especially more right leaning, are making fun of it.

On the other hand professor Miodek approves all three forms - he says that in his youth feminine forms like ministerka, profesorka etc. were normal and only later masculine forms gained popularity and were considered as more dignifying for women. 89.64.9.29 22:11, 1 January 2025 (UTC)Reply

Those are their musings. Not actual usage. Bralczyk also told people not to say pies zmarł when those people were emotional and Miodek is a notorious normativist who prescribes many things that are not used. Their opinion isn't always gospel. Vininn126 (talk) 22:39, 1 January 2025 (UTC)Reply

Ahh Bralczyk just quoted my opinion from 20 years ago ;) and he wasn't wrong too. But people got all emotional about it.

But it has nothing to do with current discussion.

I see you have something against linguistic authorities. First you denied authority of Polish Language Council, now Bralczyk and Miodek. Do you only accept those linguists who think like you?

Anyway I gave two examples that even between linguistic authorities there's no consensus. Bralczyk opts for ministerka, while Miodek says "use whatever form you prefer".

The fact is that there were memes created after Bralczyk's opinion that were making fun of word ministra and that there is a lot of people who make fun of this word - which was the question here.

Generally speaking this word is used by a very small group of people (some politicians, some journalists), while the majority of population uses the form minister. 89.64.9.29 00:30, 2 January 2025 (UTC)Reply

No, I will often cite certain authorities - I have an issue sticking to prescriptivism for the sake of prescriptivism, as a descriptive dictionary. WSJP is also prescriptivist, but usually much more neutral and also has good sourcing. But my point is, we are here to document how people talk, not how authorities think we should talk. If they line up, great. It's not about whether they agree with me or not, it's about questioning what they are saying and checking how much validity there actually is in these statements, because they, just like other people, make mistakes, and it's our job here on Wikis to cross verify and try to find what is actually accurate. I have even added several notes from the language in the past where they made sense, but it needs to be done from the perspective of a neutral oberserver, rather than telling the reader how they should talk. Vininn126 (talk) 00:35, 2 January 2025 (UTC)Reply

There's obviously a contention and a debate about this term in Poland, as evidenced by various articles on this topic and the fact that the language authorities had been even asked for their opinion. The Appendix:Glossary#proscribed label description says "Some authorities or commentators recommend against or warn against the listed usage", so it's probably applicable here. The label doesn't seem to imply strict ban or prohibition, merely some commentators warning against the term seems to be enough. The other neologisms don't have this kind of contention around them.

I'm also not comfortable with a dismissive statement that disliking the term is bad. If more than one native speaker happens to dislike the term and doesn't feel like the term sounds natural to them from the linguistic perspective (rather than disliking the thing or phenomenon described by the term), then this probably means something. Even one well educated unbiased native speaker, who is immersed in the language environment, normally has the capacity to make relatively accurate judgements on this matter in many cases. However, having the consensus of more than one native speaker is, of course, much better, as age, profession, and other factors can influence opinions. Having "actual study published in a journal" would be perfect, but this shouldn't be a blocker if such study isn't readily available. How many of the Wiktionary editors are native Polish speakers? Would Wiktionary be interested in attracting more native Polish speakers?

I also happen to dislike aggressive activism. For example, I remember how the git version control system users had been forced to change the primary branch name from "master" to "main" [14], which caused some inconveniences and disrupted workflow, but refusing this wasn't an option because of the slavery and racism allegations. I'm not sure if the activists succeeded in eradicating the term from all other spheres of life, such as the term master's degree, etc. This Polish term introduction also reeks of a similar aggressive activism, peddled by a vocal minority in the name of feminism or whatever. And yes, rejecting this probably isn't an option because of the possible bigotry accusations. I'm a bit old and conservative, I dislike changes for the sake of the changes, which are enforced via blackmail-alike methods. But the youngsters can adapt their speech to the new rules much faster. --Ssvb (talk) 07:07, 2 January 2025 (UTC)Reply

@Ssvb You need to separate your own personal like or dislike of the term from our duty to report usage, which is the only thing that matters here. Personally, I could not give less of a shit whether a git branch is called "main" or "master", and if it makes some people happier to change it then that's great, because I genuinely don't care either way. It's each person's prerogative to care about what they want, but Wiktionary is not your soapbox to air your personal grievances about these kinds of trivialities. Please let's retain some perspective. I do agree that we should label things appropriately, but the IP user clearly wants us to be normative (explicitly or implicitly), which isn't appropriate. We simply describe usage, and sometimes that entails describing how speakers feel about that usage. Theknightwho (talk) 08:46, 2 January 2025 (UTC)Reply

@Theknightwho: That's precisely what I do and I'm a neutral party here. I merely point out that the Polish Language Council is a prescriptivist authority. But the active promoters of the new terms effectively act as prescriptivist authorities too. Both of these sides need to be represented in Wiktionary because it's your duty to report usage. And I think that the contention between these authorities should be preferably mentioned too, but maybe I'm asking too much.

Some people claimed that the English term "master" is offensive [15] and this did have a real life impact, such as the git branches renaming. Do you feel like it's your duty to document this somehow in the master article? Leaving your emotions aside, what's your opinion with a Wiktionary editor hat on? --Ssvb (talk) 09:44, 2 January 2025 (UTC)Reply

January 2025

Bad ledes in Thesaurus namespace

Latest comment: 22 days ago1 comment1 person in discussion

@qwertygiy Standard practice in the Thesaurus namespace is currently putting a blank line between {{ws header}} and the first L2. See, for example, Thesaurus:person, Thesaurus:berry, etc. This gives a warning to anyone who makes a new Thesaurus entry (e.g. at this trigger of filter 115), because it's in violation of WT:NORM. So either the Thesaurus namespace should be excluded from filter 115 or the practice should be changed to match NORM. -saph (user—talk—contribs) 03:12, 1 January 2025 (UTC)Reply

2024 – Top pageviews statistics

Latest comment: 10 days ago2 comments2 people in discussion

The top for en.wiktionary.org (unfiltered list from dump files) is:

	25840383 Special:Search
	21800424 Wiktionary:Main_Page
	 1993839 Appendix:Glossary
	 1645478 rainbow_kiss
	 1506823 xxx
	 1451758 -
	 1244194 黑料
	 1145183 吃瓜
	  938861 Category:English_swear_words
	  681510 bokep
	  672460 I'll
	  633276 视频
	  599259 XXXX
	  598818 aww
	  567081 colmek
	  527691 麻豆
	  523495 bocil
	  493837 Appendix:Protologisms/Long_words/Titin
	  463888 Appendix:Filipino_surnames
	  444811 Wiktionary:International_Phonetic_Alphabet
	  427258 XXX
	  395379 لا_إله_إلا_الله_محمد_رسول_الله
	  395192 
	  386054 «
	  377134 astaghfirullah
	  356207 変態
	  349388 Category:English_surnames_from_Old_English
	  343450 سکس
	  342540 ‘
	  338717 pajeet

Detail and another WMF projects: https://archive.org/details/2024-top_2k_user_pageviews Dušan Kreheľ (talk) 08:49, 1 January 2025 (UTC)Reply

Well, rainbow kiss is the most popular. I wonder which website(s?) are sending people (or bots) to it. Also, we can assume it is the favourite sexual act on en.wiktionary. I wonder what Wiktionnaire's top ~~sex act~~ ~~word~~sex act is... Father of minus 2 (talk) 22:22, 12 January 2025 (UTC)Reply

Pronunciation of irregular plurals

Latest comment: 18 days ago12 comments5 people in discussion

Currently there is no way of knowing how to pronounce, for example, ibices or sphinges. JMGN (talk) 00:46, 2 January 2025 (UTC)Reply

@JMGN: As īʹbĭsēz, /ˈaɪbɪsiːz/ and sfĭnʹjēz, /ˈsfɪnd͡ʒiːz/, respectively. 0DF (talk) 13:30, 4 January 2025 (UTC)Reply

@Chuck Entz: Should we add them to the headword entries too, since they appear there and are irregular? Namely, in ibex & sphinge. JMGN (talk) 18:03, 4 January 2025 (UTC)Reply

Then add the pronunciations or add a pronunciation request? -saph (user—talk—contribs) 03:42, 2 January 2025 (UTC)Reply

To all of them? I thought this was Beer parlor... JMGN (talk) 12:17, 2 January 2025 (UTC)Reply

There's nothing preventing anyone from adding them now, hence, no need to bring the issue to Beer parlour. See, for instance, vertices or indices. Andrew Sheedy (talk) 18:33, 2 January 2025 (UTC)Reply

Really surprized that this cannot be automated as the rest of the pronunciations though... JMGN (talk) 21:21, 3 January 2025 (UTC)Reply

English is a special case: it represents the collision of two branches of Indo-European, followed by a thousand years of history including serving as the main language in two whole continents and in numerous countries worldwide, and as a second language for over a billion people. It has huge numbers of loanwords coming from languages all over the world throughout known history. On top of all that, it has no authoritative standard. Although it may be possible to automate the pronunciation, it would be a huge project and would probably add considerably to system overhead on the million+ pages where it would be deployed. Chuck Entz (talk) 02:01, 4 January 2025 (UTC)Reply

Hard to believe that there're over a million entries with irregular plurals... JMGN (talk) 12:57, 4 January 2025 (UTC)Reply

There don't have to be. What is there about irregular plurals that requires special treatment? Chuck Entz (talk) 14:01, 4 January 2025 (UTC)Reply

@JMGN: We have 19,653 entries for English nouns with irregular plurals. 0DF (talk) 15:43, 4 January 2025 (UTC)Reply

@0DF: Thnx. Let's do it then! JMGN (talk) 18:00, 4 January 2025 (UTC)Reply

"number" or "numeral"?

Latest comment: 20 days ago10 comments4 people in discussion

We currently have a POS "numeral", hence CAT:Numerals by language, CAT:English numerals, etc. but for some reason we have CAT:Cardinal numbers by language and CAT:Ordinal numbers by language not #cardinal numeral or #ordinal numeral. Category:Numerical appendices has a mixture of appendices called "Foo numbers" and "Foo numerals". I'd like to straighten this out by using a consistent naming scheme, probably numeral instead of number. [The root of the issue seems to be that numeral is normally taken to be a symbol (like 2, 3, 4 in the Hindu-Arabic system) that refers to a number, which is an abstract concept, but (a) whether numerical words like two, three, four are considered "numerals" or "numbers" is less clear (technically it appears they are numerals, being symbols of sorts, maybe more correctly signs, that refer to abstract numbers), and (b) in common parlance, the distinction between numeral and number is elided.] Mixing both terms is unhelpful, so if we can settle on consistent terminology, either "numeral" or "number", I can do the renames. Benwing2 (talk) 03:33, 2 January 2025 (UTC)Reply

I think I have a preference for numeral. Vininn126 (talk) 03:45, 2 January 2025 (UTC)Reply

If we are to standardize to one of them, then "numeral" is the better option of the two. But I wonder if we could instead come up with some consistent distinction between the two. — SURJECTION ^{/ T / C / L /} 07:58, 2 January 2025 (UTC)Reply

@Surjection Can you be more specific? What sort of distinction were you thinking of? Benwing2 (talk) 09:17, 2 January 2025 (UTC)Reply

Numerals are a POS, numbers are a semantic category. Ordinal numbers are often not numerals in many languages, same goes for adverbial numbers (once, twice) and fractional numbers (half, third), for instance.

In many languages, the numeral POS has different grammatical properties than other POS: in Finnish and Russian, it governs very specific cases on the adjacent noun, for instance. In many other languages, there is no numeral POS at all (e.g. Afar nammáy or Tokelauan lua). They are still numbers though.

So, basically, the appendices calling these "numerals" are 'wrong' (in the sense that they don't follow the above distinction), and should probably be standardised to numbers. Thadh (talk) 09:10, 2 January 2025 (UTC)Reply

I'm not sure where you got the idea that a numeral word has to be its own part of speech to be a numeral. See Ordinal numeral on Wikipedia. That seems a Thadh-ism (if I may call it that), which you've extrapolated from a handful of languages. Benwing2 (talk) 09:16, 2 January 2025 (UTC)Reply

OK, that was a bit snarky and I apologize for that, but I'm still confused as to where you've gotten your ideas from. Benwing2 (talk) 09:27, 2 January 2025 (UTC)Reply

Yes, it was... No worries. I don't think I've said that, but rather that for now that's the distinction we do (or should/could) handle. 'Numeral' as a grammatical category is very useful for these languages that do have one, whereas they still have a distinct semantic category including other parts of speech. We can and should make a distinction between the two, and I think calling them by the same name will lead to much more confusion than the status quo. Thadh (talk) 09:16, 3 January 2025 (UTC)Reply

I am not convinced of this. Not all Russian numbers work the same way by any means; they range from один (pure adjective) to миллион (pure noun), with in-between numbers getting progressively more noun-like and less adjective-like. I don't know Finnish but I wouldn't be surprised things are similar. If you want to make a number vs. numeral distinction, you need to spell out when one term is used and when another is used, and what renames need to happen; otherwise I have no idea what you're getting at. Benwing2 (talk) 09:38, 3 January 2025 (UTC)Reply

All right:

- number is used to denote any member of the semantic category/ies that denote a specified amount, position etc. that can be theoretically counted.

- numeral is used to denote any member of a syntactic category of nominals that exhibits syntactic behaviour not found in other nouns and adjectives, and is typically associated with amounts that can or cannot be counted.

Since POSs already are language-specific, I can't give you hard-and-fast rules where to use the latter: just like I can't give you a way to recognise a noun or a verb or an adjective, it differs by language. However, it is clear to me that there are languages where numerals are nominals (in flectional languages, they can usually be inflected, in others, they can be used as a head of a nominal phrase) that do not syntactically behave the same way as nouns or adjectives or (if those exist) determiners, and have very specific rules of governing the head noun.

Maybe indeed the numeral 'one' (yksi) could better be analysed as an adjective, but that doesn't take away that kaksi is neither an adjective nor a noun, as it agrees in the oblique cases but takes a partitive-case noun in the nominative (non-agreement). Same thing for Russian два (dva): две девушки, but двум девушкам - partial agreement, not identical to adjectives or nouns. Thadh (talk) 09:55, 3 January 2025 (UTC)Reply

Extended Mover request: User:Rex Aurorum

Latest comment: 17 days ago2 comments2 people in discussion

Hello. I'd like to request extended mover rights, mainly to be able to fix issues: 1. Typos made by earlier editors (non-Indonesian speakers) 2. Typos made by myself (frequently made typos in certain clusters) ―Rex Aurōrum^{｢Disputātiō｣} 10:29, 2 January 2025 (UTC)Reply

Nominated at WT:WL. Svārtava (t ɕ) 16:18, 5 January 2025 (UTC)Reply

Sundanese main entries

Latest comment: 13 days ago2 comments2 people in discussion

I noticed that Sundanese entries have the main entries in the Sundanese script but according to @Udaradingin, the most common script used nowadays is the modern script. Shouldn't the main entries (and possibly also links) be moved to Latin script entries just like how Tagalog use Latin script and the Baybayin spelling is just shown as an alternative spelling? Thanks. 𝄽 ysrael214 (talk) 08:51, 3 January 2025 (UTC)Reply

I agree. There are some Sundanese entries in Latin script that lists both the definition and a redirecting link to the Sundanese script version of said entry. I think it would be great if the su-noun template has Sundanese as an alternative spelling on (as also seen in ms-noun or tl-noun). As for now, cleanup for some entries are underway. Udaradingin (talk) 10:08, 10 January 2025 (UTC)Reply

Stray Arabic-script digit entries

Latest comment: 19 days ago2 comments2 people in discussion

We have entries for the main series of Arabic-Indic digits in Unicode, and what Unicode refers to as the Extended Arabic-Indic digits

Comparison of digits
Digit	Main series	Main series language sections	Extended series	Extended series language sections
1	١	Translingual	۱	Ottoman Turkish Persian Punjabi Urdu
2	٢	Translingual	۲	Ottoman Turkish Persian Punjabi Urdu
3	٣	Translingual	۳	Ottoman Turkish Persian Punjabi Urdu
4	٤	Translingual Ottoman Turkish	۴	Persian Punjabi Urdu
5	٥	Translingual Ottoman Turkish	۵	Persian Punjabi Urdu
6	٦	Translingual Ottoman Turkish	۶	Pashto Persian Punjabi Urdu
7	٧	Translingual	۷	Ottoman Turkish Persian Punjabi Urdu
8	٨	Translingual	۸	Ottoman Turkish Persian Punjabi Urdu
9	٩	Translingual	۹	Ottoman Turkish Persian Punjabi Urdu
0	٠	Translingual	۰	Ottoman Turkish Persian Punjabi Urdu

As you can see, the main series are all straightforward Translingual entries like we have for the Latin-script digits, though a few also have Ottoman Turkish entries. The Extended series, however, are all nothing but entries for individual languages. What's more, many of them have no headword templates, and the ones that do treat them as entries for the spelled-out word that the symbol represents in that language. I did my best to fix the entries at ۱ (in the Extended series), but then I realized that there shouldn't be entries for specific languages at all, just a Translingual section at the top.

I'm not sure what the Translingual entries for these characters should look like- some, at least, seem like variants used in some Arabic-script languages, but not others. Others seem like identical glyphs that are separate due to some quirk in the early history of Unicode. There's a task listed in WT:Todo for fixing entries that use the wrong character for a given language, so there's probably a lot more to this.

I do think that all of the pages in both series should have only a Translingual section, and all the other language sections should be merged into the spelled-out versions that the digits represent (if there's anything worth keeping). I didn't see any idiomatic senses like "4" used for "for" in texting.

The main problem is that I'm not really proficient in these languages, so I'm not sure how, exactly, to fix this- but I am sure it needs to be fixed, somehow. Thanks, Chuck Entz (talk) 01:28, 4 January 2025 (UTC)Reply

You are making sense. The Ottoman entries were added by @Moonpulsar in 2023, the Persian and Urdu ones in 2006 and 2007, when rules, standards or consistency on Wiktionary were not developed in a now relevant extent, as formatting was wild. The analogy to non-Arabic-script languages suggest that we keep but translingual entries, perhaps even with hard-redirects of alternative forms.

I have seen both series of numbers in either Arabic, Persian, and Ottoman prints, a second one did not need to have been encoded by Unicode in the first place, rather than being relegated to font systems varying display by language, as say, italic б looks different depending on whether it is Serbian or Russian, and less clearly Bulgarian.

The numbers did not even have distinct names from the ones we use in Europe, once again I note that the terminology of Eastern Arabic numerals vs. Western Arabic numerals is Wikipedia’s citogenesis, with their frequent problem of citing terms added to some list but barely used, in case anyone attempts to conceive what Wiktionary has to portray. Fay Freak (talk) 02:18, 4 January 2025 (UTC)Reply

Affix template standardization

Latest comment: 17 days ago11 comments7 people in discussion

The templates prefix, suffix and confix (and their shortcuts pre, suf and con, respectively), can all be handled by affix (and its shortcut af). The template compound (and its shortcut com) can also be handed by af, although compound+ (and its shortcut com+) provides additional text that is not currently replicable with af. Both pre and suf are designated as "less-preferred" on category pages in favor of af, so it appears that af is the de facto standard. However, the other templates can still be found on many pages so converting them to af will need to be done. Once that is done, the templates prefix, suffix and confix (and their shortcuts pre, suf and con, respectively) can be formally depreciated, similar to circumfix. Netizen3102 (talk) 17:11, 4 January 2025 (UTC)Reply

What is the rationale? By analogy, changing all the for loops in a computer program into while loops reduces the number of keywords but why is that better? Makes it harder to understand. 2A00:23C5:FE1C:3701:C9DA:1ED4:BE2C:8235 17:13, 4 January 2025 (UTC)Reply

One rational is that it would be easier for NEW users. Less complexity of templates equals lower barrier to entry. The downside is that these are fairly used templates and editors who have been using them for a while will have to adjust. I still think we should try to make it easier for new people, however, even if it is in a small way. Vininn126 (talk) 17:15, 4 January 2025 (UTC)Reply

Just remember that most people know what prefixes and suffixes are, but quite a few have never heard of affixes. Chuck Entz (talk) 19:51, 4 January 2025 (UTC)Reply

What would be interesting is if we were to merge these templates, would we suddenly see threads popping up about the lack of {{pre}} and {{suf}}, asking how to deal with prefixes and suffixes? A hypothetical, to be sure, but I think an interesting one. Vininn126 (talk) 19:55, 4 January 2025 (UTC)Reply

This can be addressed by having a few well maintained perfectly formatted role model entries for each language, provided as examples for new users. Similar to how parrot is used as an example for Wiktionary:Quotations. --Ssvb (talk) 12:55, 5 January 2025 (UTC)Reply

Oppose: I don't think consolidating the complexity of these several templates into one would really make things easier for new editors. — excarnateSojourner (ta·co) 01:19, 6 January 2025 (UTC)Reply

The af template requires dashes similar to how prefixes and suffixes are traditionally written, whereas pre, suf and con do not. For example, (af) un- +‎ do and (pre) un- +‎ do both produce the same output, but pre does not require the dash after the prefix, which could be confusing to editors. Netizen3102 (talk) 17:22, 4 January 2025 (UTC)Reply

I also personally find it easier to keep track of affixes when the dashes are present - the other prefixes do ALLOW for dashes, but they are not required. Vininn126 (talk) 17:25, 4 January 2025 (UTC)Reply

I'm not convinced at all. Continuing the for/while analogy, that's like pointing out that while only needs a single expression and doesn't require semicolons or a "stepwise rule". Sure! But that's why we don't use it for everything. Because humans are not instruction sets, and benefit from context. I'm sure there are ways to DRY it by having one template call into another. Reducing everything to eventual Turing tape is nasty. 2A00:23C5:FE1C:3701:C54E:F82E:FAA1:E7A5 05:54, 6 January 2025 (UTC)Reply

Fewer varieties of different templates make Wiktionary more machine readable, even though I'm not sure whether this is considered to be a desirable goal. --Ssvb (talk) 19:11, 4 January 2025 (UTC)Reply

Category:Ojibwe stem-building elements

Latest comment: 3 days ago5 comments4 people in discussion

There are a few Ojibwe entries that keep showing up in WT:Todo lists because they're not in Category:Ojibwe lemmas or Category:Ojibwe non-lemma forms, and lots more that don't show up in the lists, but have similar problems.

First, some background: as with many American Indian languages, Ojibwe is polysynthetic, meaning that it uses mostly complex systems of morphemes bound together instead of separate words. That makes it hard to analyze Ojibwe grammar using the categories established for the better known European languages. There are prefixes, suffixes, infixes, and circumfixes that attach, not just to a central root or stem, but also to each other in very complicated ways.

Apparently Ojibwe carries this even farther by having stems that are made up of separate sub-elements: initials, medials, and finals, as explained on the page for Category:Ojibwe stem-building elements. These aren't completely arbitrary: they each have specific roles and carry specific types of information.

For 5 months in 2020, @SteveGat spent a great deal of time expanding our coverage of Ojibwe, but in ways that never really got integrated into our POS headers and categories. I would like to do that part now.

The question is: how should we do that. I can see a few approaches:

Make initials, medials and finals into prefixes, suffixes, and/or infixes
Make all of them just plain morphemes
Add them to the modules as lemmas

For the first two options, we would want to have secondary categories to preserve the information. These entries already have secondary categories such as Category:Ojibwe noun finals and Category:Ojibwe verb finals and tertiary categories attached to those. For the third option, we would want to also integrate the new lemma types into the category-tree modules so the categories can use {{auto cat}}. For that matter, we could do the same for the secondary and tertiary categories no matter what we do with the rest.

I should also mention Category:Ottawa initials, which indicates that there are probably more languages with similar issues that I don't know about. That one shows up in Wiktionary:Todo/Lists/Uncategorised pages (all namespaces)#Category, but there may be more with categories added by hand. Chuck Entz (talk) 19:38, 4 January 2025 (UTC)Reply

I forgot to ping @-sche, who knows a lot more about Algonquian languages like this one than I do. Chuck Entz (talk) 19:40, 4 January 2025 (UTC)Reply

That's also a feature of other Algonquian languages like Cree (however that language(s) happens to be treated on Wiktionary), though Ojibwe is the one with the most content. Circeus (talk) 17:21, 5 January 2025 (UTC)Reply

@Circeus: Most of the language is treated under Plains Cree, although several dialects have their own categories. The macrolanguage is a leftover that for some reason still hasn't been deleted, even though it's a hazard. Thadh (talk) 14:56, 20 January 2025 (UTC)Reply

I can't say I know a lot about Ojibwe but in general I would prefer to try and fit things like initials, medials and finals into existing categories like prefixes, suffixes and infixes rather than just use the language-specific terminology directly. This latter approach, in the extreme, leads to a proliferation of lemma types that is singularly unhelpful, e.g. as was done with Lojban, where someone added Lojban-specific lemma types cmavo, cmene, fu'ivla, gismu, lujvo and rafsi to Module:headword/data. I and most people can't tell a gismu from a Ginsu knife, making these terms completely opaque. I went through a year or two ago and tried to rewrite the opaque Lojban grammatical terminology into the most similar comprehensible term, hence the terms in the category Category:Lojban gismu now have a header "Root" instead of "gismu"; similarly "Predicate" in place of "lujvo"; etc. The actual categories haven't yet been renamed but should be. IMO if there's a one-to-one mapping between initial <-> prefix, final <-> suffix, etc. there is no need to have the same term categorized into both CAT:Ojibwe prefixes and CAT:Ojibwe initials (just use the former), but if there is some extra information in the CAT:Ojibwe initials category, I am not averse to having the term categorized both ways. I can add the {{auto cat}} support for language-specific (or family-specific) terminology like "initials", "finals", etc.; this is not hard as the underlying functionality for language-specific categories is already present. The only other thing I'd add is that we have nastily-named categories in Special:WantedCategories like Category:Unami animate intransitive (vai), Category:Unami verb transitive inanimate and Category:Unami inanimate intransitive verb. I remember having a discussion with someone (probably the same SteveGat) about putting these into separate Category:Unami animate verbs and Category:Unami intransitive verbs categories; he eventually convinced me that there is a reason for combining them, as apparently a "transitive inanimate" verb is a different beast from an "inanimate intransitive" verb, not just the transitive equivalent. But these definitely should use the Wiktionary standard naming format Category:Unami inanimate intransitive verbs and such, not Category:Unami verb inanimate intransitive or some other weirdness, even if the latter is the standard format used in the grammar of these languages. Benwing2 (talk) 06:07, 6 January 2025 (UTC)Reply

How about...

Latest comment: 17 days ago13 comments8 people in discussion

creating a template called "all", that can do everything? You just need to know what to put in the parameters, as described in the template documentation subpages (we're limited to 2 MB per page, so there would have to be a number of them). In other news, there's a new Swiss Army Knife™ that can do thousands of different things. The only problem: with all the attachments, it's over a meter wide...

There's a certain amount of complexity inherent in any given task. The question is, how do we distribute it?

With lots of templates, we don't have to know as much to do any one task. With fewer templates, we have to know about more things to do one task, but if we do multiple tasks, the information is in fewer places.

We should be thinking about what tasks go together, and have one template that does the things that go together, but multiple templates to do things that don't.

Also, we need to think about the range of things the individual user deals with: someone who edits Mandarin Chinese needs to know about Han characters, tones, and various particles like classifiers, but not affixes or grammatical gender. Someone who works with most European languages, on the other hand, needs to know about the morphology for things like cases, gender, number, mood, voice, tense, aspect, etc. Someone who works with Celtic languages needs to consider the interactions in sounds between syllables, separate words, and even sentences, while a Hawaiian doesn't really encounter synchronic phonological changes in some vowels, and nothing at all in consonants.

Considering this, we should think about whether all of those people need to use the same templates for everything. Yes, we have specialized templates for specific languages that do extra things, but we should also think about whether to have templates for specific languages that do less so users don't have think about as much. We've been deleting lots of language-specific templates that can't do things that the general templates can, but also can't do anything that the general templates can't. The question that doesn't get asked is: are the things that the templates can't do things that editors in those languages will want to do.

Another thing to think about: knowing that a template called "xyz-noun" is all you need for headwords in language-xyz noun entries should make it easier to get started in that language. If it doesn't have features needed for that language, you can always use {{head}}, or learn to customize it. It's also nice to have things that are just for you and your community of editors.

That's not to say that such things should be used as barriers to keep others out or as a way to claim ownership over language entries or anything else. All of the things I mentioned above should be considered as needed, but shouldn't override all the other things we already look at- I'm talking about broadening the discourse, not replacing it. Chuck Entz (talk) 21:45, 4 January 2025 (UTC)Reply

How about we not be so reliant on templates? There's too many templates as is and they change way too frequently. (And while we're at it, you shouldn't be required to code to create or edit a category). Purplebackpack89 16:12, 5 January 2025 (UTC)Reply

Lua may have allowed incredible flexibility in templates, but it has also made them impossible to edit for 99% of people. I do not consider this a good thing. Circeus (talk) 17:23, 5 January 2025 (UTC)Reply

Plus, any given template relies on a stack of dependencies that is completely impenetrable. I've given up trying on many templates and modules and I'm more knowledgeable than your average person (but still pretty ignorant about programming). —Justin (koavf)❤T☮C☺M☯ 17:24, 5 January 2025 (UTC)Reply

Hear, hear DCDuring (talk) 17:39, 5 January 2025 (UTC)Reply

Lua is by far nicer, cleaner and easier to read than wiki templates. Complex templates are way too cryptic. --Ssvb (talk) 21:27, 5 January 2025 (UTC)Reply

The closest thing I've ever come up with is a universal definition template. I don't understand the bellyaching here, where we already have to deal with tons of functions, regardless of language (those that think you can't are sorely mistaken) - on the otherhand, I'm not sure you can create something THAT universal, at least not in one fell swoop. Vininn126 (talk) 17:50, 5 January 2025 (UTC)Reply

I think the point is that templates shouldn't be designed for those who put in more than 30 hours a week on Wiktionary and/or have IQs over 200. {{en-noun}} is wonderfully powerful, but even high-volume contributors have had trouble with the keystroke-saving features using "+", "-", "~", not to mention the complexities of auto-pluralization. Why should users have to consult the documentation every third time they try to use the template? DCDuring (talk) 19:43, 5 January 2025 (UTC)Reply

Sweet Heaven, you are singing my song. At the very least, there's no reason to not have more intelligible fall-back aliases like "pural=[foo]" or something for a normal person who casually edits. If you've ever edited SVGs at c: and tried to use c:Template:Valid SVG and its successor Templates, it's completely infuriating how clipped and counter-intuitive all the inputs are. —Justin (koavf)❤T☮C☺M☯ 19:58, 5 January 2025 (UTC)Reply

@Chuck Entz What is this in reference to? Is there something specific you're annoyed about? Benwing2 (talk) 21:33, 5 January 2025 (UTC)Reply

I'm going to take a wild guess and say that it is not a matter of one or a few templates or even one or a new types of templates, but rather an attitude toward usability and the population of potential contributors. DCDuring (talk) 03:04, 6 January 2025 (UTC)Reply

The downside is that the "xyz-noun" templates for inflected languages are enormously difficult to use and have a steep learning curve. Very few of the new editors can use them correctly on their first try, so their initial edits tend to need corrections. At least that's what I observed when looking at the new Belarusian entries added by new people. And I suspect that many potential new editors probably just give up rather than contributing incorrect edits if they notice problems in the previews of their edits. --Ssvb (talk) 21:42, 5 January 2025 (UTC)Reply

@Ssvb I agree that many of the xyz-noun templates are complex, but I'm not sure there's anything much that can be done about this. The root of the issue is that your typical inflection system is itself quite complex, and if you want to support the system fully, the template itself will necessarily be complex. One alternative is only to support the most regular inflections, but (a) most people are more interested in the harder, less regular words, which also are usually the most common words; and (b) the templates are already designed (at least the ones I've designed) so they have sensible defaults in most cases that make it relatively easy to specify the inflection of words with regular declensions or conjugations. Another alternative is to require people to specify a lot more information manually in the case of irregular inflections (e.g. just type out the entire inflection by hand); on the surface that may make it easier to enter for a native speaker who knows the inflection but doesn't want to or can't figure out the syntax of something like {{be-ndecl}}. But in practice (a) it's extremely tedious, with the result that a lot of words never get inflections; (b) it leads to lots of mistakes. Whenever I design a new template for entering the noun, verb or adjective inflection of language Foo and convert old template uses, I invariably find tons of mistakes due to bad design in the previous template where too much info has to be given manually. So I'm not really sure what a better approach would be. Benwing2 (talk) 05:38, 6 January 2025 (UTC)Reply

th-cls

Latest comment: 17 days ago16 comments4 people in discussion

(Notifying Alifshinobi, Octahedron80, YURi, Judexvivorum, หมวดซาโต้, Atitarev, GinGlaep, RichardW57, Noktonissian):

I've made an inline classifier template for Thai similar to {{zh-mw}}. Here's an example of what it looks like:

(Classifier: ลูก (lûuk); ใบ (bai); ผล (pǒn); หวี (wǐi); เครือ (krʉʉa))

Are there any objections to me moving it into mainspace / any feedback? - saph ^_^^⠀talk⠀ 00:28, 6 January 2025 (UTC)Reply

I already made {{cls}} that can be used by many languages, not only Thai. (Tai languages and Vietnamese also use classifier.) I oppose to make template for only Thai. You should expand into this template instead. (It is used a lot at thwikt) --Octahedron80 (talk) 01:07, 6 January 2025 (UTC)Reply

I'm not sure I would agree that this is a situation where a one-size-fits-all template is ideal, especially not a wikitext template. {{th-cls}} has automatic translit where {{cls}} does not, for one. - saph ^_^^⠀talk⠀ 01:13, 6 January 2025 (UTC)Reply

Because I add tr=- to prevent translit since it results to many parentheses. It is no need to show them all. --Octahedron80 (talk) 01:17, 6 January 2025 (UTC)Reply

See th:อาทิตย์ th:ຄຳ th:ᦋᦲᧃᧉ th:ကျား for example. --Octahedron80 (talk) 01:25, 6 January 2025 (UTC)Reply

Didn't notice that, fair enough. I'll wait for other people to comment. - saph ^_^^⠀talk⠀ 01:20, 6 January 2025 (UTC)Reply

@Saph I support @Octahedron80's view that we should have a single language-independent {{cls}} template, since there are a lot of languages with classifiers and otherwise we'd end up with a proliferation of incompatible and subtly different templates. This template can have language-specific behaviors for certain languages if it makes sense to do so, e.g. we could make the default transliterating and turn it off for certain languages. (IMO however, transliteration should usually be enabled, since most non-Latin scripts are unfamiliar and hard to read for the average Wiktionary user; it might make sense, for example, to turn off translit in some circumstances for Greek and Cyrillic, which aren't so hard to read and with which many people will be familiar, but for most scripts transliteration is helpful. If the issue with transliteration is display-related, we should be able to come up with a display format that works better.) If Thai needs some special behavior of some sort, that could be supported under the hood in {{cls}}.

@Octahedron80 My main complaint about {{cls}} is not its implementation but the default positioning before the headword. This is nonstandard (we usually put labels and other information after the headword) and IMO looks bad. If you're OK with it, I can do a bot run moving the {{cls}} invocations after the headword. Benwing2 (talk) 05:27, 6 January 2025 (UTC)Reply

~~No. Don't do that.~~ Originally we put classifier(s) after th-noun (and lots of Tai's noun headword). But there are many cases that it cannot share the same classifer(s) with other senses, or some senses cannot have classifier at all. So the template cls is born to add classifier per sense (just like zh-mw you know; what is mw anyway?). --Octahedron80 (talk) 05:49, 6 January 2025 (UTC)Reply

About transliteration, you can make it turn on or off tr display as you like. By the way, the zh-mw doesn't show pinyin, so I just follow that. --Octahedron80 (talk) 06:10, 6 January 2025 (UTC)Reply

About Tày language, the template is not tended to be used with Tày before headword, but someone is already widely using it. And I cannot make them off. Their classifiers should integrate with its Tày tyz-noun, like Vietnamese vi-noun. See bó for comparing. If tyz-noun support classifier by itself, so we can remove cls there. --Octahedron80 (talk) 05:54, 6 January 2025 (UTC)Reply

@Octahedron80 You are misunderstanding me. I'm not objecting to putting classifiers per sense, following the sense definition. What I'm objecting to is putting the classifier directly *before* the headword. If it goes on the headword line, it needs to follow. So I'm suggesting moving {{cls}} uses from before the headword to after the headword. BTW this is largely with Vietnamese, not with Tày or Thai. If it's better to not have it on the headword line at all, but instead on a sense line, that's fine, but I can't do that by bot; in the meantime it's better to have the classifiers after the headword than before. And since I assume the issue with per-sense classifiers occurs with all languages using classifiers (since classifiers are essentially semantic-based), so I don't see how it's useful to integrate classifiers into the headword. BTW "mw" means "measure word". See measure word and classifier on Wikipedia. Benwing2 (talk) 06:13, 6 January 2025 (UTC)Reply

Tày and Vietnamese use classifier before noun (same as Chinese), unlike other Tai languages that use classifier after number and noun. Do Wiktionary need to show classifier in headword before noun? If you asked Vietnamese users, they would say yes I guess. --Octahedron80 (talk) 06:28, 6 January 2025 (UTC)Reply

Whether the classifier comes before the noun or after the noun in the grammar of the language has nothing to do with where we should put the classifier in the headword. All headword-related information always goes after the headword itself. There is no other situation that I know of where we put any headword-related information before the headword. Thus, putting the classifier before the headword is highly nonstandard and looks really awful (IMO) and janky. So it's important we move its position. Benwing2 (talk) 06:34, 6 January 2025 (UTC)Reply

Okay. You can move cls to end of Tày headword at first, until we can make tyz (and vi?) templates better. --Octahedron80 (talk) 06:39, 6 January 2025 (UTC)Reply

@Octahedron80: Aside: I think classifier before noun is quite common amongst Tai languages in northern regions - quite possibly alignment with Chinese. --RichardW57 (talk) 08:18, 6 January 2025 (UTC)Reply

If we wanted to do per-language transliteration (/per-language turning off transliteration), would we keep it as wikitext? That seems like it would make the template a lot less readable. - saph ^_^^⠀talk⠀ 11:33, 6 January 2025 (UTC)Reply

Category:Artsakh and subcats

Latest comment: 14 days ago11 comments6 people in discussion

Are these needed anymore? The Republic of Artsakh dissolved a year ago. 115.188.138.105 11:16, 6 January 2025 (UTC)Reply

Cf. Category:Soviet Union. The words still exist or existed in regular use. Why would we delete this category? —Justin (koavf)❤T☮C☺M☯ 11:19, 6 January 2025 (UTC)Reply

To OP's point, the category descriptions are worded as if Artsakh still exists. - saph ^_^^⠀talk⠀ 12:43, 6 January 2025 (UTC)Reply

What? His point was about the existence of the category, not the wording of the description. No one needs to start a conversation about modifying the module's wording (which I will do now). —Justin (koavf)❤T☮C☺M☯ 12:54, 6 January 2025 (UTC)Reply

https://en.wiktionary.org/w/index.php?title=Module%3Aplace%2Fshared-data&diff=83485136&oldid=83484470 —Justin (koavf)❤T☮C☺M☯ 12:58, 6 January 2025 (UTC)Reply

Well, theres a Category:Rivers in Artsakh but no corresponding Category:Rivers in the Soviet Union or other toponym categories. 115.188.138.105 20:19, 6 January 2025 (UTC)Reply

Sure, but the premise is "this place no longer exists (i.e. the state was dissolved), therefore, should we delete the categories?" and the answer is "no". There may be some subcats that shouldn't have existed or should be deleted, but that's not because the breakaway republic has been reintegrated into Azerbaijan. —Justin (koavf)❤T☮C☺M☯ 11:19, 7 January 2025 (UTC)Reply

By that argument we should have categories for all polities that have ever existed. Category:en:Rivers in the Aztec Empire? I'd rather just have categories for currently-existing polities, as well as those which are of particular historical significance to particular languages (Category:la:Towns in the Roman Empire?). This, that and the other (talk) 12:00, 7 January 2025 (UTC)Reply

By what argument? My argument was "there are enough words about [topic] to have a category", so yes, if there are enough words about that topic, go for it. Why would polities be any different than sports or political movements or breads or any of the other things we have categories about? —Justin (koavf)❤T☮C☺M☯ 12:06, 7 January 2025 (UTC)Reply

As a general rule, geographic features such as rivers, cities, etc. are categorized according to current political boundaries. So there should be no Category:Rivers in Artsakh any longer. Possibly an exception could be made for cities that no longer exist, but I'm skeptical of that, and rivers usually don't come and go, so there's no reason to put anything in Category:Rivers in Artsakh. Benwing2 (talk) 00:10, 9 January 2025 (UTC)Reply

The river categories can go but village and city categories should stay, because the invaders have either destroyed or renamed them. For example, Karin Tak belongs in the Category:en:Villages in Artsakh because it was a village only when Artsakh existed. Now neither the population, nor the village nor the name are there anymore. Vahag (talk) 08:37, 9 January 2025 (UTC)Reply

Para-Nakh languages

Latest comment: 16 days ago5 comments3 people in discussion

I would like to discuss here the addition of a reference to the Para-Nakh languages to the etymology for the Nakh languages. In my opinion, it should work the same way Ancient Greek forms refer to a pre-Greek substrate. An explanation of this is given in detail in Johanna Nichols' work {{R:cau-nkh:Nichols:2004}}. Particularly from page 145 onwards. I think this option works well for explaining forms with phonologically close but irregular correspondences. For example, (1) Ingush ӏаж (ˀaž, “apple”), Chechen ӏа̄ж (ˀaaž, “id.”); (2) Ingush нихь (niḥʳ, “hide, animal skin”), Chechen неӏ (neˀ, “id.”); (3) Ingush зӏамига (zˀamiga, “little, small”), Chechen жима (žima, “id.”); (4) Ingush чил (čil, “ashes”), Chechen чим (čim, “id.”); (5) Ingush муа (mwa, “scar”), Chechen мо (mo, “id.”); (6) Ingush миинг (miı̇ng, “alder”), Chechen маъ (maʔ, “id.”), муъ (muʔ), Bats მურყაჼ (murq̇ã, “id.”) → Georgian მურყანი (murq̇ani, “id.”) as a suggestion from user:კვარია; (7) Ingush шуа (šwa, “abomasum”), Chechen шуа (šwa, “id.”) and their doublet forms with normal development Ingush шоа (šoa, “id.”), Chechen шо (šo, “id.”) as my example. Still, I think it would be wrong to reconstruct the Proto-Nakh form on the basis of these irregular daughter forms. So it was very much not wanted to get a situation like, for example, with Proto-Finnic *omëna (“apple”), where the daughter forms have no regularity. If you have a better idea on how to handle it here, please let me know. @Vahagn Petrosyan, კვარია, Tollef Salemann, Tropylium, Chuck Entz, Thadh, Fay Freak, Surjection ɶLerman (talk) 15:36, 6 January 2025 (UTC)Reply

Just to be clear, do you propose simply making a code for a pre-Nakh substrate? Thadh (talk) 16:08, 6 January 2025 (UTC)Reply

@Thadh Yes, that's right, although Nichols doesn't have that. ɶLerman (talk) 16:16, 6 January 2025 (UTC)Reply

Why can't you simply say "borrowed from a {{bor|ce|qfa-sub}} language"? That would put the term into Category:Chechen terms borrowed from substrate languages. By the way, I consider Category:Ancient Greek terms borrowed from a Pre-Greek substrate redundant to Category:Ancient Greek terms borrowed from substrate languages. I don't believe people who claim they can distinguish between different substrate sources within the same language. Vahag (talk) 17:18, 6 January 2025 (UTC)Reply

Ok, I'll try to use this template, thanks. ɶLerman (talk) 10:53, 7 January 2025 (UTC)Reply

Splitting WT:RFVE?

Latest comment: 8 days ago17 comments8 people in discussion

This page is one of the slowest-to-load high-usage pages we have. Despite User:Pious Eterino's best efforts, it has been above 700K almost all the time since 12/7/24. It would help to find ways to split it. I can imagine three basic ways:

by whether or not another dictionary has the challenged definition.
by whether or not the challenged definition is labeled as restricted geographically (or otherwise?).
by whether the challenged definition is hard to cite because it is for a term that is highly polysemic.

I don't know which one is the best to start with. The first might encourage people to at least look at a few other dictionaries (I like to use OneLook.com for convenient access to multiple dictionaries but OED is an obvious resource for those who have access. The second would encourage those with familiar with the restricted domain to focus their efforts on those areas. The third would be useful for isolating terms that should have long dwell time in RfV.

It might also be useful to use categories and subcategories of RfVed items to isolate, say, challenged UK- or Commonwealth-specific definitions and those with other attributes suggested above as bases for splitting the page. Such a category system could be applied in other languages as well. DCDuring (talk) 22:14, 7 January 2025 (UTC)Reply

In my view the problem is that the rate of new RFVs exceeds the capacity of cite-seekers to process the requests.

I would point the finger particularly at an IP-hopping user who has, in recent weeks, been posting large volumes of words from Webster, without (as far as I can tell) making any effort to assist with other requests. I believe it is incumbent on such users to help out by looking for cites for RFVs posted by others on the page. Perhaps we need to be a bit heavy-handed in making this into a proper obligation and enforcing it, a bit like Wikipedia's "quid pro quo" system for "did you know" entries on their Main Page.

I'd prefer to try this before splitting the page. But if a split is absolutely necessary, I think the best way would be to create a (hopefully temporary) subpage WT:Requests for verification/English/Old, which would be for RFVs of {{Webster 1913}} words, words/senses marked (obsolete) and the like. This, that and the other (talk) 23:30, 7 January 2025 (UTC)Reply

@This, that and the other: I like that idea. Alternatively, we could impose a rule of, say, no more than two nominations a day. (In the case of nominations by IP addresses, nominations originating from the same IP range would be deemed to be from the same editor.) Also, I have asked this IP to sign and date nominations. If this request is ignored, I feel the nominations should simply be removed. — Sgconlaw (talk) 23:41, 7 January 2025 (UTC)Reply

"incumbent on such users to help out" — I don't think that forcing Wonderfool to cite and close RFVs will end well. He'll just make stuff up etc. I like the rule idea though. Anyone can lazily RFV something without doing any research. 2A00:23C5:FE1C:3701:CC32:2372:B6E0:DBC7 20:42, 9 January 2025 (UTC)Reply

I think that the lowest priority need for verification are words from very old dictionaries and words that can be found used by a "significant" author from history but so far no others. Yes, some truly dictionary-only words no doubt exist, but most words in old dictionaries probably have an original basis in fact. Also, the further back you go, the less and less searchable material we have, so the three-citation rule in practice becomes more stringent. I wouldn't mind a policy relaxation of the three "independent" cites rule for single-author words from hundreds of years ago. It is somewhat of thankless task looking for some of these, and then if we don't find any further examples, what? We delete the entry? Does that really benefit anyone? Better to mark them "only known in X", I would say. Mihia (talk) 20:50, 10 January 2025 (UTC)Reply
RFVs for old words are certainly lower-priority in some sense (hence why I think shunting to a subpage would work, if a split was really necessary).

Not sure if I can agree with your comments about keeping old hapaxes. We'd then have trouble deciding what was a legitimate use and what was simply a single author's error for another word, typo in EEBO etc. Even for the correctly written words, I'm not sure that a hapax of, say, Browne is any more valuable than a hapax made up by a modern author. Who, besides us, is reading Browne today anyway? His failed coinages are just clutter.

The situation is different for famous and enduring works of literature (Shakespeare etc). I was sad to see the "one use in a well-known work" rule abolished, and all the Shakespearianisms subsequently deleted, as these are entires that people actually use. (There's one at RFVE now: solidare.) I would have preferred to see the rule tightened to "one use in a work of enduring literary interest found on a community-agreed list of such works on the language considerations page". This, that and the other (talk) 22:32, 10 January 2025 (UTC)Reply

@User:This, that and the other IMO we have plenty of uncited, low-quality English entries, so limiting the number of RfVs seems likely to guarantee ever-declining quality of our English dictionary.

"incumbent on such users to help out" Do we want to have s "social credit" system to evaluate contributors, with only sufficiently net-positive contributors allowed to use credits to make RfVs? I think not.

Our rules allow items that have been in RfV for 30 days without sufficient citations to be deleted. Only about 20% of the RfV page is from items added after November 30, 2024. Most discussions peter out after a week. Perhaps the best thing would be to give User:Pious Eterino two hands: 1., a round of applause for his efforts to move resolved items off the page and, 2., assistance in those efforts. We should also acknowledge and follow the efforts of User:-sche and others to try to bring older items to a conclusion.

Some of the older unresolved items may be there because of unresolved policy issues, eg, Discordian#Etymology 2 (durably archived media). But many just seem to need decisiveness. DCDuring (talk) 23:05, 10 January 2025 (UTC)Reply

I do not agree that entries that are, on the face of, plausible, i.e. excluding "obvious rubbish", should be automatically deletable after 30 days simply because no citations have been added. It could be that they were listed in error and nobody has looked at them. I think there should be some kind of check or safeguard, even with many listings and limited resources. Mihia (talk) 23:40, 10 January 2025 (UTC)Reply

Further on this point, although my view is that few people actually bother to read instructions, I do think that the RFV pages should, even so, in the blurb at the top, ask editors to record negative findings, and even if this just repeats what someone else has said. I wonder whether everyone does this. If after 30 days we have two or three experienced editors saying "couldn't find anything", then we can be more confident about deleting it. Mihia (talk) 21:03, 11 January 2025 (UTC)Reply

That wouldn't hurt. I can't say I record that I can't find anything, unless I've searched really thoroughly. Andrew Sheedy (talk) 04:38, 13 January 2025 (UTC)Reply

I assumed that I wouldn't be able to edit that text myself, but actually I can, so I have added the following:

Recording negative findings: Editors who make a fair effort to find citations but fail to do so should state their negative result on this page (even if it only repeats another editor's negative result).

Mihia (talk) 19:48, 14 January 2025 (UTC)Reply

I do agree with you (This, that and the other, I mean) about a single author's errors, typos, totally made-up words used only once etc., that these don't have greater value just because they are in old books. I guess what I was referring to are words that in all probability did exist with that meaning at the time, just that because of paucity of material we can't verify this -- a bit like the words in old dictionaries that "probably" existed. For words in these categories, if we can distinguish them, I would support applying the benefit of the doubt, even if this means we risk including a small amount of junk. Mihia (talk) 20:55, 11 January 2025 (UTC)Reply

To me, it's odd that we include reconstructions, but not these types of words. I would be in favour of including such words with a disclaimer in the entry (along the lines of the defunct {{LDL}} template that's being discussed below). And as mentioned above, I think we should have all words in exceptionally famous and widely studied works, like Shakespeare (again, with a disclaimer/notice). Andrew Sheedy (talk) 04:38, 13 January 2025 (UTC)Reply

The size of RFVE has not always been so massive, see this graph that shows when it really started to blow up. Ioaxxere (talk) 04:55, 13 January 2025 (UTC)Reply

Someone certainly had a "spring clean" at the start of 2023! Mihia (talk) 10:29, 13 January 2025 (UTC)Reply

Could a major source of the problem be that, as Wiktionary covers more of the English language, the average quality and degree of editor interest in new entries is declining? That would suggest policies aimed at new entries, new L2s, and new definitions. Examples would be:

requiring all new definitions to have
1. at least one citation (or two or three) and/or
2. per-definition references to other dictionaries.
reviewing all new definitions before they "go live".
using filters to identify (and exclude?) new definitions from IPs
directing some classes of users to the entry talk page and then actually following up on proposed new definitions.

As new definitions are necessary to follow the expansion and evolution of English and to fill in missing definitions, any policy should be not be so hard to follow that we excessively reduce desirable new content or cause would-be contributors to circumvent the restrictions by editing existing definitions to convert them into their new definition. DCDuring (talk) 14:19, 13 January 2025 (UTC)Reply

Some of my devices are so old that they are not capable to download this page at all. And these devices are not so old compared to some of devices used by many other people. Tollef Salemann (talk) 21:07, 14 January 2025 (UTC)Reply

Analysis of words in terms of Pali roots

Latest comment: 12 days ago10 comments3 people in discussion

Is it legitimate to present an analysis of Pali words inherited from Old Indic in terms of Pali roots? For example, text books on Pali will present many participles sensu lato as being root + -ta or root + -ya, even though they have been inherited from Sanskrit. I believe that these are worthy as inclusion as surface analyses at the very least.

@Pulimaiyi has objected, «It has been brought to my attention that you have been creating Pali roots - some of which, have a questionable form - and then using these to synchronically derive terms which are clear-cut cases of inheritance from Old Indo-Aryan and as such cannot be analysed as intra-Pali derivations. This is very misleading. Case in point: satta and sakka. How can sakka, for instance, be categorised or analyzed as a "-ya" formation if it is indistinguishable from another hypothetical form "sakka", which could hypothetically derive from a hypothetical adjective *śakra (which would be a "-ra" suffix adjective)? These derivations were not done at the Pali-level. sakka is inherited from a "-ya" formation, it is not itself a "-ya" formation. Please desist from such edits. Thanks. -- 𝘗𝘶𝘭𝘪𝘮𝘢𝘪𝘺𝘪(𝘵𝘢𝘭𝘬) 17:07, 7 January 2025 (UTC)»Reply

In so far as sakka is indeed a gerundive (aka future passive participle), this analysis is legitimate. Multiple ancestries are possible, and can be noted in the etymology section. Kindly curb your objections to the presentation of Pali internal analyses, but rather enmhance etymology sections if appropriate. So the short answer to your request is 'no'm but I will heed a consensus. --RichardW57 (talk) 15:14, 8 January 2025 (UTC)Reply

There was some debate regarding another word on surface analysis in terms of Pali roots, but sakka and satta are exceptionally clear-cut non-surface-analysable terms. Their function as participles is duely documented on the respective pages; but that is not any reason to be necessarily be able to analyse them using the participle-forming suffixes inherited from Sanskrit - as a result, I have removed the surface analysis from these two pages. Svārtava (t ɕ) 18:01, 8 January 2025 (UTC)Reply

sakka cannot be analysed as sak + ya. Where is the -ya component in it that the etymology claims? -- 𝘗𝘶𝘭𝘪 𝘮𝘢𝘪𝘺𝘪^{(𝘵𝘢𝘭𝘬)} 18:45, 8 January 2025 (UTC)Reply

See Duroiselle^[1] for roots dā to kam and Buddhadatta^[2] for more of what happens to -ya. The commonest pattern, as in present (as in the third conjugation where div yields dibbati) and passive stem formation, is that the 'y' totally assimilates to an immediately preceding consonant. With dentals, it merges to form a geminate palatal. Are you just being obstructive? --RichardW57 (talk) 00:36, 10 January 2025 (UTC) RichardW57 (talk) 00:36, 10 January 2025 (UTC)Reply

@RichardW57: The pattern you say is an observation-based mixture of inheritance sound changes and Sanskrit combining rules (e.g. शक् (śak) + -त (-ta) = शक्त (śakta) -> Pali satta). This may be done by those studying grammar withing Pali as an observation, but this shouldn't allow the inclusion of surface analysis. As an example, any Prakrit grammarian can make a rule from observation that while forming compounds, word medial -p- "assimilates" into -v- and surface-analyse 𑀲𑀯𑀢𑁆𑀢𑀻 (savattī) as 𑀲- (sa-) [ < Sanskrit स- (sa-) ] + 𑀧𑀢𑁆𑀢𑀻 (pattī), which I'm sure is uncontroversially undesirable and unuseful. Svārtava (t ɕ) 06:38, 10 January 2025 (UTC)Reply

@Svartava: You're wrong. If it was obvious to Prakrit speakers, it's not unhelpful. And I feel modern students of Pali are expected to recognised suffixed -ya in most of its forms rather than recognise the suffixed forms in their own right. (-yir- for -r + y- may be an exception.) And automatically mentally restoring the assimilation of final stops of roots of words in context cannot be too difficult - the Indic scripts of the Philippines used not to write final consonants. --RichardW57 (talk) 16:14, 10 January 2025 (UTC)Reply

@RichardW57: It might have been obvious at some point before it changed even more into forms like 𑀲𑀉𑀢𑁆𑀢𑀻 (saüttī). However, the understanding comes from an understanding of inheritance sound changes only, not surface-analysability. Svārtava (t ɕ) 16:20, 10 January 2025 (UTC)Reply

@Svartava: I think you underestimate the human ability to interpret speech. --RichardW57 (talk) 17:35, 10 January 2025 (UTC)Reply

@Svartava: Note that the purpose of Duroiselle's work is not to explain how Pali came about, but to help one understand it. Unlike Buddhadatta's work, it's not even aimed at helping one to write or speak it. For example, he gives guidance on interpreting an aorist, but not how to form it. --RichardW57 (talk) 17:33, 10 January 2025 (UTC)Reply

References

^ Charles Duroiselle (1921) A Practical Grammar of the Pali Language (overall work in English), Rangoon, section 472
^ A. P. Buddhadatta Thera (1956) The New Pali Course: Part II, 4th edition (overall work in English), Colombo, section 144, page 176

Exceptional behavior for modern Greek?

Latest comment: 13 days ago4 comments3 people in discussion

A lot of modern Greek pages do things differently from other languages, usually for no clear reason that I can see, e.g.:

Many uses of {{col}}, {{col2}}, etc. set |sort=0 and |collapse=0.
Many terms in {{col}}, {{col2}}, etc. manually disable transliteration.
Modern Greek and Ancient Greek seem to be essentially the only users of {{see}}, which is used heavily in these two languages, esp. modern Greek.
Several pages do unusual things like {{l|el|αγριοκοιτάω|αγριοκοιτάω/αγριοκοιτώ|t=}}, {{l|el|αγριοκοιτάζω}} (instead of just e.g. {{l|el|αγριοκοιτάω}}/{{l|el|αγριοκοιτώ}}, {{l|el|αγριοκοιτάζω}}).

The page κοιτάζω illustrates the first three, and κοιτάω illustrates (1) and (4). I am in the process of cleaning up {{col}} and variants and I'm going to fix (1) and (2) pending clear reasons why these things should remain. (1) requires manual auditing to see whether any invocations of |sort=0 should stay, but I expect there to be few cases of this. @Sarri.greek @Saltmarsh Benwing2 (talk) 00:24, 9 January 2025 (UTC)Reply

Dear @Benwing2, Happy 2025! Please do as you wish so that we can copypaste your final style to our cheatsheets. I see that you discuss your changes at Discord. Just to explain:

_1 I used to copypaste it from an older style found in 2019. collapse=0 when the reader should view the whole table (without the silly-hide of few lines). Later, I found the Template:topx handling columns ad libitum, now i see top2, 3 etc.- which does not use pipes, but normal links with asteriscs: they can be copypasted easily, here and at other wiktionaries. I was about to use it at all Related sections, but if you do not like it, please use the pattern that is advisable to copypaste everywhere.

_2 tr=- for repetitive similar transliterations; but if a reader wishes, repeat them.

_3 Template:see, used heavily at Related section. When the Related section is “polyplethes” -more than 5 words- we do not repeat it at each of its members. At a member (a derived or related word) we give links for its closest Rels +all its compounds and for the rest we urge the reader to see the full list at the central lemma which has the complete index of the etymological field (as we often see it in Ety.Dictionaries). Example: modern πείθω (peítho) with Rels by stem. Or ψήφος (psífos) with Rels by meaning, plus a large α...ω index (with Hide) to be able to find words easily. E.g. at the verb ψηφίζω (psifízo, “I vote”) or at ancient ψηφίζω (psēphízō, “I count; vote”) a selection of close Rels +the compounds are given plus the {see|el|and=1|ψήφος} link to the rest of the ety.field. The ancient ψῆφος (psêphos) has so many Derived terms, that, for the moment, I just gave a link to perseus.

_4 modern -άω/ώ twin.variant verbs Appendix:Greek_verbs#2nd_Conjugation my.2024.notes. The {link|el|ωωωάω|ωωωάω/ωωωώ}} was used when it is judged that the only thing to see at the -ώ variant is "go to -άω". A separate link for -ώ variant is used when it is still in use, sometimes as isodynamous to the -άω, not dated. Rarely, an -ώ is the main lemma, not the -άω (τηλεφωνώ (tilefonó, “I phone”). Or, an -ώ variant does not exist at all in practice. The modern -ώ is not a contraction of -άω (-áo) which is a modern suffix unlike the ancient uncontracted -άω (-áō), also -έω, -όω > contructing to -ῶ (-ô) which is the basic verb from Hellenistic Koine onwards. The -ίζω (-ízo) is a completely different verb and always has a separate link.

Thank you for your hard work. PS ...#Waiting for Medieval Greek. ‑‑Sarri.greek ^♫ I 10:31, 9 January 2025 (UTC)Reply

I would add that I think the {{see}} template is a rather good idea, and the only reason it is restricted to the Greek lects is because non-Greek-focused editors don't know about it! This, that and the other (talk) 23:30, 9 January 2025 (UTC)Reply

Yeah you are probably right. BTW I am thinking of adding parameters |keepfirst= and |keeplast= indicating a number of lines at the beginning or end of {{col}} to keep in that position and not sort elsewhere, so you can e.g. use |keeplast=1 to prevent a use of {{see}} at the end of {{col}} from getting sorted among other lines. Maybe there is a better way of doing this. Benwing2 (talk) 23:37, 9 January 2025 (UTC)Reply

Waiting for Medieval Greek

Latest comment: 9 days ago3 comments2 people in discussion

Still waiting for Medieval Greek (2024) L2 title to be implemented. En.wiktionary could become a pioneer, rectifying the absurd "Ancient Greek up to 1453" seen at all official catalogues, still standing as a relic, in 2025! Happy New Year! ‑‑Sarri.greek ^♫ I 10:31, 9 January 2025 (UTC)Reply

@Sarri.greek: I've been thinking about this issue some more. Would I be right in saying that your objection to the two-way split (pre- and post-1453) is primarily nomenclatural? If so, would you be happy if Ancient Greek, qua all Greek written in Greek before 1453 (i.e., excluding Mycenaean Greek, written in Linear B), were renamed "Classical Greek"? I assume it would seem less strange to call AD-second-millennium Greek "Classical" than it would to call it "Ancient". Alternatively, what about a two-way split based on orthography, denominated "Polytonic Greek" and "Monotonic Greek"? According to the latter dichotomy, Katharevousa would be included under Polytonic Greek, which would presumably be a good thing, given that it already uses grc templates for its declension and given that it is in large part a continuation of the Classical idiom (Atticism). These are just some thoughts from outside the box to attempt to cut through the impasse. What do you think? 0DF (talk) 03:02, 14 January 2025 (UTC)Reply

Dear @0DF, think grk, not grc for a moment and forget script (polyonic-monotonic, which led to misconceptions at en.wikt since 2006). We can discuss somewhere else lexicographic conventions of presenting any grk: briefly, ANYTHING grk, can be, and indeed was, written in polytonic. Our convention is that we do not duplicate pages with different scripts without reason (differentiating observations), and that we lemmatise Modern Greek in its monotonic version, which is the current script after 1982, although everything was written polytonically until then (quotations may be polytonic for 1st editions, thus el‑transliteration has to include every symbol as Module:grc-translit does).

Ref to wikt:el:Template:ελληνικά for the broad image. There is ancient (including Mycenaean, studied separately, and including Hellenistic Koine), medieval (including the Byzantine polity's output mentioned here) and modern (including Katharevousa) Plus dialects, regional idioms for each period. That is the question. Whether 'tis nobler in the mindset of en.wikt to separate grk in THREE periods. It is a language with too long a history of 3,000 years. Because of that, as a case study, it has too many lang-specifics, some probably absent from en.wikt, or systematically ignored, or even mocked as obsessive "force-feed" (sic) from us, native editors who are trained and daily exposed to different kinds of grk since the age of 12. The borders of periods and registers of Greek are diaphanous and “διάτρητος” “diatretus” ("perforated"); words reappearing with a different garment in each lemma creating intrinsic new behaviours & phaenomena. This is a difficulty for any grk-editor, even for native philologists. But are we going to put our heads in the sand?

Here, I am trying to ask the bord of bureaucrats Why is this little vote not implemented?, why this procrastination? (because the turnout for any question about this 'little language', grk, is 5-8 people -most of whom do not specialise in any kind of grk in particular-). The excellent bureaucrat @Benwing is left alone to judge and carry out everything here?

The state of grk: At the moment, of the three directors (ref) of Ancient Greek @Erutuon, JohnC5, Mahagaja only M Mahagaja is active and has responded; also my mentor Erutuon responded. Generally, admins (all seriously trained linguists, as I understand) abstain and avoid expressing a decision when not specialised in Koine or Medieval Greek, whereas other editors advertise their decisions instantly; still any linguist could take a look at sources and bibliography, including greek bibliography -or is it discarded?- for such a broad question. The ModGreek admin, and my mentor, @Saltmarsh (studying and building Modern Greek since 2005), is less active now but always present, and has agreed manyfold. Plus, presently there is no active specialist (known as trained) in grc‑koi‑lat, or gkm, or gkm‑lat = el‑ear. How many times is this vote going to repeat itself? Everyone is fascinated with grc and its dialects (its texts being most prestigious), or exotic or very rare lemmata, grc‑koi under AncGr is in a very bad state and needs full review, MedGr gkm is mentioned under AncGr, el-kth under ModGr needs a polytonic reviewing plus sources, modern dialects and regional idioms under ModGr are promoted to 'Languages' indiscriminately(?), and ModGr el etymologies need review with refs. Thank you. PS could Template:code be nobr? The hyphen breaks lines. ‑‑Sarri.greek ^♫ I 11:49, 14 January 2025 (UTC)Reply

SoP hyphenated compounds

Latest comment: 12 days ago6 comments4 people in discussion

The CFI presently states "Idiomaticity rules apply to hyphenated compounds in the same way as to spaced phrases", but does not explicitly mention hyphenated prefixes. I think it would be beneficial to clarify this section to make it explicit that the same also applies to hyphenated prefixes (where the prefix is, or should be, individually defined in the dictionary), consistently with the decision to delete all these (excepting those saved by a separate rule). To this end I propose that we amend the above sentence to read "Idiomaticity rules apply to hyphenated compounds, including hyphenated prefixed words, in the same way as to spaced phrases". Mihia (talk) 20:36, 9 January 2025 (UTC)Reply

Sounds good to me but we might need an explicit vote to change CFI; I'm not sure what the policy for this is. Benwing2 (talk) 23:41, 9 January 2025 (UTC)Reply

It seems worth mentioning that it has also been proposed to adopt the opposite policy, as discussed at Wiktionary:Votes/2024-11/Updating COALMINE rule and Wiktionary:Beer_parlour/2022/September#Including_hyphenated_prefixed_words_as_single_words.--Urszag (talk) 17:19, 10 January 2025 (UTC)Reply

I was thinking it might be nodded through as merely codifying something that we are already doing. Can anyone, perhaps an administrator, advise whether this should go to a formal vote? Mihia (talk) 18:28, 10 January 2025 (UTC)Reply
I went ahead and added the proposed amendment, because, as stated it is simply a clarification of the current practice, and the ex-teacher example given in the same paragraph is arguably already an example of a SOP hyphenated prefixed word.

Relatedly, I am interested to know @Mihia's position on entries like non-existent, non-essential, etc. since these are also essentially SOP but are saved by COALMINE; as a proponent of revoking COALMINE, do you believe these entries should be deleted? Svārtava (t ɕ) 19:22, 10 January 2025 (UTC)Reply
Thanks for making that edit. To answer your question, if "non-X" means "non- + X", and nothing more, then I think it should count as SoP even if "nonX" exists, so yes, delete it. I don't see why the existence somewhere of "nonX" -- which is probably the case for almost all "non-X" anyway, for some value of "almost" -- should make any difference. I know there are arguments why it should, but last time round I remember not being convinced by these. In fact, I would go even futher: if "nonX" means nothing more than "non + X" then I believe it is also SoP -- in other words, the accident of how we write things doesn't make any difference, logically -- but, then again, actually implementing this generally in any useful way seems impractical. But with the hyphen, the component division is obvious. I do think, however, that if you search for "non-X" and it does not exist in the dictionary then ideally you should be more helpfully directed to the parts (and similarly for other defined prefixes). Mihia (talk) 20:41, 10 January 2025 (UTC)Reply

Category for dog whistles

Latest comment: 10 days ago5 comments4 people in discussion

I wish to propose a new category, as well as possibly labels, for dog whistles for classifying a word is a political allusion that a certain audience is supposed to know the intended meaning. this is a specific type of word that currently has no categorisation. Juwan (talk) 00:17, 11 January 2025 (UTC)Reply

I don’t know, it is heavily context-dependent. Even at our quotes for dog whistle, examples like globalist are given. What about slangy terms which are only understood at all at a certain demographic, or statistically make inferences about political convictions possible? If somebody calls a television idiot box, he is kind of an intellectual, if he calls it electric Jew, the intellectual circles the speaker moves in are discernibly narrow, yet the audience is not double-layered, which would allude us that an essential part of the definition of dog whistle is not satisfyingly reflected by us on its definition page. But our labels are supposed to describe lexemic meanings. Does dog whistle even belong to linguistic terminology rather than being a political catchword, perhaps presupposing epistemological, not to say ideological, paradigms not all dictionary readers and editors might comprehend? Fay Freak (talk) 07:22, 11 January 2025 (UTC)Reply

Would you object to it being an appendix? —Justin (koavf)❤T☮C☺M☯ 07:31, 11 January 2025 (UTC)Reply

No, even less so to a user-page, which one may always try to be more convincing. We should see first how much rhyme or reason there is behind it, to assess the eventual useful content of a category. It requires an overview of political ideologies to which we believe us able to assign terms, doesn’t it? Would a new label and category not be a combination of existing labels and categories like Category:Neo-Nazism and Category:Conservatism plus the presence of a certain degree of double-entendre? Fay Freak (talk) 08:50, 11 January 2025 (UTC)Reply

I think this is far too subjective to be a useful category. — Sgconlaw (talk) 22:48, 12 January 2025 (UTC)Reply

Moving User:JnpoJuwan/Images into main

Latest comment: 8 days ago4 comments3 people in discussion

I am not sure about the process about adding policy into main. many other contributors have already taken a look and I request to move it to Wiktionary namespace, replacing the pages Wiktionary:Images and Help:Images (the latter becomes a redirect). Juwan (talk) 10:03, 11 January 2025 (UTC)Reply

@JnpoJuwan: if it is to be policy, you’ll need to list it at “Wiktionary:Votes” to be formally voted on. — Sgconlaw (talk) 06:01, 12 January 2025 (UTC)Reply

I

support the move; the two pages linked to are not useful in their current state. This could easily be a think tank policy without a formal vote. Ultimateria (talk) 19:15, 13 January 2025 (UTC)Reply

Moved as think tank policy. Juwan (talk) 22:40, 14 January 2025 (UTC)Reply

Sporadic senseid

Latest comment: 10 days ago3 comments2 people in discussion

If one wants to make reference to a specific sense of a lemma, is it bad practice to add {{senseid}} for just one sense? If applying it to one sense, should one ensure that all senses are identified by {{senseid}} or {{tl|etymid}? In the case prompting this question, I wanted to use the sense ID for a meaning of short, but saw that {{senseid}} had been used for none of them. --RichardW57 (talk) 13:26, 11 January 2025 (UTC) Incidentally, is it allowed to use sense IDs for translations to English, or does that breach the rule that non-English lemmas get translations, not meanings? I don't trust commonsense to apply. --13:26, 11 January 2025 (UTC)Reply

@RichardW57 I would say it is perfectly fine to add {{senseid}} as and when necessary, if you need to link to an individual sense, no matter whether it is a one-off or not. I do this fairly regularly. No need to add {{senseid}} to all senses unless you have a specific reason to!

As for "is it allowed to use sense IDs for translations to English" - not sure what you mean exactly, but {{senseid}} can be used in entries of any language. This, that and the other (talk) 00:17, 13 January 2025 (UTC)Reply

@This, that and the other: I can see an objection that using sense IDs is giving the meaning of a word, as opposed to translation. This urge to be helpful and save user clicks seems to irritate some users. For example, I often add |t= to forms and other script forms to remind the user of the rough meaning of a word if he's simply temporarily forgotten it, but that seems to irritate other editors, who seem to think it should only be used for disambiguation. --RichardW57 (talk) 14:13, 13 January 2025 (UTC)Reply

I'm finding it hard to monitor the Beer Parlour and allied pages for new topics. Would it be in order to add something like {{minitoc}} so that one can immediately see new topics without being swamped by detailed discussions of no personal interest? I can imagine that adding it may need some careful crafting. Or am I missing a trick I should adopt? --RichardW57 (talk) 13:36, 11 January 2025 (UTC)Reply

Replacement for LDL

Latest comment: 11 days ago4 comments2 people in discussion

@This, that and the other: {{LDL}} was removed in 2024 on the basis that it wasn't being used where WT:CFI said it should be. I, however, was finding it useful for words which I suspected would not meet the requirements even if the language were well-documented, i.e. that the inability to furnish 3 examples was not due to accidents of preservation (or modern publishing or inadequate research on my part). What mechanism should now be used to sound a note of caution? --RichardW57 (talk) 14:05, 11 January 2025 (UTC)Reply

Incidentally, I can't find the vote removing the policy requiring the use of LDL or its like, only the discussion at Template talk:LDL. RichardW57 (talk) 14:05, 11 January 2025 (UTC)Reply

@RichardW57 we have {{lb|...|hapax}} if you think it's reasonably likely that only one attestation is available. If the word is attested twice, I do have to wonder whether such a fact needs any special flagging or attention for users. If a "note of caution" needs sounding, surely it would be better to write a usage note afresh for each entry to explain exactly why you feel "caution" is required. (Or at a maximum, a language-specific usage note template could be created.)

As for the policy change, I didn't bother with the formality of a vote, as the RFDO discussion was closed as delete (not by me) and it seemed absurd to leave a reference to a deleted template in the policy. However, I accept your perspective that you feel the whole thing should have been done by vote. If you feel the change should be retrospectively ratified by vote, I could hardly argue against you. This, that and the other (talk) 02:50, 12 January 2025 (UTC)Reply

@This, that and the other: Part of the objection to the template was that its output was far too prominent. The words I am worried about appear only in dictionaries (which could prompt their uses in mediaeval or modern texts), either as entries or parts of a definition - they barely rise to the status of hapax! If the 'Pali editor community' complied with the requirement to maintain a list of acceptable sole sources, the dictionaries I'm using should be on it. (I'd be inclined to make the dictionary requirement the presence on two or more of a list of dictionaries, but that wouldn't be compliant either.) At the moment I'm resorting to {{rfq}} to ask for attestation, but it asks for interesting examples of usage, and relies on HTML comments for one to say what one would like to see shown. (The justification for not maintaining such a list is that is that the editors have better things to do with their time.) There are some instances where it would be good to show the syntax associated with the word, but it's difficult to ask for that. I'm working on the principle that any old 'durably archived' attestation is better than none. I find lawfully (especially working from a non-rogue jurisdiction) providing literal enough translations hard work. We also seem to lack a decent mechanism for acknowledging translations, perhaps because it's not needed in the main rogue jurisdiction. Still, 'hapax' will do for words I can only find, directly or indirectly, in a single dictionary definition. Now what I need to find is a reference for the recommended syntax of an annotated translation to English. --RichardW57 (talk) 11:01, 12 January 2025 (UTC)Reply

redoing list templates

Latest comment: 8 days ago41 comments12 people in discussion

We have a ton of list templates like {{list:countries of Africa/en}}, most of which use an antiquated list-helper framework that dumps the results into a raw, hard-to-read list like this:

countries of Africa (appendix)edit

In languages with transliteration, as @Fenakhay points out, it is even worse, like the corresponding Japanese list:

countries of Africa: アフリカの諸(しょ)国(こく) (Afurika no shokoku) (appendix)edit
- アルジェリア (Arujeria)
- アンゴラ (Angora)
- ウガンダ (Uganda)
- エジプト (Ejiputo)
- エチオピア (Echiopia)
- エリトリア (Eritoria)
- ガーナ (Gāna)
- カーボベルデ (Kābo Berude)
- ガボン (Gabon)
- カメルーン (Kamerūn)
- ガンビア (Ganbia)
- ギニア (Ginia)
- ギニアビサウ (Ginia-Bisau)
- ケニア (Kenia)
- コートジボワール (Kōto Jibowāru)
- コモロ (Komoro)
- コンゴ共(きょう)和(わ)国(こく) (Kongo Kyōwakoku)
- コンゴ民(みん)主(しゅ)共(きょう)和(わ)国(こく) (Kongo Minshu Kyōwakoku)
- サントメ・プリンシペ (San Tome Purinshipe)
- ザンビア (Zanbia)
- シエラレオネ (Shierareone)
- ジブチ (Jibuchi)
- ジンバブエ (Jinbabue)
- スーダン (Sūdan)
- スワジランド (Suwajirando)
- セーシェル (Sēsheru)
- 赤(せき)道(どう)ギニア (Sekidō Ginia)
- セネガル (Senegaru)
- ソマリア (Somaria)
- タンザニア (Tanzania)
- チャド (Chado)
- 中(ちゅう)央(おう)アフリカ共(きょう)和(わ)国(こく) (Chūō Afurika Kyōwakoku)
- チュニジア (Chunijia)
- トーゴ (Tōgo)
- ナイジェリア (Naijeria)
- ナミビア (Namibia)
- ニジェール (Nijēru)
- ブルキナファソ (Burukina Faso)
- ブルンジ (Burunji)
- ベナン (Benan)
- ボツワナ (Botsuwana)
- マダガスカル (Madagasukaru)
- マラウイ (Maraui)
- マリ (Mari)
- 南(みなみ)アフリカ (Minami Afurika)
- 南(みなみ)スーダン (Minami Sūdan)
- モーリシャス (Mōrishasu)
- モーリタニア (Mōritania)
- モザンビーク (Mozanbīku)
- モロッコ (Morokko)
- リビア (Ribia)
- リベリア (Riberia)
- ルワンダ (Ruwanda)
- レソト (Resoto)

I would like to redo these using {{col}}. Any objections?

BTW the current framework is really bad in that it pretty much forces you to format your own raw list; the badly named {{list helper 2}} template (and we inexplicably have both {{list helper}} and {{list helper 2}}) takes a |list= param whose value is a list of comma-separated pre-formatted links, which makes it difficult to change the format into something else. Instead the entries should be a comma-separated list of raw terms with inline modifiers as needed; or the entries can each be in their own parameter, again raw with inline modifiers. Either format makes it easy for the underlying framework to choose how to display the entries optimally.

Any other suggestions for improvements/etc.?

Benwing2 (talk) 00:27, 12 January 2025 (UTC)Reply

Have you mocked up a ferinstance for us to judge? —Justin (koavf)❤T☮C☺M☯ 00:35, 12 January 2025 (UTC)Reply

@Koavf The use of {{col}} would be something like this:

(countries in Africa):

Here I have linked the title to the actual category. I haven't yet mocked up an edit button but it could be displayed just to the right of the title, or right-justified. Benwing2 (talk) 00:44, 12 January 2025 (UTC)Reply

Looks nice enough and generally prettier, but I'd prefer no "show more" dropdown. —Justin (koavf)❤T☮C☺M☯ 00:52, 12 January 2025 (UTC)Reply

@Koavf Do you mean you want all of them displayed by default? There is a param for that but I'd want to make sure there is consensus for that as it takes up a certain amount of space. Benwing2 (talk) 00:56, 12 January 2025 (UTC)Reply

Yes, that what I want. I don't think that showing basically 12 arbitrary names out of a list of 55 is helpful. —Justin (koavf)❤T☮C☺M☯ 00:58, 12 January 2025 (UTC)Reply

I agree. I find the show-more dropdowns a bit inconvenient.--Urszag (talk) 11:14, 12 January 2025 (UTC)Reply

Personally I'm a fan. I need to scroll past long lists more often than I need to look through them, especially on basic English entries and pages with many language sections. Ultimateria (talk) 19:01, 13 January 2025 (UTC)Reply

I agree in changing it to {{col}}, {{list:countries of Asia/en}} is also pretty convoluted. Also, if the problem is showing just part of the list, wouldn't it be better to do the opposite and have it fully collapsed by default? I think the Japanese one would be pretty big and a bunch of uncollapsed lists could be a nuisance in pages like Kuwait. Trooper57 (talk) 01:30, 12 January 2025 (UTC)Reply

I'm against collapsed by default, but collapsed entirely is better than sneak peeks of various parts of an alphabetical list. —Justin (koavf)❤T☮C☺M☯ 01:40, 12 January 2025 (UTC)Reply

I

Support this proposal, and see no reason not to use the built-in collapsibility of {{col}} for consistency with other term-list boxes in our entries. I note that this style of collapsibility for term lists was the beneficiary of clear community consensus. This, that and the other (talk) 02:38, 12 January 2025 (UTC)Reply

I hate the existing list format. I'd love for the replacement to not have any visible members, just the "show more". DCDuring (talk) 03:10, 12 January 2025 (UTC)Reply

Support Vininn126 (talk) 05:48, 12 January 2025 (UTC)Reply

Other than having the boring antiquated style, is the old list really hard to read? The new one takes up much more space. And the collapsibility, that is used to improve the space usage, is a usability issue on its own, because it requires a click to see the full list.

Speaking as someone having zero Japanese skills, the old-style Japanese version of the list would be easier for me to navigate if it were alphabetically sorted by the English transcriptions. But, I guess, I'm not supposed to be reading the Japanese entries in the first place, so my opinion is irrelevant. That's just the only reason why the demonstrated Japanese list subjectively feels awkward to me, but I don't see any other problems with it. --Ssvb (talk) 09:26, 12 January 2025 (UTC)Reply

Comment: How would it work with countries that have multiple names? I don't think {{col}} has support for multiple links in the same line, right? This doesn't really happen with African countries, but I don't think it'd be right to hide either Turkey or Türkiye from the European list. To do so would be to get prescriptive about it imo. And then the alternative of listing them in two bullet points seems even worse.

...Actually, I guess we do have Côte d'Ivoire vs Ivory Coast and Eswatini vs eSwatini vs Swaziland. It's worth thinking about those and Czech Republic vs Czechia. I'm sure other languages have similar conundrums as well. MedK1 (talk) 18:59, 12 January 2025 (UTC)Reply

@MedK1 See my comment below. {{col}} now has support for multiple links on a given line, either separated by a comma (intended for synonyms or alternative forms with significant differences) or a tilde (intended for alternative forms with minor differences). So we could write Côte d'Ivoire,Ivory Coast or Eswatini~eSwatini,Swaziland etc. (the latter displays as Eswatini ~ eSwatini, Swaziland). Benwing2 (talk) 21:22, 12 January 2025 (UTC)Reply

Ah, awesome! Support then, though I do think the format of a label and a colon preceding the list looks very very ugly. MedK1 (talk) 21:29, 12 January 2025 (UTC)Reply

@MedK1 Are you referring to my suggestion for (continents: mabara) or similar? This would go on a line by itself rather than preceding the list on the same line. If you think that's ugly, do you have any suggestions for how to improve it? I'm not sure what part of it you find ugly. Maybe we could potentially dispense with the mabara part entirely; I've just kept this because all the existing lists have it. If your objection is to the overall format of the header, maybe User:This, that and the other has some ideas how to prettify it. Benwing2 (talk) 21:44, 12 January 2025 (UTC)Reply

Yeah, I don't like the titles (the *(countries in Africa): bit before {{col}}). I don't like them in the other lists either — they always look jarring. I feel they don't look like titles per se either.

Factors that contribute to this feeling may be them being justified to the left, being in italic rather than bold, and them starting with lowercase characters. Changing any of these (but especially the last two) would go a long way.

But then they wouldn't fit right with being on a list, now would they? Ugly as they are, I get how they came to be that way. I can't think of something that'd both 'work' with lists while also looking good; it's why I hadn't made any suggestions before. I hope someone can come up with something nice... MedK1 (talk) 21:57, 12 January 2025 (UTC)Reply

Can you mock up some ideas that would look better to you? I'm not at all opposed to changing the way that {{col}} handles headers; it's something that has kinda come to be without a lot of thought put into it. Benwing2 (talk) 22:44, 12 January 2025 (UTC)Reply

@Benwing2 @MedK1 if you are looking for inspiration I made some mockups at User:This, that and the other/NavFrame and list-switcher#Interim improvement: add title to list-switcher. This, that and the other (talk) 00:11, 13 January 2025 (UTC)Reply

@This, that and the other Thanks. 3a and 3b look the same to me and both look a lot better than what we have. Benwing2 (talk) 00:17, 13 January 2025 (UTC)Reply

Also I'd like to add an [edit] box on the right side of the title bar, for use with lists; how hard is that to do using the 3a/3b style? Benwing2 (talk) 00:19, 13 January 2025 (UTC)Reply

@Benwing2 I was referring specifically to the "Interim improvement: add title to list-switcher" section - sorry I didn't make that clear. Maybe I should have made a separate page. The rest of that page is for a discussion I want to start at some point about unifying the style of NavFrames and list-switchers.

Anyway, yes it's very easy to add an "edit" link there. The whole thing will need some CSS implementation - not necessarily the way I have mocked it up on that page. This, that and the other (talk) 00:21, 13 January 2025 (UTC)Reply

So yeah, interim B looks better. Benwing2 (talk) 00:31, 13 January 2025 (UTC)Reply

Thanks!! I like these; ignoring interim A (which is not a significant improvement, imo), they're all better than what we have and I'm fine with them! Interim B is the best I think! MedK1 (talk) 00:30, 13 January 2025 (UTC)Reply

I do want to add that I think navboxes (er, aka NavFrames?) are fine as-is however. They're their own thing, so it's fine if they look a bit different from your standard page text. MedK1 (talk) 00:33, 13 January 2025 (UTC)Reply

Support. big lists deserve some love and better formatting. Juwan (talk) 22:46, 14 January 2025 (UTC)Reply

Comment: So far everyone except for User:Ssvb seems in favor of using {{col}}, although there is some disagreement about whether to display all the elements by default or only some of them (or none). I am writing the new list-helper code, which is likely to require each list:foo of Bar/CODE to do something like this (for {{list:continents/sw}}):

{{#invoke:topic list|show|sw
|hypernym=[[bara|mabara]]
|Afrika<t:Africa>
|Antaktika~Antaktiki<t:Antarctica>
|Asia<t:Asia>
|Ulaya,Uropa<t:Europe>
|Amerika ya Kaskazini<t:North America>
|Amerika ya Kusini<t:South America>
|Australia<t:Australia>
}}

This code is smart enough to know that the English hypernym to be displayed in the title is continents (based on the template name), and the corresponding category is Category:sw:Continents, although you can override either using |enhypernym= or |cat=. The format of each element is exactly as in {{col}}, meaning it can take inline modifiers; multiple comma-separated or tilde-separated items (the tilde is meant to delimit slight variants and the comma to delimit synonyms or variants with more significant differences); pre-formatted elements (e.g. for Japanese using {{ja-r}} or similar); etc. The title will currently display as something like (continents: mabara), where the English hypernym "continents" links to the appropriate category and the language-appropriate hypernym mabara links to the singular equivalent of the term mentioned. This means that language-appropriate hypernym needs to be in the plural but typically linked to the singular, which isn't always the case in the current lists (a lot of the time, the hypernym is in the singular form, so I may need some help auditing the lists to rewrite the hypernyms in the plural). The reason that each list template directly invokes a module instead of having a wrapping {{topic list}} template is so that the user can specify parameters to the list template (e.g. |nocat=) and they are handled automatically without each list template having to explicitly pass all such parameters through. I assume this won't be such an issue as most people will just copy an existing list to create a new one.

One nice thing about this is that we can easily decide the exactly display format later, and change or optimize it at any time; we could even add a per-user parameter to control the appearance of such lists if there is a lot of disagreement over the appearance. Benwing2 (talk) 11:50, 12 January 2025 (UTC)Reply

One factor no one has mentioned: see Category:Latin script templates, most of which are transcluded in all the single-Latin-letter pages like "a". We need to be very careful not to add more Lua overhead to most of those pages. If anything, we need to come up with lite versions that use modules even less than before. Even if we split off letter/symbol entries to a subpage, that subpage could still end up having problems all by itself. Chuck Entz (talk) 23:13, 12 January 2025 (UTC)Reply

@Chuck Entz Those templates use their own dedicated module, Module:letters, which is out of scope at this point. IMO they might be better implemented using {{flatlist}}, which is non-Lua and uses a simple CSS implementation to display items horizontally with bullets in between them (which looks a lot better than commas). Benwing2 (talk) 23:31, 12 January 2025 (UTC)Reply

@Benwing2: I just asked what do people find unreadable in the compact look of the old list? Maybe giving it the same light blue background could make it look a bit better and more distinctly stand out? Clicking on the link https://en.wiktionary.org/wiki/Category:en:Countries_in_Africa of the old list is functionally roughly equivalent to clicking on the "show more" link of the new list, as both present the data formatted as columns. It's one click here versus one click there. Do small lists of only 3-5 items look nice with the new design? And the configurability of the template is a powerful feature, but it enables formatting discrepancies between different entries. I don't object the changes per see, but I see a potential for a "simplification initiative" in a few years from now, essentially reverting this redesign. --Ssvb (talk) 07:51, 13 January 2025 (UTC)Reply

@Ssvb for long lists, the existing style is a barely-readable jumble of text. There's a reason typesetters use techniques such as spacing, columns, and tables to lay out long lists.

You're right that for very short lists it may be overkill to use the term-list style. The one that springs to mind is {{list:days of the week/en}}. @Benwing2 I wonder if we should keep using a similar style to the current style for short list? "List" templates are, generally speaking, lists of coordinate terms, and so I wonder it could make sense to have some kind of inline {{cot}}-style template for lists of, say 8 items or less. (And also move the boxes up from a "See also" L3 to a "Coordinate terms" L4 - but I digress.) This, that and the other (talk) 10:13, 13 January 2025 (UTC)Reply

The terms should definitely be listed under "Coordinate terms" regardless of style, and I've been doing that as I encounter them listed under Related terms or See also. I've also had the thought that maybe short lists should use a horizontal layout style, although I somewhat prefer the {{flatlist}} style with bullets between the items rather than commas, something like this:

That would also allow commas and tildes to be used with their current meaning. Benwing2 (talk) 10:42, 13 January 2025 (UTC)Reply

BTW I'm now convinced that allowing for bullet or comma separation with horizontal display is a good idea, but it has to be specified manually. Something like doing it automatically if there are less than a certain number of items won't work because some items take up much more width than others. See for example ਖ਼#Punjabi. Under See also there are three lists of characters (Gurmukhi script letters, vowels and diacritics). The first list has 42 elements but is definitely better displayed horizontally with comma separators; the current display with three horizontal lists looks decent (other than IMO the category should be linked to the title rather than displayed separately). OTOH something like {{list:fundamental interactions/ru}} has only four items but looks terrible laid out horizontally like it currently is. Possibly we could do something involving the total character width of all elements but I think that would be tricky to get right. Benwing2 (talk) 23:58, 13 January 2025 (UTC)Reply

The manual override to enable horizontal display with the bullet/comma choice sounds good to me for creating new content. Still what will happen to the existing entries, such as the mentioned ਖ਼#Punjabi? Will they all need to be manually edited to enable horizontal display for them? --Ssvb (talk) 07:36, 14 January 2025 (UTC)Reply

@Ssvb No. The override would be done once in e.g. {{list:Gurmukhi script letters/pa}}, which is the template that displays that list. In fact, the system I am designing will automatically override the default value for horizontal display so as to enable it with a comma separator (and set certain other defaults, such as disabling transliteration and providing appendix lists for the script and alphabet if they exist) for *ALL* lists of the form list:Foo script letters/CODE. This is on the assumption that all lists of letters should be displayed in a similar fashion. An individual list can always override the default and supply different settings if the defaults don't look right for that particular list, but on the assumption that most lists won't do that, we'll get fairly uniform, good-looking lists of letters while other sorts of lists may display in a different fashion. (For example, days-of-the-week lists may set horizontal display with bullets as the default, since there are only 7 entries and they're usually not too long; but some particular languages may want to override this and go back to a columnar display if the horizontal list gets too long, e.g. if there are several synonyms of each day name, each one with translit, or something.) The only thing that might need to change on each page calling these lists is to remove the bullet to the left of the list template invocation, since the list module itself will generate it when the list is displayed horizontally. Benwing2 (talk) 07:50, 14 January 2025 (UTC)Reply

Comment: I have created an implementation of this, and a sample list in {{list:countries of Africa/ja/sandbox}} (it needs to be in mainspace because its name is parsed to get part of the title). It displays as follows:

countries of Africa: アフリカの諸(しょ)国(こく) (Afurika no shokoku) (appendix)edit

ja (ja)
アルジェリア (Arujeria)
アンゴラ (Angora)
ウガンダ (Uganda)
エジプト (Ejiputo)
エチオピア (Echiopia)
エリトリア (Eritoria)
ガーナ (Gāna)
カーボベルデ (Kābo Berude)
ガボン (Gabon)
カメルーン (Kamerūn)
ガンビア (Ganbia)
ギニア (Ginia)
ギニアビサウ (Ginia-Bisau)
ケニア (Kenia)
コートジボワール (Kōto Jibowāru)
コモロ (Komoro)
コンゴ共(きょう)和(わ)国(こく) (Kongo Kyōwakoku)
コンゴ民(みん)主(しゅ)共(きょう)和(わ)国(こく) (Kongo Minshu Kyōwakoku)
サントメ・プリンシペ (San Tome Purinshipe)
ザンビア (Zanbia)
シエラレオネ (Shierareone)
ジブチ (Jibuchi)
ジンバブエ (Jinbabue)
スーダン (Sūdan)
スワジランド (Suwajirando)
セーシェル (Sēsheru)
赤(せき)道(どう)ギニア (Sekidō Ginia)
セネガル (Senegaru)
ソマリア (Somaria)
タンザニア (Tanzania)
チャド (Chado)
中(ちゅう)央(おう)アフリカ共(きょう)和(わ)国(こく) (Chūō Afurika Kyōwakoku)
チュニジア (Chunijia)
トーゴ (Tōgo)
ナイジェリア (Naijeria)
ナミビア (Namibia)
ニジェール (Nijēru)
ブルキナファソ (Burukina Faso)
ブルンジ (Burunji)
ベナン (Benan)
ボツワナ (Botsuwana)
マダガスカル (Madagasukaru)
マラウイ (Maraui)
マリ (Mari)
南(みなみ)アフリカ (Minami Afurika)
南(みなみ)スーダン (Minami Sūdan)
モーリシャス (Mōrishasu)
モーリタニア (Mōritania)
モザンビーク (Mozanbīku)
モロッコ (Morokko)
リビア (Ribia)
リベリア (Riberia)
ルワンダ (Ruwanda)
レソト (Resoto)

As noted above, we are going to change the title bar to look better. Benwing2 (talk) 01:50, 13 January 2025 (UTC)Reply

And I have just gone ahead and changed the title bar, as envisaged.

On the Vector 2022 skin this template takes up 4 columns. This is a bit cramped and it should preferably only take up 3 columns in this skin. A solution to this problem is in the works. This, that and the other (talk) 06:19, 13 January 2025 (UTC)Reply

Thanks. Do you feel your prototypical algorithm in JavaScript is good enough to convert to Lua? If so I can go ahead and do it. Benwing2 (talk) 06:34, 13 January 2025 (UTC)Reply

I'll reply on your talk to avoid cluttering BP with technical discussion. This, that and the other (talk) 06:59, 13 January 2025 (UTC)Reply

obnoxious collapsibles on mobile

Latest comment: 5 days ago7 comments5 people in discussion

Many wiktionary entries are blessed with a veeery rich list of linked terms (e. g. derived terms at hand). On desktop, these appear in a neat handy collapsible box, that can be open to view the deluge of links. When opening a page, the table is shortened and can be scrolled past easily. I personally use mobile more often though and in my (firefox) browser these collapsibles pose a serious obstacle to accessing the rest of the page, simply because they are default non-collapsed with the "show less/more"-button on the bottom, so I have to scroll past hundreds of words to get to the sections below.

I have no clue whatsoevery how these are programmed or by who, but I can imagine two solutions: Either have them start off closed when opening a page (like translations) or move the button to the top, so users don't have to scroll past to close them.

PS: I found these two discussions https://en.wiktionary.org/wiki/Wiktionary:Beer_parlour/2021/April and https://en.wiktionary.org/wiki/Wiktionary:Grease_pit/2021/June#Experience_on_mobile on the same problem with language headers. This is even worse because users can at least immediately close a language header Jan R Müller (talk) 13:11, 12 January 2025 (UTC)Reply

People still use mobile phones? I think the general trend in the world is moving away from those old things. Father of minus 2 (talk) 18:42, 12 January 2025 (UTC)Reply

I'm sure issues like this have been discussed more recently than 2021, and some solutions have been proposed. I don't recall what the current blocker is; I recall some longstanding Phabricator ticket to do something or other. @Ioaxxere @Surjection @This, that and the other Can you comment? I vaguely remember some proposals for having a gadget to improve the mobile experience. Benwing2 (talk) 21:50, 12 January 2025 (UTC)Reply

There are two separate issues here. Ioaxxere can probably say more about the language headers one.

As for the collapsible term-list boxes, I have actually been noticing something similar on my phone, except for me, it is all the quotations in the entry that appear uncollapsed - the collapsible boxes are collapsed as they should be. It turns out that the sticky "visibility" links that are shown in the sidebar in the desktop Vector skin (and in the Tools menu on Vector 2022) appear at the very bottom of the page on mobile, and I have at some point turned on the "Show quotations" option. So @Jan R Müller please scroll to the very bottom of an affected entry and tap "Hide derived terms" or similar.

I wonder if we should move these links somewhere else on mobile... This, that and the other (talk) 00:10, 13 January 2025 (UTC)Reply

@Jan R Müller: The language header thing you mentioned is tracked at phab:T376446 but the developers are yet to fix this for us. Ioaxxere (talk) 04:27, 13 January 2025 (UTC)Reply

Good news as of Jan 9 (3 days ago):

We have decided to not block this on T374883. Anything that reduces the complexity of the section collapsing code is worth doing as it simplifies the transition to Parsoid. This means articles with only one heading will display with the heading collapsed by disabled, but given there is a user preference to override this behaviour, we don't see this as a problem. We've queued this up for next sprint since the patch is written, it's mostly removing code and we don't see this as a big time sync.

Benwing2 (talk) 04:32, 13 January 2025 (UTC)Reply

@Benwing2 @Ioaxxere just noting that this appears to have happened. Now all section headings appear collapsed by default, even when there is only a single section. This, that and the other (talk) 04:45, 18 January 2025 (UTC)Reply

Can we label Five-Percent Nation lingo or should we consider it slang?

Latest comment: 9 days ago6 comments3 people in discussion

Hi, I've been thinking about including a lot of Five-Percent Nation lingo into dictionary. Some of the words are already there like a-alike or 85er but I thought that maybe we could create a label for it that would automatically add a category, let's say "Five-Percent Nation" or "Five-Percent Nation jargon"? Of course not all words should be allowed as most of them are just English words that carries similar meaning but since the lingo was popularised in the 90s by many rappers and some of it is still used up to today in hip-hop, I think it should have its own category? What do you guys thinks? Tashi (talk) 21:34, 12 January 2025 (UTC)Reply

I don't know much about the Five-Percent Nation but if the inclusion of these terms can be justified according to CFI and there are enough of them, then IMO yes we should probably have a category. Category:African-American English and especially Category:African-American Vernacular English are a mess and a lot of recategorization is in order. @-sche Benwing2 (talk) 21:56, 12 January 2025 (UTC)Reply

I don't know how many terms we could include. Is there a minimum number of entries a jargon should have? Tashi (talk) 22:08, 12 January 2025 (UTC)Reply

IMO a category should have at least ~10 items to be worth creating. Benwing2 (talk) 23:32, 12 January 2025 (UTC)Reply

Agreed. Most people don't realize that Wikimedia categories aren't really for categorizing, they're navigation aids to help find pages that have specific things in common. If there's only a few pages, you might as well just link directly to the other pages from each of them (i.e. in a "See also" section or a usage note). Chuck Entz (talk) 00:04, 13 January 2025 (UTC)Reply

I think it's doable :) If I want to add it to the module, do I need a permission or I can just do it now? Tashi (talk) 23:02, 13 January 2025 (UTC)Reply

mismatches between list templates and categories

Latest comment: 10 days ago1 comment1 person in discussion

I made a list of all the current list templates and the categories they categorize into. In the following list I chopped off the language-specific component of the template and category names, counted up each (template, category) combination and sorted the result by count>

More information

 224 days of the week: cat=Days of the week
 107 Gregorian calendar months: cat=Months
  55 seasons: cat=Seasons
  46 sexual orientations: cat=Sexual orientations
  43 continents: cat=Continents
  42 Gregorian calendar months: cat=Gregorian calendar months
  29 oceans: cat=Oceans
  29 fundamental interactions: cat=Physics
  29 countries of Europe: cat=Countries in Europe
  19 times of day: cat=Times of day
  17 states of India: cat=States of India
  15 countries of Africa: cat=Countries in Africa
  14 lunar months: cat=Lunar months
  12 countries of Asia: cat=Countries in Asia
  11 parts of speech: cat=Parts of speech
  11 Islamic calendar months: cat=Islamic calendar months
   9 taxonomy: cat=Taxonomy
   9 senses: cat=Senses
   9 quarks: cat=Quarks
   9 fingers: cat=Fingers
   9 canids: cat=Canids
   9 Arabic script letters: cat= 
   8 units of time: cat=
   8 books of the New Testament: cat=Books of the Bible
   8 Western astrology signs: cat=Astrology
   7 tastes: cat=Taste
   7 regions of Italy: cat=Administrative regions of Italy
   7 provinces of the Netherlands: cat=Provinces of the Netherlands
   7 planets of the Solar System: cat=Planets of the Solar System
   7 compass points: cat=Compass points
   7 Chinese zodiac signs: cat=Chinese zodiac signs
   6 states of Austria: cat=States of Austria
   6 provinces of Pakistan: cat=Provinces of Pakistan
   6 books of the Protestant Old Testament: cat=Books of the Bible
   6 books of the Catholic Old Testament: cat=Books of the Bible
   5 union territories: cat=Union territories of India
   5 provinces of Belgium: cat=Provinces of Belgium
   5 camelids: cat=Camelids
   5 Islamic prophets: cat=Islamic prophets
   5 Cyrillic script letter names: cat=Cyrillic letter names
   4 religions: cat=Religion
   4 principal signs of the Ifa divination system: cat=Divination
   4 musical notes: cat=Music
   4 cantons of Switzerland: cat=Cantons of Switzerland
   4 Latin script letter names: cat=Latin letter names
   4 Egyptian calendar months: cat=Egyptian calendar months
   4 Chinese sexagenary cycle terms: cat=Chinese sexagenary cycle terms
   4 Chinese heavenly stems: cat=Chinese heavenly stems
   3 religious texts: cat=Religion
   3 provinces of Armenia: cat=Provinces of Armenia
   3 poetic meter: cat=Prosody
   3 countries of Africa: cat=Countries
   3 brain lobes: cat=Brain
   3 Hebrew calendar months: cat=Hebrew calendar months
   3 French Republican Calendar months: cat=Months
   3 Chinese earthly branches: cat=Chinese earthly branches
   2 titles: cat=Titles
   2 territories of Canada: cat=Territories of Canada
   2 states of Mexico: cat=States of Mexico
   2 states of Brazil: cat=States of Brazil
   2 solar months: cat=Solar months
   2 romantic orientations: cat=Romantic orientations
   2 regions of the Philippines: cat=Regions of the Philippines
   2 reds: cat=Reds
   2 provinces of the Philippines: cat=Provinces of the Philippines
   2 provinces of Turkey: cat=Provinces of Turkey
   2 provinces of Nepal: cat=Provinces of Nepal
   2 provinces of Indonesia: cat=Provinces of Indonesia
   2 provinces of Canada: cat=Provinces of Canada
   2 prefectures of Japan: cat=Prefectures of Japan
   2 lunar months: cat=Hindu lunar calendar months
   2 human anatomy direction adjectives: cat=Medicine
   2 grammatical numbers: cat=Grammar
   2 grammatical genders: cat=Grammar
   2 felids: cat=Felids
   2 era names of Qing: cat=Chinese era names
   2 divisions of Bangladesh: cat=Divisions of Bangladesh
   2 districts of Kerala: cat=Districts of Kerala
   2 dentistry location adjectives: cat=Dentistry
   2 days of the Elvish week: cat=Days of the week
   2 countries of Oceania: cat=Countries in Oceania
   2 countries of North America: cat=Countries in North America
   2 counties of Romania: cat=Counties of Romania
   2 blues: cat=Blues
   2 Hebrew script letter names: cat=Hebrew letter names
   2 Bengali calendar months: cat=Months
   2 Assamese calendar months: cat=Months
   2 Armenian calendar months: cat=Armenian calendar months
   2 Arabic script letter names: cat=
   1 ʾinna and her sisters: cat=Sisters of ʾinna
   1 vilayets of Turkey: cat=Vilayets of Turkey
   1 varieties of Latin: cat=
   1 varieties of Egyptian: cat=
   1 traditional units of time: cat=Units of measure
   1 times of day: cat=Time
   1 taxonomic ranks: cat=Taxonomy
   1 states of the United States: cat=States of the United States
   1 states of Sudan: cat=States of Sudan
   1 states of Malaysia: cat=States of Malaysia
   1 squarks: cat=Squarks
   1 sequence of days: cat=Georgian point-in-time adverbs
   1 scale degrees: cat=Music
   1 religionists: cat=Religion
   1 regions of France: cat=Administrative regions of France
   1 punctuation: cat=Punctuation marks
   1 provinces of Laos: cat=Provinces of Laos
   1 provinces of Iran: cat=Provinces of Iran
   1 provinces of Cambodia: cat=Provinces of Cambodia
   1 provinces of Algeria: cat=Provinces of Algeria
   1 provinces of Afghanistan: cat=Provinces of Afghanistan
   1 nouns with long construct singular: cat=Arabic nouns with long construct singular
   1 moon of Earth: cat=Moon
   1 months-Nigeria: cat=Months
   1 months-Benin: cat=Months
   1 ministers: cat=Government
   1 military ranks: cat=Military ranks
   1 lunar phases: cat=Calendar
   1 lunar months: cat=Months
   1 leptons: cat=Leptons
   1 kinship generations: cat=Family
   1 kartvelian languages: cat=Language
   1 influential stars: cat=
   1 gregorian calendar months: cat=Months
   1 governorates of Egypt: cat=Governorates of Egypt
   1 foreignisms: cat=Foreignisms
   1 fingers-humorous: cat=Fingers
   1 felids: cat=
   1 family members: cat=Family
   1 elementary particles: cat=Subatomic particles
   1 dwarf planets of the solar system: cat=Dwarf planets
   1 dwarf planets of the Solar System: cat=Dwarf planets of the Solar System
   1 divisions of West Bengal: cat=Divisions of West Bengal
   1 divisions of Myanmar: cat=Divisions of Myanmar
   1 districts of Mymensingh: cat=Mymensingh Division
   1 dialectisms: cat=Dialectisms
   1 descendants: cat=Family
   1 degrees of comparison: cat=Grammar
   1 decennial cycle: cat=Decennial cycle years
   1 days: cat=
   1 days relative to today: cat=Time
   1 cranial nerves: cat=Neuroanatomy
   1 countries of the Indian subcontinent: cat=Indian subcontinent
   1 countries of South America: cat=Countries in South America
   1 countries of Central America: cat=Countries in Central America
   1 countries of Central America: cat=Countries
   1 countries of Asia: cat=Countries
   1 countries in the Middle East: cat=Middle East
   1 counties of Albania: cat=Counties of Albania
   1 computer keys: cat=Computing
   1 clause elements: cat=Clause elements
   1 canids: cat=
   1 bones of the tarsus: cat=Skeleton
   1 blues: cat=Colors
   1 blacks: cat=Blacks
   1 arithmetic operations: cat=Arithmetic
   1 antileptons: cat=Leptons
   1 administrative divisions of South Korea: cat=
   1 Wiccan Sabbats: cat=Wicca
   1 Vikram Samvat calendar months: cat=Hindu lunar calendar months
   1 Vaṅga Bengali calendar months: cat=Months
   1 Tamil script vowels: cat=Tamil letters
   1 Tamil script letters: cat=Tamil letters
   1 Tamil script consonants: cat=Tamil letters
   1 Syriac script letter names: cat=
   1 Sylheti calendar months: cat=Months
   1 Sundanese script consonants: cat=Sundanese letters
   1 Shahmukhi script vowels: cat=Punjabi letters
   1 Shahmukhi script digraphs: cat= Punjabi letters
   1 Russian script letters: cat=
   1 Roman mythology Olympian gods: cat=Roman deities
   1 Relative quality adjectives: cat=Relative quality-->
   1 Regions of Morocco: cat=
   1 Prophets of Judaism: cat=Prophets of Judaism
   1 Parsi calendar months: cat=Persian months
   1 Page 7266 Template:list:noble ranks/th: Found match for regex: 
   1 Page 7091 Template:List:Peruvian ecoregions/qu: Found match for regex: 
   1 Page 4 Template:list preload: Found match for regex: 
   1 Page 12118 Template:mkj-months: Found match for regex: 
   1 Page 12117 Template:ang-months: Found match for regex: 
   1 Okinawan calendar months: cat=Okinawan months
   1 Odia calendar months: cat=Hindu lunar calendar months
   1 Meetei Mayek script letters: cat=Manipuri letters
   1 Maori calendar months: cat=Months
   1 Malayalam script letters: cat=Malayalam letters
   1 Kojoda months: cat=Kojoda months
   1 Kamrupi Assamese calendar months: cat=Months
   1 Japanese scribal abbreviations: cat=Japanese months-->
   1 Japanese calendar months: cat=Japanese calendar months
   1 Hindu calendar months: cat=Hindu calendar months
   1 Hindu Jovian years: cat=Hindu Jovian years
   1 Gurmukhi script vowels: cat=Punjabi letters
   1 Gurmukhi script letters: cat=Punjabi letters
   1 Gurmukhi script diacritics: cat=Punjabi letters
   1 Gujarati script letters: cat=Gujarati letters
   1 Greek script letter names: cat=Greek letter names
   1 Greek mythology Titans: cat=Greek deities
   1 Greek mythology Olympian gods: cat=Greek deities
   1 Greek mythology Muses: cat=Greek deities
   1 Fingers: cat=Fingers
   1 Ending at Sun Jan 12 21:03:12 2025
   1 Elapsed time: 0 mins 0.04 secs
   1 Devanagari script vowels: cat=Translingual letters
   1 Devanagari script letters: cat=Sanskrit letters
   1 Devanagari script letters: cat=Newar letters
   1 Devanagari script letters: cat=Nepali letters
   1 Devanagari script letters: cat=Maithili letters
   1 Devanagari script letters: cat=Kashmiri letters
   1 Devanagari script letters: cat=Hindi letters
   1 Devanagari script letters: cat=Gamale Kham letters
   1 Devanagari script consonants: cat=Translingual letters
   1 Days of the week: cat=Days of the week
   1 Chinese radicals: cat= 
   1 Chinese calendar months: cat=Chinese months
   1 Cereals: cat=Grains
   1 Burmese zodiac signs: cat=Burmese zodiac
   1 Bengali script vowels: cat=Translingual letters
   1 Bengali script letters: cat=Bengali letters
   1 Bengali script letters: cat=Assamese letters
   1 Bengali script consonants: cat=Translingual letters
   1 Bengali calendar months: cat=Bengali calendar months
   1 Beginning at Sun Jan 12 21:03:12 2025
   1 Baybayin script letters: cat= 
   1 Assyrian letter names: cat=Syriac letter names
   1 Assyrian Neo-Aramaic vowel names: cat=
   1 Arabic script letters: cat=Brahui letters
   1 Arabic script letters: cat= Punjabi letters
   1 Arabic script diacritics: cat= Punjabi letters
   1 Ancient Greek diacritics: cat=Diacritical marks

What you can see is:

42 languages (including English) categorize their Gregorian calendar months into LANG:Gregorian calendar months but 107 categorize just into LANG:Months. I'd like to fix that, and I think we have one of two options. (1) categorize only into LANG:Gregorian calendar months; (2) categorize both into LANG:Gregorian calendar months and LANG:Months. (English does (1) but then manually adds Category:en:Months to all the months.)
Templates named countries of Continent vs. categories countries in Continent. The logic used in Module:place and related data modules is that "of" is usually reserved for political/administrative subdivisions of a country or sub-country-level entity while "in" is used for random lists (hence Category:en:Counties of Texas, USA and Category:en:Municipalities of Minas Gerais, Brazil but Category:Cities in Texas, USA and Category:Towns in Minas Gerais, Brazil). By this reckoning, a continent is not a political entity so it's Category:en:Countries in Africa not #Category:en:Countries of Africa. I propose making the template names follow the same logic.
Templates like Devanagari script letters map to language-specific categories. Should they instead map to a topic cat LANG:Devanagari script letters (and similarly for other Foo script letters categories)? Benwing2 (talk) 03:23, 13 January 2025 (UTC)Reply

Importing hatnote templates from Wikipedia

Latest comment: 8 days ago10 comments4 people in discussion

for the purposes of non-mainspace (Wiktionary and Appendix namespaces), I wish to import some hatnote templates from Wikipedia, such as the ones below:

Template:Main (as {{main}})
Template:For (as {{for}}; or {{for2}} for backwards compatability)
Template:Further (as {{further}})
Template:See also (as {{also2}})
Template:Self-reference (as {{selfref}})

these would be helpful as they are already intuitive for wiki readers and writers. some of these are also being imported to fix their visual output, as some are misuing the bare : ''a'' syntax, which is bad for both desktop and mobile. Juwan (talk) 21:52, 13 January 2025 (UTC)Reply

@JnpoJuwan: could you give some examples of how you intend to use them? — Sgconlaw (talk) 22:55, 13 January 2025 (UTC)Reply

Generally I discourage importing code from Wikipedia as it almost always makes for a maintenance headache and usually there is already a way to do that in Wiktionary. In this case, we already have {{also}} for confusable terms, and a name like {{also2}} would be very confusing to editors. I think if it's desired to have a hatnote functionality for Wiktionary and Appendix there should be a single template called {{hatnote}} or similar, and in fact it already exists; so do {{selfref}} and {{main}}, along with {{maincat}}. I'd definitely discourage having more hatnote templates that don't obviously differ from each other. Benwing2 (talk) 23:30, 13 January 2025 (UTC)Reply

I see no valid use for the templates not created yet. The first three take account of the encyclopedic character of a project, which Wiktionary lacks, the fourth displays the same as our {{also}}, the fifth is already created for links to internal pages. Fay Freak (talk) 21:38, 14 January 2025 (UTC)Reply

Well, turns out we have encyclopedic content on appendix pages, which is where {{main}} is used on 152 pages, some of uses of which are ugly webdesign (some rhyme pages as Rhymes:Indonesian/i-), some redundant (e.g. Wiktionary:List of languages/special, where the link to the main page is auto-created due to the present page being a subpage). I doubt that Wikipedia editors are going to write those linguistic appendices, though, so as Benwing I am shrewdly afeared that the presence of the templates is disproportionate maintenance burden. Fay Freak (talk) 21:47, 14 January 2025 (UTC)Reply

I believe that these would be helpful exactly for quick editing in internal pages, on the appendix and Wiktionary namespaces. it's not that only Wikipedia editors will write those, but I am suggesting these are globally well-known and helpful in the case that I suggested, as I find them missing sometimes. Juwan (talk) 22:38, 14 January 2025 (UTC)Reply

for a concrete example, I wanted these when writing Wiktionary:Images. I would find it also helpful for navigation around other policy pages. Juwan (talk) 22:43, 14 January 2025 (UTC)Reply

@JnpoJuwan You need to be specific about how the existing {{hatnote}} and such templates aren't sufficient. Just saying "let's do what Wikipedia does" isn't enough. Benwing2 (talk) 22:46, 14 January 2025 (UTC)Reply

I may be doing a bad job at explaining myself. the {{hatnote}} templates may be sufficient, however, these would be practical in quickly writing the wikitext with simpler formatting, including links and all. I don't mean that these templates actually have to be copies from the code on Wikipedia, as the templates sure a lot to deal with. rather, their outputs should be similar.

I apologise for this discussion having a lot of words to try to explain a few. Juwan (talk) 22:58, 14 January 2025 (UTC)Reply

Especially since I specifically deny that people who know it from alleged global usage would deploy them for some quick editing in Wiktionary-internal pages. Editors are not equally invested in Wiktionary as well as Wikipedia or any project where these templates equally make sense and if so it would take extra steps for them to try use these templates and be seriously disappointed if they are referred to an existing more general method ({{hatnote}}) to format their text. We aren’t that heavy in formulated guidelines and policy pages. So currently the prospective uses of the additions – they hardly deserve the attribute “practical” – are pushed into the background by the maintenance concern. We actively remove rarely used templates and template redirects that do the same job as other templates that are oftener. Fay Freak (talk) 23:07, 14 January 2025 (UTC)Reply

Old Albanian

Discussion moved to Wiktionary:Language treatment requests#Old Albanian.

Usage of etym-only codes: what should or shouldn't be appropriate?

Latest comment: 8 days ago1 comment1 person in discussion

One thing I see often debated is how etym-only codes are to be used. Some argue only in etymology sections for categories (why not in {{cog}} or {{nocog}}?) Some argue they should be allowed in descendants sections. Some complain about uses such as купорос, where Middle Russian is an ety-only code for Russian, and this would be better handled as an alt form.

I'm not going to lie, I'm not optimistic about editors coming to some sort of consensus, but I figure it's worth a try to start the discussion and get input as well as potential problems or solutions to each approach.

Personally, I feel that etym-only codes are fine in any etymology template as well as descendants, which is a sort of "reverse" etymology section, and it could also be helpful in organizing things such as dialectal forms. For example, Polish has a few etymology codes - instead of everything being in just the alts section, it would be possible to show how each dialect group handled a given reflex to show better comparison of regional differences. Vininn126 (talk) 21:39, 14 January 2025 (UTC)Reply

Removing non-lemmas from Special:Random

Latest comment: 2 days ago16 comments13 people in discussion

In Wikipedia, a series of pages called disambiguation pages were removed from the pool of pages that Special:Random can direct you towards. Even though disambiguation pages make up 225,920/6,939,921 or ~3.26% of the total pages on Wikipedia, their repetitive and auxiliary rather than informative nature was enough that their removal was deemed an improvement to the overall "Random article" button experience.

If one is to consider the state of affairs prior to the above change as an issue, then here on Wiktionary, what we have is a much bigger version of it. Like disambiguation pages, our non-lemma pages are built to direct readers to the word's canonical version. Unlike disambiguation pages, however, non-lemmas actually tend to form a sizable chunk of entries for most languages with active editors; our Wiktionary:Statistics page and the many non-lemma and lemma category pages (highlighted blue just now) can provide a rough view of the situation, where the corresponding categories for English and French show that the amount of non-lemma entries is respectively ~61% and ~310% (!!!) the amount of lemmas.

It's worth noting that English is a weakly inflected language: The proportion of non-lemmas becomes far more noticeable if you take a highly inflected one like Finnish. User:Jberkel keeps a list of wanted entries, where most of them are non-lemma entries. If we took the stats for his latest data dump and began creating all the pages in there, the creation of the Finnish pages alone would nearly quadruplicate our total amount of entries as well as heavily skew the Special:Random output.

With these and other more anecdotal considerations — such as my own experience with successive "Random entry" clicks and commentary by other users in the Wiktionary Discord server — in mind, I'd like to know if users here would be in favor of changing Special:Random to remove non-lemma pages from its pool. This is, pages where all of their definitions are non-lemmas. The way I imagine it, it'd be implemented by having code check for the absence of any "LANG lemmas" categories in the page, perhaps even by checking "whether any categories end in lemmas" and if that's not the case, then the page is not eligible for showing up in Special:Random and would therefore be invisible/ignored by it.

Should people agree, I aim to get this proposal to Phabricator and use this discussion as evidence of community consensus (as has been done in the past) to hopefully get this sorted out! I'm looking forward to potentially going back to enjoying my favorite Wiktionary pastime. MedK1 (talk) 22:16, 14 January 2025 (UTC)Reply

Strong support Fay Freak (talk) 22:47, 14 January 2025 (UTC)Reply

Support Trooper57 (talk) 20:29, 15 January 2025 (UTC)Reply

Support. — Sgconlaw (talk) 22:33, 15 January 2025 (UTC)Reply

Support! Polomo47 (talk) 02:30, 16 January 2025 (UTC)Reply

Support Vininn126 (talk) 09:16, 16 January 2025 (UTC) Vininn126 (talk) 09:16, 16 January 2025 (UTC)Reply

Strong support Chihunglu83 (talk) 18:24, 16 January 2025 (UTC)Reply

Support, although I have little hope this will get implemented. Jberkel 13:44, 17 January 2025 (UTC)Reply

Indeed. Vininn126 (talk) 13:45, 17 January 2025 (UTC)Reply

Well, depends what people use Random for, doesn't it? If you want a word for its meaning and definition: lemmas are best. If you want something like a dice roll (a random valid word for a word game, or for composing surreal Dada poetry) then maybe you want all possible forms. I don't use it so I don't know the use cases. 2A00:23C5:FE1C:3701:2921:96CC:86C1:8A99 13:48, 17 January 2025 (UTC)Reply

Well, sure, but good luck getting this implemented - I don't think any such feature exists in MediaWiki.

Here's a more workable proposal: Many years ago we used to have a link in the sidebar, immediately below "Random page", which took you to WT:Random page, where you pick a language and get sent to a random lemma in that language. I'm not sure why it was removed. We could very well add that back, or at least make it easier to find. This, that and the other (talk) 14:17, 17 January 2025 (UTC)Reply

In the MediaWiki codebase there seem to be a hook to override Special:Random, but this would require custom development just for Wiktionary, which is unlikely to happen (unless there's a patch coming from the community, but even then there might be security concerns). Jberkel 14:37, 17 January 2025 (UTC)Reply

Support TTO's proposal. I would use a "random English lemma" feature more than Special:Random even if Special:Random only returned lemmas, because I'm monolingual. — excarnateSojourner (ta·co) 18:29, 17 January 2025 (UTC)Reply

Strong support despite technical limitations pointed by Jberkel and TTO. Making Wiktionary:Random page (a very interesting page I wasn't aware of before this) more visible and accessible is also a great idea. Svārtava (t ɕ) 18:16, 20 January 2025 (UTC)Reply

Support although @MedK1 you have a typo up above; you wrote:

perhaps even by checking "whether any categories end in lemmas" and if that's the case, then the page is not eligible for showing up in Special:Random

when you presumably mean "if that's not the case". Benwing2 (talk) 09:21, 21 January 2025 (UTC)Reply

Oh, true! I'll fix it right away, thank you! MedK1 (talk) 15:29, 21 January 2025 (UTC)Reply

adding a topic category 'Religions'

Latest comment: 1 day ago11 comments5 people in discussion

Very surprisingly, we don't seem to have a set category listing religions, although we have CAT:en:Religion (a related-to category, which has 1,241 items and is in serious need of subclassifying) and even CAT:en:Religious occupations (?!). Any objection to me adding one? There are currently four lists of religions e.g. Template:list:religions/en and similarly for Telugu (although it's garbage, with English terms mixed in), Syriac and Georgian, and I can redirect the terms in those lists to the new 'Religions' categories. (Or should it be 'Religious movements'?) Benwing2 (talk) 05:18, 15 January 2025 (UTC)Reply

I also propose a set category 'Taxonomic ranks' (kingdom, phylum, class, order, family, genus, species and many others, e.g. subspecies, superclass, tribe, etc.). Benwing2 (talk) 08:04, 15 January 2025 (UTC)Reply

This can easily be filled by Template:list:taxonomic ranks/fa and 9 templates of the form Template:list:taxonomy/CODE (zh, ko, my, th, vi, ja, km, hi, ml), which should be renamed to Template:list:taxonomic ranks/CODE. Benwing2 (talk) 08:06, 15 January 2025 (UTC)Reply

@Benwing2 How would you propose to handle some things that are not exactly taxonomic ranks, to wit, clade, taxon, group? There are also ambiguous ranks like section, which is used both as a subgeneric and suprageneric 'rank'. I assume that we are talking only about current/recent definitions. I am not sure where exactly group names like cohort, series, and division fit in. There is also a bit of ambiguity because botanists, zoologogists, prokaryotists, virologists, horticulturalists, etc. each have different codes which seem to affect terms like empire, domain, kingdom, and realm at the very top (or bottom?) of the chain. It is probably just a matter of presentation, with some items appearing outside of the main sequence of ranks. DCDuring (talk) 21:49, 20 January 2025 (UTC)Reply

If I'm understanding things right, terms like clade and taxon are at the meta-level; they are types of groupings. So maybe we should have a Category:Taxonomic groupings or Category:Types of taxonomic groupings to include them. Benwing2 (talk) 22:10, 20 January 2025 (UTC)Reply

No those first three are terms for taxonomic names that may appear in the hierarchy anywhere, in principle. (Also, it is desirable that any taxonomic name be a clade, in the sense of a taxonomic group that includes a single group of organisms and all its descendants. For taxa in general this is at best a hypothesis or a goal.) I don't see why these and things like nothospecies, nothogenus, variety, oogenus, form, section, and cohort etc, should appear in a separate category. Each of them can make sense in a inheritance/descent taxonomic sequence of groups of organisms. DCDuring (talk) 22:36, 20 January 2025 (UTC)Reply

OK you know more than I do about this. I just think that 'taxonomic ranks' should only include actual ranks (inclusively; any term used as a rank by any related field should count); meta-terms like clade should go somewhere else, or only in Category:Taxonomy. Benwing2 (talk) 22:58, 20 January 2025 (UTC)Reply

I agree, except that I think the group names that can appear in multiple positions in taxonomic trees should also be included, specifically clade, section, series, division, as well as sames such as cohort, and also group (which is often used in nearly SoP terms like species group, stem group, informal group, etc.) and, possibly, taxon. DCDuring (talk) 16:21, 21 January 2025 (UTC)Reply

Support both. — excarnateSojourner (ta·co) 18:08, 17 January 2025 (UTC)Reply

No opinion either way, but note that we do have Appendix:Religions, which may explain the absence of a category. Andrew Sheedy (talk) 18:20, 17 January 2025 (UTC)Reply

I'm indifferent ultimately, but I will just note that the way that "religion" is used can be ambiguous and there may be complications. For instance, I routinely hear someone saying that (e.g.) Catholicism is a "religion", when I think of Catholicism as a subset of Christianity and Christianity is the "religion". And just like the word "language", there is a language like English and there's the sense of "he uses foul language" or "I don't like your (use of the English) language" and I think the ambiguity is similar here. So, to stop rambling: I agree with the kind of taxonomic approach (where you have examples like "Christianity > Catholicism > Roman Catholicism" etc.), but I can easily imagine a lot of disagreement or misuse of the category tree by putting things in the top level that belong lower in the taxonomy. —Justin (koavf)❤T☮C☺M☯ 18:27, 17 January 2025 (UTC)Reply

Kiautschou German Pidgin

Latest comment: 5 days ago2 comments2 people in discussion

An entry has just been created at Gobenol with no headword and incorrect language codes. The problem is that we don't have a language code for Kiautschou German pidgin, so @WorldPeaceIsNotFarAway "improvised". It also should be moved to lowercase gobenol, if we decide to fix it rather than delete it. Chuck Entz (talk) 16:01, 15 January 2025 (UTC)Reply

@Chuck Entz: Kiautschou German pidgin makes interesting reading. The capitalisation of the noun is original and conforms to German grammar, so I'd say the entry's in the right place. The pidgin has no ISO 639-3 code code, so we'll have to make our own; de-kiau, perhaps? 0DF (talk) 01:32, 18 January 2025 (UTC)Reply

Update on enabling dark mode

Latest comment: 1 day ago6 comments4 people in discussion

A while back I made a request to Phabricator (phab:T381058) requesting that dark mode be enabled for logged-out users. Recently, User:Jdlrobson (one of the developers leading the dark mode project) informed me that it would not be enabled until there were fewer "accessibility issues" (i.e. not enough contrast) in dark mode than in light mode. The results of the evaluation are here:

https://night-mode-checker.wmcloud.org/enwiktionary/light.html (1018 errors)
https://night-mode-checker.wmcloud.org/enwiktionary/night.html (2068 errors)

The number in light mode is surprisingly high, but from spot checking a few of them it seems like the tool is very sensitive and catching colour combinations that aren't really hard to distinguish in practice. The dark mode contrast issues are much more severe, so that's what we should prioritize for now.

Therefore I ask everyone reading this: if you edit a language, please make sure that its templates is using the palette colours! If a template is widely used, fixing it could bring down the contrast issues by a significant amount. I haven't had a lot of spare time lately but I will also try to help out over the weekend.

Ioaxxere (talk) 15:03, 16 January 2025 (UTC)Reply

@Ioaxxere: probably best to add something to “Wiktionary:Templates” about this. — Sgconlaw (talk) 18:27, 16 January 2025 (UTC)Reply

As I said at GP recently, there is a very long tail (probably thousands) of non-dark-mode-compliant templates that are used on only a handful of pages. Many of them have other issues besides colour (poor padding/spacing, needless fixed or relative width settings, small text, ...) and I have been converting many of these to {{inflection-table-top}}.

Sometimes this conversion gives rise to objections - and some editors say they would prefer to have been consulted beforehand - but in most cases the change is relatively minor, and if it is not liked by a group of editors, it can easily be reverted and discussed after the fact. Moreover, the conversion as a whole seems quite uncontroversial. I have converted templates in countless languages and I've received very few complaints - concerning only Greek (which was able to be resolved), Old English (still unresolved - although, to be fair, I did take the liberty of rearranging the template at the same time...) and Afar (just today).

@Jdlrobson according to the random sample at [16] we have 27% of pages that are non-compliant. What threshold would you consider acceptable? I would also add that the report doesn't indicate which template has the problem, which is a bit of a time-waster. Surely the name of the template can be extracted from the Parsoid HTML. Is it possible to improve this? This, that and the other (talk) 23:59, 16 January 2025 (UTC)Reply

At mw:Recommendations for night mode compatibility on Wikimedia wikis#Use accessible colors which pass WCAG AA checks there are browser extensions for Chrome and Firefox. I checked 吃瓜 and with such contrast checker I can see the issue in "trad" and "simp" from {{zh-forms}}. Note that this report is from the 500 most visited pages, and the figures change with weekly updates. The threshold is simply fewer issues in night mode than light mode, supposing that in light mode they are minor issues not reported so far. Vriullop (talk) 07:23, 17 January 2025 (UTC)Reply

Romance definite article boxes

We need to decide what to do with Romance definite article templates like {{Italian definite articles}}, {{Mirandolese Emilian definite articles}}, ... Template:Italian definite articles In my opinion these stick out like a sore thumb even in regular light mode:

in no other situation do we show inflection information in a right-floating box
they are multicoloured for some reason
the grey colour have poor contrast with the text

I'd like to propose moving these under an "Inflection" or "Declension" L4 header within the relevant "Article" POS, which is where most languages place similar tables. Naturally I would also convert them to {{inflection-table-top}}:

LANG definite articles
	singular	plural
masculine	...	...
feminine	...	...

Any comments or objections? This, that and the other (talk) 11:10, 17 January 2025 (UTC)Reply

There being no objections, this particular item is

Done. This, that and the other (talk) 10:02, 22 January 2025 (UTC)Reply

what counts as a "country"?

Latest comment: 3 days ago47 comments16 people in discussion

I am cleaning up all the list templates and I notice that someone has stuck Do NOT add Kosovo here in a bunch of "countries of Europe" lists. Just 4 days ago, despite this, an IP went ahead and stuck it into Template:list:countries of Europe/en, noting (reasonably IMO) that Template:list:countries of Asia/en includes both Palestine and Taiwan, despite neither having full diplomatic recognition and Palestine being under military occupation. In order to head off controversy, I'd like to get consensus on which countries to include and which ones not to include. I propose:

Taiwan, Palestine and Kosovo all go in these lists (as well as the other obvious candidates listed under Wikipedia's Category:States with limited recognition, which include Armenia, China, Cyprus, Israel, North Korea and South Korea).
Other de-facto-independent countries with wide but partial diplomatic recognition and no military dispute involved also go in these lists (notably, the Cook Islands and Niue, which are technically "self-governing in free association with New Zealand", but there may be other island nations in a similar situation).
"Frozen conflict" areas that have very limited diplomatic recognition *DO NOT* go in these lists. This includes e.g. Transnistria, Abkhazia, South Ossetia, the occupied parts of Ukraine, Northern Cyprus, Somaliland, Puntland, and various more obscure areas listed in Category:States with limited recognition.
Constituent countries probably *DO NOT* go in these lists. (Although there is now support in {{col}} for indented sublists, which will show up as parenthesized sublists in horizontally-laid-out lists once I finish the support for this; so conceivably we could put Greenland and the Faroe Islands indented under Denmark; England, Scotland, Wales and Northern Ireland indented under the United Kingdom; etc.)
Finally, I don't know what to do about the Sahrawi Arab Democratic Republic (which claims Western Sahara but controls only a fraction of it), as I don't know whether it's more controversial to include it or leave it out. Per Wikipedia, as of Sep 2022 it had diplomatic relations with 46 states (but not including the US, Canada, anywhere in Europe, China, Russia, India or Brazil), with recognition frequently given and withdrawn; see International recognition of the Sahrawi Arab Democratic Republic for the gory details.

The intent here is to take the least controversial and least POV positions, as we're a dictionary and not in the business of being politically controversial. (You could maybe argue that it's best to not include any partly-recognized state, but that would entail leaving out not only Taiwan, Palestine and Kosovo but China, North and South Korea, Israel, etc. etc., which feels needlessly POINTy and hardly the least controversial approach.) Benwing2 (talk) 07:00, 17 January 2025 (UTC)Reply

I think we should go by the most common definition and simply list all 193 member states of the United Nations and its two observer states. Anything else would basically (accidentally) be Wiktionary making a statement; this is the least controversial option as it is the most basic—albeit somewhat random and illogical—definition.

The second choice is the above plus the Cook Islands, Kosovo, Niue, SADR and Taiwan. I would argue for the inclusion of SADR because ~40 countries recognising it are four times that of Taiwan, so listing just Taiwan would seem somewhat strange. The Cook Islands and Niue are very weird because they are essentially the exact same as the United States' “Compact of Free Association” rubbish (Marshall Islands, Federated States of Micronesia and Palau), the only difference being that the American ones are UN members whereas the New Zealand ones are not. However, as UN membership = country is quite the elementary definition, and, seeing as the Cook Islands and Niue have the exact same self-governance as the US's states, they should be kept.

But again, I am in favour of the first option because IMO it should not really be up to Wiktionary to decide what a “country” is here. LunaEatsTuna (talk) 09:21, 17 January 2025 (UTC)Reply

To be honest I think leaving out Taiwan makes us look "less neutral" than including it, simply because most countries acknowledge that it's a de facto independent country (but they can't say they have "diplomatic relations" with it), have some kind of representative office there (but can't call it an "embassy"), and so forth. To me, leaving out Taiwan is adopting the PRC POV.

On the whole I generally prefer Benwing's proposal (Luna's second choice). I don't really have a strong view on how we treat Western Sahara - if we don't include limited-recognition states I'd lean against treating it as a country, as it is effectively a government-in-exile at this point.

Another idea could be to emulate what Wikipedia does - it places limited-recognition states on the same level as "regular" countries. Compare their Template:Economy of Europe for instance. This, that and the other (talk) 10:48, 17 January 2025 (UTC)Reply

The English language definition of a country should guide what political entities are considered countries in Wiktionary entry definitions. The United Nations membership is one factor, but it is arbitrary and is not ultimate authority on which political entities meet the English language definition of country, the English language speaker understanding of "country" is the authority. Taiwan meets the country definition; it also meets other definitions. --Geographyinitiative (talk) 11:02, 17 January 2025 (UTC)Reply

I agree with the point about Taiwan and with Benwing's proposal/Luna's 2nd choice being the best one. MedK1 (talk) 17:01, 17 January 2025 (UTC)Reply

@LunaEatsTuna: Note: Cook Islands and Niue are far from the exact same as the US's COFA, as there are notable differences. The two entities do not have their own citizenship laws and rely on NZ citizenship. They are a part of Realm of New Zealand and New Zealand does not consider them sovereign states. As such, this year, Cook Islands PM Mark Brown confirmed that they do not meet the UN criteria for membership, likely due to the relationship with New Zealand. Notably last month, New Zealand also rejected the Cook Islander request for a separate passport. This is in stark contrast to the COFA situation where the United States provides services to the countries, but has no control over their citizenship and foreign affairs. The citizens of the Marshall Islands, FSM, and Palau are citizens of those respective countries and not U.S. citizens, and they each have their own passports. Hence why they have been admitted to the UN as member states, as they are fully sovereign. They do not belong to or make part of the U.S. I just want to make it clear that the two situations are completely different. AG202 (talk) 15:44, 17 January 2025 (UTC)Reply

The Marshalls, Micronesia, and Palau are free to leave their compact of free association with the United States and not under its sovereignty (tho they are "insular" areas of the United States) and everyone there has citizenship in those three states. No one on Earth has "Cook Islands citizenship": they are all New Zealand citizens a part of the larger realm of New Zealand, just like those on Tokelau. —Justin (koavf)❤T☮C☺M☯ 16:26, 17 January 2025 (UTC)Reply

@Koavf: While they are insular areas, they are not a part of the United States, so there's no way for them to "leave". AG202 (talk) 17:13, 17 January 2025 (UTC)Reply

Exactly my point: they are not part of the United States. They can leave the compact of free association if they want. —Justin (koavf)❤T☮C☺M☯ 17:14, 17 January 2025 (UTC)Reply

Ahhh got it, the phrasing "are free to leave the United States" made it sound like they were a part of it, but I see what you mean now. AG202 (talk) 17:25, 17 January 2025 (UTC)Reply

Yes, I worded that in such a confusing and misleading way that you were perfectly reasonable in correcting me. —Justin (koavf)❤T☮C☺M☯ 17:27, 17 January 2025 (UTC)Reply

Everything that fulfils the three criteria of statehood – population, territory, and government. International recognition is only declaratory, not constitutive (communis opinio). Wikipedia editors only find it relevant as they rest on tertiary sources rather than judge the available material (which we have to anyway to see the linguistic situation of a country realistically rather than just mentioning it indirectly). Again they made up the distinction between de facto and de jure countries.

Apparently they call the three criteria Montevideo checklist or Montevideo criteria for statehood in English. After the Second World War, when the United Nations were instituted, it seems to have been difficult to swallow that these pillars of international law have been formulated in Germany, by Georg Jellinek, → Drei-Elemente-Lehre. Anyway you have a master’s thesis by Ali Zounouzy Zadeh (2012) International law and the criteria for statehood for 60 pages dissertation where all is related to the English-reading audience, and the keywords and titles for more.

Is least controversial: If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck. Taiwan exists, Transnistria exists, Israel exists, not our business if it has a right to exist. Fay Freak (talk) 12:36, 17 January 2025 (UTC)Reply

@Fay Freak Would that definition of yours exclude or include entities that do not claim statehood but function as independent entities? Examples being Puntland and the Wa State; they are essentially countries (in the most neutral definition) with the only difference being they themselves do not use the label and cannot—in most circumstances—enter into foreign relations with other states. LunaEatsTuna (talk) 17:04, 17 January 2025 (UTC)Reply

@LunaEatsTuna: Puntland seems widely described as and even intend to be a federated state – why is this no entry? –, which would be an exclusion criterion since terminologically, and for the present categorization purpose, in English, we highlight countries from states, but this well-meaning opinion—without me visiting it to inspect its administration and legal system—does not appear to hold water since they are at war with their neighbouring states and entertain foreign relations with ministerial delegations, everything with African characteristics. In view of the government, Puntland is more of a state than Somalia is, unsurprisingly, since the latter is synonymous to anarchy, or at least the most common negative example in discussions of libertarianism or anarchocapitalism, for as long as we know the internet. Fay Freak (talk) 17:40, 17 January 2025 (UTC)Reply

I agree with your proposal and the SADR is a state. —Justin (koavf)❤T☮C☺M☯ 16:24, 17 January 2025 (UTC)Reply

Very argumentative Koavf. — Fenakhay ^{(حيطي · مساهماتي)} 16:51, 17 January 2025 (UTC)Reply

? —Justin (koavf)❤T☮C☺M☯ 16:54, 17 January 2025 (UTC)Reply

To be a state, you need a territory, a permanent population and state institutions; not just tents in a foreign country. — Fenakhay ^{(حيطي · مساهماتي)} 16:57, 17 January 2025 (UTC)Reply

The SADR has all of those things and not "just tents in a foreign country". Please don't spread misinformation. —Justin (koavf)❤T☮C☺M☯ 17:12, 17 January 2025 (UTC)Reply

Hahahahaha. Quite the joke. Hahaha. — Fenakhay ^{(حيطي · مساهماتي)} 17:15, 17 January 2025 (UTC)Reply

Stop posting your lies and stop abusing rollback. —Justin (koavf)❤T☮C☺M☯ 17:20, 17 January 2025 (UTC)Reply

Oh look who's upset for spreading misinformation and suppressing other peoples' comments. Get a grasp of reality. — Fenakhay ^{(حيطي · مساهماتي)} 17:22, 17 January 2025 (UTC)Reply

"An edit should be reverted if it is clearly and irredeemably nonconstructive". Your comments here are that. Stop posting them. —Justin (koavf)❤T☮C☺M☯ 17:26, 17 January 2025 (UTC)Reply

Wow you can partially read. Please read the whole paragraph :) (this is a discussion) . — Fenakhay ^{(حيطي · مساهماتي)} 17:30, 17 January 2025 (UTC)Reply

That is a personal attack and is not relevant to this discussion. I am blocking you from this page for 72 hours. —Justin (koavf)❤T☮C☺M☯ 17:37, 17 January 2025 (UTC)Reply

Ah, misuse of admin powers, classic move. Constructive feedback seems to be in short supply again, doesn’t it? :) — Fenakhay ^{(حيطي · مساهماتي)} 17:44, 17 January 2025 (UTC)Reply

@Benwing, Benwing2: as the person who started this thread and a fellow admin, I would like to request that you review this portion to see if it's constructive. Fenakhay, leave me alone. —Justin (koavf)❤T☮C☺M☯ 18:20, 17 January 2025 (UTC)Reply

Koavf's block against me is problematic. As the founder of the SADR WikiProject, he’s clearly not neutral on this topic. I pointed out issues with his arguments and what seemed like a deliberate misreading of the revert rules. Instead of discussing it, he blocked me. There were repeated shouts of "Stop spreading misinformation" which is ironic, as that is exactly what is happening, given that the reality is far from the fiction being portrayed. — Fenakhay ^{(حيطي · مساهماتي)} 19:13, 17 January 2025 (UTC)Reply

I don't know much of anything about the SADR, so I'm just going on what is going on here: @Koavf made a rather absolute statement and @Fenakhay disputed it in a rather dismissive way. As a native speaker of Moroccan Arabic, Fenakhay might be expected to have strong opinions on the subject, and Koavf apparently does as well. Koavf responded by being more absolute, and Fenakhay responding by being more dimissive, calling Koavf's statement a joke. Koavf removed Fenakhay's remark, which Fenakhay reverted in order to restore that remark. Koavf then accused Fenakhay of lying and abusing rollback, and attempted to block him.

Neither party can be pround of their actions here, but being dismissive/characterizing a statement as a joke is not on the same level as censoring the other person's remark, characterizing the other person's statements as "lies" and using admin powers to attempt to enforce the censorship. One is expressing the person's own opinion, and the other is trying to prevent someone else from expressing their opinion. Fenakhay's use of rollback rather than simply typing his content in by hand is pretty minor. Koavf's blocking on grounds of "Repeated unconstructive edits, personal attacks, and misuse of rollback" is much worse, and the block reason could just as easily be applied to Koavf, with the substitution of "block" for "rollback".

This subject obviously hits a raw nerve for both parties, and I would ask both of them to step back, take a deep breath, and walk away- at least for a day or two. Chuck Entz (talk) 20:16, 17 January 2025 (UTC)Reply

Thanks. This will be my last comment here. —Justin (koavf)❤T☮C☺M☯ 20:19, 17 January 2025 (UTC)Reply

Why the SADR specifically? It doesn't feel country-like to me, although I'm not faniliar with African politics. CitationsFreak (talk) 18:23, 17 January 2025 (UTC)Reply

Fenakhay proposed three qualities, which all apply to them: territory (the Free Zone), a population (the Sahrawi refugees), and state apparatuses (a military, ambassadors, membership in international organizations like the African Union), etc. They also fit the Montevideo Convention requirements. —Justin (koavf)❤T☮C☺M☯ 18:31, 17 January 2025 (UTC)Reply

I didn't propose anything. That was Fay Freak. Stop spreading misinformation. — Fenakhay ^{(حيطي · مساهماتي)} 19:15, 17 January 2025 (UTC)Reply

I told you to leave me alone. Leave me alone. Also, stop your lying, as everyone can see that you are lying. —Justin (koavf)❤T☮C☺M☯ 19:56, 17 January 2025 (UTC)Reply

Koavf brought it up specifically, which is why I asked. (Personally, I don't support including it. The UN lists it on the same level as Guam in terms of sovereignty, which we should all agree is not a country.) CitationsFreak (talk) 19:27, 17 January 2025 (UTC)Reply

I presume you mean the list of non-self-governing territories, which is not some kind of index of what is or should be a sovereign state. Also, the UN is not determinative of what is or should be sovereign in the first place. —Justin (koavf)❤T☮C☺M☯ 20:01, 17 January 2025 (UTC)Reply

I agree with Fay Freak and I would support the first choice presented by LunaEatsTuna, to be the least neutral. We could also add a separate list for partially-recognized "countries" to include those that a worth mentioning; like Taiwan and such. — Fenakhay ^{(حيطي · مساهماتي)} 16:53, 17 January 2025 (UTC)Reply

Much as I hate using this term, disputes over what is and isn't a country...ARE encyclopedic. If enough sources refer to something as a country to pass RfV, I reckon we're bound to call something a country. Purplebackpack89 18:47, 17 January 2025 (UTC)Reply

To define a term, we employ language (our working language English) to describe the thing denoted by the term and not just anything someone has called the thing in the context of the term as long as it happened three times. Otherwise we would declare Comirnaty a bioweapon. And the long list of micronations and reichtard countries would also be countries, with few attested translations though they be. Where and when to put the scare quotes? We would find more references contradicting these statements of course, “enough” sources to the contrary, but why? It is encyclopedic. Collating sources to determine what something is referred to as, rather than what a user of a linguistic symbol comprehends, is encyclopedic. They describe opinions about a thing on Wikipedia, we go directly to what is conceptualized for a term. One just has to employ circumspect language. If there is a problem with the State of Palestine and the West Bank meeting the usual idea of a state, we can just tell the issues with some hedges; a who's who is obviously out of our scope and wouldn’t solve the problem, you do that over at Wikipedia, and despite all the job conditioning for disserting sources it would still be a bad job since it would not discuss the issue of perspective, sources having to be relativized for having limited purposes in mind.

We could also dilute our concept of a country given that we just need vocabulary lists across many languages but then we can as well accept that we have a misnomer in calling a country everything someone can travel to and trade with which practically has different administration a polyglot has to know about, and therefore typically memorizes, roughly and for example. These aren’t disputes, these are different definitions. People live with different understandings of things and thus terms matched by them with them, either variously deranged, depending on which functional environment you take as a litmus test. Fay Freak (talk) 20:24, 17 January 2025 (UTC)Reply

@Benwing, your proposal (or Luna's second choice) seems reasonable to me. If we are (and we are!) a descriptivist dictionary and the point of our lists of countries is to help people find words for things which, descriptively, a significant number of speakers think of as countries, I think it makes sense to be relatively inclusive, and certainly include Kosovo and Taiwan (as long as they still have some recognition, as a sanity check). Western Sahara too is on a lot of maps, and lists of countries, and if it is recognized by dozens of countries, then yes, we should include it (but my instinct would be to list it under that name, Western Sahara, if our lists are mainly using countries' common names; only if we're making a list that has French Republic instead of France [etc] would I list it as SADR). - -sche (discuss) 19:23, 17 January 2025 (UTC)Reply

Thanks! This comment is very constructive and helpful. Benwing2 (talk) 22:24, 17 January 2025 (UTC)Reply

I agree with this. I think a certain amount of common sense is important here. If someone asks, "How many countries are there in the world", what are they likely to have in mind? I think Benwing's proposal captures the usual scope of the word fairly well. Andrew Sheedy (talk) 00:28, 18 January 2025 (UTC)Reply

For reference, see: Wikipedia's list of sovereign states. I think I can support with Benwing's proposal. I personally would prefer the UN member (or UN observer) criteria and then separate other entities out on their own line, but as mentioned, that would run into issues with ex: Taiwan and would take up space. I'm a bit hesitant on Cook Islands & Niue for the reasons I've mentioned above + their limited international recognition as independent sovereign states. However, I guess since we're using "country" instead of "sovereign state", it doesn't matter that much. AG202 (talk) 01:15, 18 January 2025 (UTC)Reply

Thanks for the comments. I also thought about using separate lines or footnotes but it gets complicated real fast. Benwing2 (talk) 01:48, 18 January 2025 (UTC)Reply

A country can also include places like Scotland, Wales, Greenland, etc but they aren't sovereign states. 115.188.138.105 11:49, 18 January 2025 (UTC)Reply

True. We know Trump wants to get his hands on Greenland. Hopefully he'll fail. DonnanZ (talk) 09:22, 20 January 2025 (UTC)Reply

How about "in a UN agency, observer state of the UN, or recognized as a state by any of the states listed"? This should every state that should in the list, along with a few stragglers (including the SADR and Cook Islands). CitationsFreak (talk) 09:33, 20 January 2025 (UTC)Reply

T:User lang-1 through T:User lang-5

Latest comment: 5 days ago12 comments4 people in discussion

Is there a reason there are 6 separate templates? I don't see why these can't just be merged into {{User lang}} and we use a switch for the color/category changes. - saph ^_^^⠀talk⠀ 15:11, 17 January 2025 (UTC)Reply

They can be. It's just easier to make the five templates than the one to the extent that it requires less technical knowledge. —Justin (koavf)❤T☮C☺M☯ 15:34, 17 January 2025 (UTC)Reply

It's not like it would make it any more complicated, though, really just changing the dash to a pipe:

{{User lang-4|en|This user speaks English at a near-native level.}}

→

{{User lang|4|en|This user speaks English at a near-native level.}}

Unless I'm misunderstanding what you mean by "technical knowledge." - saph ^_^^⠀talk⠀ 15:45, 17 January 2025 (UTC)Reply

See Special:Permalink/83630137 for examples of a merged template I put together. - saph ^_^^⠀talk⠀ 16:07, 17 January 2025 (UTC)Reply

I don't know that you're misunderstanding, I'm just answering the question you asked: it's easier to make a template that always looks one way and harder to make a template that changes how it looks based on your input. It's harder still to make one that changes twice based on two inputs. —Justin (koavf)❤T☮C☺M☯ 16:10, 17 January 2025 (UTC)Reply

It only needs one input for proficiency, the names for the CSS classes and for the categories are the same. - saph ^_^^⠀talk⠀ 16:14, 17 January 2025 (UTC)Reply

One input: language code. Another input: proficiency level. —Justin (koavf)❤T☮C☺M☯ 16:22, 17 January 2025 (UTC)Reply

I agree. Benwing2 (talk) 00:00, 18 January 2025 (UTC)Reply

If someone revamps the language templates, can they please also add support for an "inactive=" parameter (or whatever anyone wants to name it) that a bot could set en masse to take inactive users out of the proficiency categories? See vote and 2023 GP. Currently I hover over usernames in the categories with the old Navigation popups to see when they were last active, to figure out who to ping. - -sche (discuss) 01:20, 18 January 2025 (UTC)Reply

Yup, I'm completely with you on this. The main issue is that a lot of users use the built-in parser functions, which we need to work around/prohibit/whatever. Benwing2 (talk) 01:46, 18 January 2025 (UTC)Reply

Yeah, should be trivial to implement on the template end. I think a lot of our Babel templates already have a |nocat= parameter anyway. - saph ^_^^⠀talk⠀ 05:06, 18 January 2025 (UTC)Reply

Added. - saph ^_^^⠀talk⠀ 06:04, 18 January 2025 (UTC)Reply

Declension tables for Arabic dialects

Latest comment: 5 days ago2 comments2 people in discussion

The Arabic dialects' noun and adjective declension, while being limited compared to MSA, is extensive enough to be not sufficiently represented in the headword. Even if no cases exist, features like feminine constructs and definite article assimilation are simply not represented. So why hasn't there already been made basic declension tables for them? There already exists verb conjugation tables, and dialectal variation in declension is not great enough either to require more than a few tweaks to fit each dialect well. ☆ Vesper (talk) 06:43, 18 January 2025 (UTC)Reply

I'm not sure what you're referring to exactly. It's true that Arabic dialects don't get a lot of love, but neither feminine construct forms nor definite article assimilation is shown in the Arabic script and AFAIK they are quite predictable in most (if not all?) dialects. Benwing2 (talk) 07:54, 18 January 2025 (UTC)Reply

Link to Sanskrit roots on PIE root page?

Latest comment: 5 hours ago6 comments3 people in discussion

User @Victar has removed a link to ज्वल् (jval) at the PIE *ǵwelH- page, reasoning that ज्वलति (jvalati) is already mentioned. My thinking is: given that the pages for Sanskrit roots are there (and a page like ज्वल् (jval) should have its derived terms expanded), it seems logical to have a link to them at the PIE level. Thoughts? Exarchus (talk) 13:26, 18 January 2025 (UTC)Reply

Agreed, if the root-to-derived morphology pipeline is still perceived as productive or at least generally seen to exist in Sanskrit (as it seems to be). — Ceso femmuin mbolgaig mbung, mellohi! (投稿) 15:54, 18 January 2025 (UTC)Reply

What supporting data does Sanskrit ज्वल् (jval) add to PIE *ǵwelH- that ज्वलति (jvalati) doesn't? Roots are just linguistic constructs, and it seems silly to add a descents section to all PIE root pages just for Sanskrit. --{{victar|talk}} 02:50, 22 January 2025 (UTC)Reply

After my edit it already adds more: it points to derived terms that shouldn't be projected back to PIE but were (often) created within Sanskrit.

It also seems simply a matter of consistency: if A refers to B as its ancestor, then it makes sense for B to refer to A as its descendant. Exarchus (talk) 09:26, 22 January 2025 (UTC)Reply

All said derivatives can and should already be on ज्वलति (jvalati). I don't understand your argument on consistency. If we had consistency, we would have Proto-Germanic root pages, like *kul-. --{{victar|talk}} 02:45, 23 January 2025 (UTC)Reply

"All said derivatives can and should already be on ज्वलति (jvalati)." That is not how Sanskrit is currently treated on this site, the primary overview of the derivatives is at the root pages. Also because the derivatives often can't be derived from the present verb, or because there are multiple present verbs.

"I don't understand your argument on consistency." The ज्वल् (jval) page gives *ǵwelH- as its ancestor. So would make sense to have ज्वल् (jval) as descendant at the *ǵwelH- page.

"If we had consistency, we would have Proto-Germanic root pages, like *kul-." Would such hypothetical Proto-Germanic roots fit the condition given above by mellohi: "if the root-to-derived morphology pipeline is still perceived as productive or at least generally seen to exist"? Exarchus (talk) 10:06, 23 January 2025 (UTC)Reply

Lashi revamp and on mass-deletions

Latest comment: 4 days ago2 comments2 people in discussion

Hey all. Way back when in 2018-2019, I created a bunch of Lashi (Lacid) lemmas based on {{R:lsi:Luk:2017}} and {{R:lsi:Wannemacher:2011}}. While these are perfectly respectable sources by themselves, they obviously do not capture the entirety of the language by themselves, not to mention the fact that they use a scientific, phonological orthography (in the case of Wannemacher, simply IPA) to note down the 30.000-speaker language which actually has both an orthography and a Bible translation in that orthography, both of which seem to be accepted by the speakers.

Now, since I was following the above grammars, we obviously don't follow this orthography, but rather a (~~dogshit~~) semi-phonological transcription. I would like to fix this situation, perhaps by contributing again, now more carefully and following written materials (like the above-described Bible translation, as well as primers and other books written in the language over the last years), but the sheer amount of work needed to convert those lemmata that are based on the linguistic works, and may lack the appropriate information I need to actually write these terms down in the orthography (basically I would need to track down every single word in the non-literal Bible translations and pray to God [pun wholeheartedly intended] that they are attested there) is not only frightening, but simply undoable. It would be much easier to start anew, and work from the written materials, rather than from the scholarly ones.

As such, I propose we delete all existing entries in CAT:Lashi lemmas, and start anew. I could write a pronunciation template based on Wannemacher's phonological description, gather some more resources and start consistently working from the written materials. That way our readers will not get confused by the inconsistency in our entries, and we will not have to go through the herculean task of converting every single entry to a written form that is likely not even attested outside of the linguistic analyses.

I would also like to quickly start a second, related topic, which will inevitably come up if we reach a consensus on the first one. We have a number of languages that have a significant number of entries, which makes cleanup difficult, but that desperately need it nonetheless - to name a few notable cases, we have languages that have been massacred by a certain user that is now blocked, a couple of other languages that have been edited by yours truly in their early and stupid years and those whose cleanup has already been discussed not only in the Beer Parlour, but even on Reddit! So my second question is, in case the creator of the entries was indefinitely blocked for the creation of these entries, or if the editor themself agrees that maybe they should've been (like me :P), would it be acceptable to start a discussion on WT:Language treatment requests about the deletion of all lemmas/all members of a subcategory in the given language? Thanks in advance for your input. Thadh (talk) 22:37, 18 January 2025 (UTC)Reply

IMO yes yes and yes to everything you say, and in general I would advocate nuking languages that are FUBAR (Mon seems an especially flagrant example). I have disagreed with you in the past about the idea that IPA or an IPA-like transcription should never be used to represent a language, but obviously it's preferable to use a proper orthography if it exists and is in use, which it sounds like it is. Benwing2 (talk) 23:02, 18 January 2025 (UTC)Reply

Ongoing vandal

Latest comment: 3 days ago4 comments4 people in discussion

User:68.188.203.200 is creating lots of made-up nouns like insecticidality. 2A00:23C5:FE1C:3701:ACCF:6A12:5948:AB6D 01:34, 19 January 2025 (UTC)Reply

https://duckduckgo.com/?q=%22insecticidality%22&ia=web —Justin (koavf)❤T☮C☺M☯ 01:52, 19 January 2025 (UTC)Reply

Checking a sample of the user's last 50 creations, valueful, skeletality, simpliciality, exclamativity, medicinality, Facharztzentrum, Neueröffnung, エコノミクス, inhumatorio and 검붉다 all seem to be attested, though simpliciality may be a non-native speaker's word (or perhaps the reason it seems to show up mainly in technical texts by non-native speakers is just that it's a technical word). abusivity also seems to be uncommon, with modern uses by NNES, and inhumatorio does not seem to be as common as I would expect (is another word more usual for this?). For groupoidality, however, I only spot a single use on Google Scholar, and nothing on Google Books or Archive.org, so I've RFVed that one. (The fact that the user is quickly creating lots of entries in 5+ different languages does also give me some pause.) - -sche (discuss) 07:26, 19 January 2025 (UTC)Reply

The German ones are actively used in speech; the difference in content between edited languages is indicative of legitimate treatment of the material, since no one master’s multiple languages equiformly. For charity I tend to assume that the editor is a polyglot with a biomedicine background or similar. Success in some academic fields depends on your ability to juggle vocabulary more than anything. Fay Freak (talk) 21:24, 19 January 2025 (UTC)Reply

Does Unbinding Disinherit?

Latest comment: 4 days ago1 comment1 person in discussion

Can an unbound form be inherited from a bound form as opposed to merely being derived? The question is applicable to the relationship of the Pali participle kārita to as yet unentered antecedent Sanskrit कारित (kārita) ; Monier Williams only records the latter as a bound form. --RichardW57 (talk) 11:39, 19 January 2025 (UTC)Reply

proposed set categories

Latest comment: 11 hours ago19 comments6 people in discussion

WARNING: Long post. See also my above post about Category:Religions and Category:Taxonomic ranks. Pinging @-sche and @Ioaxxere as editors who have contributed intensively to prior discussions on topic categories.

I would like to add the following seemingly obvious gaps:

Category:Units of time: millennium, century, decade, year, month, etc. I also propose to include geological time units in this category: eon, era, period, etc. Note that we currently have Category:Units of measure (and Category:en:Units of measure has 632 elements and really should be split, but that can be done later).
Category:Western zodiac signs: Pisces, Sagittarius, etc. Bizarrely, we have Category:Chinese zodiac signs but no category for Western zodiac signs. We have 8 Template:list:Western astrology signs/CODE lists that can populate these categories.
Category:Religious texts: We have three Template:list:religious texts/CODE lists giving names of religious texts/scriptures of various religions. Probably they should go in their own category.
Category:French Republican calendar months: Yeah these are funny but they exist and we have three Template:list:French Republican Calendar months/CODE lists specifying them for different languages.
Category:Poetic meters: Category:en:Poetry has 447 members, ugh. We have three misnamed lists Template:list:poetic meter/CODE that can populate these categories to begin with. Many terms for poetic meters like iambic pentameter and heroic verse are either only in Category:en:Prosody or nowhere.
Category:Types of electromagnetic radiation: gamma ray, X-ray, visible light, microwaves, radio waves, infrared, ultraviolet, etc. I have been trying to avoid putting the type of topic category into the name of the category but here I don't see any alternative, as Category:Electromagnetic radiation by itself suggests a related-to category. Maybe this should be called Category:Electromagnetic waves but I dunno if this is a term of art in physics. Note that we have three lists Template:list:electromagnetic radiation/CODE listing types of electromagnetic radiation in different languages.

I would like to make the following splits/renames:

Category:Lunar months should be deleted and split into categories for specific calendars:
1. Those in Category:en:Lunar months should almost all be moved to Category:Hindu lunar calendar months, which should be renamed. Discussion in Discord has concluded that it should be either Category:Hindu calendar months or Category:Vikrami calendar months, but it's not clear which is better. The problem with Category:Hindu calendar months is that what is essentially the same calendar and should probably be unified with it is also used by Muslims in Bengal and Punjab, and the problem with Category:Vikrami calendar months is that per Wikipedia, this refers only to a subset of all the Hindu calendar variants (although some editors on Discord say it is also used in a wider sense, incorporating all the Hindu calendar variants). Wikipedia is somewhat schizophrenic; for example, w:Assamese calendar is considered a Hindu calendar, at least by its categorization, while w:Bengali calendar is not, but they are virtually the same. (This is probably because Assam is majority-Hindu while Bengal is majority-Muslim.) I personally think "Hindu calendar" is fine; note by comparison that Western digits are normally called "Arabic numerals" or "Hindu-Arabic numerals" even though the majority of users are neither Arab nor Hindu (and for that matter, most Hindus and many Arabs user different numerals); the term reflects its origin rather than its current use. An alternative, I suppose, is "South Asian calendar", but this term does not appear to be in use in this sense. (IMO what matters most for unification of these different calendar variants is whether the month names are cognate with each other, which they almost always are. Some variants in fact are solar, some are lunar and some are lunisolar, and some have different starting points, but these distinctions are not decisive.)
2. Category:ban:Lunar months should be moved to Category:ban:Balinese calendar months, as the Balinese calendar seems quite different from others (there are actually two Balinese calendars but only one has months, it seems).
3. All the Southeast Asian Buddhist calendars seem to be based on the Hindu calendar and should probably be merged into them. Alternatively, place them in subcategories of Category:Buddist calendar months.
4. The only remaining language with months in a subcategory of Category:Lunar months is Zulu, which should provisionally get its own Category:zu:Zulu calendar months. There are several lunar/lunisolar calendars used in Africa, and some of them are likely unifiable, but I don't know which ones.
Category:Books of the Bible: Split into Category:Books of the Old Testament and Category:Books of the New Testament. Probably delete Category:Books of the Bible, since AFAIK there are no books of the Bible that can't clearly be categorized into Old or New Testament. Besides the fact that the Old Testament and New Testament are associated with different religions, on a practical level Category:en:Books of the Bible has 106 members and Category:zh:Books of the Bible has 278 members. Note that we also already have 8 Template:list:books of the New Testament/CODE lists, 6 Template:list:books of the Catholic Old Testament/CODE lists and 6 Template:list:books of the Protestant Old Testament/CODE lists. (FYI the latter is essentially the same as the books of the Jewish Tanakh.)
Category:Greek deities: Split into something like Category:Greek mythology Olympian deities, Category:Greek mythology Muses and Category:Greek mythology Titans. Any that don't fit into these subcategories can stay in the supercategory. Category:en:Greek deities has 226 members and we already have 6 Template:list:Greek mythology Olympian gods/CODE lists, 5 Template:list:Greek mythology Muses/CODE lists and 2 Template:list:Greek mythology Titans/CODE lists.
Category:Fingers: Split out Category:Names of fingers (or Category:Types of fingers or just Category:Terms for fingers? Is "ring finger" the name of a finger or a type of finger?). Same issue as Category:Types of electromagnetic radiation but no obvious naming alternative. Category:Fingers is a mixture of terms for names or types of fingers (ring finger, middle finger, thumb, ..., along with more colorful terms like leech-finger, pussy finger and tall man) and terms related to fingers such as polydactyly and fingernail. We have 9 lists of types of fingers in Template:list:fingers/CODE as well as Template:list:fingers-humorous/pt listing humorous names for fingers in Portuguese.
Category:Size: Split out Category:Sizes, ranging from itsy-bitsy and extra-small to humongous and ginormous. Category:en:Size has 163 members.

Benwing2 (talk) 08:48, 20 January 2025 (UTC)Reply

Support, although re books of the Bible we should also consider how to handle books of the Tanakh, which overlap with books of the Old Testament. At present, it seems like we just don't categorize books of the Tanakh at all(?), e.g. נחמיה is categoryless; it seems suboptimal to only categorize books of the Jewish Tanakh as "Old Testament" (since that's the Christian POV naming/framing of the Jews' books/religion as only being the old half of the whole story, though if any of our Jewish editors want to weigh in and say they DGAF, I defer to them), but it could be redundant to double-categorize a lot of books as both OT and Tanakh, but if we only have a "Category:Books of the Tanakh", it misses books which Christians regard as OT but Jews don't regard as Tanakh (the Apocrypha). Maybe we have "Books of the Tanakh", and then double-categorize that category into "Category:Books of the Old Testament" (also putting Christian apocrypha directly into that category) and "Category:Judaism"??
Re Hindu calendar months, if both Hindu and Muslim Bengalis use the same names for months, and those months are originally/mainly from the/a Hindu calendar, then my initial reaction (like yours) is to just call them "Hindu calendar months" and not sweat it—thinking of how e.g. lots of Malaysians, including Muslims and Malays, celebrate Chinese lunar new year, without necessitating calling it Chinese-and-Muslim-and-Malay new year. But if Hindu and Muslim Bengali-speakers use different month names, or just object to the calendar being called a Hindu calendar, is it a problem to have both a "CAT:hi:Hindu calendar months" and a "CAT:bn:Bengali calendar months" as (ultimately) subcategories of "CAT:Months"?
- -sche (discuss) 21:12, 20 January 2025 (UTC)Reply

@-sche Thanks for your comments. I mention above that AFAIK the Protestant Old Testament and Jewish Tanakh have the same books. I agree it may be a bit strange to call נחמיה a "book of the Old Testament" (although that's exactly what our definition says) but I'm not sure we need to double-categorize; maybe we can call it "Books of the Old Testament and Tanakh" or "Books of the Old Testament and/or Tanakh" or something? Granted that the books of the Apocrypha are not in the Tanakh but IMO should still go in that category. There are also things like the Book of Enoch, not considered canonical by most Christians and Jews (but canonical for Ethiopian Jews and Ethiopian and Eritrean Orthodox Christians), which probably should go in the category as well.

As for the month issue, yeah my original thought was also to have a distinct Category:Bengali calendar months. But I ran into the issue of Template:list:Sylheti calendar months/syl, where per Wikipedia there isn't even a distinct Sylheti calendar (or at least, the page on it was deleted as being fictional), and months like ꠎꠂꠑ (zoiṭó) are given simultaneously as months in the Assamese, Bengali and Sylheti Hindu calendar (and where the Assamese calendar seems not to differ from the Bengali calendar). This led me to conclude that if we start classifying each calendar as different, we could end up triple or quadruple classifying a lot of terms, which would just be confusing. Benwing2 (talk) 21:39, 20 January 2025 (UTC)Reply

"AFAIK the Protestant Old Testament and Jewish Tanakh have the same books." Kind of: e.g. Book of Ezra and Book of Nehemiah are a single piece of literature in the Tanakh/Jewish Bible. —Justin (koavf)❤T☮C☺M☯ 22:35, 20 January 2025 (UTC)Reply

This is a lot to respond to, so it should probably be separate threads or proposals, but

Gaps:

Category:Units of time: Agreed
Category:Western zodiac signs: Agreed
Category:Religious texts: Agreed
Category:French Republican calendar months: Agreed
Category:Poetic meters: Agreed
Category:Types of electromagnetic radiation: Call this Category:Electromagnetic spectrum. It's easier, more intuitive, and does not include all extraneous electromagnetic phenomena.

Splits/renames:

Category:Lunar months should be deleted and split into categories for specific calendars: Agreed
Category:Books of the Bible: Split into Category:Books of the Old Testament and Category:Books of the New Testament. Strong disagree. This runs into all kinds of issues with the Jewish Bible structuring, deuterocanon, and Ethiopian/Eritrean extended canon. It's not necessary and more trouble than helpful.
Category:Greek deities: Weak disagree, as 226 is a navigable amount, but I'm open to clearly defined subcategories like the muses.
Category:Fingers: Weak agree
Category:Size: "Split out Category:Sizes, ranging from itsy-bitsy and extra-small to humongous and ginormous. Category:en:Size has 163 members." Okay, but what wouldn't be in the new category? I guess shoe size and maybe a few general terms related to the concept of sizing, but galactic and S and most of the other things in here are actual measures of sizes (formal or informal), so I don't think this would help much and 163 is a perfectly navigable category.

As an aside, what is the value of discussion in Discord? Why is any policy discussion happening off-wiki and not recorded here? —Justin (koavf)❤T☮C☺M☯ 22:34, 20 January 2025 (UTC)Reply

Thanks for your comments. Discussion on Discord can happen in real time, and it is much better suited for conversations with extensive back and forth discussion than Wiktionary's forums are. It is easy to ask a question and get an immediate response, which tends not to happen on Wiktionary itself. The discussion about the Hindu calendar occurred in the #indo-iranian channel on Discord over maybe a total of 30 minutes; nothing like that could happen on Wiktionary. As for the splits and renames:

I'm not sure your issue with splitting "Books of the Bible". You mention issues about canonicality but is there actually an issue with an Old/New split with inclusive policies as to what goes in (basically anything considered canonical by any major denomination)? Are there books where there is a question whether they are considered Old or New? The canonicality issues are the same whether we have a single "Books of the Bible" category (which IMO is very Christian-biased in a way that an Old/New split isn't) or two categories. Also keep in mind that we have lists like Template:list:books of the Protestant Old Testament/en and Template:list:books of the Catholic Old Testament/en that can help with specifying what is considered canonical by which group.
Category:Size has terms like blow up, procerity, grow, pipsqueakery, lobsterling, long drink of water and other randomness that are not a specific size but just have some vague relation to size. It's a related-to category, which allows for random stuff like this, which a set category would not. After splitting out Category:Sizes, we could potentially merge Category:Size and Victar's Category:Quantity, which are not obviously distinct.
As for Category:Greek deities, IMO categories esp. set categories should not have more than 100 or so members unless they're clearly all of the same type.

Benwing2 (talk) 23:21, 20 January 2025 (UTC)Reply

Thanks yourself.

"Are there books where there is a question whether they are considered Old or New?" The extended canon of the Tewahedo churches does not fit this Old/New Testament split.

"The canonicality issues are the same whether we have a single "Books of the Bible" category (which IMO is very Christian-biased in a way that an Old/New split isn't) or two categories." I think it's the other way around: having an "Old Testament/New Testament" divide is a Christian notion, but Jews in common language could use the word "Bible" with others (tho "Tanakh" or "the Law" or something is more common when practitioners of Judaism are discussing scripture among themselves).

"As for Category:Greek deities, IMO categories esp. set categories should not have more than 100 or so members unless they're clearly all of the same type." That's a nice goal maybe, but there are just some sets that have more than 100 things. I don't see that as a problem really. If a clearly defined set that is actually useful for language (e.g. not something like "even numbers" or "objects with mass" or something) has a thousand members, it's still very useful to categorize them together. —Justin (koavf)❤T☮C☺M☯ 00:00, 21 January 2025 (UTC)Reply

I am speaking as someone who has both a Jewish and Christian background, and who considers himself Jewish. The New Testament is a Christian addition to the Jewish scriptures; grouping them together as the "Bible" is a Christian concept that is foreign to Judaism. Imagine if we did not have a "Books of the Bible" category but instead had a category "Books of the Quad Combination" or "Books of the Standard Works" (see w:Standard Works) that indiscriminately grouped together the books of the Christian Bible along with those of the Book of Mormon, the Doctrine and Covenants and the Pearl of Great Price. And someone argued on technical grounds against splitting them into distinct categories? Surely Catholics and Protestants would object? Benwing2 (talk) 00:34, 21 January 2025 (UTC)Reply

Sorry if I'm stupid here, but I'm not following. As I intended to say earlier, the notion of a "New Testament" appended to prior Jewish scripture is a Christian notion, for sure. But what is the point you're making with the Mormon analogy? Calling the Jewish scripture the "Old Testament" is still just using the Christian terms for that tradition's holy literature. —Justin (koavf)❤T☮C☺M☯ 03:25, 21 January 2025 (UTC)Reply

@Koavf Did you see my response to -sche? I proposed a category name "Books of the Old Testament and Tanakh" or "Books of the Old Testament and/or Tanakh" or similar, since the Old Testament is just the Christian interpretation of the Jewish scriptures (modulo some argumentation over what is considered canonical, which led to certain books being excluded from the Tanakh but included in the Catholic Old Testament, among other things). My point is that insisting on combining the Old Testament/Tanakh with the New Testament and calling it the "Bible" is objectionable to Jews in the same way that grouping the Bible with other Mormon scriptures and calling the resulting amalgamation by the Mormon name would be objectionable to non-Mormon Christians. Having separate categories is less biased. We could have two categories "Books of the Tanakh" and "Books of the Bible" but that would lead to double categorizing practically all of the Tanakh books, which I think would be more confusing than anything else. As for the extra Ethiopian Tewahedo "Church Order" books, IMO including them in a "Bible" category could well be considered biased in the same way as including Mormon-specific scripture in a "Bible" category would, since they are clearly not either New or Old Testament and postdate both. Benwing2 (talk) 08:28, 21 January 2025 (UTC)Reply

But it would be biased to not include them in "Books of the Bible", since they are. It's not for us to decide canonicity in the Bible. I can't speak to Jews being offended at the term "Bible" being used to refer to the common scripture tradition, but in my limited experience, it's just a word used for convenience's sake and not offensive. Others' mileage may vary, clearly. —Justin (koavf)❤T☮C☺M☯ 09:01, 21 January 2025 (UTC)Reply

Doing a quick search, I found (e.g.) this:

FWIW, in academic discourse, "Bible" can be understood more restrictively as the "Hebrew Bible" when used by Jewish authors or in Jewish contexts, while it can be understood more expansively as (some form of) the Christian Bible ("Old Testament" + "New Testament") when used by Christian authors or in Christian contexts. Biblical scholars find this fluctuation in usage quite natural.

Also FWIW, the Jewish Study Bible refers to the New Testament without any impulse to put it in scare quotes. The Preface includes some comments that reflect tangentially on OP's question on page x.

So that aligns with what I think is generally true, but I appreciate that this is anecdotal. —Justin (koavf)❤T☮C☺M☯ 09:04, 21 January 2025 (UTC)Reply

I really think this is anecdotal as it does not align at all with my experience as a Jew, and regardless of the term, the category itself is biased as containing all and only what Christians consider canonical (and even then only certain Christians; for some reason I can't understand, you insist on counting Ethiopian Christian-specific scriptures as canonical but not Mormon-specific scriptures). I have to say, having a discussion with you is exhausting and about as pleasant as a root canal, since you keep ignoring the main thrust of my argument and cherry-picking things to respond to. For this reason I am not going to engage any more with you. Your disagreement with splitting is registered, but you don't get a liberum veto in case of consensus in favor of splitting. Benwing2 (talk) 09:17, 21 January 2025 (UTC)Reply

Outside of Christian contexts I tend to refer to the Tanakh as the "Hebrew scriptures", though the presence of Aramaic in two books renders that not completely correct. I've seen "scriptures" used in a number of non-Christian contexts, but I think Christians would recognize it as referring to the Bible. The Apocrypha are kind of weird as part of the Judaic tradition and included in the early Jewish Septuagint translation, but not accepted as canonical by Judaism. The older Christian denominations accepted them but the Protestants didn't. As for anecdotal evidence: there are so many denominations just in Christianity that you can find someone who will agree with just about anything. While that isn't as true in Judaism, I'm pretty sure that the mere name of the "Jewish Study Bible" would be fairly effective at selecting for those who aren't offended by references to "the Bible". I'm sure if there were a publication with "MAGA" in the title, you would find the people there in favor of Trump... Chuck Entz (talk) 14:36, 21 January 2025 (UTC)Reply

I really have no idea what I wrote that was so off-putting, nor did I ever reject your claim about Mormons and Mormonism. —Justin (koavf)❤T☮C☺M☯ 16:06, 21 January 2025 (UTC)Reply

Why not every have every book of the Bible that some denomination of Christianity considers canon, including stuff like the Book of Mormon and the Church Order? Still with the Old/New Testement split (along with the others), of course. Would be the most neutral. CitationsFreak (talk) 20:46, 21 January 2025 (UTC)Reply

I think that's the best solution and what I would argue for. I appreciate that Benwing is not interested in discussing and I don't want to speak for him, but I think that's consistent with what he was saying before and I would argue for this solution being consistent with that. —Justin (koavf)❤T☮C☺M☯ 04:20, 23 January 2025 (UTC)Reply

Maybe the most neutral / simplest thing is to give Christianity and Judaism their own (sub)category(s), even if there is overlap, i.e. have both "CAT:Books of the Tanakh" and "CAT:Books of the Old Testament" even if some (but not all!) entries will be in both categories? After all, we correctly have Apollo as both a Greek god and a Roman god, and we regularly write things like {{lb|en|transitive|intransitive}}, putting (a single definition of) a verb into both the "transitive verbs" and "intransitive verbs" category: that there are entries in both categories is OK if both categories apply.
"Books of the Bible" would hold "CAT:Books of the Old Testament", "CAT:Books of the New Testament", and any entries that don't fit into one of those subcategories (if, as mentioned above, any Tewahedo books don't). - -sche (discuss) 22:21, 21 January 2025 (UTC)Reply

Category:Size feels like it should be a thesaurus, unless the idea is for it to have specific size standards like clothing sizes. Ioaxxere (talk) 18:30, 22 January 2025 (UTC)Reply

splitting Category:Music

Latest comment: 2 days ago9 comments4 people in discussion

Category:en:Music has 3,870 terms (?!). We have Category:en:Musical notes, which has things like sixteenth note and quaver (I would have thought F-sharp is a musical note as well, but apparently not), but we need a shitload more set categories. I suggest:

Category:Musical notes: Rename to Category:Musical note values or Category:Musical note durations. This holds terms for note durations such whole note, half note, quarter note, double whole note and corresponding British terms like quaver, crotchet, breve. (FYI "note value" is Wikipedia's term; I have studied Classical piano for 12+ years and have not obviously encountered this term. I just ambiguously call them "notes" but I didn't really study music theory super formally in school.)
Category:Musical rests: This is the corresponding set of rests to the notes/note values/note durations of the previous category: half rest, quarter rest, British terms like crotchet rest, etc.
Category:Musical tones or repurposed Category:Musical notes or Category:Musical scale notes: F-sharp, D-flat, E, also solfège notes like do/ut, re, mi, fa, sol, la, ti/si.
Category:Musical keys: F-sharp minor, E-flat major, etc.
Category:Musical scales: major/major scale, minor/minor scale, natural minor scale, harmonic minor scale, melodic minor scale, pentatonic/pentatonic scale, whole-tone scale and many others.
Category:Musical modes: Ionian, Dorian, Phrygian, Lydian, Mixolydian, Aeolian, Locrian and many variants like hypolydian, hypomixolydian, etc.
Category:Musical chords: major triad, dominant seventh chord, half-diminished seventh chord, ninth chord, Picardy third, Neapolitan chord (BTW we are missing Neapolitan sixth/Neapolitan sixth chord) etc.
Category:Musical intervals: fourth, fifth, sixth, augmented sixth, diminished sixth, tritone, octave, etc.
Category:Musical clefs: treble/treble clef (aka G clef), bass/bass clef (aka F clef), tenor clef, alto clef (both types of C-clef/C clef), etc.
Category:Musical tempos: largo, lento, adagio, andante, andantino, moderato, allegretto, allegro, vivace, presto, prestissimo, etc.
Category:Musical dynamic values or some related name: pianissimo, piano, mezzo piano, mezzo forte, forte, fortissimo, fortississimo, etc.
Category:Musical articulations: legato, staccato, sforzando, pizzicato, glissando, arpeggiato, etc.
Category:Musical tempo changes: accelerando, ritardando, rallentando, stretto, stringendo, rubato, meno mosso, più mosso, etc.
Category:Musical dynamic changes: crescendo, decrescendo, diminuendo, smorzando, calando, morendo, etc.
Category:Musical vocal ranges (update: we have Category:Musical voices and registers): soprano, alto, tenor, bass, mezzo soprano, baritone, basso profundo, treble, countertenor, etc.
Category:Musical time signatures (update: we have Category:Musical meters): four-four time, three-four time/three-quarter time, cut time/alla breve, common time, six-eight time (which we are missing), twelve-eight time (likewise), etc.
Category:Musical ornaments (maybe there is a better term): trill, shake, mordent, tremolo, vibrato, slide, arpeggio, etc.
Category:Musical mnemonics: FACE, EGBDF (= every good boy does fine, every good boy deserves fudge, and other variants), ACEG (= all cows eat grass, all cars eat gas, etc.).

There are surely others (e.g. we need to split Category:Musical instruments even more than it currently is), but this comes to mind first. Benwing2 (talk) 10:27, 20 January 2025 (UTC)Reply

~~CAT:Musical genres? Vininn126~~ (talk) 10:31, 20 January 2025 (UTC)Reply

We have it, you just misspelled it :) Benwing2 (talk) 10:35, 20 January 2025 (UTC)Reply

I see my coffee and ADHD meds haven't fully woken me up yet... Vininn126 (talk) 10:37, 20 January 2025 (UTC)Reply

lol! Benwing2 (talk) 10:38, 20 January 2025 (UTC)Reply

OK some more:

Category:Musical composition forms (need better name): song, instrumental, rock opera, ... (for popular music); prelude, etude, sonata, symphony, fugue, cantata, toccata, fantasy, mass, requiem, opera and a zillion others for classical music
Category:Musical composition parts (need better name): verse, chorus, bridge, fadeout, ... (for popular music); movement, trio, exposition, recapitulation, coda, etc. for classical music
Category:Musical chord progressions: Axis progression, backdoor progression, circle progression, 50's progression/'50s progression/ice cream changes/doo-wop progression, twelve-bar blues, eight-bar blues, etc.

Benwing2 (talk) 20:24, 20 January 2025 (UTC)Reply

I agree subcategorization is needed! I am hesitant about whether people will grasp and maintain the intended distinctions between all of "tempos, dynamic values, articulations, tempo changes, dynamic changes, ornaments". Do we think the average user adding e.g. an -issimo term a year from now will expect/grasp that it goes in one category if it's more similar to prestissimo "very quickly", but a different category if it's more similar to fortissimo "very loudly", and intuit that a term like allegretto vs one like accelerando, or a term like accelerando vs crescendo, vs vibrato vs staccato, go in five separate categories? I am unsure. Maybe anyone who is dealing with musical terminology does perceive these as fundamentally different categories of direction and will have no problem maintaining the distinctions. But I wonder whether some of these should be consolidated into something like "musical directions" / "musical directives", or "musical changes" grouping tempo and dynamic changes. - -sche (discuss) 20:41, 20 January 2025 (UTC)Reply

Thanks for opening this topic. I generally agree with your decisions. I am fairly knowledgeable in classical music and you're welcome to contact on Discord with questions. A few naming suggestions: 1. Musical note durations (or values is fine). 3. Musical pitches. 7. Name is okay but a bit awkward, maybe just Chords or Musical chord types. 11. Musical dynamics. Fold 13 and 14 into Musical directives, as outside of the most obvious set, there is a lot of subjectivity involved (some terms may indicate changing both tempo and dynamic at once, or are ambiguous/up to interpretation). I wouldn't mind if a word goes in both Musical directives as well as another category.

Re: items 1 and 2 for the second list, I feel it would be sensible to split classical from popular, partly because their meta-terminology (i.e. what "form", "genre", "part", etc. even mean) is quite different, and because I think users are more likely to want to use a list specific to their domain of interest than a miscellaneous one. If not, then Musical composition types and Musical composition sections. Since the situation is tricky, I don't have fully-formed thoughts right now on split-classical/popular category names beyond Classical music composition types/sections. I'll note that classical "genre" often encompasses many but not all items of 1(c) (whereas popular "genre" encompasses e.g. reggaeton and sludgecore); classical "form" encompasses sonata form, rondo, strophic, binary form, etc., and the two blur and are often confused; and classical composition "parts" without other context often refers to a musician's role e.g. "the violin part" – see first 3 bullets at Part.

I'm not that concerned about people grasping this setup. With a good structure (which Benwing has proposed), I feel confident that average users can either look at similar entries and copy by analogy, or otherwise do whatever and it's not the end of the world for entries to await a bit of cleanup. Hftf (talk) 21:28, 20 January 2025 (UTC)Reply

Thanks! I agree with all your suggestions, and I think splitting classical from popular in the first two items of the second list makes sense. The third item about chord progressions also seems mostly to refer to popular music; classical music seems to speak of cadences, which maybe should be a different category (see w:Cadence for an exhaustive discussion). @-sche I think that in practice, people familiar with music theory will understand the distinctions of the categories, esp. if the descriptions are clear, and they are more likely to be the ones adding music terms. I suspect people not familiar with music theory who add a music term will be more likely to just put it under Category:Music, regardless of whether we have a more ramified category tree or a less ramified one. Having a more ramified tree has the advantage of allowing people to more easily grasp the distinctions between categories if they're not clear about them, and there are more than enough terms to fill even a highly ramified tree. Benwing2 (talk) 22:01, 20 January 2025 (UTC)Reply

Category:Musicians: rename and add Category:Musical artists?

Latest comment: 3 days ago1 comment1 person in discussion

This category is for types of musicians (drummer, pianist, etc.) but manages to also include Beatles. We should maybe rename Category:Musicians -> Category:Types of musicians and create Category:Musical artists for things like Beatles and Rolling Stone (which are very questionable as entries, but ...). Under Category:Musical artists could go Category:Taylor Swift and Category:Justin Bieber (yes these damn categories exist) instead of having them directly under Category:Music. We probably need a Category:Beatles since we have Beatledom, Beatlemania, Beatles-esque, Beatle cut, Beatlehead, and others. Benwing2 (talk) 10:44, 20 January 2025 (UTC)Reply

"Old" and "Orkhon" Turkic, plus some more

Latest comment: 6 hours ago15 comments5 people in discussion

Current coverage of Pre-Islamic (+ Karakhanid) Turkic languages is quite shoddy across Wiktionary, here's what I mean:

Academic coverage of 'Old Turkic' spans around 4 centuries [8th-11th] (Orkhon T., Old Uyghur with early Karakhanid texts (Kutadgu Bilig, Divan Lughat at-Turk) marking the literary end for this term^[1]^[2]^[3]), yet in Wiktionary, the 'Old Turkic [otk]' label is used specifically for Orkhon (or Inscriptional) Turkic attested around the 8th and 9th centuries. Which is simply not ideal.
1. This leads to many new editors to use Orkhon script for all 'Old Turkic' terms, which is quite misleading (since the terms in ""Runic"" script constitute the least amount of examples for this language.)
Currently, 'Old Turkic' and Old Uyghur is shown under the Siberian branch, while Karakhanid is shown as a Karluk language. This may not be reflective of how these languages should be listed ideally[17] (according to Tekin.) That image would also need us to rewrite or agree upon a significantly different family tree than what we have now.
Currently, descendants from Proto-Turkic follow rigid "family branches", like "Karluk", "kipchak" or "Oghuz". I oppose to this classification, which leads to some fringe and uncategorizable cases like Salar, Pecheneg and Western Yugur and (Northern/Southern) Altai from what I get.
Currently we only have one language tag for Bulgar [xbo], but we have two different 'versions' of Bulgar, Danube- and Volga-^[4]. Perhaps we can employ [xbo-dnb] and [xbo-vol] for these too, though I am reluctant for this change.

For those I propose this classification:

Proto-Turkic [trk-pro]
- West Old Turkic [*trk-wes] (PROPOSAL) (also listed as 'Proto-Bulgaric')
  - Khazar [zkz] (also listed as 'Kuban Bulgar')
  - Danube Bulgar [xbo], [*xbo-dnb] (PROPOSAL)
  - Volga Bulgar [xbo], [*xbo-vol] (PROPOSAL)
    - (...)
- Common Turkic [trk-cmn]
  - East Old Turkic [*trk-eas] (PROPOSAL)
    - Orkhon/Inscriptional Turkic [otk]
    - Yenisei Kyrgyz [otk-kir]
    - Old Uyghur [oui]
    - Karakhanid [xqa]
    - (...)
      - (...)

Specifics can be decided on later, but this is the main frame I am going with for Proto-Turkic geneaology. What do you think (pinging users from [18])? @BurakD53 @Allahverdi Verdizade @Yorınçga573 @Blueskies006 @Ardahan Karabağ @Bartanaqa @Samiollah1357 @Zbutie3.14 @Rttle1

^ Erdal, M. (2004). A grammar of Old Turkic. BRILL. pp. 6-22
^ Johanson, L., & Csató, É. Á. (2021). The Turkic languages (2nd ed.). Routledge. p. 132 DOI: 10.4324/9781003243809-8
^ Tekin, T., & Ölmez, M. (2003). Türk Dilleri: Giriş (2nd ed.). Yıldız. pp. 18-28
^ Tekin, T., & Ölmez, M. (2003). Türk Dilleri: Giriş (2nd ed.). Yıldız. pp. 28-31

AmaçsızBirKişi (talk) 11:51, 20 January 2025 (UTC)Reply

1. It is not particularly important whether the Old Turkic language is labeled as "Orkhon Turkic" or "Old Turkic." Ultimately, when we refer to otk, we know what we mean. However, it would be more accurate to call it "Orkhon Turkic." for descendants. The label can be changed to "Orkhon Turkic." in descendants list. Calling it "Inscriptional Turkic" would be incorrect, as we list Yenisei Kyrgyz inscriptions separately in the descendants list. Those are also inscriptions.

2. Karakhanid Turkic can indeed be considered a continuation of Old Turkic, but it diverges from it at a certain point. Its grammar does not entirely align with that of Old Turkic. While Old Turkic and Siberian languages fall under the "olur-" group, Karakhanid Turkic remains in the "o(l)tur-" group. I see no issue in classifying it as the initial stage of a separate branch.

3. Salar can historically be traced back to an Oghuz tribe. Salar Turkic also exhibits lexical features characteristic of Oghuz languages. For instance, using sağ instead of oň for the direction "right" is unique to Oghuz languages, and Salar aligns with this. Similarly, using dudak instead of erin for "lip" is unique to Oghuz languages, and Salar aligns here as well. Along with numerous other lexical features not listed here, Salar conforms to Oghuz languages. Morphologically and grammatically, it differs significantly not only from Oghuz languages but also from other Turkic languages. However, it can still be traced back to Proto-Oghuz. Even though it has been heavily influenced phonetically by Chinese and Tibetan, there is no obstacle to considering it part of the Oghuz branch; on the contrary, there is evidence to support this classification.

4. I agree. If otk-kir exists and does not refer to a separate language but directly redirects to otk, then xbo-dnb and xbo-vol can also directly redirect to Bulgar.

For those I propose this classification:

* Pre-Turkic/Proto-Turkic [trk-pro]

** Proto-Bulgaric

*** Khazar [zkz]

*** Danube Bulgar [xbo], [*xbo-dnb] (PROPOSAL)

*** Volga Bulgar [xbo], [*xbo-vol] (PROPOSAL)

(...)

** Proto-Common-Turkic [trk-cmn]

*** Siberian:

**** Orkhon Turkic [otk-ork] (PROPOSAL)

***** Yenisei Kyrgyz [otk-kir]

***** Old Uyghur [oui]

*** Karluk:

**** Karakhanid [xqa]

***** Khorezmian

****** Chagatai [chg]

(...)

(...) BurakD53 (talk) 13:02, 20 January 2025 (UTC)Reply

Your classification of Old Turkic/Karakhanid seems more accepted, and I guess we need not be done with family branches completely also. If no-one objects to that, we can implement a version of your scheme in the Descendants section from now on.

Although 'probing' this question revealed yet another can of worms, what about the rest of the family tree? As @Zbutie3.14 pointed out, many PT pages are plagued with inconsistent and incorrect branch names, and there's no one, fully-agreed upon scheme for all pages to follow, a bot can be used for this maybe, or we can use {{desctree}} like Indo-European for standardization (this would necessitate us creating new language headers like "Karluk Turkic", "Proto-Bulgar" etc.

AmaçsızBirKişi (talk) 07:52, 21 January 2025 (UTC)Reply

I can help with this if (a) everyone agrees on a branching scheme, (b) someone makes a list of all the mappings from incorrect branches to correct branches and how to reorganize the incorrect branches. Benwing2 (talk) 08:03, 21 January 2025 (UTC)Reply

Thank you. We already have a lead at WT:ATRK#Descendants, but that list is going to change soon I believe. It would be nice to use more templates across RC:PT pages to make pages more uniform. AmaçsızBirKişi (talk) 08:36, 21 January 2025 (UTC)Reply

Let me know. I would definitely advise using {{desctree}} when possible, simply to avoid duplication if nothing else. Benwing2 (talk) 09:23, 21 January 2025 (UTC)Reply

Personally I strongly prefer tables over trees; I would like it if we had the look of the current proto turkic pages but with the uniformity of using a template. Zbutie3.14 (talk) 21:13, 21 January 2025 (UTC)Reply

I support having Old Turkic as a larger category above the rest, so it would look like

Old Turkic:
- Orkhon Turkic
- Old Uyghur
- etc

Would we need a bot to go through everything and change it? Also most proto turkic pages still have north/west/south for kipchak which I made a tea room discussion about earlier in december and you replied so I guess we should deal with that too if we're going to go through and change things.

\\

Salar is Oğuz for sure

\\

From what I know most turkic languages can be fit into the current category system very well so a few languages being hard to fit doesn't mean we need to change the whole system

\\

I don't know anything about the rest Zbutie3.14 (talk) 18:20, 20 January 2025 (UTC)Reply

The points you raise are valid, but I agree with @BurakD53‘s classification more. See below:

• If we place Karakhanid under Old Uyghur and Old Turkic, won’t this make modern Uzbek and Uyghur inherit from Old Turkic? Not this hasn’t happened but I don’t want readers to have the impression that Old Turkic = Proto-Turkic (which many underinformed people already think, just look at Turkish Wiktionary). Most people who use Wiktionary aren’t linguists

• To what extent are Danube and Volga Bulgar different languages? This isn’t my area of expertise but are they distinct enough to merit separate headers?

• We can (and should) inform readers that classification isn’t linear and strict; we can highlight any inconsistencies in etymology or usage notes

• Agree that Salar is Oghuz. Do we classify it as a third distinct Oghuz branch or put it with Turkmen under East Oghuz?

A few other questions about classification:

• Should we make headers for other unattested proto-languages (Kipchak, other Kipchak branches, Siberian, Oghur, etc.)? Having consistent reconstructions instead of skipping from Proto-Turkic reconstructions to next attested form might make diachronic development easier to follow

• To what extent is Balkan Gagauz Turkish its own language? Between Istanbul Turkish and (Moldovan) Gagauz I’m not sure if the other Balkan Turkish/Gagauz dialects are divergent enough to constitute a different language

• Where do we place Äynu ([aib])? I think it merits inclusion as a descendant of Uyghur or Chagatai Blueskies006 (talk) 21:03, 20 January 2025 (UTC)Reply

Just adding here that although I don't know much about Old Turkic, I know there have been prior discussions on this topic that AFAIK haven't led anywhere, so I would suggest someone look them up and see what the sticking points were. Benwing2 (talk) 22:03, 20 January 2025 (UTC)Reply

The difference between Volga Bulgar and Danube Bulgar is actually very clear. One is Muslim and, therefore, influenced by Arabic, using the Arabic script. Almost all linguistic data that has survived to the present day comes from this language written in Arabic script. It was also influenced by Volga Turkic and eventually gave way to Volga Turkic over time. Danube Bulgars, on the other hand, were not Muslim, so they were not influenced by Arabic and did not use the Arabic script. Instead, they used the Cyrillic or Greek alphabet. It was spoken earlier than Volga Bulgar. Over time, they became Slavicized and disappeared from history. However, they contributed words to Old Church Slavonic, and even today, it is possible to find a few Old Bulgar words in modern Bulgarian. Of course, there is no influence from Arabic or Old Tatar (Volga Turkic) in their language. We have a calendar consisting of animal names, personal names, and two inscriptions from them. Talat Tekin has two separate books on Danube Bulgar and Volga Bulgar, addressing each language individually. Tekin also mentions Kuban Bulgar, but since there is no linguistic data, he relies only on borrowings. So, there are very significant and distinct points separating these two languages. I am not saying they should be listed as separate languages on the site, just that they should be separated in the descendants section, as is done with otk-kir. This would also ensure that the borrowings are placed in the correct category. BurakD53 (talk) 23:43, 20 January 2025 (UTC)Reply

That makes sense, ty for clarifying Blueskies006 (talk) 01:16, 21 January 2025 (UTC)Reply

- I am not well-versed enough to comment on Salar, see Burak's reply above, but I don't think we need three branches in Oghuz tree personally (might be wrong, but I envision this:)

Oghuz (the attested language, not the RC "Proto-Oghuz")
- Old Anatolian Turkish
  - (...)
- Ajem Turkic (I am adding this from another BP or LPD chat, tentative)
  - (...)
    - Azerbaijani
    - Qashqai
- Turkmen
- Salar

Again, correct me if I am wrong, but this seems apt for Oghuz branch from what I get.

- Balkan Gagauz and Äynu I cannot tell much other than we should include them in our pages (which we didn't in the past.)

- - But for Äynu I think placing it under Chagatai works, I had done that once for a RC:PT page and it did not get replaced/shifted yet.

I can see a need for a separate, thorough talk on classification. Maybe now, maybe in the future.

AmaçsızBirKişi (talk) 08:03, 21 January 2025 (UTC)Reply

I agree with you. This Oghuz categorization is better. Old Anatolian Turkish should be divided as OAT and Ajem Turkic. Balkan Gagauz language should stay a dialect. Balkan Gagauz as a language is unnecessary.

Support BurakD53 (talk) 13:40, 21 January 2025 (UTC)Reply

Implementing the following changes to WT:ATRK#Descendants, if nobody objects.

Overhauled descendants table (which will be added to every Proto-Turkic entry from now on). Please tell me any further improvements or your thoughts on this:

Proto-Turkic [trk-pro]
- Proto-Bulgaric [*???] (a new language code will be necessary)
  - Khazar [zkz]
  - Danube Bulgar [xbo], [*xbo-dnb] (a new language code will be necessary)
  - Volga Bulgar [xbo], [*xbo-vol] (a new language code will be necessary)
    - Middle Chuvash [cv-mid]
      - Chuvash [cv]
        (...)
- Proto-Common Turkic [trk-cmn]
  - Old Turkic [otk]
    - Orkhon Turkic [otk-ork] (a new language code will be necessary)
      - Yenisei Kyrgyz [otk-kir]
      - Old Uyghur [oui]
        (...)
  - Siberian [trk-sib]
    - North Siberian [trk-nsb]
      - (...)
    - South Siberian [trk-ssb]
      - Yenisei Turkic
        (...)
      - Sayan Turkic
        (...)
      - Northern Altai [atv] (this one may need to be shifted around)
  - Karluk [trk-kar]
    - Ili Turki [ili]
    - Karakhanid [xqa]
      - Khorezmian [zkh]
        Chagatai [chg]
        (...)
  - Oghuz [trk-ogz] (the language attested in Diwan Lughat at-Turk, not the "reconstructed" one)
    - Old Anatolian Turkic [trk-oat]
      - (...)
    - Ajem-Turkic [*???] (a new language code will be necessary)
      - Classical Azerbaijani [az-cls]
        Azerbaijani [az]
        
        Qashqai [qxq]
    - Turkmen [tk]
    - Salar [slr]
  - Kipchak [trk-kip]
    - (...)

I can see a bot reformatting every single Proto-Turkic entry and fixing the old classifications. Some specifics may be decided on further down the line.

AmaçsızBirKişi (talk) 08:51, 23 January 2025 (UTC)Reply

Types of taxonomic ranks for categories

Latest comment: 1 day ago3 comments3 people in discussion

I think an understanding of how taxonomic ranks work will help in deciding how to subdivide them. I haven't really studied the codes for viruses and prokaryotes, which have their own, separate logic, so I'll stick the animal and algae/fungi/plant codes. These treat different broad levels differently:

To begin with, the basic, "atomic" unit is the binomen:
1. The genus or generic name, which is a noun
2. The species or specific epithet, which is either:
  1. An adjective which agrees with the generic name in gender and number, or
  2. A noun which is:
    1. "in apposition", and only agrees with itself (in other words, it's its own referent), or
    2. a genitive, and agrees with the referent in gender and number. Of particular interest are names of parasites, which tend to have names that are the genitive of the name of the host- this is sometimes the only way we know the gender of names above the rank of genus.
In fact, the animal code divides everything into:
1. Species group: species, subspecies, etc. (species and below), which behave the same in matters of agreement- a lower rank like a subspecies
2. Genus group: Genus and everything between genus and species. There may be some coordination in gender and number with the genus (I don't remember offhand), but no agreement.
3. Family group: Everything from family down, but also a few ranks above that derive from family, such as superfamily. No agreement.
4. Higher taxa: Everything above the family group. The animal code only really concerns itself with family group and below, except that it specifies standard endings for each rank above the genus group, and the stem generally derives from the genus (usually the genitve- thus Hominidae from Homo) of the type species. The other code forms names the same way, but with different standard endings. An exception, built into the Code, is made for a few very well established old family names that can optionally be used instead of the standard ones, such as the Compositae, Cruciferae and Leguminosae instead of Asteraceae, Brassicaceae and Fabaceae.
The most natural way to split taxonomic names would be family, genus and species groups vs. higher taxa. Either that, or family group and higher.
Ranks vs. clades:
1. Taxonomy was originally based on the belief in divine creation, with classification being a way to discern the order God had set up in the process of creation. Thus, common traits would show they were meant to be a group. The traditional taxonomic ranks were mostly set up with this in mind, and are more about overall organization tha about finer details of the structure.
2. Modern taxonomy, on the other hand, is based on the idea that all living things are descended from a single organism or group of organisms, with changes in genetic traits being inherited by descendants, but not by non-descendants. That means that it would theoretically be possibly to reconstruct the family tree of a group of organisms by statistically analyzing the distribution of traits among them. The science of doing this is cladistics, and the tree is a phylogeny. Any part of this tree is a clade. Theoretically, members of a clade should all be the descendants of the organism that diverged from its sibling organisms in some way.
  1. If taxonomists do their job right, every taxon should be a clade- but there are more of these divergences than there are taxonomic ranks. A clade can be anywhere on the tree below the very top, so it isn't necessarily the same as a taxonomic rank. There are also organizations such as the Angiosperm Phylogeny Group that are more interested in figuring out the tree than in making all the clades fit into the traditional taxonomic ranks. They come up with informal clades such as the Eudicots and Rosids that are supposed to eventually be mapped onto traditional taxonomic ranks- at which point they will have names formed according to the traditional rules that apply to those ranks. These informal clades are not official taxa, so they don't really have taxonomic ranks- you can tell the ranks above them and below them, but not in between.

I hope this helps in deciding how to set up the categories. Chuck Entz (talk) 01:30, 21 January 2025 (UTC)Reply

Sorry, this is confusing to me. Was this in regard to the short discussion between me and @DCDuring? I was just proposing a single set category 'Taxonomic ranks' to capture terms for specific taxonomic ranks, and maybe another set category for "meta-terms" like taxon, clade and rank. If this is what you are writing about and you have a better proposal, please let me know; otherwise, can you clarify what your intent was? Benwing2 (talk) 08:08, 21 January 2025 (UTC)Reply

To be clear, an organism does not belong to a group called a rank. Rank is a metaterm. Clade can have metaterm meaning, in the sense that any taxonomic name may or may not be a clade, clade being a Good Thing in modern taxonomy. A taxon, a clade, and a group can have members, just like the traditional ranks. The traditional ranks have the advantage of (relative) stability and are suggestive of relative position in the tree of life. OTOH, the traditional ranks are positively confusing in paleobiology, in which, for example, an extinct order can have a modern class as a member.

As to metaterms, there are a huge number of terms used in bionomenclature. (See Terms used in Bionomeclature GBIF (2010); not paginated, but my copy printed single-sided the body of which is about 3/4" thick and has about 10 terms per page.) It mixes rank names, morphemes, and a variety of other types of terms, some SoP. I don't know how to break this set of terms into useful subcategories. The first thing for us to do might be to make sure that nomenclatural terms are distinguished from other categories of terms used in biology (and virology). DCDuring (talk) 17:01, 21 January 2025 (UTC)Reply

Automatically expand the first section on mobile

Latest comment: 3 hours ago6 comments5 people in discussion

Currently, when you go to an entry on mobile, every is collapsed (example: [19]). This makes for a pretty awful user experience. See phab:T376446 for further discussion. I propose that we add some code according to this rule: when a page loads, if no section is expanded, automatically expand the first section. I already have this in my own common.js and I strongly recommend that we address this as soon as possible. (@Benwing2, Surjection, This, that and the other) Ioaxxere (talk) 18:25, 22 January 2025 (UTC)Reply

I don't think we should expand the first section automatically. If there are multiple non-English sections, opening the first one is completely arbitrary. — SURJECTION ^{/ T / C / L /} 18:38, 22 January 2025 (UTC)Reply

How about something like this?

If the first section is English (or Translingual?), open it.
Otherwise if there are <= N sections, for some small N (e.g. 3), open them all.
Otherwise, don't open. (It would be great in that case to display a "summarizing table of contents" that lists some short but crucial info about each language, such as the first five words of the first definition.)
If possible, add the ability to specify preferred languages that auto-open; this might be more work, though.

Benwing2 (talk) 20:02, 22 January 2025 (UTC)Reply

My opinion is that we should always expand a single section, but no more. I would be somewhat opposed to automatically opening an English section if there are multiple. One thing is clear, though: if the user enters a link through an anchor, then we should not expand any sections if there are multiple, since tne anchor will automatically open whatever is needed. — SURJECTION ^{/ T / C / L /} 12:06, 23 January 2025 (UTC)Reply

I think the automatic opening of sections needs to be as limited as possible, for three reasons:

The behaviour needs to be simple enough that it can be understood by regular or frequent visitors to Wiktionary.
@Jdlrobson said this at phab:T376446:
Any code that expands headings in site JavaScript is going to be rough on performance and possibly SEO for all the pages that code runs. Note, JS is not blocking, so that code could end up executing very late
Expanding where there is only one section seems safe in this context, but expanding a section high on the page after the user has already scrolled to find a section lower on the page would be annoying behaviour.
Even if there are only a handful of languages, tapping the language name you want is clearly better than having to scroll and "hunt" through the page for your desired language imho.

So:

I will obviously support auto-opening where there is only a single section on the page (I believe this is the behaviour that the WMF devs have agreed to develop, although I'm open to being corrected).
I could support auto-opening English as well, given this is the English Wiktionary, although this would be a weak support - it could be very annoying for people looking for non-English entries if the JS runs too late.
I'd oppose other changes.

This, that and the other (talk) 22:07, 22 January 2025 (UTC)Reply

The current behavior is completely non-intuitive. Today, I looked up broligarchy on my cellphone, didn't see what looked like an entry, and reported that we didn't have an entry. Admittedly, my age-related cognitive deficits may have played a role, but this just doesn't seem right.

I like @User:Benwing2's proposal or something very similar. (Unsurprisingly, I would like Translingual to open, too, if the first L2 [which it always is!!!].) If it were practical for registered users to specify some number (1, 2,{{..}} n [where n is small]) of preferred L2s, then all of those could be opened, with Benwins2's proposal the default. DCDuring (talk) 23:50, 22 January 2025 (UTC)Reply

[Dya-1] Charles Duroiselle (1921) A Practical Grammar of the Pali Language (overall work in English), Rangoon, section 472

[Bya-2] A. P. Buddhadatta Thera (1956) The New Pali Course: Part II, 4th edition (overall work in English), Colombo, section 144, page 176

[3] Erdal, M. (2004). A grammar of Old Turkic. BRILL. pp. 6-22

[4] Johanson, L., & Csató, É. Á. (2021). The Turkic languages (2nd ed.). Routledge. p. 132 DOI: 10.4324/9781003243809-8

[5] Tekin, T., & Ölmez, M. (2003). Türk Dilleri: Giriş (2nd ed.). Yıldız. pp. 18-28

[6] Tekin, T., & Ölmez, M. (2003). Türk Dilleri: Giriş (2nd ed.). Yıldız. pp. 28-31

[1]

[2]

[1]

[2]

[3]

[4]

December 2024

Use of y instead of ij in Early Modern Dutch

Reveal potentially shocking/NSFW images only upon clicking?

Template:defdate and pre-1500 dates

FYI: December 2024 Unicode update

'LANG forms' -> 'LANG spellings'

Template:syncopic form / Template:syncopic form of

Why is there no quote-thesis template?

Request AutoWikiBrowser

Use of titles in quotes and citations

Do we not label attributive adjectives?

The U4C is ordering an admin to respond to a block appeal

Protecting pages as "model pages"

吃飽

Cantonese, Hainanese, and Hakka lemmata treated as Chinese

Temporary Accounts - introduction to the project

Banning Proto-North Caucasian and Proto-Northeast Caucasian reconstructions

Adverbs?

Rethinking Middle Korean verb lemmatization

Beekes

French Wiktionary Word of the Year

WT:TRANS

Dobrujan Tatar language name

WT:TENNIS

Romance languages: reflexive verb forms and enclisis

Yiddish in Latin characters

Dutch defective verbs

Extended Mover Request: User:AG202

Username pronunciations

jive talk

Hebrew transliteration

Adjective definitions

Stress over hyphens (-́)

Standardizing Alternative scripts heading for Pali and Sanskrit

Abuse of power by one of admins + the word "ministra" in Polish

January 2025

Bad ledes in Thesaurus namespace

2024 – Top pageviews statistics

Pronunciation of irregular plurals

"number" or "numeral"?

Extended Mover request: User:Rex Aurorum

Sundanese main entries

Stray Arabic-script digit entries

Affix template standardization

Category:Ojibwe stem-building elements

How about...

th-cls

Category:Artsakh and subcats

Para-Nakh languages

Splitting WT:RFVE?

Analysis of words in terms of Pali roots

References

Exceptional behavior for modern Greek?

Waiting for Medieval Greek

SoP hyphenated compounds

Category for dog whistles

Moving User:JnpoJuwan/Images into main

Sporadic senseid

Table of Contents

Replacement for LDL

redoing list templates

obnoxious collapsibles on mobile

Can we label Five-Percent Nation lingo or should we consider it slang?

mismatches between list templates and categories

Importing hatnote templates from Wikipedia

Old Albanian

Usage of etym-only codes: what should or shouldn't be appropriate?

Removing non-lemmas from Special:Random

adding a topic category 'Religions'

Kiautschou German Pidgin

Update on enabling dark mode

Romance definite article boxes

what counts as a "country"?

T:User lang-1 through T:User lang-5

Declension tables for Arabic dialects

Link to Sanskrit roots on PIE root page?

Lashi revamp and on mass-deletions

Ongoing vandal

Does Unbinding Disinherit?

proposed set categories