Wiktionary:Beer parlour/2024/December

Use of y instead of ij in Early Modern Dutch

In the modern Dutch alphabet, the digraph ij is used instead of y (although it's often written like a y with an umlaut), but in Early Modern Dutch y was used (up until 1804 apparently). But if you search Wiktionary for any of the y versions, you won't find them. Should we be including the y versions in Wiktionary? And if so, should they be listed under Early Modern Dutch or just Dutch (Early Modern Dutch is not listed at Wiktionary:List of languages). Nosferattus (talk) 01:35, 1 December 2024 (UTC)[reply]

Yes, these forms should be included. Wiktionary views any term written after 1500 to be modern Dutch. There are already a few of these y-forms added: zyn, cyfer. As you can see, they use the {{obsolete spelling of}} template. I think their inclusion is limited, not because they shouldn't be included, but because editors mainly focus on adding terms in current use.

It would perhaps be a good idea to create a template similar to {{pt-pre-reform}}, to better organize the obsolete and superseded spellings.

Stujul (talk) 09:38, 1 December 2024 (UTC)[reply]

We still find this in Max Havelaar, written in 1860. For example, on just one page we find myn, my, blyken, hy, tyd, belangryke, twyfel, pryzen, stryden, misdryf and zyn.^[1] The author used his own, somewhat idiosyncratic spelling, though. --Lambiam 10:30, 1 December 2024 (UTC)[reply]

Thanks! I've started to add some of the more common words like hy and my. Nosferattus (talk) 02:19, 4 December 2024 (UTC)[reply]

Regarding Dutch spelling, pannekoek is given as superseded spelling of pannenkoek, and the latter is the official spelling, but not everyone agrees with that, see Witte Boekje. Shouldn't 'pannekoek' be rather a 'non-official spelling'?

More examples here. Exarchus (talk) 19:18, 5 December 2024 (UTC)[reply]

Reveal potentially shocking/NSFW images only upon clicking?

I visited loxoscelism to add a translation and was greeted by a slightly revolting image.

I would be in favor of hiding such images behind a "click to reveal" message so that they aren't shown by default. This should be quite easy to do using JS.

What do others think? — Fytcha〈 T | L | C 〉 13:13, 1 December 2024 (UTC)[reply]

@Fytcha: The image would still be visible in the mobile search: [2]. It would be better to just make that image an external link. Ioaxxere (talk) 15:17, 1 December 2024 (UTC)[reply]

FWIW, this was discussed in 2015 and last year. If we start censoring images, it's a slippery slope: people made headlines the very week we last discussed this for censoring Michelangelo's David. People—you see them on Talk:gay as recently as this week—complain that gay people are pornography / NSFW, and pass laws to that effect. There are people who think, and seek laws saying, trans people are pornography / NSFL. There are people (conservative Jews, Muslims, Christians) who think images of any women are NSFL. People complain about the image at swastika, or issue legal challenges (at least to WP) over maps of countries they'd prefer had different borders. Some people object to the image at penis, or the image at areola, but I think they're worth a thousand words, and don't see why a workplace would be OK with someone looking up penis, and only freak if the dictionary were illustrated—as others said in prior discussions, if one works at such a place, one may need to avoid Wiktionary at work, since images are liable to show up.
With that said, I acknowledge that it's reasonable that we unofficially have some practices, e.g. the entry for mangle doesn't contain an image of a mangled body even though it theoretically could. I don't mind the image at loxoscelism, but I'm not entirely opposed to hiding some images behind a click... I'm just very wary of the slippery slope.
One idea, if this doesn't exist already, is an opt-in gadget which would hide all images and require a click to see them; that avoids the slippery slope by being image-agnostic and opt-in. - -sche (discuss) 16:43, 1 December 2024 (UTC)[reply]

@-sche: Thanks for the reply as well as the links. One of my take-aways is that selectively hiding pictures (behind a "click to show" message) is not at all "politically unviable" on Wiktionary.

As pointed out plentifully, finding a sane demarcation will prove difficult. Reading these discussions, the impression I got was that the wisest strategy would be to start with a very liberal policy (that is however enacted for everyone by default in an opt-out fashion) and then have people incrementally work out amendments in subsequent (BP) discussions. These kinds of demarcations are not found conclusively in a single sweep. My mentioning of "NSFW" above was probably ill-advised, so what I'd suggest now as a starting point for which images to hide by default is (medical) gore, i.e. photos of wounds, deformations, the effects of disease, photos taken during surgeries, etc.

one may need to avoid Wiktionary at work, since images are liable to show up.: That's true; currently, people cannot access Wiktionary at work (or in similar situations) free of risk. What I would point out is that this is an unusual and thus surprising fact about Wiktionary as a dictionary. Of all the dictionaries I've used, I don't think there has ever been another one where I had to be cautious using it in front of other people. — Fytcha〈 T | L | C 〉 19:07, 1 December 2024 (UTC)[reply]

Whether it is a slippery slope depends on the art of formulating policy, otherwise of course it can be watered down if we are unsure about it. We can distinguish motivations by which people might avoid images. For cases of medical irregularities the hardiness which we expect differs — one may well prefer a certain time of decision and mental preparation to see the image because there is only so much repulsive content any one can consume without his affective wellbeing being called into question – from the responsiveness to the regularly behaving exposed human body. If someone does not suffer locally appropriate coverage of it on كَتَبَ (kataba, “to write”) it is his problem and it is not even easy to have a depiction of an action while on the other hand the majority of the internet is porn anyway, and grounds for much greater dissonances and contradictions to scripture offended readers would have to care about, calling the survival of Islam in the 21st century into doubt, a question of available and appropriate attention we have to put into the balance. We do not have to equate illness, violence, nudity, and making love. There is also a historical depth to the matter: I guess Nazi stuff falls under “violence” but we can expect a distance towards things because of how long ago a thing prevailed, possibly again leaving only a limited number of images.

However yes, I’d rather not burden our editors with dealing with thinking about the general guidelines even, and keep a policy of deliberate ambiguity beyond what we have written. You could try some technical execution anyway of course, just that, unless we exert ourselves much to bloat our policy pages, the eventual uncontentiousness of which is doubtful, we won’t deploy it with discernible regularity beyond reverting new users futzing around with images by reason that “I have 10,000 edits per year/I am admin and I know well enough which pictures are appropriate in the given context, you however have an ideological agenda, from what I can see.” It would result in templates and/or gadgets which, in effect, new users would be discouraged to use, not to say disallowed. Fay Freak (talk) 17:48, 1 December 2024 (UTC)[reply]

@Fay Freak: [...] unless we exert ourselves much to bloat our policy pages, the eventual uncontentiousness of which is doubtful, we won’t deploy it with discernible regularity beyond reverting new users futzing around with images [...] I think I agree which is why I'm thinking towards some kind of consistent policy. We're currently in the wild west with respect to images. — Fytcha〈 T | L | C 〉 19:14, 1 December 2024 (UTC)[reply]

FWIW this is being discussed on Wikipedia, too: Wikipedia:Village_pump_(policy)#Can_we_hide_sensitive_graphic_photos?. (On the whole, I find myself in the NOTCENSORED camp.) - -sche (discuss) 01:45, 3 December 2024 (UTC)[reply]

Here's the right link: w:Wikipedia:Village pump (policy)#Can we hide sensitive graphic photos?. FWIW, I think we should hide them, mostly for the low-bandwidth peoples. CitationsFreak (talk) 03:02, 3 December 2024 (UTC)[reply]

Can one use user CSS or similar to suppress all images (to address NSFW problem)? DCDuring (talk) 15:23, 3 December 2024 (UTC)[reply]
Should "File:"/"Image:" be replaced by templates that could carry information to allow selective filtering of images by type? DCDuring (talk) 15:23, 3 December 2024 (UTC)[reply]

Template:defdate and pre-1500 dates

A number of English entries contain things like this (at sky):

(obsolete) A cloud. [13th–16th c.]

However, we take the cutoff between Middle English and Modern English to be around 1500, so it has always struck me as anachronistic to say that the Modern English word arose in the 13th century. That information should be at the Middle English entry.

I'd like to propose moving the origin date of these senses to Middle English, then replacing the Modern English {{defdate}} invocation (some of which can be found using this crude search) with

(obsolete) A cloud. [Middle English–16th century]

or

(obsolete) A cloud. [until the 16th century]

(Also, this is not a paper dictionary, so there's no good reason to abbreviate "century" as "c.")

Thoughts? This, that and the other (talk) 05:39, 2 December 2024 (UTC)[reply]

Commenting and subbing as I have been wondering the same for Lechitic lects. I similarly do not use {{etydate}} in Polish if the term was inherited from Old Polish, etc. Vininn126 (talk) 08:31, 2 December 2024 (UTC)[reply]

Support. Tollef Salemann (talk) 08:51, 2 December 2024 (UTC)[reply]

I don’t see a contradiction perforce. You propose to water down, information that could later be used, to edit history. If these datings are credible information at all and not random attestation ages that can happen with the Middle Ages; we still have not solved the problem of regularly labelling “reconstructed” lects, which would allow us to cleanly state things like “probably in the 4th century already, but attested from the 9th”; okay I sometimes use the etymology for this, as on بَال (bāl), if a reconstruction entry is not feasible. How is the move of Byzantine Greek going? 🙄 Fay Freak (talk) 18:05, 3 December 2024 (UTC)[reply]

Would sister languages also be marked as being attested since that time? Would we use Latin to mention since when we see attestation dates of Spanish? Vininn126 (talk) 18:47, 3 December 2024 (UTC)[reply]

Our coverage of Latin is larger. The decision depends somewhat on how secure an individual language’s editors are expected to be with the corpora, and what they can expect to be created any time soon. If we had lots of Greek entries having such phrasing as proposed, the planned reorganization would be considerably more challenging, demanding to revisit the attestation situation in affected cases. Just let the editors—including you—leave as much as they know, in so far as it is not overwhelming to the eye?

To ever halt before your problem, one has to construe one’s task as an editor gigantic enough that one boasts to never leave any gap, inconsistency, or inconsequence, which also does not align with reality, in as much as the presence of a gap, inconsistency, and inconsequence appears to align not with the actual reality of a language. Instead we acknowledge our finite manpower. Not some imaginary limit stemming from language cutoffs, the purpose of which apparently one has remind editors about once in a while again: inasmuch as they are justified by mutual intelligibility of languages, they do not constitute impermeable walls, though we may remember them as such and speakers constitute their identities by such ideas to some degree; instead the language headers, subheaders and labels are there to communicate something which you otherwise wouldn’t immediately relate to them. Seen in such a way, the defdates to senses are, beyond their situation in time and place—as identified by dialect and chronolect headers—, exactly what the dictionary glosses to senses of a word are supposed to do. What you bring up as a question of logics turns out a question of balance. Fay Freak (talk) 22:27, 3 December 2024 (UTC)[reply]

I'm still overall against including it over our arbitrary boundaries. Vininn126 (talk) 22:33, 3 December 2024 (UTC)[reply]

Adding the information to the Middle English entries definitely seems like a good idea. While I can see the theoretical justification for replacing dates before 1500 with "Middle English", I'm not sure that change is really an improvement: it obviously removes some information, and the periodization convention of distinguishing between Middle English and Modern English is not particularly significant in and of itself.--Urszag (talk) 22:35, 3 December 2024 (UTC)[reply]

I support using dates with definitions, so 13th to 16th century and not from Middle English to the 16th century. The date rage is more informative and easier to read. The sometimes arbitrary boundaries between stages of a language can live in the etymology section and the categories generated from it. Vox Sciurorum (talk) 00:30, 4 December 2024 (UTC)[reply]

I would delete that from the list of Modern English senses and move it to the etymology section (‘from Middle English foo “bar”…’) and to the Middle English entry. Nicodene (talk) 21:09, 4 December 2024 (UTC)[reply]

@Nicodene: 16th century is Modern English, hence if attested it should not be removed. J3133 (talk) 07:58, 7 December 2024 (UTC)[reply]

Obviously. I was going by the “until the 16th century”. Nicodene (talk) 09:30, 7 December 2024 (UTC)[reply]

By all means add information to Middle English entries, but I don't see any reason to remove it from English entries. The proposal just makes things vaguer and more imprecise. The distinction between ‘Middle English’ and ‘Modern English’ is just a historical convention anyway; for a linguist, enforcing this distinction in practice is next to impossible if you're working with texts from the 16th century (as I have done here in the past). At least with Old English there is a clear break in the written record which makes the change in grammar and vocabulary pretty sharp. Ƿidsiþ 07:23, 7 December 2024 (UTC)[reply]

I agree. Plus, anyone who cares about the distinction between Middle English and modern English can extrapolate from the dates given. But anyone who has never heard of Middle English (which is probably most people) won't find the information meaningful. Andrew Sheedy (talk) 05:04, 20 December 2024 (UTC)[reply]

I still feel this doesn't hold up for languages like Latin with multiple children and also the fact it's well known. Vininn126 (talk) 08:54, 20 December 2024 (UTC)[reply]

I agree. English is a bit of a special case relative to other major languages, because so much of its vocabulary entered the language late. I wouldn't go past Old English, and maybe not even that far (I would be fine with the defdate template reading [Old English to present], but not [Middle English to present]. Andrew Sheedy (talk) 17:18, 20 December 2024 (UTC)[reply]

So would I, I suppose – especially since the dates of OE texts are often a bit speculative. Ƿidsiþ 06:40, 21 December 2024 (UTC)[reply]

Support, provided the 'removed' information is transferred over to Middle English. I disagree with changing "c." to "century" though. Regardless of whether you're doing things online or on paper, it's generally a good idea to optimize the space used and keep things concise; it just looks prettier that way. MedK1 (talk) 04:05, 26 December 2024 (UTC)[reply]

FYI: December 2024 Unicode update

https://us11.campaign-archive.com/?u=c234d9aba766117eac258004b&id=d533f3804f —Justin (koavf)❤T☮C☺M☯ 23:19, 2 December 2024 (UTC)[reply]

'LANG forms' -> 'LANG spellings'

IMO it is confusing that we use 'forms' to mean 'spellings' in categories like Category:American English forms and Category:European Portuguese verb forms and Category:Brazilian Portuguese forms superseded by AO1990; also for that matter, more generally in CAT:Obsolete forms by language, CAT:Archaic forms by language, etc. Most of the descriptions of these categories make clear that the "forms" referred to are superseded/archaic/obsolete/etc. spellings, not some other kind of form. Even opening up Category:Ukrainian archaic forms produces 5 subcategories whose names all contain "spellings" or "terms spelled with" in them. Unfortunately the term "form" is badly overloaded at Wiktionary; any action to reduce the overloading is welcome in my book. So I propose at first to rename ad-hoc language-specific categories containing 'forms' to 'spellings'; and if there are no objections, rename the more general 'LANG superseded/archaic/obsolete/dated/rare/uncommon/informal/nonstandard forms' -> 'LANG superseded/etc. spellings'. Any terms that are in a 'LANG foo forms' category but aren't mere spelling variations should be moved to the corresponding 'LANG foo terms' category (which exist for all 'foo' except for 'superseded', but 'superseded' seems specifically for spellings, so this is unlikely to be an issue). The only 'foo forms' category I've excluded is 'LANG short forms', which is using "forms" differently, and should eliminated in favor of either 'LANG ellipses', 'LANG clippings' or 'LANG abbreviations' (depending on what sort of short form is involved), but that's a different can of worms. Benwing2 (talk) 09:08, 7 December 2024 (UTC)[reply]

@-sche Sorry to ping you directly but surprisingly no one has commented and I figure you might have something to say here. Benwing2 (talk) 09:46, 9 December 2024 (UTC)[reply]

Thanks for the ping. The entries in these categories don't seem to be all of one type: it seems they will need pruning (especially, but not only, if renamed) iff people still want to distinguish spellings from forms. (Or are we abandoning that distinction? I know some later commenters in that discussion argued for that instead, and I'm not sure whether a decision was reached or, if not, which approach would be best.)
For example, I see we have anemia as an American form of anaemia (it should indeed rather be spelling if we're distinguishing those two words), but we also have airfoil in the same "American forms" category but using an "American spelling" label although it differs from aerofoil in more than just spelling. Likewise Abissinia, currently listed as an archaic "form", would be better as a "spelling", but the difference in adipsy and adipsia is not just spelling; abyssus, currently presented as an "archaic form of abyss", also does not seem like a mere archaic "spelling", but perhaps it is also best not as a "form" but as an archaic (or obsolete) synonym of abyss, or as (obsolete) Abyss. So, especially (but not only) if renaming the categories, it seems like we need to decide what we want the scope to be, and whether we want to distinguish "only the spelling is different" and "the pronunciation is also different" or combine those two things...? - -sche (discuss) 17:23, 9 December 2024 (UTC)[reply]

Hmm. In practice I suspect people won't be able to distinguish mere archaic/obsolete/American/British spelling variants from those that also differ in pronunciation (aluminum vs. aluminium). At the same time I think "form" is far too overloaded. Maybe we could say "American English variants" etc.? Also technically the "European Portuguese verb forms" vs. "Brazilian Portuguese verb forms" reflect slight differences in pronunciation; they are mostly in past tense -amos (Brazilian) vs. -ámos (European), which is meant to indicate a difference in vowel quality. Likewise although the majority of "Portuguese forms superseded by AO1990" are just spelling differences, there are a few that are not, e.g. pre-reform abeto Douglas vs. modern abeto-de-douglas (although in that case the definition specifically says "pre-reform spelling of ..." and it seems there was also a pre-reform abeto de Douglas). So maybe we should use the term "variant". As for alt forms vs. alt spellings, I do think we should try to make that distinction since some of the things tagged as "alt forms" differ quite a bit from the form they are said to alternate with. Benwing2 (talk) 20:50, 9 December 2024 (UTC)[reply]

Me and the Portuguese editors I know use {{alt spelling of}} when the difference is in spelling but not in pronunciation, at least “phonemically” — i.e., different spellings between European and Brazilian Portuguese are alternate spellings because the difference comes from each dialect’s pronunciation of phonemes, not just of that particular word. Meanwhile, I use {{alt form of}} when it’s a different pronunciation that doesn’t stem from a systematic difference between dialects.

However, this distinction in template usage is almost entirely moot if the category that gets assigned is the same. I think the most useful decision is to create new categories, 'LANG archaic spellings' etc., as daughters of 'LANG archaic forms' etc. Though this would need us to pay some real attention to replace the category tree definitions as well as the categorizations called by templates. Polomo47 (talk) 01:48, 10 December 2024 (UTC)[reply]

Template:syncopic form / Template:syncopic form of

syncopic seems to be vastly less common than syncopal, which is itself less common than syncopated (see ngrams). Should we rename these templates? P U C – 16:09, 8 December 2024 (UTC)[reply]

Maybe it should just be {{syncope}}/{{syncope of}}, since we already have {{clipping}}/{{clipping of}}, {{ellipsis}}/{{ellipsis of}}, etc.? Benwing2 (talk) 09:53, 9 December 2024 (UTC)[reply]

This sounds better to me. Vininn126 (talk) 10:05, 9 December 2024 (UTC)[reply]

Or just replace it entirely with {{clipping}} (of), since syncope is a type of clipping anyway, and it's not clear why one would want a specialized category for it. Nicodene (talk) 10:23, 9 December 2024 (UTC)[reply]

I actually only had instances of clipping in Polish entries. Syncopy might be seen as more phonological and clipping is often a process in more colloquial things. Not sure. Vininn126 (talk) 10:25, 9 December 2024 (UTC)[reply]

There isn't a difference, as far as I am aware, other than the fact that syncope refers to clipping in medial position. And that it sounds fancier. Nicodene (talk) 10:37, 9 December 2024 (UTC)[reply]

I too was under the impression that syncopy is a purely "mechanical" phonetic process whereas clipping is a deliberate removal of syllables used to coin new words. Not that I have any source to support that interpretation but... P U C – 11:18, 9 December 2024 (UTC)[reply]

I can't find any sign of such a difference outside the realm of (accidental?) Wiktionary convention. Google Books, for instance, brings up a laundry list of sources confirming that these are indeed synonyms. Nicodene (talk) 12:00, 9 December 2024 (UTC)[reply]

I have to add my voice to the chorus of people saying that I find the present distinction to be valuable. I certainly wouldn't insist on the current terms used - and I am increasingly convinced we shouldn't keep using them as we are. But distinguishing between clipping that occurs as part of a gradual phonological process (e.g. Romance syncope, or English fancy) vs deliberate, conscious truncation (e.g. math) seems very valuable. This, that and the other (talk) 00:22, 10 December 2024 (UTC)[reply]

Very well. In that case the issue is finding a pair of terms that can reasonably be specialized in the way that you have described.

We could try elision and shortening respectively. Nicodene (talk) 04:07, 10 December 2024 (UTC)[reply]

I personally am fine with elision and clipping respectively since we're already using clipping in the sense of "conscious truncation". Benwing2 (talk) 07:27, 10 December 2024 (UTC)[reply]

Seconded. Vininn126 (talk) 07:29, 10 December 2024 (UTC)[reply]

I agree with @Nicodene here; I don't see why we need to categorize syncopes, apocopes and aphereses separately from clippings. See Wiktionary:Requests_for_moves,_mergers_and_splits#Template:clipping_of,_Template:aphetic_form_of;_Template:clipping,_Template:aphetic_form (but unfortunately there was pushback for this). Benwing2 (talk) 10:26, 9 December 2024 (UTC)[reply]

Alternatively, if "clipping" seems specific to colloquial language, we could merge syncopes/apocopes/aphereses into "elisions". Benwing2 (talk) 10:29, 9 December 2024 (UTC)[reply]

It could be that these are all part of the same process, I'm not sure I like the usage of ellipsis - differentiating skipping a word versus a syllable (and from there skipping a syllable in other ways) could be useful. Perhaps that could be a separate parameter. Vininn126 (talk) 10:32, 9 December 2024 (UTC)[reply]

@Vininn126 elision, not ellipsis -- they are different processes and would be categorized differently. Benwing2 (talk) 11:07, 9 December 2024 (UTC)[reply]

Ah, yes you're right! So perhaps there's something to that, then. Vininn126 (talk) 11:08, 9 December 2024 (UTC)[reply]

Why is there no quote-thesis template?

We have cite-thesis. I was wanting to add a quote from a thesis to an entry, and the quote, quote-book, and quote-journal templates are not fit for purpose. Is it worth putting it to a vote? I don't know if I can do that as I've only been active on Wiktionary quite recently. Cameron.coombe (talk) 23:04, 9 December 2024 (UTC)[reply]

@Cameron.coombe: I think {{quote-book}} is fine for this purpose, and don’t think a separate template is required. — Sgconlaw (talk) 23:10, 9 December 2024 (UTC)[reply]

@Sgconlaw cheers, I assumed thesis titles needed to be set in quote marks, not italics, but that's Chicago style. Does Wiktionary not require this? Cameron.coombe (talk) 23:15, 9 December 2024 (UTC)[reply]

I don't think we're that fussy 😊 This, that and the other (talk) 23:37, 9 December 2024 (UTC)[reply]

I usually use quote-journal or quote-book and add "|genre=Thesis" as a ham-handed work-around. --Geographyinitiative (talk) 23:45, 9 December 2024 (UTC)[reply]

Thanks all! Will proceed with boldness Cameron.coombe (talk) 00:11, 10 December 2024 (UTC)[reply]

Request AutoWikiBrowser

I've been doing mass-correction of Portuguese pre-reform or otherwise archaic spellings — for reference, see how many pages are listed in WT:RFVI, and how I've cleared out [[Category:Portuguese superseded forms]]. My current project is clearing out the categories dated forms, archaic forms, and, above all, obsolete forms.

I think AutoWikiBrowser might just be able to help me with my antics — they must've helped my friend @MedK1 —, so I'd like to request access. Polomo47 (talk) 01:37, 10 December 2024 (UTC)[reply]

@Polomo47 I gave you access. Benwing2 (talk) 07:26, 10 December 2024 (UTC)[reply]

Use of titles in quotes and citations

I'm just fixing a quote here and noticed tbe editor put a title (Dr. med.) preceding the author name. Is this established practice here? I don't personally like it because it's clutter and likely not applied consistently. But I couldn't find a policy. Cameron.coombe (talk) 04:45, 10 December 2024 (UTC)[reply]

@Cameron.coombe: I don’t think we have a policy on this yet, but I always remove titles and forms of address unless they are strictly required to identify the author (for example, in some early works, female authors were named after their husbands, as “Mrs. John Smith”). — Sgconlaw (talk) 05:07, 10 December 2024 (UTC)[reply]

@Sgconlaw thank you. Do you think it's worth me drafting a policy proposal? Cameron.coombe (talk) 05:26, 10 December 2024 (UTC)[reply]

Do we not label attributive adjectives?

I noticed that common attributive-only adjectives are not labelled as such:

former
mere
elder (though in usage notes)
principal

Is there a reason behind this? Whether an adjective is general use (so no note), attr only, postpositive only, or pred only is important information, especially for non-native speakers, and it's provided in other dictionaries. There is an attributive label in the lb template, but it links to a gloss of the meaning for nouns, not adjectives, and, at least based on the common examples above, doesn't seem to be in use? Cameron.coombe (talk) 10:53, 11 December 2024 (UTC)[reply]

It is a counsel of perfection that we should properly label every adjective sense that needs such a label. Add the label to the appropriate senses when you find them. I can't think of a practical way to detect all the cases where such labels are missing. A list from some source would be helpful, probably just for the more common cases.

We have labeled as "attributive" (mostly not "attributive only") some 200+ English terms. To label a sense of a polysemic adjective "attributive only" may risk user confusion. DCDuring (talk) 14:38, 11 December 2024 (UTC)[reply]

@DCDuring thank you for the thoughts. I'm quite happy with simple "attributive," which is what I'm familiar with from other dictionaries. "Mostly attributive" can also be helpful if pred. sense is rare or nonstandard but attested. I'm not sure about the label auto-linking here though when I'd use it of adjs. Cameron.coombe (talk) 22:11, 11 December 2024 (UTC)[reply]

If you are saying that we should link the attributive (and postpositive and predicate) labels to something explanatory, I agree, though I would usually settle for our entries for the terms or {{senseid}}ed definitions at the entries. It also might make sense to have categories for the terms that have such labels. Making the changes required is not in my wheelhouse. DCDuring (talk) 22:30, 11 December 2024 (UTC)[reply]

@DCDuring cool, thanks. I wasn't familiar with senseid. I can have a play next time I need to. My only concern now though would be adding a whole lot of attributive labels and then having someone go through and revert them. I've got your support, but I don't know how universal that translates to! Cameron.coombe (talk) 23:15, 11 December 2024 (UTC)[reply]

@Cameron.coombe: I would also

Support your additions, FWIW. 0DF (talk) 00:02, 12 December 2024 (UTC)[reply]

Maybe, we should give folks a chance to comment.

I'm already disagreeing with myself about my rejection of attributive only as a label rather than attributive. The normal ("unmarked") state of an English adjective is that is prepositive and usable both attributively and as a predicate. The function of our labels is to mark exceptions to the unmarked state. Bare attributive does not do this, IMO. I don't know that we can be certain that only should follow attributive, because exceptions are likely, if not now, then perhaps in the past, and if not in UK and North America, then in Australia, the Caribbean, or India. Maybe the default for all of these should contain usually, with stronger only reserved for cases where the supporting evidence is strong. DCDuring (talk) 00:32, 12 December 2024 (UTC)[reply]

True, attributive only or usually attributive is more precise. Other dictionaries use simply attributive, but probably because of space restrictions. (I know space isn't a concern for Wiktionary, but is clutter?) For exceptions, I would handle these as subdefs:

former

(attributive) Previous.
1. (nonstandard, India) Predicatively [I can't think of a usage example lol]

I'm sure this isn't used in India, just an example. Cameron.coombe (talk) 00:57, 12 December 2024 (UTC)[reply]

The U4C is ordering an admin to respond to a block appeal

User Ghilt from the Universal Code of Conduct/Coordinating Committee (U4C) has proxy-posted a 3rd block appeal on User_talk:Gapazoid. They have stated that an en-wiktionary admin other than the blocking admin (Surjection) must read and respond to the unblock request. 2603:6011:C801:9FED:B33F:5C52:5400:3763 14:56, 11 December 2024 (UTC)[reply]

Why not just change the blocking reason to "disruptive editing" and have done with it? 0DF (talk) 18:22, 11 December 2024 (UTC)[reply]

I'm fine with changing the block reason, but I maintain that this editor cannot be allowed to edit again. — SURJECTION ^{/ T / C / L /} 18:57, 11 December 2024 (UTC)[reply]

@Surjection: As far as I can make out, your new block reason should satisfy the blocked editor. I think it's reasonable, FWIW. 0DF (talk) 19:23, 11 December 2024 (UTC)[reply]

Are you the IP who said they'd kill themself? Polomo47 (talk) 19:01, 11 December 2024 (UTC)[reply]

I note that User:Gapazoid is (in addition to being locally blocked) globally locked as a "Vandalism-only account", though User:Xaosflux stated that global unlocking might be considered if en.Wiktionary unblocks. Unless Gapazoid has deleted contributions on other wikis that I can't see, I actually find the global lock rationale harder to understand than the local block rationale; the user appears to have edited only en.Wiktionary and en.Wikipedia, and the few edits to en.Wikipedia appear to be mundane copyediting.
AFAICT Gapazoid has made only a single edit to Wiktionary content (?), to MAP; the only other (eight) edits the user has made were to his/her talk page; is this correct? (I see no deleted contributions.) If Special:Contributions/2600:387:0:803:0:0:0:95 (also locally blocked and globally locked) and/or Special:Contributions/2603:6011:C8F0:E4E0::/64 are the same person, their own sole contributions were to threaten, on Gapazoid's talk page, to commit suicide. If the user has made other edits I have missed, either on Wiktionary or elsewhere, I hope someone will bright them to light.
The user's edit to MAP was to change the usage note from commonly interpreted as a sign that the speaker supports (or is sympathetic to) such people to ...a sign that the speaker supports sexual contact between adults and children. That change seems mistaken / incorrect to me, and had I seen it, I would have undone it with an edit summary explaining that the "supports such people" language seems more accurate, but — if the edit had been made with no edit summary, or with a mundane edit summary — I would have taken it to be a mistaken but good-faith edit, not vandalism, and would not personally have issued a block. However, the edit was made with an edit summary which, like the user's posts on his/her talk page, state that he/she is a pedophile but is opposed to child sexual abuse.
I can understand the user's objection to the original block summary saying he/she engaged in "pedophilia advocacy", and I appreciate the improved block summary. I also understand the position that a user openly announcing himself/herself as a pedophile is disruptive, somewhat similar to w:WP:HID; threats of w:WP:SUICIDE also seem disruptive. I'd also note that my spider sense is that people who are blocked for things like this [edited to add for clarity: meaning "borderline disruptive things", not meaning "pedophilia-related things", which are rarer] and then spend this much time trying to get this many wikis / functionaries / organs of the WMF / etc involved in unblocking them . . . in the situations in the past where it's happened, such users have either been felt also by other wikis' admins to be NOTHERE (and so remained blocked not only here but also on other wikis that considered their appeals), or have been unblocked but then proven themselves to indeed be disruptive (NOTHERE, here only to bog people down in debates, etc) and gotten reblocked in time. Considering all of that, I, for my own part, as just one admin here, decline to unblock. If other admins (or other editors) want to weigh in, I encourage them to do so! I pinged Xaosflux above to make him aware of this discussion, and now ping User:Ghilt and will also leave a note on Gapazoid's talk page pointing to this discussion. - -sche (discuss) 00:44, 12 December 2024 (UTC)[reply]

I remember a pedophilia commenting on BP, saying that are a pedophile. It was later hidden within a folder, though. CitationsFreak (talk) 03:53, 12 December 2024 (UTC)[reply]

Thank you for your statement, -sche. And also thanks to Surjection for changing the log entry. This concludes the matter for us. On behalf of the U4C, --Ghilt (talk) 09:27, 12 December 2024 (UTC)[reply]

We can entertain user lock appeals at m:Special:Contact/Stewards, and yes: overcoming community blocks is a good way help such an appeal be successful. Xaosflux (talk) 12:24, 12 December 2024 (UTC)[reply]

After a private lock appeal I have unlocked the account. To respond to comments here, the lock was implemented after an SRG request due to pedophilia advocacy - similar to why we, for example, lock accounts for uploading CSAM on Commons, even if they only edited that project. With that being said, they have made a reasonable further explanation to me in private and I see it as a sign that this can currently be locally handled. EPIC (talk) 16:20, 19 December 2024 (UTC)[reply]

Protecting pages as "model pages"

Saltmarsh (talk • contribs) has semi-protected a couple dozen Greek entries as "model pages". I don't think this is a good practice, since it deters editors who could materially improve these pages (no dictionary entry is ever complete), and there are much better approaches, e.g. having example entries in a separate namespace. — SURJECTION ^{/ T / C / L /} 18:46, 11 December 2024 (UTC)[reply]

The full list of protected pages appears to be -τερος, Άγγλος, άγγλος, αγγούρι, αγγούρια, ακρογωνιαίος, ανηψιών, ασκί, βαθύς, βρέχει, λύνομαι, λύνω, μεταφρ., περισσότερος. Some of these were originally fully protected (i.e. only admins could edit them). — SURJECTION ^{/ T / C / L /} 18:48, 11 December 2024 (UTC)[reply]

@Surjection, PUC, these pages were not locked; I have edited often (they used to be protected from anonymous greek editors who mostly write vanadlisms about soccer teams, and silly schooljokes. That is because we -editors of modern greek- are not around every single day). The models are in Category:Greek model pages so that we can copypaste from them. All languages should have copypaste models for us: because wikitext is getting harder and harder. Also see a trial at User:Erutuon/Ancient Greek model pages which is even more complicated. I always try to find copypaste patterns from recent edits by administrators; I would have liked to have them in some Cat with their endorsement, rather than going around Histories and their Contributions, hoping to find something similar to my task. If not protected, fine: but someone has to patrol them. Thank you. ‑‑Sarri.greek ^♫ I 10:47, 12 December 2024 (UTC)[reply]

These pages are still semi-protected and many of them were admin-protected. I don't see any "anonymous greek editors who mostly write vanadlisms about soccer teams, and silly schooljokes" in the history of any of these pages, so they cannot simply have been protected to guard them from vandalism.

The idea of model pages on its own appears sound, but it's not a good idea to make the actual mainspace pages the 'model pages' and then protect them because they're 'model pages'. These should be in a separate namespace. — SURJECTION ^{/ T / C / L /} 10:51, 12 December 2024 (UTC)[reply]

No problem: unlock them, M @Surjection. We can make a List and write specific examples -because they cannot be changed without discussion: they are heavily copypasted- at the About Greek page or Help Greek. My administrator @Saltmarsh has done SO much for Modern Greek! I would like to help him a bit. It's just... mmm I need a little help from programmers. For example, a little template for the Orthographic Reform to monotonic of 1982. (cf Άγγλος.2024 cf Notes Little things like that. I could make it myself, but from experience, I see that only interface programmers check Templates and make them in a correct way. ‑‑Sarri.greek ^♫ I 15:52, 12 December 2024 (UTC)[reply]

@Surjection, Sarri.greek As far as I can see these "Model pages" do no harm (kindly point any out if you any see one). New editors need help with layout, not always easily extracted from "Help". Protecting them (which again does no harm) ensures that any changes in suggested layout can be discussed. — Salt marsh ^☮ 14:29, 12 December 2024 (UTC)[reply]

Yes, thank you @Saltmarsh. Need to trust some pages; the ones checked by an admin. By the way, I am checking some of the pages. When robots finish their work, we can check again. (... I know only named parameters, cannot remember the sequences of positional params: I hate it). I have to throw away alll my cheatsheets. Thank you, dear Salt!! ‑‑Sarri.greek ^♫ I 14:38, 12 December 2024 (UTC)[reply]

The harm they do by being unnecessarily protected is to prevent users from editing them. This goes against the entire idea of having a wiki. — SURJECTION ^{/ T / C / L /} 15:21, 12 December 2024 (UTC)[reply]

I would oppose protecting any page in principal namespace on the grounds that it is a model page. Such model pages might be useful in Wiktionary space. I wonder how that could work in any page with multiple L2 sections.

It might be useful to have templates, possibly located on entry talk pages, that indicate that a given L2 section has achieved some stage of "completion", so that contributors could find such "models". DCDuring (talk) 14:56, 12 December 2024 (UTC)[reply]

Nice idea, thank you M @DCDuring. Something analogous to wikisource's coloured bars. not reviewed / reviewed - see List so and so. A list of 'SOS' pages can be created, especially the ones with 3 Greek L2s, 2 Greek L2s, for every part of speech or inflectional group etc. Usually, I edit Ancient and Modern Greek in unison (lots of pages coinicide and Modern refers all the time to previous etymologies and inflections. Especially with Hellenistic Koine -which has many problems and is usually ignored-). I hope robots will normalise the standard templates because it is very difficult to have 2 or 3 ways to write the same thing in the same page. I am awaiting also for the pending Medieval Greek gkm. Thank you all for your attention. ‑‑Sarri.greek ^♫ I 15:11, 12 December 2024 (UTC)[reply]

I realized that the situation in Ancient/Modern/Medieval? Greek made the model-page-in-principal-namespace idea practical for those languages, as other languages do not use the same characters. But it wouldn't work so well for CJKV entries where the different L2s often have different levels of development. I would prefer an approach that worked across all kinds of entries with multiple L2s. Maybe it would be useful to see what works for Greek-character entries along the lines that you suggest, without protecting model pages. That might be a 'model' for entries with multiple L2s in other character sets. DCDuring (talk) 15:30, 12 December 2024 (UTC)[reply]

I agree with DCDuring there may be a case to be made for putting such entries in the Wiktionary namespace, but I also agree strongly with Surjection that these protections should be reverted. This is a bad use of the page protection mechanism. — Mnemosientje (t · c) 19:53, 17 December 2024 (UTC)[reply]

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘ Well @Surjection "against the entire idea of having a wiki." Wikipedia has numerous such protected pages. These pages do no harm at all - I suspect that basically you "just don't like them". Well I do!, and these interminable discussions, which some people seem to relish, really piss me off !! — Salt marsh ^☮ 19:58, 12 December 2024 (UTC)[reply]

Pages are protected because of vandalism, because of high use rate (templates, modules, etc.) or because they are non-content pages that should not be edited by anonymous users. Neither applies here. Again, if we want to have "model pages" that are protected, then they need to be copies outside of mainspace. — SURJECTION ^{/ T / C / L /} 20:45, 12 December 2024 (UTC)[reply]

I agree that entries should not be fully protected (admin-only) unless they have been or are likely to become the target of enough vandalism to warrant that (and even then, unless the vandalism has been enduring, protection should generally be temporary, like the protection applied to words that appear on the mainpage). Protecting pages (even at a lower protection level) simply because they are "good" is not the way to go; as recent edits to some of the pages mentioned here have shown, they were far from complete, so preventing some people from improving them is inadvisable. I agree with Surjection that if the goal is to show ideal formatting (or such), it is better to have examples (or even one single example, e.g. made-up word illustrating all possible things, e.g. how to format an adjective, a verb, a noun, all at once) somewhere in Wiktionary: space like the language's "About" / "Entry guidelines" page.

吃飽

Inspired by the discussion above, I looked at what other pages are indefinitely edit-protected at high levels. 吃飽 has been indefinitely protected, allowing only template editors(!) and admins to edit, since an edit war 2019; is this still needed? The user who was edit-warring back then seems to have matured. (Even if there is still a problem, we now have the ability to block specific users from editing specific pages, while still allowing them to edit the rest of the site, which seems like it'd be better than protecting the whole page and thus blocking anyone from editing it.) - -sche (discuss) 00:08, 13 December 2024 (UTC)[reply]

Chiming in: Protecting pages for them to be models is obviously unsound. This is platonic idealism, which does not hold water empirically given that there is always room for improvement, you just have not exerted yourself long enough on it. And protecting pages always expresses distrust to users, which needs to have some basis other than the quality of the page. Fay Freak (talk) 00:57, 13 December 2024 (UTC)[reply]

Cantonese, Hainanese, and Hakka lemmata treated as Chinese

I've recently done some work periodically to clear out Wiktionary:Todo/Lists/Derivation category does not match entry language. Currently, there are ten entries in categories entitled Cantonese, Hainanese, or Hakka terms derived from X which are not also part of the corresponding lemmas or non-lemma forms categories. Chinese/Cantonese 0T used to be an eleventh such entry, but then I edited it to resolve that problem with it. However, doing so seemed to cause other problems.
Firstly, the entry is still a member of Category:Chinese lemmas and Category:Chinese nouns despite the lang-code changes. Secondly, whereas {{lb|zh|HKC}} correctly displays (Hong Kong Cantonese) and adds the entry to Category:Hong Kong Cantonese, {{lb|yue|HKC}} just displays (HKC) and does not categorise the entry at all. And thirdly, my changes moved the entry from Category:zh:Beverages (177 members) and Category:zh:Chinese restaurants (48 members) to Category:yue:Beverages (0 member[s]) and Category:yue:Chinese restaurants (0 member[s]), both of the latter of which were red-linked until WingerBot created them.
This seems a manifestly suboptimal way to resolve the language-mismatch issue in the cases of these Cantonese, Hainanese, and Hakka lemmata. What is the proper way to deal with these cases? 0DF (talk) 00:37, 12 December 2024 (UTC)[reply]

Pinging ND381 (who added Cantonese sorry), Wpi (who added Cantonese 0T and the other Latin-script Cantonese terms), Justinrleung (who added Hainanese 弄 and 枚), and Mar vin kaiser (who added Hakka 雪文). 0DF (talk) 03:31, 16 December 2024 (UTC)[reply]

@0DF: Chinese is a special case, because terms are simultaneously Chinese and any of a huge variety of sublects. The writing system has a lot to do with this, since it allows writing things that are basically the same words in writing but completely different when spoken. It's very complicated, with variations in grammar, in pronunciation, and in writing that only partly overlap.

There's a whole universe of Chinese-specific templates and modules that do things in a completely different way from anything else on Wiktionary. When I'm going through the Todo lists, I treat most of the Chinese-related stuff as false positives and leave it alone. In all likelihood, "fixing" things will just cause other problems. The other CJKV languages share some of the same issues and are best left alone, for the most part.

I do fix things like Chinese etymologies that use language codes for Tibetan, and any {{lb|en}} on CJKV definition lines- but I know my limits (I took a year of Beginning Mandarin at UCLA, but that was 38 years ago). Chuck Entz (talk) 04:16, 16 December 2024 (UTC)[reply]

@0DF: This is happened before (see here) and the correct way would be something like Special:Diff/72937108/75521861, and not changing it to Cantonese L2.

There seems to be a bug(?) in mod:zh-pron where it did not add Cantonese lemmas if |c= is empty and | (which adds Cantonese nouns) - I'll look at this later this week. – wpi (talk) 06:43, 16 December 2024 (UTC)[reply]

P.S. I should note that on Wiktionary:Todo/Lists/Derivation category does not match entry language/description#Cleanup instructions it says Occasionally, the L2 language header, etymology template and {{head}} template may disagree on the language of the entry. If you do not speak the language(s) involved, it is best to ask the entry's creator to resolve the issue. (bold text mine) – wpi (talk) 06:47, 16 December 2024 (UTC)[reply]

@Chuck Entz: Yes, I was somewhat aware that Chinese is a unique case: unified in writing, but divided in speaking. Thank you for chiming in.

@Wpi: OK, I'll add {{cln|yue|lemmas}} vel sim. henceforth. That should fix things. Thanks for pointing me to the correct solution, and I hope you're successful in fixing the issue with Module:zh-pron. I'd already clocked that “If you do not speak the language(s) involved,…” caveat, but if I observed that literally, I wouldn't be being nearly as productive or helpful as I would be by being bold in editing. I'd already noticed that my changes to 0T were inadequate, hence my raising the issue in this BP section and then pinging you and the other relevant editors, which has led to the proper resolution, so I think I have my boldness–caution level fairly well calibrated.

0DF (talk) 13:34, 16 December 2024 (UTC)[reply]

@wpi: I've employed your {{cln|…|lemmas|[POS]}} solution. However, I'm apprehensive that that lead to the creation of Category:Hainanese verbs and Category:Hainanese classifiers, at least the former of which I would have expected already to have existed (like Category:Hakka nouns already did). Is there some reason why Hainanese terms shouldn't get POS categories? Justinrleung? 0DF (talk) 14:03, 16 December 2024 (UTC)[reply]

Thanks. Regarding Hainanese, I believe it's because {{zh-pron}} does not support Hainanese (yet), so there hadn't been any category infrastructure for it. – wpi (talk) 14:40, 16 December 2024 (UTC)[reply]

@wpi: Ah, OK. In that case, the categories are ready-made for a time when {{zh-pron}} does support Hainanese. Thanks for your help. 0DF (talk) 17:02, 16 December 2024 (UTC)[reply]

Temporary Accounts - introduction to the project

The Wikimedia Foundation is in the process of rolling out temporary accounts for unregistered (logged-out) editors on multiple wikis. The pilot communities have the chance to test and share comments to improve the feature before it is deployed on all wikis in mid-2025.

Temporary accounts will be used to attribute new edits made by logged-out users instead of the IP addresses. It will not be an exact replacement, though. First, temporary users will have access to some functionalities currently inaccessible for logged-out editors (like notifications). Secondly, the Wikimedia projects will continue to use IP addresses of logged-out editors behind the scenes, and experienced community members will be able to access them when necessary. This change is especially relevant to the logged-out editors and anyone who uses IP addresses when blocking users and keeping the wikis safe. Older IP addresses that were recorded before the introduction of temporary accounts on a wiki will not be modified.

We would like to invite you to read the first of a series of posts dedicated to temporary accounts. It gives an overview of the basics of the project, impact on different groups of users, and the plan for introducing the change on all wikis.

We will do our best to inform everyone impacted ahead of time. Information about temporary accounts will be available on Tech News, Diff, other blogs, different wikipages, banners, and other forms. At conferences, we or our colleagues on our behalf are inviting attendees to talk about this project. In addition, we are contacting affiliates running community support programs.

Subscribe to our new newsletter to stay close in touch. To learn more about the project, check out the FAQ and look at the latest updates. Talk to us on our project page or off-wiki. See you! NKohli (WMF) and SGrabarczuk (WMF) (talk) 03:27, 12 December 2024 (UTC)[reply]

Could a user chose to use IPs instead of temp. accounts if they wanted? CitationsFreak (talk) 03:50, 12 December 2024 (UTC)[reply]

No, it will not be possible. The only choice will be between the accounts: logged-out (temp account) or logged-in (regular account). SGrabarczuk (WMF) (talk) 14:08, 12 December 2024 (UTC)[reply]

Great! It'll give me a chance to have even more usernames! P. Sovjunk (talk) 23:44, 24 December 2024 (UTC)[reply]

Banning Proto-North Caucasian and Proto-Northeast Caucasian reconstructions

1. Proto-North Caucasian. In my opinion, there are currently no reconstructions of the Proto-North Caucasian simply on the grounds that there are no reconstructions of the Proto-Northeast Caucasian. Here I would prefer to end any discussion about this superfamily and delete the category itself in order to avoid reconstructions.

2. Proto-Northeast Caucasian. Just as it was written above, I believe that there are no reconstructions of the Proto-Northeast Caucasian. Whereas the so-called reconstructions of Starostin and Nikolaev are actually tentative pseudo-reconstructions. In addition, they do not give reconstructions of Proto-Northeast Caucasian forms anywhere. All their reconstructions in the database are Proto-North Caucasian, which are identical, apparently. Realizing this, Johanna Nichols uses the pound sign (#) for pseudo-constructions in her works.

This convention follows Williams 1989, who uses the asterisk for reconstructions based on regular sound correspondences and the # for "[p]seudo-reconstructions based on a quick inspection of a cognate set without working out sound correspondences".

It should be noted the recent case of a User:Qmbhiseykwos who began to add (in addition to pseudo-reconstructions by Nichols and "reconstructions" by Starostin and Nikolaev) "reconstructions" by the Dutch linguist P. Schrijver (2018, 2021, 2024), which should also be considered pseudo-reconstructions. For example, Reconstruction:Proto-Northeast Caucasian/rɔḳʷ(ə).

2.1. Appendix. Since the Wiktionary does not operate with the concept of tentative pseudo-reconstructions, all such "reconstructions" of the Proto-Northeast Caucasian should be indicated only in the appendix. For example, Appendix:Proto-Nakh-Daghestanian reconstructions

2.2. Renaming. I believe that it is necessary to rename the family to the (Proto-)Nakh-Daghestanian one. This must be done, since the name hint at the division of the North Caucasian and South Caucasian languages (Kartvelian), which is unacceptable.

2.2.1. Accordingly, it is necessary to rename the (Proto-)Northwest Caucasian to the (Proto-)Abkhazo-Circassian or (Proto-)Adyghe-Abkhaz, etc.

3. Proto-Daghestanian. It may be necessary to create a category for this family. Regarding this family, there are curious reconstructions by B. Giginejšvili (1977) and E. A. Bokarev (1981). But they don't seem to give any reconstructed forms. It is difficult to tell me anything here, since I have not studied these languages. I'll give you a comment by the American Caucasologist Alice C. Harris (2003: 180):

“It should be noted first that the phonetic reconstructions proposed by Nikolayev and Starostin (1994) and adopted by Alekseev (1985) are not widely accepted. For example, Nichols (1997) and Schulze (1997) show serious problems with the proposals in Nikolayev and Starostin (1994), and Giginejšvili (1977), Schulze (1988), Talibov (1980), provide reconstructions that are in various ways more rational”.

@Vahagn Petrosyan, კვარია ɶLerman (talk) 12:17, 15 December 2024 (UTC)[reply]

Not even Nakh-Daghestani sound correspondences are fully understood, to even consider enrolling Abkhaz-Adyghe here is insanity. Proto-North Caucasian was never _not_ controversial, so I have no clue why it was even added to Wiktionary in the first place. Nuke Proto-North Caucasian. On the other hand, banning Proto-Nakh-Daghestani reconstructions is perhaps too extreme. Imho, there's no great harm in having them exist even if it turns out they're wrong/imprecise. კვარია (talk) 14:39, 15 December 2024 (UTC)[reply]

Agree. Tollef Salemann (talk) 16:14, 15 December 2024 (UTC)[reply]

Nuke North Caucasian both as a group and as a reconstructed language, for Nakh-Daghestani/NE-Caucasian - I'm fine with agreeing not to create reconstructions, but I think having a code may be a good idea nonetheless. Just need to patroll them from time to time. Thadh (talk) 21:57, 15 December 2024 (UTC)[reply]

Nuke North Caucasian yes, already since it's unclear if even the family exists at all. Tentative cognates in NWC could be noted in NEC entries if we end up having/keeping them (same as we do with longer-standing Indo-Uralic, Altaic, etc. etymologies).
Keep Proto-Northeast Caucasian. NCED's (and Schrijver's) reconstructions may have many problems, but they are generally not "pseudo-reconstructions", and there's enough reason to think many of them are at least valid etymological groups. Any etymologies where Nikolayev & Starostin propose NWC reflexes are given in PNC form, but this is mainly because they set up very few changes from there to PNEC. The one I find on a lookover of their preface is *gg(w) > *ddɮ(w). In effect they admit the reconstruction is of PNEC in the first place, but it comes out so complex they end up able to derive (their reconstruction of) PNWC almost directly from it.
I do not follow the argument against "Northwest Caucasian" and "Northeast Caucasian", perfectly illustrative and mainstream names as far as I can tell. What is the "division [that] is unacceptable"? Treating Kartvelian / South Caucasian as an unrelated family? That if anything seems to be much closer to consensus than the question of North Caucasian, and I also do not see how this would be "hinted at" here.
"Daghestanian" as a distinct node is also not consensus, does not have distinct reconstructions for it either, and should not be added (seems IMO like an outdated typological unit against Nakh being more innovative). Probably we should not commit to any NEC grouping scheme beyond the unambiguous base units like Lezgic or Avar-Andic.

--Tropylium (talk) 13:58, 16 December 2024 (UTC)[reply]

I agree, let's ban Proto-North Caucasian. I have no opinion on the rest of the issues. I will only note that there are many weak scholars and outright charlatans dealing with the three Caucasian branches. All of their etymological works should be reviewed by our more intelligent editors. @Qmbhiseykwos, no mindless copying, please. Vahag (talk) 18:03, 16 December 2024 (UTC)[reply]

Perhaps we should have a section in the Appendix namespace for StarLingish. It would fit right in with Klingon, Na'vi and other constructed languages from fictional universes... Chuck Entz (talk) 06:06, 17 December 2024 (UTC)[reply]

I, too, am surprised about the presence of Proto-North Caucasian on Wiktionary. It must have slipped our eyes like the proverbial monkey walking down the street, few would be willing to see, and only been included as a consequence of Wikipedia or another reference not having been unequivocal about its unacceptedness. It has to be removed. Fay Freak (talk) 21:38, 16 December 2024 (UTC)[reply]

Since there is consensus to delete (Proto-)North Caucasian, I'm going to implement that. — SURJECTION ^{/ T / C / L /} 13:29, 17 December 2024 (UTC)[reply]

How many users does it take to make a decision? ɶLerman (talk) 15:45, 17 December 2024 (UTC)[reply]

Adverbs?

I know how much everyone loves a part-of-speech question, so here is another one.

"I was indoors."

"He is upstairs."

"They were outside."

"Look, your keys are there!"

Sometimes I feel that some dictionaries, including Wiktionary, are coy about giving examples of this nature, as if they are unsure of the part of speech of the complements. While these are not "traditional" adverbs, in that they do not modify anything adverbially in a traditional sense, and cannot be removed leaving a relevantly valid sentence, nevertheless they do answer adverbial wh-questions, and do not seem like adjectives. Some people call these "adverbial complements", I think. Are we happy to place these uses under "adverb"? Another possibility for some cases -- e.g. "outside" in these examples -- is "intransitive" preposition, in that "They were outside" implies "They were outside (somewhere/something)", but I'm not sure that this concept is fully mainstream. What do you think? Mihia (talk) 21:52, 17 December 2024 (UTC)[reply]

The 2021 vote against categorizing words as intransitive prepositions is still current, isn't it? "Adverb" seems acceptable to me.--Urszag (talk) 22:04, 17 December 2024 (UTC)[reply]

Gosh, I forgot entirely about that vote. Thanks for reminding me. Can we make any clear distinction between "I was indoors" being an adverb, and "Is Mr. Smith in?" being an adjective, as is currently listed at in — and, indeed, generally between my examples above and various other supposedly adjectival instances of other "short function words", where in some cases the philosophy, quite possibly perpetuated in part by myself, seems to be "if it's the complement of the be-verb then it's an adjective"? Mihia (talk) 22:54, 17 December 2024 (UTC)[reply]

Rethinking Middle Korean verb lemmatization

@AG202 @Solarkoid @Chom.kwoy @Tibidibi

As per Wiktionary:About Korean/Historical forms#Lemmatizations and this discussion in October 2020, we currently lemmatize morphophonemic forms for nouns but allomorphic forms for verbs (faithful to "而餘皆爲入聲之終也然ㄱㆁㄷㄴㅂㅁㅅㄹ八字可足用也").

But I really cannot help but think (as I and others have already stressed) that this is misguided and adds needless confusion.

Etymology sections already use the faithful phonemic form by convention. This creates at best alt hyperlinks/double hyperlinks and at worst redlinks even when we have an entry for the MK verb in question. This is especially problematic because, let's be real, 99% of people ever going to MK entries on here do so through a MoK ety section.
In the discussion linked above, it was said that "by convention" Korean lemmatizes actual inflected forms for verbs.
1. Since when? Even in Modern Korean, -다 (-da) is defined as a "dictionary citation form ending," sufficiently demonstrating that even within our morphological orthographic framework we are specifically citing "dictionary forms," not any real form in use.
2. This should and has carry/ied over to Middle Korean dictionaries. Consider four popular Middle Korean dictionaries—15세기 국어 활용형 사전, 우리말큰사전(옛말과 이두), 고어사전, and 한불자전. Consider now that the former two use the "morphophonemic spelling," and only the latter two use "faithful" spelling as we do now. Consider further that 고어사전 is a) from 1960 and b) also lemmatizes other forms such as the infinitive in some cases, with the express goal of being accessible for learners. We don't do this, we shouldn't do this, etc. 한불자전 is from 1880(!) and was written by a French missioner. Is this really the precedent for us to be following?

I would love to start adding more MK entries but there are a lot of gaps right now in infrastructure(?) that make this difficult. This is IMO the largest blocker; I've brought this up countless times on the Discord, but I'd love to reach an actual BP consensus. Any input appreciated. Lunabunn (talk) 03:48, 18 December 2024 (UTC)[reply]

Agreed. I've already expressed my opinion on this several times before, but I'd rather the forms show the original stem. This will be beneficial for the learners and those curious in the long run, and it will help majorly advance the cause to create an automated conjugation template, most importantly for the header.

Additionally, I myself also want us to reach a consensus fast, as there seems to be a confusion whether or not Modern Korean etymology header contains the actual, attested form of the verb (=용언) or the root. The only real caveat is, syllable-final ㅸ looks pretty ugly in syllables — 어드ᇦ다, 셔ᇕ다, ᄠᅥᇕ다, etc... would be some of the roots we have to add. Other than that, I think this is for the best.

- Solarkoid (talk) 18:02, 18 December 2024 (UTC)[reply]

Support: Matches what we do for other Koreanic lects. AG202 (talk) 18:05, 18 December 2024 (UTC)[reply]

Strong support with a suggestion. This exact issue has been on my mind for the past few years ever since editors, including myself, have begun adding significant numbers of MK verb/adjective entries. I thought I should speak on this matter as someone who has added numerous MK entries throughout the years. Thanks @Lunabunn for finally bringing this up.

I can now see consistency for consistency's sake is really the only thing going for the current "historically faithful (allophonic)" framework we have. While unapologetically uniform in its lemmatization rules, I agree that this leads to needless confusion and is at the expense of navigability. This is especially true for readers who likely access MK entries through MoK etymology sections (whom I assume are the overwhelming--I can't stress this enough--majority as you have mentioned). As for the "convention" from the previous discussion (I was there), I believe it referred not to dictionaries but the MK spelling convention, i.e., 표면형(表面形) (phonetic), as using the 기저형(基底形) (morphophonemic) would be anachronistic.

Speaking from personal experience, this has also been quite confusing and time-consuming for even editors who are familiar with MK orthography. For example:

Having to actively think about the proper "historically faithful" lemma when creating wikilinks (see how, in ᄉᆡᆷ, I had to link 기픈 to the "proper" 깁다, ATM a totally unrelated MoK entry, instead of the phonemic 깊다, at least the descendant MoK entry), which isn't intuitive at all as myself a native Korean speaker accustomed to MoK orthography. Although correctly linked according to current conventions, I would imagine this would be utterly baffling to a beginner.
Having to add "phonemically faithful" stubs to make up for this (e.g., see the MK "entry" for 및다, which would become the main MK entry under this proposal), unnecessarily adding workload to the already thin MK editor base.

All in all, it is clear that, in addition to the points Lunabunn has brought up, the positives--if any, really, other than doing it ostensibly for faithfulness' sake--of creating lemmas consistent with historical MK orthography do not outweigh its numerous negatives. Being anachronistic is not a good reason to continue this. I am now convinced that this is not the goal for which we should aim, especially Wiktionary being a word dictionary and not a spelling guide. This is also what modern monolingual dictionaries do, and this is what we should follow, which is more in-line with general Wiktionary policy. Moreover, we already don't do this for nouns, so the historicity argument is indeed moot.

However, I do not think an entirely phonemically faithful lemmatization scheme is desirable. As @Solarkoid specified, this would mean we would need to create entries such as 셔ᇕ다, which never appeared in actual MK or MoK texts (it's like an imaginary number) and is, well, yes, "ugly." Aside from looks, which shouldn't be something we consider in a dictionary, the general 표준어대사전 and the academic 15세기 국어 활용형 사전 both actually list the historically faithful 셟다 as their headword, while mentioning the phonemic 셔ᇕ- as a "form" appearing before vowels (which is not wrong). Conversely, both list the phonemically faithful 맞다 rather than the historically correct 맛다 as the headword. Monolingual dictionaries (and thus conventionally cite verbs/adjectives with the ending -다) seem to treat the lenes ㅸ and ㅿ (distinct phonemes in MK) as exceptions, to align, I am pretty sure, with how MoK treats them. In fact, 15세기 국어 활용형 사전 explicitly states this in its preface. Therefore, you simply are not going to find 셔ᇕ다, particularly as the headword with the ending -다, in any mainstream dictionary or work (except, I suppose, research papers in Korean, which have the liberty of, well, not being a dictionary for learners; they could use forms such as 셔ᇕ다 all they want).

I believe we should not implement an entirely phonemically faithful lemmatization scheme for, again, the sake of uniformity, as neither do popular modern monolingual dictionaries do this; we would be the first dictionary to do this, as it is also demonstrably not a "convention," as with currently using historically accurate forms. As such, I think creating entries such as 셔ᇕ다, spelled in Hangul, would also be a source of confusion for those expecting the same lemma coming from popular MK dictionaries (as well as deviating from MoK morphophonemic standards [which treat vestigial W [-w-] and z [-∅-] as allophones calling it "irregular conjugations"] on which they by principle base lemmatization [for a stage of the language when W and z were still phonemes] yet with which most people would be familiar). It's not that using 셔ᇕ다, besides aesthetics (lol), is inherently wrong (it's actually correct); however, 셟다, technically "wrong" (read: an inconsistent treatment), is how contemporary dictionaries have chosen to lemmatize in order to make it easier for modern readers unfamiliar with MK phonology. So it's really nobody's fault; we would just be following precedents--conventions as you will.

Yet, this is not perfect either, as some words would be lemmatized according to a different principle from the rest (and for something as superficial as their spelling at that). Nevertheless, I propose that we still likewise make exceptions for cases like ㅸ and ㅿ (the only exceptions I could find with a cursory review of dictionaries) for the reason explained above, in Hangul, of which we use the historically faithful spelling, but apply an entirely phonemically faithful (containing the root) scheme in Romanization. We can do this as Wiktionary is unique in that it always provides both Hangul and Romanization for MK.

So, for example, in -ᆸ다 where ㅂ represents an underlying /β/ such as in 셟다, we would use the Yale W. Consequently, we would get 셟다 (Yale: syelW-ta), with the historically faithful Hangul spelling as the headword and phonemically faithful spelling as the romanization. We would not be the first to do this, as some English language works on MK, which only use Yale Romanization, do exactly this (see Martin 1992 p. 57, who uses the phonemic stem syelW- to refer to this exact word). In -ᆸ다 where ㅂ represents an overt /p/ such as in 저줍다, we would obviously still use the Yale p, and such verbs/adjectives are not affected by this proposal. Hence, instead of using -ᇦ다 and -ᆸ다, -ᆸ다, as an exception, could have two possible romanizations, -Wta and -pta, depending on the word, but the Hangul spelling won't reflect this. The same goes for -ᆺ다 with -zta and -sta, instead of -ᇫ다 and -ᆺ다 (e.g., ᄃᆞᆺ다 and 벗다). In all other cases, both Hangul and romanization would represent the phonemic spelling as opposed to the historically faithful spelling that we use now, as per the proposal. This compromise would follow the convention found in monolingual dictionaries while still being consistent in providing readers with at least one phonemically faithful representation throughout all MK verbs/adjectives; there is no ambiguity, and the two different phonemes are distinguished.

This seems like a simple enough solution for a problem of an otherwise commonsense change IMO. The only downside I could think of is the need for manual input for transliteration, but MK already has these cases.

For those who might not fully understand or tl;dr, here is essentially what would happen:

Current entries with "historically faithful" spelling must be moved and could be converted to non-lemma entries as an inflected form. 맞다 becomes the lemma while 맛다 is reserved for an entry for "inflected" forms (if they are ever created, though [this should not be the focus]; Middle Korean -다 (-ta) had a more complicated usage compared to its modern descendant, so it wouldn't be a mostly empty, redundant entry. Nonetheless, I think the entry at Middle Korean -다 (-ta) will suffice). -다 (-ta) would serve two functions: form part of the dictionary citation form as per the modern convention (with phonemic spelling) and as part of inflected forms (e.g., declarative mood suffix) (with historical spelling). The second case would ever only be seen in conjugation templates, quotations, or, as mentioned above, non-lemma stubs. This entails that, for example, 맞다 (mac-ta), as the lemma, is the only form you would see in most parts of Wiktionary, while 맛다 (mas-ta), despite being historically accurate, would only be seen in the above mentioned places. For the ㅸ/ㅿ cases, if accepted, the romanization 셟다 (syelW-ta), the lemma version, would be the one seen in most parts of Wiktionary, whereas 셟다 (syelp-ta), with the same Hangul spelling and "accurate/literal/surface" transcription, would, again, only be seen in the above mentioned places (telling the reader that it represents a real (attested or possible) form/inflection with -다 (-ta), for disambiguation purposes). And, of course, for anything else, normal romanization rules apply (e.g., 셟고 (syelp-kwo)); only ㅸ/ㅿ headword forms get this special treatment.

Examples of current entries whose main entry would be affected (if we adopt an entirely phonemically faithful lemmatization scheme) are:

더럽다 (telepta) and ᄃᆞᆺ다 (tosta) would be moved to 더러ᇦ다 (teleWta) and ᄃᆞᇫ다 (tozta), respectively. However, if the above-mentioned exception is applied, these would stay at their original locations.
벗다 (pesta) would stay where it is, as its Hangul phonemic and historic spellings are the same; ᄌᆞᆽ다 (cocta) would stay where it is, but ᄌᆞᆺ다 (costa) is correct under the current convention.
여다 (yeta) and 우다 (wuta) would be moved to 열다 (yelta) and 울다 (wulta), respectively.
깃다 (kista, “to rejoice”) would be moved to 기ᇧ다 (kiskta), whereas 깃다 (kista, “to cough”) would be moved to 깇다 (kichta).
됴타 (tywotha) and 나타 (natha) would be moved to 둏다 (tywohta) and 낳다 (nahta), respectively.

-- 123catsank (talk) 01:14, 21 December 2024 (UTC)[reply]

Thank you for your thorough contribution. I am relieved to hear that my opinions on the matter are shared by other editors (no doubt more experienced than myself). Just one thing I would like to comment on:

In fact, 15세기 국어 활용형 사전 explicitly states this in its preface.

This seems misleading. Yes, that dictionary does indeed state in its preface that W and z stems would be listed with p and s respectively, but it also explicitly states that it is for convenience only, not indicative of an analysis of these stems as p and s stems under any circumstance ("... 이런 어간들의 기본형을 'ㅅ, ㅂ'으로 하겠다는 인식을 반영한 것은 아니고 편의상의 조치임을 밝혀 둔다."). Indeed, modern scholarly practice does not treat W and z stems as irregulars (although 표준국어대사전 does, that's just because it's ass), so we shouldn't either.

Now, if we choose to lemmatize these forms with p and s instead of W and z anyway for convenience, I do not necessarily object. I do, however, find myself wondering what convenience we gain by lemmatizing p and s if that means we have to manually specify the headword for romanization.

If we decide to lemmatize with p and s, I would also like to suggest that we use the W/z form in the hangul headword as well, not just its romanization. This would be aligned with how we don't include diacritics in entry titles but still show it in the headline.

(p.s. Do you edit on a different account and/or are you in the English Wiktionary Discord?) Lunabunn (talk) 03:14, 27 December 2024 (UTC)[reply]

I agree. Manual transliteration for the same "spelling" displayed would not be ideal. I would support Lunabunn's idea if we do decide to lemmatize with /p/ & /s/. AG202 (talk) 03:40, 27 December 2024 (UTC)[reply]

I support @123catsank's suggestion. Let's do it like this:

놉다 (nwop-ta) shall be 높다 (nwoph-ta), and ᄀᆞᆮ다 (kot-ta, “same”) shall be ᄀᆞᇀ다 (koth-ta).
깃다 (kis-ta, “to cough”) shall be lemmatized as 깇다 (kich-ta).
깃다 (kista, “to rejoice”) shall be lemmatized as 기ᇧ다 (kisk-ta), and 맛다 (mas-ta, “to take responsibility”) shall be lemmatized as 마ᇨ다 (mast-ta).
ᄌᆞᆺ다 (cos-ta, “frequent”) shall be lemmatized as ᄌᆞᆽ다 (coc-ta).
됴타 (tywotha) shall become 둏다 (tywoh-ta), and 여다 (yeta) shall become 열다 (yelta).
더럽다 (telep-ta) and ᄃᆞᆺ다 (tos-ta) shall stay.

-- Chom.kwoy (talk) 07:43, 16 January 2025 (UTC)[reply]

Beekes

Bluntly, Beekes is neo-Vennemann, except for Greek, and without even an actual attested language (/family) from which to derive the substrate.

That may even be too polite. I personally have thought Beekes dubious since my first encounter with him (his grammar of Avestan, in which he identifies numerous Avestan roots without Sanskrit analogues, almost none of which are actually without obvious Sanskrit analogues). That said, I am joined by Meissner, de Decker, Vine, Verhasselt, Beckwith, Nikolaev, Woodhouse, Olson, Miller, Simkin, Colvin, Meester, Garnier, Nardelli, and countless others in my reservations about Beekes as a source in the specific matter of Greek etymology/'Pre-Greek'. Even *within* Leiden, Beekes was considered peculiarly dogmatic, even by very close colleagues (e.g. Lubotsky, Kloekhorst, etc.) - indeed, even van Beek, his prize student, has published numerous papers over the last several years, especially after Beekes' death, rejecting Beekes' particular approach to Pre-Greek. Kroonen's public critique is also worth noting.

I have numerous criticisms of Beekes' approach to Pre-Greek, and am happy to systematically go through them if anyone should wish, but I hardly need to, since the critical scholarly literature is, at this point, voluminous. That said, if anyone is curious, do ask.

I am not going to go so far as to say that Beekes should not be cited at all on matters of etymology, but his views should *always* be tagged as his, as opposed to in the voice of Wiktionary, and preferably with a modifier that makes it clear that his views do not reflect the communis opinio ('Beekes, typically, assigns...' or similar), where applicable, which is frequently the case.

GatlingGunz (talk) 18:11, 18 December 2024 (UTC)[reply]

Beekes's etymologies are all over Wiktionary not because we find him particularly reliable, but because his accessible dictionary is the only one in English, so it was easily copy-pastable into Wiktionary. Frisk is in German, Chantraine is in French. Others' English etymologies are sprinkled across inaccessible articles.

Now good luck finding someone to go over the several thousand pages referencing his dictionary and reviewing his proposals one by one. The damage may be permanent. Vahag (talk) 18:24, 18 December 2024 (UTC)[reply]

By the way, apart from the pre-Greek stuff, Beekes is almost entirely a word for word translation of Frisk. —Caoimhin ceallach (talk) 23:04, 19 December 2024 (UTC)[reply]

Regular Wiktionary editors all have made pertinent observations and more or less openly concluded with remarks encouraging liberal dismissal of Beekes’ etymologies.

It would be more frank to mark etymologies as unknown or uncertain or otherwise speculated upon, while pushing Beekes’ claims of to his mere reference, not worthy of taking space in serious etymology, since collectively they have to be regarded as nuisant.

Of course, the silver bullet for anyone in the know about the particular philology is to cite an author or more to positively provide differing opinion. You don’t “need to” but it is a gain for all of humanity and your personal scholarly achievement. There is a mismatch between those who have an intimate familiarity with certain comprehensive university libraries and other historically interested people who attempt to have conceptions of the past, if only because one works on another language touching upon Greek. Our open Hellenic lexicography is seriously underdeveloped, and part of it is uncritical thinking, burdened by Beekes’ dogmaticism and lifeless superficiality in place of inviting examples of how language science is actually done. Fay Freak (talk) 16:19, 19 December 2024 (UTC)[reply]

I heartily agree that Beekes' proposals should be differentiated from the 'voice of wiktionary' or 'the fact of the matter,' in some way that is also succinct and clear for people who are (to the point) unused to doing or engaging with etymology/philology themselves, though I think any wording alluding to 'typical' positions or proposals should be charitable. It's obvious to anyone with any exposure to the field that Beekes' opinions would be his alone; the issue is the lack of alternative proposals in entries (something that has to be done by hand) and that subset of people that don't understand that, who would see a reference and an etymology and say 'OK then, checks out,' and go no further.

I also agree that actually going through and addressing every single claim is a monumental task; however, I think there is little for it other than for people with the relevant background to simply chip away at it, little by little.

I (for one) would not support some modification that removes etymologies attributed to Beekes wholesale, or that simply marks them all as contravening common opinion or as 'typical of Beekes' without further comment (the obvious implication being with such wording that Beekes is typically dubious.) I think wholesale marking of opinions as typical or against common opinion would be committing more of the same sin that has saddled us with thousands of these entries - a quick convenience but one that is poor practice, even if it is the case (in my opinion or yours) that the wording would end up being accurate in the majority of cases. Instead, criticism or contrasting opinion and claims should be specific and given on a case-by-case basis, just as it would be if you were to publish a paper on the relevant etymologies and reconstructions - ie. you ought to provide supporting information as to why (via references is fine, being as it is that wiktionary isn't the place for primary research.) This is to say, if you are to add such notes on an entry, I think you'd do best to expand on your note with alternative views and why both exist (at least implicitly via references) instead of leaving it at 'this proposal is typical of XYZ.' This can only be done one-by-one or in smaller batches.

I'd also observe that this topic/issue reflects broader issues here on wiktionary: there are many etymologies that are unjustified and/or unsourced (but clearly copied almost verbatim from some source), or which have been mass-added from some random source, or which use some good quality source of yesterdecade that is nevertheless outdated and wrong, either on the particular lemma in question or in a more pervasive manner (eg. modern reconstructions have moved on in some systematic respect), or which have done one of these, someone has later added information to it (agreeing or disagreeing), maybe this has happened 2-3 times, but the referencing work hasn't been done to distinguish which claims are associated with which reference. Sometimes etymology sections cite differing reconstruction schemes in the same paragraph without giving any indication that they're different/why.

There are likewise many cases of a lack of supporting argumentation or reasoning for why two proposed etymologies may exist, often they are just mentioned, and when editors come to justify one etymology over another, the best you often get is some comment like 'X is preferred' with no justification or reasoning (which suggests to a reader that the editor doesn't understand why and is taking it on authority, or that they are biased.)

All of that is bad practice, though to differing degrees - someone may mass-add from either a good source or a bad source and as long as they include adequate referencing I will say in their defense that at least they have tried to improve coverage in some measure. If someone obfuscates the referencing later by adding additional claims it is hardly their fault. I suspect many will mass-add from some source, let's just take the Geiriadur Prifysgol Cymru as a hypothetical, knowing some subset of given etymologies are wrong/outdated with the intent of updating them with further references and information later, only never to get to it. If someone has used an egregiously awful or biased source, it is best to approach them on it via talk pages, and some cases could warrant mass spoilers/notes or even removal, but I am not sure Beekes or any number of other pet poor-historical-linguistics-opinions scholars (I certainly have a personal list...) fit in that category, even as it is that I may strongly disagree with their associated proposals.

Some languages and language families are a lot better/worse for this stuff than others (I notice a lot of etymologies that were state-of-the-art at one time but that are now out-of-date on Japanese pages particularly, for example.)

In all these cases, my intent is not to flame past contributors' efforts, instead, I think simply chipping away at the issue on a case-by-case, fixing references and providing new ones, and occasionally reaching out via talk pages, is the only realistic solution. Herthaz (talk) 22:08, 7 January 2025 (UTC)[reply]

Being skeptical about Beekes' tendency for pre-Greek origins is all fine, but the idea is not to simply remove this without any understanding of what Beekes is saying. Exarchus (talk) 13:48, 19 January 2025 (UTC)[reply]

A prerequisite for commenting is being able to distinguish between X doesn't understand what Beekes is saying and X is dismissing what Beekes is saying. That you can read through a thread in which I demonstrate an exhaustive command of the literature on Beekes' work alone, including verifiable personal observations on his work on a language that is not Greek, and reply with this kind of contextually deeply stupid and hostile nonsense, marks your deficit of competence, not mine.

You were always, of course, welcome to ask me on my talk page as to why I think or have written <whatever>. But clearly you preferred hostility and condescension, which is both ironic and, as I have remarked elsewhere, very stupid, in context.

Grow up. GatlingGunz (talk) 14:13, 20 January 2025 (UTC)[reply]

I'm not sure you actually read my edit comment when you wrote the above. Your comment on Beekes was: "rm Beekes nonsense ("this is Pre-Greek because there's a nasal, even though the nasal is attested in Latin, Germanic, and Sanskrit as well")"

My point was that the nasal would have disappeared in Greek, so if there's an -ν- it needs to have another explanation than the Latin, Germanic and Sanskrit terms. What that explanation should be, is another discussion.

I think you had a somewhat similar misunderstanding here (the wording was indeed not very clear). Exarchus (talk) 23:34, 20 January 2025 (UTC)[reply]

One case where I'm a bit baffled by Beekes is his statement at θύω (thúō, “to rush in, storm, rage”) that it is unclear what "to shake" (धूनोति (dhūnoti)) has to do with the Greek meaning. The semantic connection doesn't seem far-fetched at all to me (storms can 'shake' things, right?). I doubt sources can be found that follow Beekes in this. Exarchus (talk) 11:20, 22 January 2025 (UTC)[reply]

French Wiktionary Word of the Year

Dear colleagues,

In the French Wiktionary, we have experimented this year our first top 10 words of the year!

It started on November 15th with a call to suggest words, without any specific methodology in mind, like an analysis of statistics of reading or anything. A dozen of people suggested about 50 words. Then in December, we had a vote with 30 participants and a simple result as a list. It wasn't perfect but it was not that complicated to do.

To my knowledge, it wasn't experimented yet in English Wiktionary, is it? If you want to try next year, I suggest you create an on-going draft to keep track of some new words during the year, it would make the selection easier. Also, having a meeting in person with seven Wiktionarian in December helped a lot. Finally, I am not hoping any echos in the press this year, but we may work to build something for 2025 and 2026, and I think we could be stronger together, if several editions of the Wiktionary project are organizing a similar initiative in parallel. So I invite you to try it too! Cheers Noé 12:33, 19 December 2024 (UTC)[reply]

@Noé: there is currently an ongoing vote on whether to have a Word of the Year, and what that word should be. Currently, it looks like the vote will fail, as it failed last year. There doesn't seem to be enough support for the proposal at the English Wiktionary. — Sgconlaw (talk) 12:56, 19 December 2024 (UTC)[reply]

Thanks for pointing this discussion, I missed it in November Beer parlour, and it was not called back in December. It is interesting to read the various opinions on this process and goals. I did not asked for a collective validation at first, I just started it and I realize now that it should have be nice to open a discussion first. Well, sometimes, it is hard to have pros and cons on something completely new, without having evaluation what words may be in the final list. Having two weeks to collect entries suggested by anyone and a simple vote with top 5 was, I think, was easier to manage that your way of doing it. I am not sure. Well, if someone want to discuss this idea next year, in October maybe, I would be glad to help with more feedback on our experimentation and media responses

Noé 13:20, 19 December 2024 (UTC)[reply]

WT:TRANS

I don't understand what kind of situations this sentence refers to: "If there are multiple paraphrases in the target language for an English term but no direct translations, one such paraphrase may be provided after {{no equivalent translation}}." Template:no equivalent translation/documentation isn't helpful either. What is "potentially unidiomatic / sum-of-parts descriptive" supposed to mean? I want to know when this template should be used and when it shouldn't.

More generally, I am often in doubt about what to do when the most direct translation (with the same part of speech) isn't actually the best translation. I know I'm not the only one with this issue, because such tricky translations are currently most often left blank. Can we clarify the relevant sections? —Caoimhin ceallach (talk) 10:08, 20 December 2024 (UTC)[reply]

Dobrujan Tatar language name

Hi there, the Dobrujan Tatar language doesn't have a separate language code. Therefore [crh-ro] is used in Wikis. But in Wiktionary there is only [crh] Crimean Tatar, and when I add a word in Dobruja Tatar it appears in Crimean Tatar categories. This is a problem, because the languages use different orthography and are not actually not so connected how it seems. Also there is the Category:Dobrujan Crimean Tatar, but this naming is wrong, it's Dobrujan Tatar. Would it be possible to use the code [crh-ro] Dobrujan Tatar, with Dobrujan Tatar categories? Zolgoyo (talk) 10:34, 20 December 2024 (UTC)[reply]

Hello! I personally do not think we need a separate name space for Dobruja Tatar specifically, for the following reasons:

1. It is a dialect of Crimean Tatar (so assumes Ethnologue and Glottolog[3].)

2. From what I have seen, Wiktionary does not show dialects in their own name space, to give examples on Turkic languages we have:

Yenisei Kyrgyz (Old Turkic,) uses different letters altogether and would be illegible to someone familiar with the Orkhon script. Orthographical differences is not a big deal for inclusion.
Viryal and Anatri Chuvash represented as just 'Chuvash' (except in etymologies)
Various dialects of Turkish and Azerbaijani, all shown with a lb tag.
Kumandy, Kuu-Kizhi and Kyzyl dialects (which can be quite divergent at times) of Northern Altai are under the same name space.

and so on... Dobrujan Tatar would be best to be shown just by a lb tag, so like the rest.

3. There seems to be only one published dictionary[4] for this dialect ('Dobruca Kırım Tatar Ağzı Sözlüğü'), and the vocabulary is clearly reminiscent of the main Crimean Tatar one.

4. If Wiktionary added Dobrujan Tatar, then why shouldn't it add Nogai Tatar also? Spoken 10 km. north of the Dobrujan Tatar speakers with a far divergent lexicon?

However, this is my opinion. We really don't need this name space.

AmaçsızBirKişi (talk) 00:05, 22 December 2024 (UTC)[reply]

We have Nogai as a distinct code, CAT:Nogai lemmas. And we do often have a separate code for varieties traditionally considered 'dialects', if this is found necessary to effectively document the variety. I can't speak for this specific case though. Thadh (talk) 01:03, 22 December 2024 (UTC)[reply]

These Dobujan Tatar words are from a book, which is probably not so good for etymology. Not bad book, but be carefull. Check it out on Tomriga in the references to Taner Murat. He writes "Tomri - queen of Mesagetes, also known under Persian form Tahm-Rayis, Greek Tomiris.... from her name came the name for Dobruja province, Tomriga". The guy is obviously a Turkic nationalist from the parallel world where Massagetes are speaking Tatar and establish Dobruja. Tollef Salemann (talk) 01:39, 22 December 2024 (UTC)[reply]

It seems like that. The dictionary there is not quite academic I figured.

Moreover, it seems like Dobrujan Tatar is just a descendant of a (relatively) larger 'Romanian Tatar' family[5]. This article also says how similar the Dobrujan Tatar dialect is to Crimean Tatar, saying how children use Crimean Tatar primers/reading books in schools.

There's also a poem, in Dobrujan Tatar, that's the extent I could find about this language[6].

AmaçsızBirKişi (talk) 09:32, 22 December 2024 (UTC)[reply]

Note that the Nogai Tatar you speak about are probably quite different from Nogai lemmas listed in the category which Thadh speaks about. The Nogais of Dobruja are related to Nogais of Caucasus, but they have splitted up in 1850-60s because of the war with Russia. I mean, they have splitted even before it, but had some contacts until 1850-s. So their language are probably closer to the Crimean. Tollef Salemann (talk) 02:00, 22 December 2024 (UTC)[reply]

Limited prior discussion: Wiktionary:Grease_pit/2023/November#Add_Category:Dobrujan_Tatar_language_to_the_relevant_language-related_modules_if_appropriate. Unfortunately, other than your own writings on other wikis, I am having a hard time finding evidence that Dobrujan Tatar is a separate language. I am trying to think if we have any editors who might be able to find relevant resources in other languages (Romanian?). - -sche (discuss) 06:12, 23 December 2024 (UTC)[reply]

WT:TENNIS

It seems curious that tennis player, the archetype of the "Tennis player test", supposedly a test of idiomaticity, is itself listed not on its own merits, but only as a "translation hub". Does this make sense? Mihia (talk) 18:38, 22 December 2024 (UTC)[reply]

Is WT:Idiom supposed to be a policy page? DCDuring (talk) 23:38, 22 December 2024 (UTC)[reply]

No. Svārtava (t ɕ) 09:11, 23 December 2024 (UTC)[reply]

It does, to be fair, say at the top of the page that "Tests can be used as guides during RFD, but they are not hard/fast rules", but, even so, one would expect the guidelines to at least apply to the examples given. Mihia (talk) 09:47, 23 December 2024 (UTC)[reply]

The closing statement from the 2016 RFD is quite interesting. I wonder if there's more to the history of the ‘tennis player test,’ because this alone makes it pretty questionable. Seems THUB was the keep reason all along?

RFD kept as no consensus for deletion: ≥ 12 keep votes. Note that translation target was used often as the keeping rationale, while the "tennis player test" was rejected by multiple participants. Polomo47 (talk) 00:08, 23 December 2024 (UTC)[reply]

The text was added by @Catonif, though I'm not sure why. I personally strongly support WT:TENNIS for its usefulness. AG202 (talk) 06:18, 23 December 2024 (UTC)[reply]

The usefulness is limited by our incomplete coverage of names of professions. Where are emergency services dispatcher,^[7] franchise opening trainer^[8] and heavy equipment operator^[9]? --Lambiam 22:17, 23 December 2024 (UTC)[reply]

Whoops, I wasn't aware of the policy when I did that, given the policy existence it would need be removed. But IMO the policy itself sounds pretty dubious, by its wording it would also allow professions such as turtle feeder or cookie taster. I would personally ditch the policy and keep tennis player for THUB. The test's paragraph itself claims its partial redundancy to THUB anyways. Catonif (talk) 08:51, 23 December 2024 (UTC)[reply]

@Catonif: For the record, unvoted tests given at WT:IDIOM are not binding policy; only WT:COALMINE is since that is voted upon. Svārtava (t ɕ) 09:11, 23 December 2024 (UTC)[reply]

It wouldn't hurt to make the distinction clear, preferably by having WT:COALMINE on a separate page, ONLY including it by reference, and placing a banner at the top of the WT:IDIOM page. DCDuring (talk) 13:30, 23 December 2024 (UTC)[reply]

Noting that "COALMINE" is mentioned individually in the CFI. So is the "fried egg" test, which is also part of "WT:IDIOM", implying that that one is policy too, I suppose? I haven't checked all the others. Mihia (talk) 15:36, 23 December 2024 (UTC)[reply]

I’m not that favorable to the test either. If almost all terms that qualify for it also qualify for THUB, then all it does is prevent us from adding (This sense is a translation hub). Is that desirable? I don’t think so, since the main reason for keeping them appears to be translation. Polomo47 (talk) 15:08, 23 December 2024 (UTC)[reply]

I could be wrong, but I think the Tennis player test predates a consensus on keeping translation hubs. So it may have been a good workaround when it was first proposed, but it seems redundant now. Andrew Sheedy (talk) 16:06, 23 December 2024 (UTC)[reply]

Yeah, exactly. Polomo47 (talk) 17:12, 23 December 2024 (UTC)[reply]

@Catonif, @Andrew Sheedy, @Polomo47: I forgot to mention this here, but WT:IDIOM applies to other languages as well, which can't survive a THUB test. We should really look at the existence of these tests as applying to languages as a whole, rather than just English. AG202 (talk) 08:37, 16 February 2025 (UTC)[reply]

Would there be grounds for keeping SoP profession names in other languages that don't apply in English? Mihia (talk) 11:22, 16 February 2025 (UTC)[reply]

@Mihia: Yes for their usefulness and commonness. I can't see why we don't want to have entries like Spanish actor de voz or French guide touristique; they're common terms for professions that people are likely to encounter in the wild, which I assume is why WT:TENNIS exists in the first place. AG202 (talk) 15:03, 16 February 2025 (UTC)[reply]

I don't totally disagree with you, but qualities of "usefulness" and "commonness" could apply to all manner of SoP phrases, in any language, by no means restricted to non-English professions. It seems to me that allowing terms on these grounds would be a different discussion. Mihia (talk) 15:34, 16 February 2025 (UTC)[reply]

Yes, but now we’re going from allowing those words to possibly disallowing them. Honestly the main thing I’m frustrated with is that this change came from a solely English perspective, not considering the ramifications that it’d have on other languages. That’s partially my own fault for not bringing it up earlier, but it does follow a frustrating trend of policy changes being enacted based on the English editor experience. AG202 (talk) 15:49, 16 February 2025 (UTC)[reply]

I still don't understand why other languages are different from English in this respect. Mihia (talk) 15:51, 16 February 2025 (UTC)[reply]

Other languages do not fall back on THUB, which was one of the main rationales, if not the main rationale for making this change. Ex: as stated: "it seems redundant now", due to quote, "also qualify[ing] for THUB". Not one person openly considered how this would affect other languages at all. AG202 (talk) 16:03, 16 February 2025 (UTC)[reply]

I don't agree that THUB should be used as a "fall back" or backdoor way of keeping SoP entries, either in English or any other language. It exists for one specific purpose, not as a way of getting around SoP rules so as to keep "useful" and "common" entries, whether profession names or anything else.

I should say, though, generally, I do agree with you that we should look at having more leeway to keep "useful" SoP entries, for want of a better way of putting it. I just don't think that allowing limitless totally transparent SoP profession names is the way to approach it. Mihia (talk) 16:38, 16 February 2025 (UTC)[reply]

The tests listed on that page are already quite English-centered, and that page isn’t policy either. My impression is that the test was created with English entries in mind, to begin with, and I don’t intuitively find utility in profession names from other languages — I didn’t make it clear in the replies I wrote earlier, but the situation in other languages was definitely on my mind. And unless someone can really back up the usefulness of such non-English entries, I’m going to vote support on the upcoming vote. Polomo47 (talk) 22:39, 16 February 2025 (UTC)[reply]

Yes, it isn't policy, but by default those tests on WT:IDIOM apply to other languages as well, and have been used in RFDs for other languages. And FWIW the vote has already started, so you can go ahead and vote if you'd like: Wiktionary:Votes/2025-02/Deletion of "Tennis player test". AG202 (talk) 15:53, 17 February 2025 (UTC)[reply]

I created a vote at Wiktionary:Votes/2025-02/Deletion_of_"Tennis_player_test" to resolve what we want to do with this. Mihia (talk) 19:32, 9 February 2025 (UTC)[reply]

Romance languages: reflexive verb forms and enclisis

This discussion is an offshoot from this RFM, which discusses reflexive verbs in Portuguese specifically. Said RFM in turn derives from this RFD discussion.

Currently, some Romance languages have a specific way of making entries for reflexive verbs; others do not have a pattern at all. Per @Benwing2, Spanish and Portuguese currently follow this scheme:

If a verb is only used reflexively
- It is listed at the page with an enclitical -se. See Portuguese automedicar-se and Spanish automedicarse
- The page without -se lists, for Spanish, that the word is only used with a proclitic pronoun; see Spanish automedicar. For Portuguese, the page without se usually does not exist.
If a verb has reflexive senses in addition to non-reflexive ones
- The reflexive verb is listed as a sense under the page without -se. See Portuguese suicidar, despedir; Spanish suicidar, despedir.
- The page with -se lists a stub reflexive of, or lists the entry as a combined form. See Portuguese suicidar-se, despedir-se; Spanish suicidarse, despedirse.

Some Portuguese editors complained about this arrangement a while ago. We proposed a new scheme in an RFM (linked above), but some editors felt the need for consistency with other Romance languages. Thus, this is a proposal on changing/standardizing how it works for most other Romance languages — the use of unhyphenated enclisis (despedirse vs. despedir-se) changes things slightly. For languages that do use a hyphen in their enclises, such as Catalan, a proposal closer to the one for Portuguese is more adequate.

The proposal, for languages with unhyphenated encliticals:

If entries exist for both the forms with -se and without it, they will get merged under the page without -se. The entry at the page with -se will list infinitive of verb combined with se.
If an entry exists only at the page with -se, it will be moved to the page without -se. In its place, the page will list infinitive of verb combined with se.

A brief list of applicable reasons. For more detail, please read the Portuguese RFM and RFD discussions (which also includes some unapplicable arguments).

It is inconsistent and confusing to list reflexive-only verbs at the page with -se, but list verbs with reflexive senses only at the page without -se.
Listing reflexive-only verbs at their enclitical forms implicitly prescribes the use of enclisis, but proclisis is just as valid and may even be used more often.
- By having the entry under the page with no -se, we could format its headword to include both forms. Like, automedicar-se or se automedicar
Among dictionaries, there is no consensus on what URL reflexive verbs get put under. The only consensus is that the headword includes the enclitical pronoun, which we can do regardless per the above.

Ping, for Italian: @Samubert96, Federico Falleti, Emanuele6, Catonif, Imetsia

Ping, for Spanish: @Ultimateria, AG202, Ser be etre shi, JeffDoozan, Orrigarmi, Brawlio, Jberkel

Ping remaining members from the Galician-Portuguese usergroup: @Davi6596, Faviola7, JnpoJuwan, MedK1, Ortsacordep, Rodrigo5260, Stríðsdrengur, Trooper57

Please ping other editors you know who may be interested in the discussion.

Polomo47 (talk) 00:01, 23 December 2024 (UTC)[reply]

CC: @Benwing2: For Spanish, honestly, I'd match what the RAE does: if the verb is only used pronominally/reflexively, then they put the lemma at the version with -se. Ex: RAE entry for automedicarse. I really don't like the idea of putting "se" in the headword at the lemma without "se", especially when the page with "se" already exists. That seems to add a much higher level of inconsistency.

I also don't like the idea of moving entries like automedicarse to automedicar, as a learner familiar with Spanish is going to search for the latter one only to be redirected to the former, as the verb is only used pronominally. What we have now isn't my favorite way to go about things (I'd have the reflexive usages at the entries with -se, regardless of if the non-reflexive version exists), but it's better than having everything at the bare infinitive. There's also precedent, at least with Spanish. AG202 (talk) 03:00, 23 December 2024 (UTC)[reply]

I also don't like the idea of moving entries like automedicarse to automedicar, as a learner familiar with Spanish is going to search for the latter one only to be redirected to the former. How so? My proposal is that we move the definitions over precisely to solve this type of issue.

Also, while the RAE categorizes URLs in that way, the RGL does not, and many Portuguese dictionaries don’t either. I don’t know about Italian, though. Polomo47 (talk) 03:56, 23 December 2024 (UTC)[reply]

@Polomo47: Oops, I meant search for the former and be directed to the latter, sorry. Learners are more likely to search for the forms with "se" is what I wanted to say. AG202 (talk) 06:13, 23 December 2024 (UTC)[reply]

Hm, I’m not confident that’s how people usually search for words. I would expect native speakers (even if we don’t particularly appeal to them) as well as more advanced learners to search without the enclitical. That’s what I do, at least — do others google differently? Polomo47 (talk) 15:04, 23 December 2024 (UTC)[reply]

At least for Spanish, having been studying it since 2013, I've almost always seen learners search with the "se" form once they're aware of it, as that'll give them more direct hits, especially from learning websites. In pretty much every learner's text as well, they'll be listed as the "se" form in any vocabulary section. I personally still search that way as well. For (notably Brazilian) Portuguese, I'd expect the trends to be different, since the se forms aren't used as much. AG202 (talk) 17:38, 23 December 2024 (UTC)[reply]

I actually find this to be very persuasive in Spanish's case. Having had some mild interactions with it over the years, it's very true that Spanish speakers just love their "se" forms. — comparatively, "lo" forms go essentially unused by Portuguese speakers around me.

While I'm really starting to think that 'it tracks' that reflexive clitics in Spanish are seen as more integral to the verb — and not necessarily because of the spelling — I can't help but wonder about other forms.

I hope I'm not bringing this up too early when we haven't even truly talked at length about the initial proposal, but do we really need pages for all the forms? This likely enters CFI territory, but I'd like to draw some attention to the non-reflexive forms. In Spanish medicar and mostrar, there's an entire table dedicated to combined forms, and yet I see several that might be missing?

Admittedly, I don't know a lot about Spanish, but one such form would be "medícote" — corresponding to "te medico" in proclisis — or something like "mostrárlela". Perhaps Spanish's rules forbid these pairings (tho I did get a hit for the latter), but Standard Galician's doesn't. — you'll find many hits for, say, quérote and mostrarlla online. There's even a TV program named Dígocho Eu.

I guess we could include all of these combinations (every single tense of many many verbs with nearly every single clitic tacked on afterward — me, te, che, vos, os, o, ma, mo, ta, to, cho, cha, lle, lles, nos, lla, llo, possibly a couple more), but I can't help but think it'd be a more productive use of our time to instead draw a line somewhere.. I'm getting some serious COALMINE conversation flashbacks right now. MedK1 (talk) 19:13, 23 December 2024 (UTC)[reply]

@AG202 just in case the mobile reply button didn't actually ping you. MedK1 (talk) 21:28, 24 December 2024 (UTC)[reply]

Sorry for the late reply, but yes, the "se" forms are integral to the verbs. However, forms like "medícote" are no longer standard usage in Spanish. Pronouns can only be attached afterwards to the gerund, infinitive, and imperative forms. AG202 (talk) 03:30, 29 December 2024 (UTC)[reply]

Thanks for the CC.

It occurs to me there are various possibilities for the way reflexives are handled, and this may have some consideration on the ultimate outcome (please expand with other languages):

Reflexives are always enclitic, and written as part of the verb. Examples: East Slavic (Russian, Ukrainian, Belarusian, ...) and North Germanic (Icelandic, Swedish, Danish, Norwegian, Faroese, ...).
Reflexives are normally proclitic, including in particular on the infinitive, and written as a separate word. Examples: German, French, apparently also Romanian. (Clarifications: German reflexives sometimes come after the finite verb, particularly when the verb is in V2 constructions and in imperatives. French reflexives come after imperatives and are joined by a hyphen, and when coming before the verb are joined with an apostrophe if the verb is vowel-initial.)
Reflexives are sometimes proclitic, sometimes enclitic. AFAIK, all such languages have the reflexive pronoun enclitic on the infinitive.
1. When enclitic on the infinitive, the verb + reflexive is written as a single word. Examples: Spanish, Italian, Galician in standard spelling.
2. When enclitic on the infinitive, the reflexive is attached to the verb with a hyphen. Examples: Portuguese, Galician in reintegrationist spelling.
3. When enclitic on the infinitive, the reflexive is written as a separate word. Examples: West Slavic languages (Czech, Polish, ...), South Slavic languages (Bulgarian, Macedonian, ...).

I mention this because there is a lot of inconsistency in how reflexive verbs are lemmatized, and it may partially correlate with the way the reflexive infinitive is written.

Benwing2 (talk) 03:38, 23 December 2024 (UTC)[reply]

AFAIK, all such languages have the reflexive pronoun enclitic on the infinitive. Is that really how it works? In the case of Portuguese, from what I gather automedicar-se is no more valid an infinitive than se automedicar — the former is just the preferred form used by dictionaries because (1) it’s a single word (2) it’s less predictable than proclisis (3) it’s something people generally like to prescribe, lol. I’ve yet to find another explanation for the preference for enclisis, but I have no reason to believe it’s because automedicar-se is the only possibility. Polomo47 (talk) 04:06, 23 December 2024 (UTC)[reply]

Sorry, I meant to clarify that "all such languages have the reflexive pronoun enclitic on the infinitive" refers to how dictionaries express the forms. I know that Brazilian Portuguese, for example, leans towards proclisis in all cases and thus says vou me deitar, not #vou deitar-me. West Slavic languages similarly are very flexible in word order and sometimes have the reflexive pronoun before the infinitive and sometimes after, but all dictionaries I've seen lemmatize the reflexive pronoun after. In contrast, French dictionaries always list reflexive infinitives with the reflexive pronoun before, because it never comes after in actual usage. Benwing2 (talk) 05:15, 23 December 2024 (UTC)[reply]

I mentioned above that Galician has a hundred forms (bare minimum) that Spanish completely lacks coverage for at the moment (potentially because they don't exist over there? I wouldn't know); you can mix and match any tense with any clitic for the most part.

It might be worth noting that for Portuguese, these countless forms exist as well, and often with more patterns — Galician roughly shares the European Portuguese rules prioritizing enclises, while for Portuguese, we have Brazilian Portuguese's proclises preferences to consider as well.

Since for Portuguese, they're framed as 'regional preferences' rather than the rules actively changing, you get far more possibilities than you would normally, all of them being SOP — you have either a separate word before, a separated suffix or a separated infix according to tense.

With Brazil liking proclises, the lemmatized enclitical can end up being quite rare in comparison to the proclitical ones. "Precisamos parar de automedicar-nos" even sounds weird in comparison to nos automedicar to me. You can have similar sentences for -te and others too. Do note that these are all considered impersonal infinitives (i.e. the ones that get lemmatized in Wiktionary).

For these and many, many other reasons, it stands to reason that one shouldn't include any of those clitical forms as separate pages for Portuguese at least. This doesn't necessarily mean anything for Spanish; more and more I'm thinking their systems are different beasts altogether and as such should be treated differently..

PS: Priberam at least does express proclitical forms for verbs. MedK1 (talk) 00:06, 24 December 2024 (UTC)[reply]

I'll also add that PT-PT dictionaries seem to prefer the -se forms: entry for "arrepender-se" at O Dicionário da Língua Portuguesa & entry for "arrepender-se" at Infopédia.pt. AG202 (talk) 03:33, 29 December 2024 (UTC)[reply]

Yiddish in Latin characters

Please lift the ban on including Yiddish terms attested in Latin characters. I know that writing it with other scripts is uncommon (except to assist beginners), but there are a few lengthy Yiddish works written mostly or entirely in the Latin script. Examples:

https://books.google.com/books?id=o_P6DQAAQBAJ

https://books.google.com/books?id=nrCYDwAAQBAJ

Probably even more. And in Cyrillic as well I guess? Also, I remember to own myself "Di Avantures fun Alis in Vunderland", having both Hebrew and Latin script in the same book. Tollef Salemann (talk) 02:20, 23 December 2024 (UTC)[reply]

The Brill article confirms that Cyrillic is another script, yes, but it is the rarest of the three (and the only other script in which Yiddish is attested, as far as I'm aware). I welcome lifting the prohibition on that as well. Anyway, cheers for suggesting another source. (((Romanophile))) ♞ (contributions) 04:06, 23 December 2024 (UTC)[reply]

Does the Wiktionary's transliteration of Yiddish terms match the spelling used in these books? I tried searching for some random words from the "Di Avantures fun Alis in Vunderland" preview sample and successfully found the relevant Yiddish entries on Wiktionary. Is this not good enough for the end users? --Ssvb (talk) 05:16, 23 December 2024 (UTC)[reply]

Usually they do match, though historically Romanizations of Yiddish have varied in form and consistency. In any case, utility is not the motive here. We already have Romanization entries for Chinese, Japanese, and Serbo-Croatian. I doubt that a proposal to delete them would succeed on grounds that there are already transliterations in the main entries, thereby making the Romanization entries 'redundant'. (((Romanophile))) ♞ (contributions) 06:31, 23 December 2024 (UTC)[reply]

I wouldn't advocate deleting them. Just creating additional Latin script entries and keeping them in sync with the Hebrew script entries is an extra maintenance effort. If contributors are ready to spend their time and efforts on that, then it's fine. If attestable Latin spelling of some terms encountered in real books differs from the transliteration of their corresponding Hebrew script entries, then these can be probably prioritized. --Ssvb (talk) 09:11, 23 December 2024 (UTC)[reply]

Yes. This has been requested at least twice in the past year, once by me at Wiktionary:Beer_parlour/2024/April#Latin-script_Yiddish and once after that by someone else somewhere else... but although there seems to be support for at least allowing Latin-script entries to point to the Hebrew-script entries, like is done for Arabic-script Afrikaans (pointing to Latin-script Afrikaans) (or, in a different vein, for Latin-script Gothic), neither I nor anyone else has gotten around to it yet. Well: unless there are objections, I will finally add "Latn" as another script to yi in, say, a week (ping me if I forget), with the understanding that Hebrew script will continue to be lemmatized at least in most cases. - -sche (discuss) 06:28, 23 December 2024 (UTC)[reply]

I personally favor a treatment like Serbo-Croatian where both scripts are lemmatic, but I won't feel devastated if we treat the Latin script as secondary to the Hebrew one either. You may want to include the Cyrillic script as another option, too, though I don't have examples on hand. (Yiddish's cousin Ladino is more my field of expertise. Or should I say Spanish Yiddish?) (((Romanophile))) ♞ (contributions) 06:42, 23 December 2024 (UTC)[reply]

For Japanese we have such entries as

jiyaku — Rōmaji transcription of じやく

Is there a reason not to use a similar approach for Yiddish? --Lambiam 21:15, 23 December 2024 (UTC)[reply]

My understanding is that this is the intention, yes; in the April discussion, Benwing proposed using {{spelling of}}, which would look like this. @Romanophile, if at some point in the future we have the ability to lemmatize two different scripts/spellings without them falling out of sync (e.g. via them both "transcluding", with "smart" changes, some underlying central backend page), I would support "double-lemmatizing" a great many things, but for now it would just lead to duplication. - -sche (discuss) 16:52, 26 December 2024 (UTC)[reply]

If it's just a stripped down soft redirect entry, then the required maintenance effort is low. BTW, does it need a declension table? And what would be the right place for book quotations in Latin script? I'm interested in this topic, because many of the same guidelines would probably also apply to Belarusian Łacinka, like the horny entry. --Ssvb (talk) 17:33, 26 December 2024 (UTC)[reply]

As Yiddish has been a contemporary of Early New High German, there needed to be Yiddish text in blackletter, and certain Germanists on the continent regularly deal with these Early Modern equivalents, but from the perspective of Anglos it is a suppressed blind spot: fractura est, non legitur. We have to cover Yiddish in Latin script like we include Hebrew spellings of Arabic language as Judeo-Arabic. The current Hebrew-written standard is just a later Ausbausprache like Luxembourgish, but unlike Luxembourgish, which is within the ballpark of another broader dialect (Category:Central Franconian language), Yiddish, due to ethnic and cultural separation, always was a distinct dialect, though the Middle High German beginnings are difficult to oversee, of course. So I don’t see how it was ever banned, only a skewed perspective; more parsimoniously one may observe an oversight in the language data, which until now only lists Hebrew script for Yiddish, factually wrong. A few times I also added Serbo-Croatian terms in Arabic script only to be annoying, without any preference for it and without believing it to be prohibited, only that rendering is faster if we only check Latin and Cyrillic script. Fay Freak (talk) 17:16, 26 December 2024 (UTC)[reply]

Did you add "Latn" as another script to yi yet? (((Romanophile))) ♞ (contributions) 18:04, 4 January 2025 (UTC)[reply]

Thanks for the reminder;

Done. - -sche (discuss) 06:24, 6 January 2025 (UTC)[reply]

Dutch defective verbs

(Notifying Mnemosientje, Lingo Bingo Dingo, Azertus, Alexis Jazz, DrJos): I am working on an update of the Dutch verb conjugation module, and in that I came across the issue of how to handle defective verbs. These are verbs that act like they have a separable part, but are (generally) not actually separable.

I usually use woordenlijst.org for checking Dutch conjugation, and it seems two distinguish two types of defective verbs. The first is verbs like herinvoeren, for which the subordinate clause form is given, but the main clause omitted. The second is verbs like zakkenrollen, for which only the infinitive and present participle is given. However, searching online, it seems that in actual usage, the second type is actually used exactly like the first type (i.e., forms like zakkenrolt and zakkenrolde are attestable). I added the option to specify these types of verbs through a parameter |subonly= (see the bottom of the page at User:Stujul/test-nl-conj).

My main question is about how to categorise these verbs. Currently there are two categories for these verbs: Cat:Dutch defective verbs and Cat:Dutch uninflected verbs. The first is added manually and the second is added by a parameter in the headword template {{nl-verb}}. These should definitely be merged. But should the two types of defective verb I mentioned be categorised separately as different subcategories, because the forms of the second one are nonstandard?

I hope to hear your opinions on this.

PS - sorry if this not the appropriate place for this discussion.

Stujul (talk) 13:36, 23 December 2024 (UTC)[reply]

If forms of zakkenrollen are missing, might it be the woordenlijst that is defective? In the conjugation table on the Dutch Wiktionary all seem to be present, although the subjunctive currently seems unattestable. Here, for example, is a use of gezakkenrold, and here of finite zakkenrollen in a main clause. Is it not just like stofzuigen (not only semantically, but also grammatically)? --Lambiam 21:07, 23 December 2024 (UTC)[reply]

Maybe zakkenrollen was a bad example. It seems indeed to be used more like stofzuigen. This may have to do with the fact that rollen is a weak verb. For example geboogschiet and gelipleest return far fewer results than respectively booggeschoten and lipgelezen. About the Dutch Wiktionary's approach: I found a list of such verbs and most are listed as fully defective there. liplezen gives the main clause forms in parentheses, and on the main page gives a note that these forms appear sporadically. I also note that some verbs that you may expect to fall into this category are actually given as complete verbs on woordenlijst.org, e.g. hartenjagen.

It may just come down to a case to case analysis, but it would be nice to have a standard approach when dealing with such verbs, as we are currently very inconsistent with it.

Stujul (talk) 10:12, 24 December 2024 (UTC)[reply]

Gelipleest is orthographically wrong anyway; /ɣəˈlɪp.leːst/ should be written as gelipleesd. But liplezen is one of the entries on this list of defective verbs.

We are not prescriptive; shouldn’t three properly attestable uses of forms like gelipleesd or lipgelezen trump any lists and suffice for including these forms (with a note warning that they are not generally accepted)? Here are two uses “in the wild” of lipleesde: [10], [11]. --Lambiam 11:26, 24 December 2024 (UTC)[reply]

Sure, we are not prescriptive, and three attestable uses do merit an entry for these forms, I don't disagree with that. But I'm not sure whether we should include these forms in the conjugation table on the lemma entry. You can find many "in the wild" uses of "ik leesde", but we don't include that form in the table at lezen. Of course, in that case, there is a clear "correct" and "incorrect" form, while for liplezen, there isn't a "correct"/"standard" form we can point to (should it be lipleesde, liplas, las lip,...).

Stujul (talk) 11:57, 24 December 2024 (UTC)[reply]

I see that the Dutch Wiktionary happely presents the unsplit conjugated form ik herindeel and the split form ik breng heruit. Both feel wrong to me; are these acceptable? --Lambiam 21:58, 23 December 2024 (UTC)[reply]

The Dutch Wiktionary is again inconsistent in this regard: indeed heruitbrengen is conjugated as a normal separable verb, herinvoeren gives an alternative construction "ik voer opnieuw in", and heruitzenden just leaves the main clause forms empty.

Both these forms that you gave also feel wrong to me.

Stujul (talk) 10:22, 24 December 2024 (UTC)[reply]

I'm amazed that I was completely unaware that these kind of verbs existed. Thinking about it I would indeed categorise them as defective, as the woordenlijst does. If you put a gun to my head I might indeed say "ik zakkenrol" or "ik herindeel", like other speakers, but they still don't feel quite right. My intuition is that these forms which can be sporadically attested are ad-hoc formations. Some standard strategy to deal with these in the language may crystalize at some point, but the fact that everyone feels unsure about them shows that it hasn't yet. —Caoimhin ceallach (talk) 18:02, 26 December 2024 (UTC)[reply]

Extended Mover Request: User:AG202

Hi, I'd like to request extended mover rights, mainly to be able to fix issues like tones in entry titles where they're not supposed to be, such as with Igbo ákpị̀, per WT:About Igbo AG202 (talk) 18:21, 23 December 2024 (UTC)[reply]

@AG202 Done. Benwing2 (talk) 21:35, 23 December 2024 (UTC)[reply]

Thank you!!! AG202 (talk) 23:22, 23 December 2024 (UTC)[reply]

@Benwing2: For the record, the process is WT:WL, see WT:Extended movers. Svārtava (t ɕ) 04:46, 24 December 2024 (UTC)[reply]

Username pronunciations

Hello,

There is a new subpage for username pronunciations called User:Flame, not lame/Username pronunciations.

Thank you Flame, not lame (Don't talk to me.) 19:54, 25 December 2024 (UTC)[reply]

Love the page! Polomo47 (talk) 17:08, 29 December 2024 (UTC)[reply]

jive talk

We should categorise jive talk, like frolic pad, there's probs some good stuff in this website P. Sovjunk (talk) 23:37, 26 December 2024 (UTC)[reply]

Hebrew transliteration

I'm probably not the first person to ask this, and I likely won't be the last: but what is the reason for Wiktionary to use conventional Israeli romanization (i.e. based on colloquial Israeli Jewish pronunciation) over something more narrow and scholarly like ISO 259? Narrower transliterations have a lot of bells and whistles, sure, but I think they still do a good job at being a compromise between various historical, regional and cultural variants of Hebrew. Why should ⟨צ⟩ be written as "ts" when that's not how Yemenite or Sephardic Jews pronounce it, or how it was historically pronounced during Biblical and Classical times? Why should ⟨ח⟩ and non-geminated ⟨כ⟩ be rendered both as ⟨kh⟩ when this merger pretty much only happens in Israeli Hebrew, while every other dialect still distinguishes the two? Why should ⟨א⟩ and ⟨ע⟩ not be rendered at all when, even inside Israel, some Jews do pronounce them? Even if Israeli Hebrew is the de facto standard dialect these days, the common transliteration isn't even the de jure standard, that would be the Hebrew Academy's, which is slightly different. I understand Hebrew is a living language, but if you're like me, a non-Jewish non-Israeli who has a mostly academic historical linguistic interest in Hebrew, the modern Israeli transliteration is just not very useful. Sure, it's more "phonetically accurate" (as discussed, for a single dialect anyway), but isn't that what the IPA section is for?

Obviously we'd have to agree on the details of the transliteration, and I have my opinions on the specifics, but overall, I think a narrower transliteration would make much more sense. It would also likely allow us to begin some sort of automatic transliteration template that languages like Russian, Arabic and Greek have got going on. Pescavelho (talk) 15:55, 27 December 2024 (UTC)[reply]

No good reason, sure, only catering to cognitive biases of majorities. The thought of continuing to use your English keyboard without any acquired extra characters is just too appealing.

In recent months, I have increasingly succeeded to see through the grievances of the world as being the consequences of neurotypicals splitting up the world, they ever imagine, into social relations: what is relevant in the present context (see it again!), for this reason, is that they fail to imagine capable keyboard layouts or input methods, and rather configure six different keyboard layouts if they know French, Spanish, Romanian, Turkish and German, for instance, in addition to English, rather than to use the international version of any of these layouts, or a Unicode search made accessible on their machine for the very occasional but recurring goal of transcribing certain foreign phonemes faithfully.

Engaging the habit learning circuitry of the brain to switch to a more convenient, even if less intuitive (according to neurotypical cognitive biases), input setup would be easy though: it is just excusable, not defensible, not to switch to us(intl) or de(deadtilde) from us(basic) (in /usr/share/X11/xkb/symbols/), and many neurotypicals editing this dictionary or similar academic works already succumbed to this which is reasonable. I also use the actual Russian layout, with extensions, ru(prxn), for all Cyrillic languages, when my neurotypical bro is ticked off by it because its assignments do not phonetically correspond to the ones on the standard German layout—all being invented by someone around 1900 and hence carried forward, few ever questioning it, the social pressure to type the same layout with “ten fingers” is too high.

One just has to look up which combination can be utilized to get bonus characters, and repeat until one does not need to expend notable brainpower for it. Juggling multiple languages to maintain polyglotism is a context where one needs bonus characters, like it or not (everyone shall like it, following the adapt neuroscientific recipe). Fay Freak (talk) 16:33, 27 December 2024 (UTC)[reply]

Is the point here that it's "too cumbersome to type"? That feels subjective, some people would feel like setting up all the templates an average Wiktionary page uses is rather cumbersome (I've certainly felt so at times). In any given case, I'm hoping the adoption of a narrower transliteration would go hand-in-hand with automated transliteration, so this concern would be null and void. Pescavelho (talk) 21:38, 27 December 2024 (UTC)[reply]

@Fay Freak Talking about /usr/share/X11/xkb/symbols/, I've basically written my own keyboard layouts for a bunch of scripts, with the general idea of a ± correspondence to Azerty (rewriting them for Qwerty would be trivial). I've also added diacritics and IPA symbols to my Latin keyboard. Exarchus (talk) 20:19, 4 January 2025 (UTC)[reply]

@Exarchus: From 2020, Red Hat developer Peter Hutterer enabled the coming decades to have custom layouts right out of the X Input System. I have not tried it. I already just had designed mine with everyone in mind and got them merged; and after weeks of tingling in 2017 I was like: snap, computers are above my paygrade, yet I had to study law. So, after pushing the Ugaritic layout, I still have, by reason that I abhorred to change the merge request, Old South Arabian and Nabataean layouts lying around on Github. Later someone branched the Ugaritic out into a separate file ancient, I see just now, so they still are up to be added there if someone fielding these languages tests them and considers them satisfying – I came to wonder if I had to decide about the designs of keyboard layouts alone, without feedback of people who would use them, since man sorely needed to be autistic for that feat. Indeed for that project I already license them, in case nothing ever happens, though there wasn’t much originality in superimposing other Semitic alphabets on the preexisting Arabic one (which I also designed in its current version so everyone on Linux and BSD got BiDi control characters not only theoretically and all). Somebody is using all this stuff, I see, if only because someone added a QWERTY version to the QWERTZ IPA layout I created in the file trans (/z/ more common than /y/ across languages), but I have no statistics whatsoever due to all the freedom and no tracking on the free desktops. 😂 Fay Freak (talk) 22:08, 4 January 2025 (UTC)[reply]

What I find very useful, definitely for unicameral scripts, is to use Caps as "ISO_Level3_Latch". Exarchus (talk) 23:26, 4 January 2025 (UTC)[reply]

I'm also not very happy about the transliteration situation for Hebrew. I don't edit it enough to have much sway in that sphere, but I would like to see a transliteration system that is actually transliteration and not transcription of a certain dialect that I am only marginally interested in. Andrew Sheedy (talk) 22:30, 27 December 2024 (UTC)[reply]

(Lurker/new Hebrew editor. I've read some of the past discussions on this topic.) I would prefer to see both Israeli and Biblical/liturgical/scholarly transcriptions next to each other (except contexts where one of them is irrelevant, of course), ideally (somewhat) automated by a module. This would satisfy both main Hebrew user bases. It's my understanding that a lot of work has already been done on automatic transliteration; it's about time it should be deployed, so we can iterate and check edge cases. I appreciate those still adding (inconsistent) manual scholarly transliterations in 2024, but think it may be useless once we apply the module. Contra the above replies (and I don't understand/ignore/tldr whatever the fuck fay freak wrote), I am satisfied with the gist of the status quo Israeli transliteration system, and generally am not convinced that one-to-one reversibility is a major virtue (compared to, say, readability and not being laden with diacritics); but I'll sooner take any reasonable automated module finally being made widespread over continued bikeshedding of the exact romanization scheme. Hftf (talk) 11:06, 28 December 2024 (UTC)[reply]

@Pescavelho I agree with you, but Neo-Hebrew editors will never agree. They have no understanding for the perspective and needs of people like you and me, who are only interested in Hebrew from a historical point of view. Unfortunately in my experience, they are incredibly biased and obtuse. Here a discussion we had in the past:

Hebrew transliteration – time to clear the mess

— Sartma ^{【𒁾𒁉 ● 𒊭 𒌑𒊑𒀉𒁲】} 19:42, 28 December 2024 (UTC)[reply]

Seems like the main argument is that, if you are someone who is only interested in modern Hebrew, then a narrow transliteration isn't helpful. OK?... What if one isn't just interested in modern Hebrew? Then the modern Hebrew transcription is probably less than useless. I feel like there's a bigger case for having only a narrow transliteration over a conventional transcription, given that modern Hebrew mostly experienced mergers rather than splits compared to Tiberian Hebrew (which is the de facto standard Hebrew orthography), so you can just ignore half of the diacritics and you're basically left with modern Hebrew, but there'd be nothing wrong with having both systems side-by-side. And again, pronunciation is what the IPA section is there for.

Personally, ISO 259, with a few modifications, would be my go-to system. I am willing to provide reasoning each of the modifications in question, and, if we decide to go ahead with the transliteration system, it will be these modifications we'll spend the most time arguing about. (the biggest issue will undoubtedly be the vowels) Pescavelho (talk) 15:45, 29 December 2024 (UTC)[reply]

Wiktionary's transliteration of Hebrew has been discussed (and disputed) a lot over the years (search the archives of this page for various discussions). One idea which seems to me to have been gaining support is, as mentioned above, to have two transliterations, one scholarly and oriented to representing the distinctions of Hebrew script / Biblical Hebrew, beside the current one that is oriented to representing the modern (Israeli) Hebrew pronunciation. A two-translit approach would also help with certain other languages where some people want a transliteration that reproduces the distinctions of the original script, and other people want a transliteration that hints at the pronunciation in the manner of a simplified version of enPR or IPA. (The second group thinks of the first group: if you want to know the distinctions of the original script, why not just learn the original script? The first group thinks of the second: if you want a pronunciation, why not provide a pronunciation, rather than putting an ambiguous respelling in the transliteration parameter?) Having seen how consistently scholarly/"Biblical" transliteration is something people want, I support adding it. - -sche (discuss) 17:21, 29 December 2024 (UTC)[reply]

Not an editor of Hebrew, but someone with casual interest in the topic/who reads Hebrew entries - I ultimately agree with those who mentioned multiple romanisations/transliterations being given and I would also agree that it would be best for this process to be automated.

There remains the question perhaps of 'which should be the default' if there is a toggle in whatever template is being worked on, though with only two romanisations only it seems likely that there is no need for such a toggle.

I could see there being strong feelings and arguments going both ways if a default must be chosen - my preference is for maximum reversibility, but as is evident given the chosen default for Korean, this may not be shared by the majority. (Cf. the chosen scheme for Arabic romansiation.) Herthaz (talk) 20:52, 7 January 2025 (UTC)[reply]

Just one more unofficial vote here for a system that reflects the precise spelling and (therefore, more or less) the Biblical pronunciation. I don't work on Hebrew here, but when I look it up, I want to know how it was around 500 BCE and related to Arabic or Afro-Asiatic, not modern Israeli sieved through Ashkenazic. But this is probably selfish: most people who use this site probably do want Israeli, and those of us with philological interests presumably know enough to work backwards. So not a strong demand or vote, just a voice in favour of using the letter 'q' in words for "kill" because it can't hurt. Hiztegilari (talk) 22:37, 7 January 2025 (UTC)[reply]

Honestly, I don't think we have to include the conventional Israeli romanization, knowing how Israelis pronounce a given word is what the IPA section is there for. That being said, I think the system proposed by @Sartma, which is similar to the one used in pages making use of cuneiform, is a good compromise. Pescavelho (talk) 23:29, 7 January 2025 (UTC)[reply]

@Pescavelho: I also don't think we have to include conventional Israeli romanisation. We could just follow the example of Modern Greek. See for instance οικογένεια (oikogéneia), transliterated oikogéneia but pronounced ikogénia /i.koˈʝe.ni.a/. That being said, I do understand that someone mainly interested in Neo-Hebrew would prefer a romanisation to a transliteration. I repeatedly tried to propose splitting Classical Hebrew from Neo-Hebrew, since in my view it's the only thing that would make Hebrew entries so much neater and less cluttered, but the majority here seems to abhor the idea. If we split the two languages, we could give transcriptions for Classical Hebrew and normalisations for Neo-Hebrew, plus numerous other improvements — from less cluttered headword lines (no need to give alternative forms), to more relevant references, &c.

As for the the transliteration system, I obviously have a preference for one of my own systems, but I'd be happy with whatever as long as it is automatised. For reference, here is a summary of transliteration and romanisation proposals by @Erutuon and me: Hebrew transliteration. — Sartma ^{【𒁾𒁉 ● 𒊭 𒌑𒊑𒀉𒁲】} 22:01, 11 January 2025 (UTC)[reply]

Splitting Classical Hebrew and Modern Hebrew doesn't make much sense because Classical Hebrew and Modern Hebrew are both based on the same written standard (Tiberian Hebrew), just with phonological and grammatical differences. Old Hebrew (i.e. Hebrew written in the Palaeo-Hebrew alphabet) and Samaritan Hebrew make more sense as separate categories however since they use different scripts, however.

For the record I also think that having Arabic dialects separate from each other doesn't make much sense either (I was wondering if there was any (re?)ignition of that discourse ever since North and South Levantine Arabic were merged in ISO).

I also have my own system for transliterating Hebrew, I'd be willing to post when/where appropriate. Pescavelho (talk) 23:09, 11 January 2025 (UTC)[reply]

I would be strongly opposed to splitting Classical Hebrew and modern Hebrew into two L2's, but I am all for having two transliterations as @-sche mentions. There is demand for doing this as well for e.g. Persian and probably several other languages, and it is something I can probably implement the underlying support for without an enormous effort (although I would need to survey the landscape to see what it would actually involve). @Sartma The modern Greek transliteration is quite controversial here at Wiktionary so I would not take that as a good precedent; here too, if people want a scholarly transliteration I would suggest two different transliterations, one scholarly and one pronunciation-based, rather than the current half-ass compromise we have. @Pescavelho There is general agreement that North and South Levantine Arabic need to be merged, but not currently the technical know-how available to help with it; the person who got the two merged in ISO was here for awhile and offered to help, but when they realized it is a big task, they bowed out and said they didn't have time. What this and other similar cases show is that splitting is a lot easier than merging, so we should be careful when we propose splits. More generally I agree that we need fewer Arabic L2's, possibly only one. The decision to split them was made a long long time ago without much thought, simply based on the ISO classification, and there has never been the will to re-merge, since it will take significant effort, both politically in terms of getting an agreement (if it's even possible) and technically in terms of actual implementation. Benwing2 (talk) 00:08, 12 January 2025 (UTC)[reply]

Correcting myself a bit here: it's incorrect to say that Classical Hebrew and Modern Hebrew are both based on the same standard. Rather, Classical Hebrew wasn't a standardized language, Tiberian spelling being a post hoc standardization quite a handful of centuries after Hebrew was no longer spoken as a first language, that was nonetheless retroactively applied to texts like the Pentateuch and the Talmud (in fact, "preserving" as much of the "Classical Hebrew pronunciation" as was feasible before it was too late was the entire point of the vocalization). Modern Hebrew "full spelling" (mostly used in dictionaries and learning materials) continues to nonetheless be based on Tiberian vocalization in spite of modern (especially Israeli) Hebrew having gone quite a few phonological shifts and mergers that make many Tiberian diacritics superfluous (a language that continues to use a spelling based on how it was pronounced centuries ago... could you imagine that?)

In this sense, both the Torah and any given Israeli children's book are both written in the same orthographic standard (with slightly different grammar and vocabulary granted, but that's a different can of worms), so there's no need for splitting the two languages. Classical pronunciation can be marked in the IPA section (potentially alongside Ashkenazi, Yemenite, Sephardic and obviously modern Israeli pronunciations) and archaic grammatical forms can have a little disclaimer in the conjugation tables saying pointing out these only really appear in the Bible and artsy-fartsy works of modern literature.

Regarding Arabic: at the risk of veering a bit into something that merits its own discussion, I think it'd be much more practical to have it be like Portuguese where the dialects are treated as a single language but there are separate regional IPA pronunciations and semantic definitions. It's kind of baffling having to scroll through an entire page (it's Arabic! It's right near the beginning of any given language list!! I shouldn't have to scroll at all!!!!!) because it turns out a definition of a normal Arabic word used across much of the Arab world for some reason only has a "South Levantine Arabic" entry. I also can't really weigh in on Persian but I always found it weird that at least Tajik wasn't considered its own language in Wiktionary, but that is out of my scope.Pescavelho (talk) 01:12, 12 January 2025 (UTC)[reply]

@Sartma, regarding consonants: I am mostly in favor of the existing systems, with a few personal choices that I believe make the transcription more sightly, such as transcribing ק as ⟨ḳ⟩ rather than ⟨q⟩ (and maybe צ as ⟨ẓ⟩ rather than ⟨ṣ⟩, I honestly haven't made my mind on that; I also have a soft spot regarding ⟨j⟩ for the /j/ sound, but I understand why people might just prefer ⟨y⟩ instead). Final ה should always be omitted, like how ـة works in Arabic (except for hei-mappik הּ, obviously).

Regarding vowels, that is the more complicated part. I'm strongly against using "special" characters like ⟨ɔ⟩ or ⟨ɛ⟩:

סָ - ⟨ā⟩ (⟨o⟩ sometimes, I haven't really wrapped my head around understand the rules about when it is pronounced /a/ vs. /o/ and honestly I doubt most Israelis know either. I'm not opposed to just transcribing it as ⟨ā⟩ all the time)
סַ - ⟨a⟩
סֵ - ⟨ē⟩
סֶ - ⟨e⟩
סִ - ⟨i⟩
סֹ - ⟨ō⟩ (if we decide that סָ = ⟨ā⟩ in all situations, I'd rather we go for ⟨o⟩; beware though, Tiberian סֹ correponds to Biblical /oː/, so it could induce people in error)
סֻ - ⟨u⟩

Matres lectionis:

וֹ - ⟨ô⟩ (or ⟨ō⟩ if סֹ becomes ⟨o⟩ instead, or maybe the other way around since וֹ is often used in loanwords?)
סִי - ⟨î⟩ (or ⟨ī⟩)
וּ - ⟨û⟩ (or ⟨ū⟩)

The following two are written as matres lectionis, but some Israelis pronounce them as dipthongs (this is actually the "historical" pronunciation, but the recent shift appears to an influence either from the Ashkenazi accent or an example of "pronunciation spelling"). I believe they should be treated as diphthongs, for reasons that will become clear:

סֵי - ⟨ēi⟩ (or ⟨ēy⟩/⟨ēj⟩, I prefer ⟨ēi⟩ since it looks better and because the י is never doubled when writing without nikkud, indicating it is interpreted as a vowel rather than semivowel)
סֶי - ⟨ei⟩ (or ⟨ey⟩/⟨ej⟩, same as above)

I used to think using macrons could be a mistake, because they indicate vowel quality rather than length in Tiberian Hebrew, but they're the best option imo. As stated, the pronunciation should be in the IPA section. Also, goes without saying but if the matre lectionis bears a vowel, it ceases being a matre lectionis and becomes a regular consonant (so ⟨ij⟩, ⟨ej⟩, etc.).

The big issue with vowels for me is what to do regarding א as a mater lectionis (in words like פֶּרֶא, רֹאשׁ or אִירָאן). In native and Aramaic loanwords, the aleph is actually etymological and could be written, but in more modern Arabic loanwords, they're actually standing in for a long /aː/ sound. In both cases the aleph is silent (hence why it doesn't bear a vowel itself), but we can't really ignore them, so I propose we ommit the aleph but add a small dot underneath the preceding vowel, so רֹאשׁ is /rọš/, פֶּרֶא is /perẹ/ and אִירָאן is /ʼīrạ̄n/. In that case, we'd have to use macrons for all the other matres lectionis (or, we could use a circumflex accent instead of a dot, but then some letters like /ā̂/ would get too crowded, unless we then repurpose the dot for writing vowels without macron, so /perệ/, but that could be confusing)

I have more opinions than these ones, but they are mostly minor. If/when we start fine-tuning the system, I can bring them up. Pescavelho (talk) 21:19, 2 February 2025 (UTC)[reply]

@Sartma Oh, and how could I forget: shva should be ⟨ə⟩, unless it is a shva nah, in which case there should be no vowel. Pescavelho (talk) 13:35, 5 February 2025 (UTC)[reply]

Adjective definitions

E.g.:

Whose first and last vertices are different.
That ends in a vowel.

My feeling is that adjectival definitions of this style seem old-fashioned or cryptic, and are potentially difficult for modern readers to understand. I would change them where I see them to e.g. "Ending in a vowel", but does anyone else have an opinion? Mihia (talk) 20:53, 27 December 2024 (UTC)[reply]

I agree about "that ends in a vowel" and similar; I would definitely change to "ending in a vowel". The first definition "whose first and last vertices are different" seems OK; paraphrasing an "open polyline" as "a polyline whose first and last vertices are different" seems fine to me. You could change it if you want to "having different first and last vertices", which seems about the same in terms of understandability. Benwing2 (talk) 02:20, 29 December 2024 (UTC)[reply]

Stress over hyphens (-́)

The official Spanish orthography mandates stresses over hyphens for compositional elements stressed on the immediate previous syllable, such as -́fobo (-phobe).

However, this rendering is not available in the English wiktionary entry (-́fobo, which does not even mention that it is stress-attracting...). JMGN (talk) 23:15, 30 December 2024 (UTC)[reply]

I see no need to use this in the page title, but it makes sense to me to use it in the entry itself.--Urszag (talk) 23:34, 30 December 2024 (UTC)[reply]

@Urszag: Should I make a formal proposition for vote? JMGN (talk) 21:23, 3 January 2025 (UTC)[reply]

I think these pages are fine the way they are. I'd say this is an issue of stylization and not "mandates" (we're not beholden to the RAE anyway because we're not a prescriptive dictionary). Affixes are rarely written on their own, so it's hard to say that they "should" be written one way or another. Under the current titles, they can be typed and searched for easily; I think that's the biggest consideration. Ultimateria (talk) 23:33, 8 January 2025 (UTC)[reply]

@Ultimateria: But as @Urszag: pointed out, it makes sense to use them in the entries themselves to provide, succintly, such important pronunciation info. that entries currently lack. JMGN (talk) 04:22, 9 January 2025 (UTC)[reply]

While I don't find it strictly necessary, I wouldn't object to that. Ultimateria (talk) 17:30, 9 January 2025 (UTC)[reply]

@Ultimateria: How's stress placement of affixes not necessary? smh ... JMGN (talk) 18:29, 9 January 2025 (UTC)[reply]

Standardizing Alternative scripts heading for Pali and Sanskrit

Sanskrit and Pali show multiple a word on an entry in multiple script by using {{sa-alt}} and {{pi-alt}} which are inconsistently placed sometimes below the heading ===Alternative scripts=== or ===Alternative forms===. I propose standardizing this to ==Alternative scripts===, to be placed above ==Alternative forms===.

It's already used on 6000+ entries
Alternative scripts is much neater especially when there are real variants/alternative forms of the word too (e.g. at लघु (laghu))
Otherwise also it is nice to keep Alternative forms reserved for real variants instead of using it for transliterations of the same word in different scripts.

Per Wiktionary:Entry layout#Flexibility, Wiktionary:Entry layout#List of headings: The list below is not an exclusive list; other headings may be essential in some circumstances, Wiktionary:Entry layout#Variations for languages other than English: Some languages do have characteristics that require variation from the standard format. For links to these variations see Wiktionary:Language considerations. So, I think it is helpful to standardize ==Alternative scripts=== for these languages and add this to WT:About Pali and WT:About Sanskrit. This will result in consistency and the header not being flagged as incorrect/error. Svārtava (t ɕ) 14:46, 31 December 2024 (UTC)[reply]

I thought it was obvious, but evidently not. Using "alternative form" for the same exact word written in a different script makes no sense. That header should be reserved for actual alternative forms/variants. -- 𝘗𝘶𝘭𝘪 𝘮𝘢𝘪𝘺𝘪^{(𝘵𝘢𝘭𝘬)} 02:54, 1 January 2025 (UTC)[reply]

"Alternative forms" is what is prescribed in WT:EL. It also covers the case of alternative writing systems in the same script (most notably Thai, but also the Myanmar, Tai Tham and Lao scripts. Also, how should we handle what are significantly different forms in each writing systems? There is a recent example at sakkoti, with 4 different forms in the Roman script, each handled by its own invocation of {{pi-alt}}. There are also cases where the Lao-repertoire Lao writing system doesn't distinguish forms that other writing systems do, particularly cases of variable reflection, as in pāhuṇeyya. And there are occasionally multiple forms only for the Latin script (e.g. kat'añjalin, whose impact is reduced by categorising it as a misspelling), merging these into one call, the redesign of the parameter list would need some thinking about. (There's no academic interest in distinctively transliterating the corresponding Thai script form, กัตอัญชะลิน (katañjalin), which is currently not classified as a spelling error. The banning of certain Roman script writing systems for Pali should also be reconsidered.

If we're going to change the heading, which I oppose, we should consider 'Alternative writing systems'. --RichardW57 (talk) 14:59, 8 January 2025 (UTC)[reply]

As written in the initial post, WT:EL allows some flexibility for languages if there is sufficient need - I can't think of a language more worthier Sanskrit and Pali to be allowed a header for ===Alternative scripts=== for the numerous scripts they are/were written in.
The proposed format for sakkoti would be as follows:

Alternative scripts

sakkoti (Latin script)
𑀲𑀓𑁆𑀓𑁄𑀢𑀺 (Brahmi script)
सक्कोति (Devanagari script)
সক্কোতি (Bengali script)
සක‍්කොති (Sinhalese script)
သက္ကောတိ or သၵ္ၵေႃတိ or သၵ်ၵေႃတိ (Burmese script)
สกฺโกติ or สักโกติ (Thai script)
ᩈᨠ᩠ᨠᩮᩣᨲᩥ (Tai Tham script)
ສກ຺ໂກຕິ or ສັກໂກຕິ (Lao script)
សក្កោតិ (Khmer script)
𑄥𑄇𑄴𑄇𑄮𑄖𑄨 (Chakma script)

Alternative forms

sakkati, sakkuṇoti, sakkuṇāti

I don't see any issue with this as the alternative scripts for the terms listed in ===Alternative forms=== would be findable at their respective entries.

Could you point out specifically what issue the proposed format would cause at pāhuṇeyya or kat'añjalin?
As for Alternative writing systems, that is the same in meaning as Alternative scripts but a longer, three-word, less used alternate of it.

Svārtava (t ɕ) 17:39, 8 January 2025 (UTC)[reply]

@Svartava: At present, we have yet to automate Lao-repertoire Lao script Pali (my name for Pali using only the letters of the original Unicode encoding for Lao). When we do, we would inevitably find ປາຫຸເນຢຢະ (pāhuneyya) listed twice on the page for ປາຫຸເຓຍຍະ (pāhuṇeyya), once under 'alternative scripts' for your labelling preference, and once under alternative forms, as both forms (in your parlance) are internationally well established. I would prefer to bite the bullet and distinguish two flavours for |Latn2=, one for Latin-only forms and another for propagation to the other scripts. Perhaps the latter usage should be |alt2=.

As you can't be bothered to update documentation for templates, e.g. for {{pi-alt}}, when you change them, I may be wrong, but we have no mechanism to record the forms of kat'añjalin in the other scripts. This odd form originates in the Thai script, has been copied into at least one Roman script publication, and seems to have spread no further, so does not have a Sinhalese script form. Its page currently uses

{{pi-alt|Latn=katañjalin|Latn2=kat'añjalin|Thai=กตญฺชลินฺ|Thai2=กัตอัญชะลิน|Thai3=กะตัญชะลิน}}

. |Latn2= is redundant in this entry; it was intended for propagation to other pages before I realised that documented spelling mistakes didn't need such propagation. We would have to extend the template, or more precisely its supporting module, to say that a form didn't occur in a certain script. That would also apply to another well-established Thai spelling mistake, สะวากขาตะ (savākkhāta), if we decided that its Roman script form was a well-established, as opposed to infrequent, spelling mistake.

In knowledgeable usage, 'writing system' and 'script are not equivalent, though both terms suffer the same fuzziness as 'species'. For example, the first two Thai script parameters given above are in different writing systems, and it can be argued that UK and US English use the Latin script with different writing systems. --RichardW57 (talk) 10:34, 9 January 2025 (UTC)[reply]

Actually, we do have a mechanism to prevent {{pi-alt}} producing forms for a script - one just uses something like |deva=-. It just needs to be documented. --RichardW57 (talk) 22:21, 9 January 2025 (UTC)[reply]

@svartava: And are ᨻᩩᨴ᩠ᨵᩮᩣ (buddho) and ᨻᩩᨴ᩠ᨵᩮᩤ (buddho) different forms or different Tai Tham writing systems? I get a hint of regional preferences, but I'm not sure how strong they are. And there are definitely regional preferences between ᨾᩴᩈ (maṃsa) and ᨾᩘᩈ (maṅsa); the latter seems specific to Northern Thai Pali. And how do you propose to handle phonetically inappropriate gemination underneath repha in Sanskrit? That definitely turns up in Khmer (or Pallava), Bengali and what I presume is Brahmi script, even if it is now universally obsolete. Does it also turn up in Devanagari? --RichardW57 (talk) 14:24, 9 January 2025 (UTC)[reply]

@RichardW57: Ideally, in such cases the Thai script entry would have both headers for scripts and forms, and we could suppress the Thai script displaying in alternative scripts table and the variants would just be placed below alternative forms header only. I didn't clearly understand phonetically inappropriate gemination underneath repha in Sanskrit, so could you give an example for understanding it correctly? Svārtava (t ɕ) 14:34, 9 January 2025 (UTC)[reply]

@Svartava: Could ypu please be a bit clearer about what you're suggesting for the Lanna script. I wasn't even thinking of Northern Thai variants of Thai script Pali, though come to think of it, มงฺส (maṅsa) looks like one. Are you saying that you would tell the Tai Khuen, the Northern Thai or the Lao that their Lana script spelling of Pali is not the canonical one?

Svartava's changes to {{pi-alt}} starting in October 2024 have trashed manual alternative forms, so some of the examples I've been giving don't work - all the parameters but |Latn= are being ignored. For example, มงฺส (maṅsa) should be listed under the alternative forms of maṃsa, but the relevant parameter to {{pi-alt}} is being ignored. I'll try to fix it this afternoon, but if I run out of time I'll default |Latn= to avutta "not said" so as not to flood cat:E with pages exploiting your page. Did you even read the template's documentation?

For gemination under repha see ស្វគ្គ៌ (svargga). --RichardW57 (talk) 16:26, 9 January 2025 (UTC)[reply]

@Svartava What are you playing at? I said I would fix the module and you went and editing in parallel, publishing non-working code. I think we now have it working. RichardW57 (talk) 18:31, 9 January 2025 (UTC)[reply]

@RichardW57 That was a mistake, and by the time I fixed it you had already made the appropriate correction and I ran into edit conflict. Svārtava (t ɕ) 18:35, 9 January 2025 (UTC)[reply]

@RichardW57 What do you think of the format for มงฺส (maṅsa) shown at here? In complex cases if there is need, we could anytime go back to the format which is better but I still think that for simpler pages keeping them under ===Alternative scripts=== instead of ===Alternative forms=== is a good approach. Svārtava (t ɕ) 19:04, 9 January 2025 (UTC)[reply]

@Svartava: I am reviewing on the assumption that the omission of everything after the definition line is just laziness. Please correct me if I am wrong, for I believe the omitted parts are required. --RichardW57 (talk)

I've now noticed that the header line in มงฺส (maṅsa) was wrong - it should have shown that transliteration, as it differs from the standard Roman script. I've now fixed it. We want the Pali-specific header template {{pi-noun}} so as to:

Nag for gender
Omit transliteration by default, for it is usually given by {{pi-sc}}, though not in this case. @Benwing2 objected to the duplication.
Allow |tr=+ to force automatic transliteration where transliteration is needed and automatic transliteration works.

Searching for the nominative/accusative forms of the various Thai script forms, I get:

มงฺสํ (maṅsaṃ) : 10 irredundant Google hits (iGh)
มํสํ (maṃsaṃ) : 126 iGh
มังสัง (maṃsaṃ) : 143 iGh

Going by the line {{pi-alt|Lana=ᨾᩴᩈ|Lana2=ᨾᩘᩈ|Latn=maṃsa}}, it is clear that มงฺส (maṅsa) is not considered the (Wiktionary-)principal form for its writing system. To me, that implies its definition line should be

{{alternative spelling of|pi|มํส|tr=-}}, {{pi-sc|t|maṃsa}}, yielding

Alternative spelling of มํส, Thai script form of maṃsa

Given the amount of effort involved at this point, we might as well do as @Octahedron80 would prefer and type

{{alternative spelling of|pi|มํส|tr=-}}, {{pi-sc|t0|maṃsa}}, yielding

Alternative spelling of มํส, Thai script (with implicit vowels) form of maṃsa

except that I would prefer more memorable codes than the likes of 't0' and 't1'. Just using {{pi-sc}} led to a massive reduction in errors, though that will still suffice for the Wiktionary-principal forms, the vast majority of entries.

For this word, the 'Alternative forms' section, which contains "มํส (maṃsa), มังสะ (maṃsa)", is unneeded and arguably wrong. These forms are the Wiktionary-principal forms for their writing systems.

In keeping with this style, the page for มํส (maṃsa) does need an 'Alternative forms' section, listing มงฺส (maṅsa), while มังสะ (maṃsa) does not merit an 'Alternative forms' section.

I have been assuming that there is no information specific to the individual Thai-script writing systems for Pali (or Thai-script Pali writing systems in general) that needs to be gathered together under these forms. Pronunciation sections are a possibility. I don't know whether deep regional differences in pronunciation have been stamped out, and if they survive one should check what script the monks are reading. It would be amusing, but complicating for us, if monks in Surin read Thai texts with the Khmer pronunciation. (I still haven't analysed the Mon pronunciations for the Mon variant of the Burmese script for Pali.) Mazard cautions us to expect chaos.

Note that, barring regional accents, all three of these Thai script forms should be pronounced the same.

Is anyone willing to help me plan out the handling of Lao-script spelling differences? I rather fear my research on Lao-script Pali spelling is not yet adequate. A fall back is to assume that different Lao spellings indicate different writing systems.

It is all so much simpler if these are all treated on an even footing as forms of maṃsa. --RichardW57 (talk) 21:26, 9 January 2025 (UTC)[reply]

@Svartava: I've drafted out what I think the non-Roman pages should look like for Wiktionary-script-principal lemma for สกฺโกติ (sakkoti) and the alternative form (in the same writing system) สกฺกติ (sakkati). You can see from the 'transliteration aid' comment in the 'Alternative forms' section that is is now more work to get reliable standard transliteration to Thai. Following the policy of preferring not to create entries without quotations, a lot of the links will be red links and thus with a relatively high risk of being typos. Of course, the Thai-script Pali we have splits up nicely into two writing systems. It's messier with the various spellings for the Tai Tham and Lao scripts. --RichardW57 (talk) 15:38, 10 January 2025 (UTC)[reply]

@Svartava I couldn't find much of a discussion of gemination under repha, but there some text which just assumes it happens, e.g. the DHARMA project instruction "e.g. normalise varnna to varṇṇa (rather than fully standard varṇa) if the inscription normally doubles nasals after r". Bengali script Sanskrit নির্ব্বাণ (nirbvāṇa) is given as an example in one of Mazard's transcriptions of books about Pali, and Google will find examples of such a word, though probably obsolete Bengali নির্ব্বাণ (nirbban) rather than Sanskrit. --RichardW57 (talk) 21:37, 11 January 2025 (UTC)[reply]

@RichardW57 This is definitely absent in Devanagari. In the scripts like Khmer, are the non-geminated variants also used or is it just the one variant with gemination that is used? Svārtava (t ɕ) 05:28, 12 January 2025 (UTC)[reply]

(Notifying Atitarev, Octahedron80, AryamanA, Pulimaiyi, Svartava, JohnC5, Kutchkutch, Getsnoopy, Rishabhbhat, Dragonoid76, Exarchus): : The Khmer-script examples are from the time of Angkorian Khmer, so ultimately they might get converted to the Pallava script once it's been encoded in Unicode. I couldn't find enough examples to analyse the usage (inscriptions don't easily photograph well), but I suspect that the rule was that everything below a repha must occupy at least two storeys, and that determines whether gemination occurs for a phonetic rC-cluster. I saw no examples of this gemination in the 20th century Chuon Nath dictionary (Khmer-Khmer), the only source of Khmer-script Sanskrit I can view now that support for Adobe Flash is generally gone. --RichardW57 (talk) 12:20, 12 January 2025 (UTC)[reply]

With and without occur in Bengali words, so I assume there are (at least) two writing systems for Bengali-script writing systems. That's on top of whatever expedients were used to avoid or limit the collapse of 'r', 'b' and 'v'. Eastern Nagari systems are not consistent amongst themselves in how and to what extent they avoid the collapse; in this latter matter, we've encountered differences in how Pali is handled in the Bengali script. --RichardW57 (talk) 12:20, 12 January 2025 (UTC)[reply]

I think I saw, in a Mon context, an image of a flag saying something like Sanskrit ဓရ်္မ္မ (dharmma), but when I went back to record the URL for it, I couldn't find it. I'm sure of the stem and the (superscript) repha. The word's misrendering for me in the preview, but not in the editing window - the superscript mark should be part of the second akshara. --RichardW57 (talk) 12:20, 12 January 2025 (UTC)[reply]

@Svartava: I've found a plausible hit for that form in what looks like a Burmese blog - https://listed.to/@thanhtunoo/37448/. I'm not sure that it's Sanskrit though. There's also a form သရ်္ဗ္ဗ (sarbba) there that seems to have inherited the b/v confusion of Angkorian Sanskrit spelling, but with v > b replacement as often in Thai, as opposed to the b > v replacement seen in Angkorian inscriptions. Of course, it might just be a Burmese Pali-Sanskrit hybrid - such are fairly common in Thai. However, I can't find that word in the SeaLang Burmese-English dictionary. There's a limit to what I can extract from the blog without someone (e.g.@Hintha) cleaning up the translation.--RichardW57 (talk) 14:27, 12 January 2025 (UTC)[reply]

Notifying other Pali and Sanskrit editors. (Notifying AryamanA, Pulimaiyi, Svartava, JohnC5, Kutchkutch, Getsnoopy, Rishabhbhat, Dragonoid76, Exarchus): , @Octahedron80. RichardW57 (talk) 17:23, 8 January 2025 (UTC)[reply]

I am in favour of an "alternative scripts" heading for the template. For Sanskrit and Pali, "alternative forms" should be for same-script terms that are spelled differently than the headword. Therefore, I think the current organisation at sakkoti is undesirable; the alternative scripts templates for the alternative forms (sakkati etc.) should be at the Latin-script entries for those terms. 4 of those templates on one page is unnecessary bloat. —Aryaman^A ^{(मुझसे बात करें • योगदान)} 22:12, 8 January 2025 (UTC)[reply]

@AryamanA: Are you proposing that the 'alternative scripts' table should be at the principal script's (usually usually Latin's for Pali, Devanagari's for Sanskrit) page only? This would simplify matters - I've been resorting to subsidiary templates to ensure consistent lists across scripts, mostly as I turn up various Lao script spellings. --RichardW57 (talk) 13:43, 9 January 2025 (UTC)[reply]

@RichardW57 That isn’t what they were proposing. Please re-read the comment. Theknightwho (talk) 13:57, 9 January 2025 (UTC)[reply]

@Theknightwho: I have and it remains unclear. If 'Latin-script' qualified 'alternative forms' and not 'entries' it would indeed have a different meaning. My suggested interpretation is consistent with the popular (but not overwhelmingly popular) idea of minimising entries for alternative forms, and @AryamnaA's stated preference for minimising alternative script's entries, which mostly aren't quite alternative forms. --RichardW57 (talk) 16:58, 9 January 2025 (UTC)[reply]

I think you understood right; I'm personally agnostic on whether "alternative scripts" should be at only the main script page or at all script pages. There are good arguments for either way. My only proposed restriction is that "alternative scripts" should only mean terms which are equivalent(ish) across different scripts (as I suggested in the example of sakkoti). —Aryaman^A ^{(मुझसे बात करें • योगदान)} 19:43, 9 January 2025 (UTC)[reply]

Abuse of power by one of admins + the word "ministra" in Polish

I hereby want to report abuse of power by admin Surjection. Said admin twice reverted my edits on ministra then banned me from editing for a week. In their discussion page I pointed out that my change was clearly sourced with the highest authority on the Polish language and that while I could have made a technical mistake - which they pointed out - it's hardly the reason to revert the change or ban someone.

My argument: the word "ministra" in Polish used as feminine form of "minister" is an error. This was clearly stated by the Polish Language Council (RJP) in their statements from 2012 and 2019. The RJP stated that such form is atypical for the Polish language and in general used in colloquial way. The RJP suggested the conservative approach for feminine form in this case (ie. "pani minister") and pointed out that used correctly the form "minister" leaves no doubt on the gender of person it applies to. The RJP pointed out that correct way to create feminine forms of nouns is by adding -ka, and not -a (in example: doktor -> doktorka and not doktora). Both statements are available publicly. Mind that on the minister page it's also stated that correct Polish feminine forms are either minister or ministerka, but not ministra.

In response Surjection used following fallacious arguments:

"you don't like the word"
"you lie"
"you didn't read"
"you don't want to contribute"
"your motive is so obvious"

and so on...

As a dictionary, which is used by huge number of people, Wiktionary should present facts, not opinions. Now the fact that some political factions and media (in comparison to others that oppose it) want to enforce usage of specific word doesn't make this word correct and if the highest authority on the Polish language says "yes, let's make feminine forms of nouns but let's do it correctly and logically" then points out that certain forms, including the one in question, are not correct, not typical for Polish language, may have wrong associations, can be misread and misunderstood and so on, then Wiktionary should include that opinion and clearly mark such forms as incorrect, colloquial etc. And admins shouldn't put themselves above authorities on languages especially when they don't even know those languages.

It is also my opinion that admins should be impartial, should focus on facts and avoid being judgmental, self-rightous, close minded, stubborn and so on. Making decisions based on their imaginative assumptions and/or prejudice is simply wrong and against any rules I've read.

Based on quick look on their discussion page it would seem that admin in question had in past made multiple questionable decisions in which they acted on assumptions and prejudice rather than facts. Their discussion page proves they're unable to admit of being wrong and they tend to use fallacious arguments and not factual ones. Therefore I request the revocation of powers of said admin as they clearly violated the Code of Conduct of Wikimedia Foundation. I also ask for correction of the definition of word in question to indicate that said form is both colloquial and incorrect. 89.64.9.29 00:08, 1 January 2025 (UTC)[reply]

Several notes:

not "banned from editing" - blocked from editing one page, i.e. the page in question,
the user first completely deleted the definition and then tried to add a completely subjective "corruption" label; it's obvious they do not like this term,
they seem to like appealing to authority, including dismissing an admin (@Vininn126) that tried to point them to policies,
I asked them multiple times on the thread they started on my talk page to contribute to the dictionary, but they are apparently interested in nothing else than trying to edit this entry to make it seem less favorable to the word in question.
Wiktionary doesn't specify that words are "incorrect", no matter how much you dislike the word.

— SURJECTION ^{/ T / C / L /} 00:12, 1 January 2025 (UTC)[reply]

Obvious agenda-pushing because they don't like the use of the feminine form. No abuse of power. Theknightwho (talk) 00:39, 1 January 2025 (UTC)[reply]

I will happily learn how making a changes based on opinion of Polish Language Council is "agenda-pushing" LOL. Read again: correct feminine forms of word minister in Polish are either minister or ministerka. I have nothing against feminine forms. This specific form ministra is simply incorrect. That's a fact not some "agenda". 89.64.9.29 00:45, 1 January 2025 (UTC)[reply]

To add some context: There's been discussion in Poland regarding feminine forms for years (actually over a hundred years). There's been consensus regarding some forms for years, but in past 15 or so years some ideological powers decided to push for feminine forms for nouns that (for multiple reasons) had feminine form the same as masculine - like said minister. And while that is, in general, considered a good trend by language authorities the problem was that said ideological powers didn't like the correct feminine forms (like said ministerka) finding them derogatory. That's why they started pushing other forms, which in their opinion are less derogatory, like this specific form we're talking about - ministra. And yes - this form is in use by certain political powers and media, especially since 2023 elections in Poland. But that doesn't change the fact that - from the language point of view - this form is incorrect, that language authorities find it as incorrect and that, to the best of my knowledge, it doesn't exist in any serious dictionary. And that is what we should focus on here. I referred to two opinions of Polish Language Council, which clearly point out that such forms are not only incorrect, but also problematic from the language point of view. I don't really care if some people try to push their agenda by using incorrect forms of words. But in my opinion a dictionary shouldn't promote incorrect forms of words or should at least clearly indicate that while they may be in use they are considered colloquial or incorrect. And that is my only goal here. 89.64.9.29 01:27, 1 January 2025 (UTC)[reply]

Your opinion on my label has been noted, but your assumptions here are the very reason you're being reported.
Your continuous attempts to dismiss valid argument by using manipulation or fallacious arguments are also the very reason you're being reported.
My response to clearly disparaging comment of another user on my profile has nothing to do with this case and doesn't matter.
You seem to have trouble understanding what is factual argument and fallacious argument. The fact that I didn't do any other edits (from this IP at least) does not matter at all in this case.
You shouldn't make comments in your own case. Everything here is public. People can check your page, my page, your comments, my comments. You made your point of view clear in your discussion page and this discussion is over. Stop trying to start it again here.

89.64.9.29 00:40, 1 January 2025 (UTC)[reply]

As an uninvolved admin I'll just say that Wiktionary is a descriptive, not prescriptive dictionary. This means we describe actual usage, not usage as some authority says it should be. If a given authority says you shouldn't use a word in a specific way, but the word is nonetheless used that way, we may note this using the term proscribed, but we don't either delete such definitions or use language like "corruption". Benwing2 (talk) 01:20, 1 January 2025 (UTC)[reply]

To explain: I found "corruption" in glossary and tried to use it, but wasn't sure how this works. I assumed someone will probably fix it when they see it. I agree that my action to delete the whole definition wasn't the best one despite my good intentions. I can admit to it. But with my second edit I actually tried to make the right thing, I looked into glossary trying to figure out what I should use to indicate that the form is incorrect - corruption was the closest thing I found. Maybe I made a mistake, but that's hardly a reason to attack me or ban me.

I added a little context for the ongoing linguistic dispute in Poland above.

I'd like to point out that currently there's a conflict between definition of minister which clearly states that feminine forms are minister and ministerka and definition of ministra. And that's one of reasons why, in my opinion, that definition should in some way indicate that the form, while in use, is considered incorrect and that correct forms are minister or ministerka (although to the best of my knowledge currently only minister is in Polish dictionaries).

On the side note in my opinion such behavior from admins is really discouraging. Especially if someone makes valid factual argument and gets nothing but false accusations in response and fallacious arguments in response. In my opinion admins shouldn't assume bad intentions. 89.64.9.29 01:44, 1 January 2025 (UTC)[reply]

Could you please point out to me where the Polish Language Council calls these forms incorrect? I don't see it in their statement here, for instance. Even if we accept your appeal to authority (which isn't really how things work here), you seem to be misrepresenting what has actually been said. Theknightwho (talk) 02:02, 1 January 2025 (UTC)[reply]

It doesn't even matter if they are proscribed, we include proscribed speęch. This is either clearly view-pushing or bad-faith, but I see no reason to give this particular person much more attention. Vininn126 (talk) 02:11, 1 January 2025 (UTC)[reply]

Well, the article you linked has a section that translates as: "[T]he creation of feminine names by changing the inflectional endings, e.g. (ta) ministra, […] is not typical for the Polish word-formation system (the words blonda, szczęściara are clearly colloquial), and it is better to use the traditional suffix model, i.e. the creation of names like doktorka[.]" Doesn't say anything on if ministra is wrong, though. CitationsFreak (talk) 02:14, 1 January 2025 (UTC)[reply]

@CitationsFreak Right, exactly. That's very different from what the other user said. I don't think it even supports the claim that they're colloquial either - it simply implies they may have been modelled on colloquial terms. Theknightwho (talk) 02:50, 1 January 2025 (UTC)[reply]

There are two statements and it's not possible to understand second one without knowing the first one.

In statement from 2012 they say that in general we create feminine forms of names by adding -ka in case of noun names, for example nauczyciel changes into nauczycielka, and by adding -a for adjectival names, for example służący changes into służąca. They say that usage of form with -a is untraditional for noun names, but forms like ministra are being created because traditional forms like ministerka can be seen as colloquial and derogative (literally "showing smallness of the person they refer to") - which is part of broader social and ideological ongoing debate in Poland (as I explained above). Then they say that forms with -a also have cons, they also can be seen as derogative and can be ambiguous with honorifics, which creates conflict with intentions of use (and it's important to know that Polish language is full of formalities and usage of honorifics is normal in daily life). After that they talk about traditional use of masculine forms for both genders. Finally they remind that for years only adjectival names had feminine forms. In the end they say that the usage of feminine forms cannot be enforced in any way, especially not by decision of authorities, nor introduction of any laws.

In second statement, from 2019, they indicate that for certain names, the usage of feminine forms by adding honorific pani (ie. pani doktor) became a norm in second part of 20th century, however since 1990s the feminine forms started to gain popularity. They reject some popular arguments against creating feminine forms (which I totally agree with). Then they say that creating forms by "changing inflectional endings" (ie. ministra) is "not typical for Polish word-formation system" and explain that such forms are generally colloquial forms of feminine forms (like blondyna compared to blondynka. They also say that traditional forms are better (ie. ministerka) and that dominant form is by adding honorific (ie. pani minister).

So you're right that they don't directly call that form incorrect. However they say that such forms were created by changing inflectional endings, which they explained more in their first statement from 2012. That means that the ending which is normally used for adjectival names is being used for noun names. They also explain that such change is typically done to create colloquial form of feminine.

And since the use of incorrect inflectional ending is an grammatical (inflectional) error ([here in Polish about inflectional errors] - the use of incorrect ending is on top of the list) then consequently the word created with error is incorrect form.

To sum up: the masculine word minister normally has the feminine form as minister (optionally with honorific pani). The correct way - according to rules - to create feminine form is by adding the ending -ka to create the word ministerka. There is also a form ministra which is created by using incorrect ending -a, which by rules should be only used to create feminine forms of adjectival names and when it's used to create feminine forms of noun names it's done to create a colloquial form.

On the side note it's worth noting that the word ministra wasn't really in common use until current coalition government took over in 2023 and enforced its usage. To be more precise there was a short time discussion about it in 2012 when first mentioned statement was published and then nothing until the end of 2023. It seems even the second statement from 2019 didn't create any interest in media. 89.64.9.29 04:12, 1 January 2025 (UTC)[reply]

These are not inflections, though; they are derivations. They may well be derived from inflections, and that might not be typical, but derivations are often unexpected or irregular in pretty much all languages, because they are a new word that has been derived from a pre-existing one in some way, and that can happen in many different ways for many different reasons. You definitely cannot conclude they are the "incorrect inflectional ending", which would only make sense if the word minister were being used with the wrong inflectional endings. That's not what's happening here, though: instead, they've taken the genitive/accusative form, reinterpreted it as a feminine noun, and inflect it in an entirely regular way. There's nothing incorrect or nonstandard about that. Theknightwho (talk) 05:50, 1 January 2025 (UTC)[reply]

The alternative understanding is that it could be a back-formation of ministerka, or simply just a change in gender. You see nouns gain or lose vowels when changing genders in dialects, compare jud. Vininn126 (talk) 05:52, 1 January 2025 (UTC)[reply]

I'm just gonna quote the 2012 statement from RJP here: "The argument that such names are regularly created word-formationally in other languages (e.g. German) is not accurate, because each language uses different ways of enriching itself, which depends on the grammatical structure of a given language, the word-formation possibilities and the customs established in it." 89.64.9.29 06:12, 1 January 2025 (UTC)[reply]

That doesn't make it incorrect - it's a simply a statement that it isn't typical in Polish, but that isn't relevant. Theknightwho (talk) 07:07, 1 January 2025 (UTC)[reply]

The admins have a natural prejudice against IP users and this is expected, because many of the IP users are vandals. Arguing with admins isn't a good idea either. You are in no position to change the rules, and Wiktionary lists all attested words, even if they look like https://en.wiktionary.org/wiki/Category:English_leet and are promoted or used only by a small fraction of the population.

You could count on the term getting appropriately labelled and/or categorized. And ministra already has had a "neologism" label. It's possible that this is not enough and some further clarifications were possible similar to the Polish Wiktionary entry for "ministra", but you made a mistake with the "corruption" label that got you blocked. Further angry comments didn't do you any favor, but just confirmed the admin's judgement. --Ssvb (talk) 13:04, 1 January 2025 (UTC)[reply]

prejudice is against the rules. Admins should be impartial and should always assume good intentions unless proved otherwise. Presumption of guilt is against basic human rights. In this situation the admin made a presumption "you don't like the word" and then showed a tendency to interpret everything in a way that would support this presumption and made further assumptions to support this claim. The admin clearly showed the symptoms of power intoxication.
I never argued that I didn't make a mistake with corruption. I clearly stated that I found it in glossary and attempted to use it based on the fact it was there. As new editor I don't need to know everything, but it's hardly a reason to attack someone.
In my comments I applied to logic and reason while admin continued to use fallacious arguments. This is kind of behavior that should never be seen from admins. Everyone has the right to defend themselves against false accusations from admins.

89.64.9.29 16:28, 1 January 2025 (UTC)[reply]

You were not attacked. You were called out for problematic behavior. Not everything is unbiased, and it is possible for you to evoke such comments with your behavior. Vininn126 (talk) 17:07, 1 January 2025 (UTC)[reply]

It's customary on Wikimedia projects to add welcome message for new users on their user page. It's not customary to repeat this message or its parts in a way that can be seen as disparaging or derogatory. The welcome message is usually constructed in a way that new user can find all the rules, guidelines and help they need.

Things can be pointed out in various ways, but there's huge difference between pointing out something (ie. "hey I noticed you tried to use label corruption but such label doesn't exist - could you read more so you can avoid such edits? If you need help with editing you can find it here...") and attacking someone (ie. "I subjectively declare that you don't like a word and ban you. I don't care what you have to say"). 89.64.9.29 18:40, 1 January 2025 (UTC)[reply]

Your behavior to such comments was assuming bad faith on Surjection's part, and I was calling that as well. We see people act this way around words all the time, it's not rude, it's direct. Vininn126 (talk) 18:44, 1 January 2025 (UTC)[reply]

That's called confirmation bias.

There's no doubt that Surjection's behavior violates Wikimedia's Universal Code of Conduct.

There's also no doubt that you both suffer from power intoxication.

And there's also no doubt that you will never see that as this condition blinds you from seeing it. But you prove it with every comment you make. 89.64.9.29 19:03, 1 January 2025 (UTC)[reply]

Teamwork doesn't seem to be your strong point. I don't mean it in a negative way, just the end result looks non-productive and this makes me sad. Still, if you happen to be incompatible with the other contributors, then it's better to quit early rather than become invested in the project, torment yourself and become a hindrance for the others. The admins, not going out of their ways to accommodate everyone, effectively happen to filter out people with potentially problematic non-cooperative personalities. --Ssvb (talk) 08:37, 2 January 2025 (UTC)[reply]

@Benwing2: this is a double edged sword, and it plays both ways. The IP user's story sounds like the new term might be artificially promoted (or in other words "prescribed") by a certain group, based on their desire to displace another term ministerka [12] that they perceive as derogatory. And thus the process might be not entirely natural. The term surely deserves an entry, but a "usage notes" section with some explanations would help non-Polish readers understand the situation. Also maybe it would be useful to find more quotations and verify that the sources are truly independent, ruling out the possibility of them colluding with each other? The opinion of native Polish speakers would be interesting to know. Do many or most of them perceive the new term as natural? I presume that the IP user is a native Polish speaker, who is a little bit upset. On the other hand, there's an audio recording of this term from a native Polish speaker [13], who presumably didn't perceive it as incorrect (unless she just recorded it as the genitive/accusative form of minister).

My personal concern is that the WT:CFI policy can be potentially gamed if somebody is up to no good. Merely three authors with publications in durably archived sources are enough to impose arbitrary new words on multi-million nations, to the excitement of non-native wiktionarists. --Ssvb (talk) 10:22, 1 January 2025 (UTC)[reply]

The word ministra first appeared around 2011-2012. It was promoted by one politician, got some media attention and was even commented by Polish Language Council (as already mentioned above), then it disappeared from general public space for years. Back in 2012 the fact that minister Mucha expected to be called ministra was seen by some as one of her many faux pas (media article in Polish). During that time there was a lot of hate towards Donald Tusk government from football supporters (and others) in Poland and Joanna Mucha was also widely criticized and put on memes in negative context. It's worth noting that despite her request in general population the form (pani) minister was used, although at least some more liberal media (like Onet.pl) were using form ministra sometimes (after quick Google search it seems the form was used alternately with minister around 2012 and after 2012 only form minister was used in articles about her). After 2012 until late 2023 the word ministra was practically non-existent. It was reintroduced after 2023 election by left-wing politicians (at first only two of them: Agnieszka Dziemianowicz-Bąk and Katarzyna Kotula) and started to be widely used in liberal media and widely criticized by more conservative part of society. For majority of population in Poland it's more natural to use pani minister rather than ministra or ministerka. On side note some language authorities in Poland, like Jerzy Bralczyk (Polish Wikipedia), pointed out that in Latin ministra means maid or servant and therefore can be seen as derogative form. 89.64.9.29 17:49, 1 January 2025 (UTC)[reply]

Claims like "majority prefer" would need sources, not one's own intuition. Feminatives are quickly gaining popularity, as well. But all that aside, it sort of doesn't matter. The only label that really works here is "neologism", the council doesn't proscribe it, and it has seen enough use to be documented. Any other labels would be based on personal opinion. Vininn126 (talk) 17:53, 1 January 2025 (UTC)[reply]

I understand that it's very hard for you to leave your bias out of this discussion, but if you look at previously mentioned statement by Polish Language Council (2019) it's clearly stated there that the honorific form pani minister is dominant. The feminine forms like ministra or ministerka are artificial in Polish language and (again the same statement) despite the push in media to use them, there is wide resistance in general society. In daily conversations people would rather say (pani) minister Kotula than ministra Kotula. I wouldn't expect to see a change in that in majority of population for at least 10 to 20 years - but that's just my estimate. In most cases usage of those artificial feminine forms is seen as ideological rather than lexical change. 89.64.9.29 18:26, 1 January 2025 (UTC)[reply]

The Polish language council is often quite out of date with a lot of things. By source, I mean an actual study published in a journal, not what a closed group of academics say. Please keep the assumptions and bias and assumption of bad-faith to a minimum. Vininn126 (talk) 18:28, 1 January 2025 (UTC)[reply]

I'm sorry but who are you to question the authority of Polish Language Council?

I understand that due to power intoxication (which we already discussed elsewhere) you have hard time understanding how a discussion works. So let me explain it to you. I made an argument and provided a valid, recognized, authoritative source to support this argument. If you disagree with that argument it's your job, not mine, to provide sources that would support your statements or prove my argumentation to be wrong. Your personal opinions, assumptions, biases aren't valid arguments. Now unless you actually have any factual arguments I consider discussion with you closed. 89.64.9.29 18:55, 1 January 2025 (UTC)[reply]

Being a descriptive dictionary, and also a good scientist, the average person. You should question those authorities, not agree blindly. Vininn126 (talk) 18:59, 1 January 2025 (UTC)[reply]

Since we're making this discussion off-topic I want to congratulate you on achieving C1 proficiency in Polish. That's definitely quite an achievement for American.

Now during my 40 years of life in Poland I have never met a single person that would say ministra in casual conversation. Everyone I know will say (pani) minister. The only people to use form ministra are some politicians and some media and in past 12 or so months some more ideologically left-wing commenters on social media.

And yes - this will probably change as it seems at least some part of Polish society has a need to use feminine forms. But for now the form minister is dominant both in casual conversations and in media (e.g. 1, 2, 3, 4, 5, 6, 7 and so on). And while some media generally use form ministra we can see that even politicians of ruling coalition use the form pani minister (as seen for example in this article by TVN where author use form ministra but quoted politician use form pani minister). 89.64.9.29 19:39, 1 January 2025 (UTC)[reply]

How is this relevant to the label? Vininn126 (talk) 20:36, 1 January 2025 (UTC)[reply]

It's as much relevant as your continuous pointless comments. 89.64.9.29 20:51, 1 January 2025 (UTC)[reply]

This entire discussion is getting pointless IMO. IP 89, you need to stop with the insults and bad-faith accusations. Snarky statements like "quite an achievement for an American" don't help. The citations prove that this term is used and IMO the label neologism is totally fine. Benwing2 (talk) 21:04, 1 January 2025 (UTC)[reply]

You misunderstood. My congratulations were sincere. It is quite an achievement as Polish is considered one of the hardest languages to learn for English speaking person. 89.64.9.29 21:21, 1 January 2025 (UTC)[reply]

Let's look at what actually has happened:

the IP removed the entire Polish entry at ministra with the summary:

"Ministra" is a genitive or accusative singular masculine form of minister. It doesn't exist as feminine form of the word minister despite the fact that some leftist politicians and media try to enforce it. It's considered an error and as such shouldn't be in dictionary. The correct feminine form is "minister".

Surjection undid this with the summary: "don't remove words just because do not like them, this has quotes to attest that it is used"
Vinnin126 added the {{welcome}} template to the IP's talk page with the note:

I highly recommend you read our Criteria for Inclusion, and that of other major dictionaries, before pushing ideologies and making a fool of yourself.

The IP added the labels |colloquial|corruption to both senses
The IP replied to Vininn126 on the IP's talk page:

I highly recommend that you stop recommending things to others as you have neither knowledge nor authority to do so. I neither care nor ask for your opinion.

Surjection undid the IPs edit to ministra with the summary: "corruption" is not a valid label", then blocked the IP from editing that page for one week with the stated reason :Disruptive edits: "I don't like this word" kind of editing that is not in good faith"
The IP posted on Surjection's talk page:

Stop blocking people just because you don't agree with them. Do you think you're higher authority on Polish language than the Council for the Polish Language, which I put as the source?

Surjection replied:

You're trying to mess with an entry because you don't like the word. Go find something better to do, like contributing new entries"

This was followed by arguing back and forth, after which the IP came here.

In hindsight, Vinnin126's comment about "making a fool of yourself" was not the best way to address someone who obviously takes this very seriously. Surjection let his annoyance at the IP's high-handed rhetoric color his responses. Given the huge amount of sheer garbage he deals with that people are always trying to hide in our entries, I don't feel comfortable passing judgment on him for that.

The IP, on the other hand, has been acting the part of an admin abusing power without the power: indirectly pulling rank at every opportunity, lecturing everyone about every detail of what they've said, and talking like a one week block from a single page accompanied by a few dismissive replies is a Crime Against Humanity. If Surjection was really into abusing his authority, he could have just blocked them sitewide and taken away their talkpage access. No one but other admins would have been aware of it for weeks.

If the IP had just brought their concerns here with a simple statement that they had been blocked and an explanation as to why they think their version should be accepted, they would have had a much better chance of getting what they wanted. As it is, it's very hard to read their comments here without wanting to reject everything they have to say.

I see their recent comments are much less angry and more helpful- I hope we can all take a deep breath, step back, and work this all out. Chuck Entz (talk) 22:50, 1 January 2025 (UTC)[reply]

I think this is a good summary. Thanks Chuck. Vininn126 (talk) 23:02, 1 January 2025 (UTC)[reply]

How about we just add some usage notes explaining that the 'traditional' forms for "female minister" are ministerka and pani minister? If there's some sort of underlying connotation for the use of the term, that should probably be mentioned as well, right?

This convo feels very much similar to the presidenta controversy here in Brazil back when Dilma was president. It's not considered wrong per se afaik, but some groups of people definitely poked fun at it — it's seriously very similar to what someone mentioned above, I thought: 'seen by some as one of her many faux pas(ses?), more often spoken by people leaning left than otherwise'. I added "sometimes proscribed" to the word I mentioned here since indeed some people sometimes proscribe it despite not authorities on the matter or anything. Could that work here? I don't speak Polish, but I'd imagine you'd be able to find a couple blogs/articles of people poking fun at the ministra, yes? MedK1 (talk) 21:03, 1 January 2025 (UTC)[reply]

Another example that might be more relatable to folks here is the singular they. Though it's perfectly acceptable, we still have "occasionally proscribed" s a label since it is indeed a little controversial, even though unlike ministra, it is not a neologism. And indeed, we even mention that part right in the article! It's very informative. MedK1 (talk) 21:09, 1 January 2025 (UTC)[reply]

Some Polish language authorities like professor Bralczyk (Polish wikipedia) suggested it can be seen as derogatory due to the fact that in Latin ministra means servant - as result some people made memes about it. And of course a lot of people, especially more right leaning, are making fun of it.

On the other hand professor Miodek approves all three forms - he says that in his youth feminine forms like ministerka, profesorka etc. were normal and only later masculine forms gained popularity and were considered as more dignifying for women. 89.64.9.29 22:11, 1 January 2025 (UTC)[reply]

Those are their musings. Not actual usage. Bralczyk also told people not to say pies zmarł when those people were emotional and Miodek is a notorious normativist who prescribes many things that are not used. Their opinion isn't always gospel. Vininn126 (talk) 22:39, 1 January 2025 (UTC)[reply]

Ahh Bralczyk just quoted my opinion from 20 years ago ;) and he wasn't wrong too. But people got all emotional about it.

But it has nothing to do with current discussion.

I see you have something against linguistic authorities. First you denied authority of Polish Language Council, now Bralczyk and Miodek. Do you only accept those linguists who think like you?

Anyway I gave two examples that even between linguistic authorities there's no consensus. Bralczyk opts for ministerka, while Miodek says "use whatever form you prefer".

The fact is that there were memes created after Bralczyk's opinion that were making fun of word ministra and that there is a lot of people who make fun of this word - which was the question here.

Generally speaking this word is used by a very small group of people (some politicians, some journalists), while the majority of population uses the form minister. 89.64.9.29 00:30, 2 January 2025 (UTC)[reply]

No, I will often cite certain authorities - I have an issue sticking to prescriptivism for the sake of prescriptivism, as a descriptive dictionary. WSJP is also prescriptivist, but usually much more neutral and also has good sourcing. But my point is, we are here to document how people talk, not how authorities think we should talk. If they line up, great. It's not about whether they agree with me or not, it's about questioning what they are saying and checking how much validity there actually is in these statements, because they, just like other people, make mistakes, and it's our job here on Wikis to cross verify and try to find what is actually accurate. I have even added several notes from the language in the past where they made sense, but it needs to be done from the perspective of a neutral oberserver, rather than telling the reader how they should talk. Vininn126 (talk) 00:35, 2 January 2025 (UTC)[reply]

There's obviously a contention and a debate about this term in Poland, as evidenced by various articles on this topic and the fact that the language authorities had been even asked for their opinion. The Appendix:Glossary#proscribed label description says "Some authorities or commentators recommend against or warn against the listed usage", so it's probably applicable here. The label doesn't seem to imply strict ban or prohibition, merely some commentators warning against the term seems to be enough. The other neologisms don't have this kind of contention around them.

I'm also not comfortable with a dismissive statement that disliking the term is bad. If more than one native speaker happens to dislike the term and doesn't feel like the term sounds natural to them from the linguistic perspective (rather than disliking the thing or phenomenon described by the term), then this probably means something. Even one well educated unbiased native speaker, who is immersed in the language environment, normally has the capacity to make relatively accurate judgements on this matter in many cases. However, having the consensus of more than one native speaker is, of course, much better, as age, profession, and other factors can influence opinions. Having "actual study published in a journal" would be perfect, but this shouldn't be a blocker if such study isn't readily available. How many of the Wiktionary editors are native Polish speakers? Would Wiktionary be interested in attracting more native Polish speakers?

I also happen to dislike aggressive activism. For example, I remember how the git version control system users had been forced to change the primary branch name from "master" to "main" [14], which caused some inconveniences and disrupted workflow, but refusing this wasn't an option because of the slavery and racism allegations. I'm not sure if the activists succeeded in eradicating the term from all other spheres of life, such as the term master's degree, etc. This Polish term introduction also reeks of a similar aggressive activism, peddled by a vocal minority in the name of feminism or whatever. And yes, rejecting this probably isn't an option because of the possible bigotry accusations. I'm a bit old and conservative, I dislike changes for the sake of the changes, which are enforced via blackmail-alike methods. But the youngsters can adapt their speech to the new rules much faster. --Ssvb (talk) 07:07, 2 January 2025 (UTC)[reply]

@Ssvb You need to separate your own personal like or dislike of the term from our duty to report usage, which is the only thing that matters here. Personally, I could not give less of a shit whether a git branch is called "main" or "master", and if it makes some people happier to change it then that's great, because I genuinely don't care either way. It's each person's prerogative to care about what they want, but Wiktionary is not your soapbox to air your personal grievances about these kinds of trivialities. Please let's retain some perspective. I do agree that we should label things appropriately, but the IP user clearly wants us to be normative (explicitly or implicitly), which isn't appropriate. We simply describe usage, and sometimes that entails describing how speakers feel about that usage. Theknightwho (talk) 08:46, 2 January 2025 (UTC)[reply]

@Theknightwho: That's precisely what I do and I'm a neutral party here. I merely point out that the Polish Language Council is a prescriptivist authority. But the active promoters of the new terms effectively act as prescriptivist authorities too. Both of these sides need to be represented in Wiktionary because it's your duty to report usage. And I think that the contention between these authorities should be preferably mentioned too, but maybe I'm asking too much.

Some people claimed that the English term "master" is offensive [15] and this did have a real life impact, such as the git branches renaming. Do you feel like it's your duty to document this somehow in the master article? Leaving your emotions aside, what's your opinion with a Wiktionary editor hat on? --Ssvb (talk) 09:44, 2 January 2025 (UTC)[reply]