Jump to content

Wiktionary:Beer parlour

Add topic
From Wiktionary, the free dictionary
(Redirected from Wiktionary:PUMP)
Latest comment: 2 hours ago by -sche in topic Hebrew transliteration

Wiktionary > Discussion rooms > Beer parlour

Welcome to the Beer Parlour! This is the place where many a historic decision has been made, and where important discussions are being held daily. If you have a question about fundamental aspects of Wiktionary—that is, about policies, proposals and other community-wide features—please place it at the bottom of the list below (click on Start a new discussion), and it will be considered. Please keep in mind the rules of discussion: remain civil, don’t make personal attacks, don’t change other people’s posts, and sign your comments with four tildes (~~~~), which produces your name with timestamp. Also keep in mind the purpose of this page and consider before posting here whether one of our other discussion rooms may be a more appropriate venue for your questions or concerns.

Sometimes discussions started here are moved to other pages for further development. In particular, changes to a major policy or guideline may be discussed on the corresponding talk page and “simple votes” (as opposed to drawn-out discussions) can be conducted on our votes page.

Questions and answers typically remain visible on this page for one to two months, but they can always be found in the appropriate monthly archive (based on the date discussion was initiated). While we make a point to preserve all discussions that were started here, talk that is clearly not appropriate for this page may be deleted. Enjoy the Beer parlour!

Beer parlour archives edit
2024

2023
Earlier years

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002
December


"Note: Some of these forms may be hypothetical. Not every possible mutated form of every word actually occurs."

[edit]

This label appears in the bottom of some Celtic mutation templates, such as {{ga-mut}}, {{cy-mut}} and {{gd-mut-cons}}:

Mutated forms of Beer parlour
radical lenition eclipsis
Beer parlour Bheer parlour mBeer parlour

Note: Certain mutated forms of some words can never occur in standard Modern Irish.
All possible mutated forms are displayed for convenience.

Mutated forms of Beer parlour
radical soft nasal aspirate
Beer parlour Feer parlour Meer parlour unchanged

Note: Certain mutated forms of some words can never occur in standard Welsh.
All possible mutated forms are displayed for convenience.

Mutation of Beer parlour
radical lenition
Beer parlour Bheer parlour

Note: Certain mutated forms of some words can never occur in standard Scottish Gaelic.
All possible mutated forms are displayed for convenience.

But I've always felt the wording is somewhat ambiguous. (Also, the warning makes the template far wider than it needs to be.)

I assume the intended meaning of this disclaimer is that not all of the mutated forms are necessarily attested, even though, for every listed form, it is possible to construct a valid sentence that uses that form. If this is the intended meaning I don't think the warning label is required at all and it should be removed. We regularly include declension and conjugation tables for rare verbs in German, French, Latin etc. where some inflected forms may be unattested, but we still list them in the declension template with no disclaimer.

I also note that the Breton and Cornish templates don't include this warning:

Any opinions on removing this message? (Notifying Mahagaja, Mellohi!, Silmethule): This, that and the other (talk) 02:28, 1 November 2024 (UTC)Reply

Yep, agree with this. The warning always felt odd, because it makes it sound like we're including grammatically-invalid forms. Theknightwho (talk) 02:33, 1 November 2024 (UTC)Reply
On a separate point, could we possibly unify the layout of these? I like the Welsh one, but the others just look awful. Theknightwho (talk) 02:42, 1 November 2024 (UTC)Reply
Support removing the note; it is unhelpful. — Ceso femmuin mbolgaig mbung, mellohi! (投稿) 02:36, 1 November 2024 (UTC)Reply
I agree with TKW, the warning always read to me as suggesting that some of the forms might be wrong: iff it in fact only means they're not attested (but they're perfectly grammatical), then I agree with removing it; as you say, we don't bother with such notes on the declension tables for German adjectives where perhaps the mixed declension neuter genitive singular is not attested, or is only attested twice and not thrice. (If, on the other hand, some forms are actually avoided by speakers, in the same way that we don't list a plural when one simply doesn't occur, then I think we need a way of suppressing the form and/or providing a clearer note, like "for words starting with xyz, speakers avoid using t-prosthesis and instead use [whatever]".) - -sche (discuss) 02:48, 1 November 2024 (UTC)Reply
They're dictated by what comes before, so all mutable forms are possible in theory; arguably, they're sometimes not even phonemic, but that's a whole separate discussion, and doesn't change the fact that they're an established part of the orthography, so therefore deserve entries. Theknightwho (talk) 02:57, 1 November 2024 (UTC)Reply
Support removing the note and also Support using the Welsh layout for all the languages. Would like to hear from Mahagaja, Mellohi! and Silmethule, each of whom have contributed significantly to various Celtic languages. If there is agreement for this, I can make the changes as I've rewritten some of the headword modules in question (esp. the Welsh one). Benwing2 (talk) 05:30, 1 November 2024 (UTC)Reply
I find unifying the layout for the Celtic mutation templates unnecessary. — Ceso femmuin mbolgaig mbung, mellohi! (投稿) 05:44, 1 November 2024 (UTC)Reply
The layout is less of a problem than the colour scheme. The Welsh table matches far more of the recommended accessibility guidelines, and as a result is much easier on the eye. Theknightwho (talk) 05:52, 1 November 2024 (UTC)Reply
@Mellohi! IMO the current styling of the non-Welsh templates looks amateurish, something straight out of early-2000's harcoded HTML tables. Exact unification isn't necessary but I'd like the overall look to be more similar to the Welsh template: get rid of unnecessary cell borders and shadows, etc. Benwing2 (talk) 06:35, 1 November 2024 (UTC)Reply

I included this note because some mutated forms genuinely don't exist. For example, in Irish, adjectives never undergo eclipsis, so a form like gcairdiúil (eclipsis of cairdiúil) can never appear. (The only exceptions are the handful of adjectives that precede their nouns, like príomh-, whose eclipsed form bpríomh- does appear.) Most finite verb forms never take h-prothesis: I can't think of a context in which the form himíonn (h-prothesis of imíonn) would appear. I'm pretty sure only the imperative and the autonomous past indicative are the only verb forms that undergo h-prothesis. In the standard language, only nouns and preposed adjectives like sean undergo the special lenition of s to ts, because it only occurs after the definite article. The {{ga-mut}} template already has |1=msn to restrict t-prothesis to masculine singular nominative nouns (the only context where it occurs), but the {{gd-mut-vowel}} template doesn't, even though t-prothesis in Gaelic is restricted in the same way as it is in Irish. So yes, these templates generate forms that are not grammatically possible, which is why the disclaimer is there. As for its width, I originally put a line break in the text so it would be two lines long and less wide, but someone removed it years ago. {{sga-mutation}} still has the line break. —Mahāgaja · talk 07:28, 1 November 2024 (UTC)Reply

So IMO this sort of disclaimer is kind of a cop-out; instead the templates should be modified to not generate truly impossible forms, and the disclaimer removed. Having this disclaimer there adds no useful information; if a learner of the language doesn't know which forms are impossible and which are possible but rare, they certainly won't learn that (or anything else) from such a disclaimer. But it's important to distinguish between things that are truly impossible and things simply so rare that they are not likely to be found in any corpus. An example is vocatives in Ukrainian, Czech or other Slavic languages that preserve the vocative; it's very rare that someone will use the vocative case when addressing an inanimate object, so for most inanimate objects you won't ever find the vocative in any given corpus, but examples do exist and there's nothing theoretically preventing someone from addressing an inanimate object (esp. in poetry or poetic language). And in general we don't add any disclaimer by Ukrainian or Czech vocatives of inanimate objects stating that they are rare; this is something we assume the reader can figure out. In your example of cairdiúil, is it truly syntactically impossible for it to precede a noun, or merely rare? In the latter case, I would argue we should keep the mutation and remove the disclaimer; in the former case, fix the code to not overgenerate the mutation, and once again remove the disclaimer. Similarly for restricting t-prothesis to the masculine singular nominative forms. Benwing2 (talk) 08:24, 1 November 2024 (UTC)Reply
To the best of my knowledge, it's syntactically impossible (or, in linguistics jargon, ungrammatical) for cairdiúil ever to precede the noun it modifies, but I'm not a native speaker. I also don't think it's possible for it to be substantivized (used like a noun), but again, I'm not a native speaker, and for all I know it's possible in poetry or other exceptional circumstances. Making the templates powerful enough not to generate impossible forms is a great idea in principle, but in practice, going through all existing uses of all templates included in Category:Mutation templates by language and marking them for which mutations are grammatically possible and which are not would be an overwhelming task, and while some of it could possibly be done by bot, I think most of it would have to be done by hand. —Mahāgaja · talk 08:42, 1 November 2024 (UTC)Reply
This should have been done years ago with manual overrides instead of papering over the issue with an unhelpful disclaimer, in my view. Not everything has to be automatable to be implemented. Theknightwho (talk) 13:53, 1 November 2024 (UTC)Reply
Adjectives used to be eclipsed in the genitive plural, e.g. ar bruach innbhir na n-éigne mbán, an example from a text called Aisling na Binne Buirbe from 1679. I don't know when this stopped being the case and whether this usage justifies our templates showing such forms. —Caoimhin ceallach (talk) 19:32, 1 November 2024 (UTC)Reply
That's true; in Old Irish adjectives were also eclipsed after neuter singular nouns. Both types of adjective eclipsis might well be found in place names and possibly fossilized phrases. In my opinion, this is an argument in favor of overgenerating mutated forms. It's probably better to have the template produce forms that are predicted to be nonexistent but might simply be very rare or archaic or nonstandard (since you never know what might be lurking in the darkest corners of a language) than to tailor it to avoid them. —Mahāgaja · talk 20:06, 1 November 2024 (UTC)Reply
Does this also apply to colloquial mutations? I'm particularly thinking of Welsh tsipsjips, but this can theoretically apply to any term starting with tsi, even though they're rarely written that way as it's not part of the literary language. (t)siecjec is another common one in speech (or used to be when people still used cheques, anyway). Theknightwho (talk) 00:36, 2 November 2024 (UTC)Reply
I think a case could be made to include ts → j as a colloquial mutation in the table, especially if it's found in writing, but this thread isn't the place for that discussion. —Mahāgaja · talk 07:42, 2 November 2024 (UTC)Reply
@Mahagaja thanks for the insights. I can see I was wrong! It's clear to me that, if nothing else, the message needs to be reworded. Here's my attempt: "Certain mutated forms of some words can never occur in standard Modern Irish. All possible mutated forms are displayed for the convenience of the reader." (broken over two or three lines as needed). This, that and the other (talk) 10:01, 2 November 2024 (UTC)Reply
Well, it would be more honest to say, "All possible mutated forms are displayed because customizing the template to show only the truly extant mutated forms of every single word is beyond our technical capabilities," but your version is more concise. More concise still: "All possible mutated forms are displayed for convenience", without specifying whose convenience. —Mahāgaja · talk 10:44, 2 November 2024 (UTC)Reply
I'm not sure this kind of disclaimer is necessary at all, really. It's up to the reader to determine whether a form can or can't be used in a given situation. Theknightwho (talk) 14:30, 2 November 2024 (UTC)Reply
By the way, in some Munster dialects is used instead of nach and it causes h-prothesis. So you could very well have himíonn sé go moch? “Doesn't he leave early?” —Caoimhin ceallach (talk) 10:49, 2 November 2024 (UTC)Reply
True; thanks for the reminder! —Mahāgaja · talk 13:49, 2 November 2024 (UTC)Reply
Is there anything to be said for setting up the templates so they only generate a link if there is a pre-existing entry for that form? I don't mean the current situation where a link appears in black text - I mean literally not creating one unless an entry exists.

I don't really see the utility of including either an entry for ddyddiau or a link to it, as it doesn't really mean anything in itself. It's a different matter for terms that do often exist in mutated form without a "trigger" (like bob as a form of pob) or are homophonous with another term (like bâr being both its own lemma and the soft mutation of pâr, or foch being the soft mutation of both boch and moch).

The status quo also seems to prompt some users to mass-generate mutated forms of a word, but not quite all of them. Which leads to mutated forms, imo needlessly, filling up Jberkel's "wanted terms" lists. Generally I don't like to use editor convenience as a rationale, but in this case I don't see how having lots of mutated forms helps anyone outside the situations I mentioned. Arafsymudwr (talk) 20:44, 2 November 2024 (UTC)Reply
Having entries for mutated forms is very helpful to learners, especially in a language like Welsh where the radical form is often not easy to recover from the mutated form. Someone just learning Welsh may encounter a word beginning with f and not know if the radical starts with b or m, or a word beginning with l or r and not know if the radical starts with ll/rh or gl/gr, or a word beginning with a vowel and not know if the radical starts with that vowel or with g. In Irish and Scottish Gaelic it's a little easier, since the spelling of the mutated form almost always gives a clue to the spelling of the radical. I would not be happy with a template that doesn't show mutated forms unless a Wiktionary entry exists, since most valid mutated forms do not currently have entries. I don't object to removing the disclaimer, though, if most people feel it does more harm than good. —Mahāgaja · talk 13:22, 3 November 2024 (UTC)Reply
I can see it for words beginning with f-, -l, -r or a vowel. Less so for words beginning with dd-, nh- and so on where I find it hard to believe anyone interested in Welsh would not recognise these as mutations with an obvious radical form. But I'm getting the feeling I might be alone in thinking this. Arafsymudwr (talk) 16:30, 3 November 2024 (UTC)Reply
I'd be strongly against only showing the mutated forms that we have entries for. That would just lead to a entries having a hodge-podge of mutations of no use to anyone, because you wouldn't be able to trust if the table was complete or not. The fact that mutations aren't always regular means that there is value in having these, just as there's value in having all the -s plurals in English. Theknightwho (talk) 16:57, 3 November 2024 (UTC)Reply
@Mahagaja: While I agree in general that the templates do generate some forms impossible in the language, a note regarding adjectives: that’s not true. Even if not according to the caighdeán rules, adjectives definitely get eclipsed in more traditional texts (19th century, early 20th century, and even in modern books when more archaizing style is employed, I know people writing like that sometimes) after genitive plurals (things like na bhfocal ndeacair, na mban bhfionn, etc.), you also get old accusatives like leis an bhFear nDubh in some Peadar Ua Laoghaire’s books (20th century!) – and we do consider those to be very much Modern Irish. But it’s true that some finite verbal forms will never get h-prefix (like regular non-autonomous past verbs), though note the mentioned above Munster that does prefix h- to other forms in Munster. // Silmeth @talk 14:25, 4 November 2024 (UTC)Reply
All the more reason to allow the template to continue to generate all mutated forms, including unexpected/rare/nonstandard ones. But the question at hand is, do we (1) eliminate the disclaimer, (2) rephrase the disclaimer, or (3) keep the disclaimer as is? —Mahāgaja · talk 14:49, 4 November 2024 (UTC)Reply
@This, that and the other These recent changes are a downgrade for the Welsh mutations. Could you please explain why you did this? Theknightwho (talk) 16:21, 11 November 2024 (UTC)Reply
@Theknightwho Here are my motivations and explanations:
  • The previous iteration of {{cy-mut}} occupied a fixed percentage of the page's width, which doesn't make sense. People view Wiktionary pages at many different widths, so a lot of users saw vast amounts of blank space around the mutations, while for others it was cluttered. The table now adapts to the width of its content, no smaller or larger. Of course this could have been fixed by simply changing the existing inline CSS of the previous table design, but see the next point.
  • There was a desire expressed by a few users to create a standard look for inflection tables. In that discussion, there was general agreement that borders should be used to delineate entries in tables like this. Rather than continuing to maintain various pieces of custom CSS in different locations around the wiki, I felt it would make more sense to work from a single basic template, of which I shared a prototype in the discussion above, WT:Beer parlour/2024/October#Towards a Standardization of Inflection Tables.
  • The Celtic mutation templates had wildly different looks despite conveying the same information. Benwing above pointed out that it would make sense to unify the visual appearance - although he did express a preference for the previous design of the Welsh template. But it's impossible to please everyone!
Could you expand on what you mean by "downgrade"? This, that and the other (talk) 22:43, 11 November 2024 (UTC)Reply
What was your reasoning for the choice of appearance for the unified template? IMO it looks worse than before, at least for Welsh, and clashes with the general tendency that tables have been moving towards. Benwing2 (talk) 23:55, 11 November 2024 (UTC)Reply
@This, that and the other Getting rid of fixed width and unifying the appearance are both fine, but at the risk of this being spread across two different threads, my big issue is what @Mellohi! pointed out in Wiktionary:Beer parlour/2024/October#Towards a Standardization of Inflection Tables, which is the intrusive and unnecessary border, as well as what @Benwing2 points out about this clashing with the general appearance templates have been moving towards. I'm also not keen on your approach of using top and bottom templates, when declension templates by their very nature are individual templates that can be wholly encased within another template call, which would give us a lot more control, so I don't really understand why you've taken that approach either. For instance, you've added a provisional |tall=yes parameter, but that's something we should be able to determine automatically. Theknightwho (talk) 00:19, 12 November 2024 (UTC)Reply
@Benwing2 I'm curious about "the general tendency that tables have been moving towards". Could you share some examples? I certainly haven't observed any movement in one direction or another. I do see some pushback regarding the double border, which I can certainly look at suppressing.
@Theknightwho my number one aim is to make this template easy to use, so that those working in minor languages who need to create simple inflection tables can do so without needing to use raw HTML elements like <div> or direct CSS syntax in their wikitext. This is why I made it so that users can use standard wikitable syntax. Of course it does create limitations - the template not knowing how many rows it contains is chief among these. However, if you feed the entire table contents into Lua, you lose the ability to use the standard table syntax without at least some modification (e.g. replacing = with {{=}}). Plus, I don't have the hours to devote to writing such a complex module! This, that and the other (talk) 01:37, 12 November 2024 (UTC)Reply
@This, that and the other If you look at the general tendencies in UI design over the years, there's been a clear trend towards more and more minimalistic design. If you remember or have seen pictures of old Mac OS and Windows UI design, it was filled with borders, shadows etc. to give a more 3-d look. Now everything is flat 2-d and even borders have tended to go away in favor of simple rectangles of different color. (FWIW this mirrors a trend in art from c. 1800-1950, going from the lush Neoclassicism of artists like Ingres to the utter minimalism of Donald Judd. Since then, artistic styles have fragmented, with no single dominant style at least in painting; I wonder if this will eventually happen to UI design as well.) Benwing2 (talk) 04:57, 12 November 2024 (UTC)Reply
@Benwing2 since I started converting the Irish declension templates to the new system just now, I figured I should respond to your points. I thought you were making a specific point about Wiktionary, but I can see your observation was more general - and I definitely see where you're coming from.
I'm not at all wedded to the use of borders in the inflection tables - in fact, one of the great things about standardising inflection tables is that the overall look can be changed in one place. But there are a couple of reasons why borders are in use as of this moment:
  1. In the October BP discussion I presented two options: Style A without borders and Style B with borders (User:This, that and the other/inflection table standardisation). No-one said they liked Style A; those who expressed a view all preferred Style B.
  2. if you don't have borders, you need lots of negative space between the rows and columns to guide the eye. This means your inflection tables start to get bulky very quickly. The Romance verb conjugation templates like {{la-conj}} try to do this, but to be honest, they don't pull it off very well. I find it quite difficult to follow the rows along, without any borders to guide my eye. Sometimes I find myself counting off the rows to work out which one I'm looking at! (I suppose another alternative solution would be row banding or striping - namely, a slightly darker background colour on every second row - but there doesn't seem to be a lot of precedent for using this technique on Wiktionary.)
And even if borders continue to be used, the borders themselves can be tweaked, for example, made paler, if that would be helpful. This, that and the other (talk) 13:04, 21 November 2024 (UTC)Reply

Proposal: Adopting Inflectional Tables Based on Modern Morphological Views for Japanese

[edit]

Hello, I would like to inquire whether it would be appropriate for Wiktionary to consider adopting the inflectional tables based on morphological views proposed by Russell, Vovin, and others, particularly with regard to both Middle and Old Japanese.

In Russell's work, "A Reconstruction and Morphophonemic Analysis of Proto-Japonic Verbal Morphology," it is stated:

As for morphophonemic analyses, the traditional (kokugogaku) style of analysis tends to be hindered by Japanese orthography, and is not helpful to the present study. The problem that Japanese orthography introduces is that since one kana equals one syllable, and since morpheme boundaries often occur mid-syllable, it is not possible to indicate where morpheme boundaries are.

As far as I know foreigners learning Japanese generally do not use kokugogaku grammar (while native Japanese people have been being taught). Additionally, there exist works aimed at linguists that introduce the grammar of Japanese (whether modern, middle, or old). However, these works tend to focus primarily on modern linguistic analysis, often addressing kokugogaku analysis only in a supplementary manner.

It is important to clarify that this proposal does not advocate for the complete replacement of existing traditional tables. I believe that the optimal scenario is one of coexistence, where each approach serves its distinct purpose.

Therefore, would you be open to the possibility of incorporating additional table templates?

Thank you for your consideration. Σ>―(〃°ω°〃)♡→L.C.D.-{に〇〇する}-14:48, 2 November 2024 (UTC)Reply

(Notifying Eirikr, TAKASUGI Shinji, Atitarev, Fish bowl, Poketalker, Cnilep, Marlin Setia1, 荒巻モロゾフ, Shen233, Cpt.Guapo, Sartma, Lugria, LittleWhole, Chuterix, Mcph2, Theknightwho): lattermint (talk) 14:59, 2 November 2024 (UTC)Reply
Support; see also: Wiktionary talk:About Japanese#Conjugation table, Wiktionary talk:About Japanese/Conjugation? —Fish bowl (talk) 22:33, 2 November 2024 (UTC)Reply

Rhymes in Welsh

[edit]

As with most languages, we are able to add rhyme information to the pronunciation in Welsh.

Currently the policy is to follow Northern Welsh rhymes, as Northern Welsh makes more distinctions - with the exception of following Southern Welsh in contrasting /s/ ≠ /z/ and /ŋ/ ≠ /ŋɡ/.

I would like to propose changing this policy so we also follow Southern Welsh wrt vowel length, as in this respect too, Southern Welsh makes more distinctions than Northern.

E.g. there is no 100% reliable way of knowing if a stressed vowel is long before /l/ and /n/ in Southern Welsh (classic examples are celyn /ˈkeːlɪn/ and calon /ˈkalɔn/).

Pinging @Llusiduonbach and @Linguoboy for their thoughts. Arafsymudwr (talk) 20:22, 2 November 2024 (UTC)Reply

I support the suggestion. When I started the Welsh rhymes pages, I ignored the vowel length distinctions of Southern Welsh because I don't have much info beyond the spelling to go on, but if people who are more familiar with Southern Welsh pronunciation want to introduce the distinction, go for it! —Mahāgaja · talk 13:38, 3 November 2024 (UTC)Reply
Thanks. To be honest the functional load of vowel length is very low even in Southern Welsh, so not much is likely to change anyway! How can I set up a vote on this? Arafsymudwr (talk) 16:25, 3 November 2024 (UTC)Reply
@Arafsymudwr IMO you don't need a vote for this. You just need to get consensus among the Welsh-language editors. Benwing2 (talk) 07:35, 4 November 2024 (UTC)Reply
OK, having got support for this, it's quite a big task to do without a bot, and I don't know how to use bots.
For example, basically all the rhymes followed by a voiced consonant in Category:Welsh_rhymes/a- (other than /m/, and sometimes /l, n, r/) or a fricative (other than /ɬ, s/) would need to be shifted to rhymes in Category:Welsh_rhymes/aː-. Any exceptions (tens of them at most, and mostly very transparent loanwords) could be moved case-by-case back to the short vowel category.
Rinse and repeat for other vowels. Arafsymudwr (talk) 21:31, 7 November 2024 (UTC)Reply
@Arafsymudwr Maybe AWB or JWB could help you automate this. Benwing2 (talk) 23:33, 9 November 2024 (UTC)Reply
Seems like a reasonable suggestion to me, but I don't know Southern Welsh length distinctions well enough to contribute. Linguoboy (talk) 16:11, 4 November 2024 (UTC)Reply
Support 0DF (talk) 23:33, 12 November 2024 (UTC)Reply

Presentation of Middle Chinese and Old Chinese readings

[edit]

At present, the way we present Middle Chinese and Old Chinese transliterations is kind of weird. See, e.g. the etymology at Zen. The "MC"/"OC" being in italics and part of the brackets is confusing - it looks like MC is part of the pronunciation. Do we really need the MC in there at all (can't we get away with just having the "derived from Middle Chinese" label before the character itself?), and if we do can we at least edit Module:ltc-pron and Module:och-pron to make the label clearer?

So instead of:

(MC dzyen)

maybe something like

(MC: dzyen)

I don't know anything about Chinese, so if this is standard formatting for Old Chinese/Middle Chinese transliterations, just ignore this. Smurrayinchester (talk) 17:21, 6 November 2024 (UTC)Reply

Funnily enough I had been thinking about this recently too. The convention in the literature appears to be to write "MC" in roman (not italic) but not to use a colon, like so:
禪#禪 (MC dzyen)
The Chinese workgroup is very large but I will ping it anyway to get further insights: (Notifying Atitarev, Benwing2, Fish bowl, Frigoris, Justinrleung, kc_kennylau, Mar vin kaiser, Michael Ly, ND381, RcAlex36, The dog2, Theknightwho, Tooironic, Wpi, 沈澄心, 恨国党非蠢即坏, LittleWhole): This, that and the other (talk) 08:49, 10 November 2024 (UTC)Reply
Agreed that the current format is a bit confusing. I think there's a tooltip for "MC" currently, but that's not very easily accessed. We should format it with "MC/OC" not italicized, and the link definitely would help. — justin(r)leung (t...) | c=› } 06:18, 11 November 2024 (UTC)Reply

Adding (many) new colors to the palette

[edit]

In Wiktionary:Beer parlour/2024/October#Towards a Standardization of Inflection Tables, one of the concerns I raised (and some other editors agreed with) is that MediaWiki:Gadget-Palette.css is currently too small to be practically used to support e.g. inflection templates.

I've come up with a possible set of colors to add. This is not a small addition; there would be a total of 160 new colors, which can be grouped into 16 base colors and 10 contrast levels for each color. The most recent version of this can be seen in User:Surjection/swatch2.

These new colors are designed with contrast restrictions in mind. All colors with numbers 0 through 4 meet WCAG contrast requirements at an AAA grade, with a contrast ratio of at least 7.5:1 against the default text color in both light and dark modes. All colors with numbers 0 through 6 meet them at an AA grade, with a contrast ratio of at least 4.5:1.

cc @Ioaxxere as the creator and main maintainer of the palette. — SURJECTION / T / C / L / 10:45, 7 November 2024 (UTC)Reply

Thanks for your efforts. Feedback: if I view that page from a mobile device (in dark mode) all of the text is legible; however, when I view that page in either light mode or dark mode (Vector 2010 or Vector 2022) on a computer, the most extreme 1-2 cells are illegible. In the first table (black text on a coloured background), level '9' is illegibly dark (in some colours, such as blue, level '8' is also very hard to read); similarly, in the second table, (coloured text on a white background) level '0' is impossible to read and level '1' is difficult. In the third table (white text on a coloured background), level '9' is hard to read, and in the last table (coloured text on a black background), level '0' is impossible to read (even if I turn my screen's brightness way up), and level '1' is also impossible to read unless I turn my screen's brightness way up. (Level '2' coloured text on either a white or dark background is not very easy to read, either, although I can make it out.) - -sche (discuss) 22:46, 7 November 2024 (UTC)Reply
This is a known issue. Anything past -6 is never really meant to be used as a background color, and anything below maybe -4 or so as a text color (with default text and background colors, respectively, anyway). They are there mostly for completeness' sake. — SURJECTION / T / C / L / 13:09, 8 November 2024 (UTC)Reply
Support Thanks for this. I was thinking of doing something similar, but wasn't sure about the best way to execute it, so I'm glad you did it instead. I do have similar concerns as -sche above. I understand only numbers 0-6 are meant to be contrast compliant, but then I do wonder what the use-case would be for the darker colors? I also wonder what this addition would mean for the colors already present in the palette. Would these colors be added alongside the current ones or in place of them?
Stujul (talk) 10:08, 8 November 2024 (UTC)Reply
The darker colors could be used with inverted text colors, for decorative elements that have no text at all or as border colors. — SURJECTION / T / C / L / 13:08, 8 November 2024 (UTC)Reply
User:Surjection/swatch3 has lighter colors (on light mode, darker on night mode), since it tries to adhere to the stricter APCA contrast. I chose 75 for -4, as it is the minimum requirement for body text according to APCA-RC Bronze Simple Mode. -0 to -2 has 90 which is 'preferred'. — SURJECTION / T / C / L / 12:58, 10 November 2024 (UTC)Reply
Done Done See Wiktionary:Palette/numbered. — SURJECTION / T / C / L / 15:10, 10 November 2024 (UTC)Reply
Very much support, thanks for helping expand the palette. Ioaxxere (talk) 03:25, 12 November 2024 (UTC)Reply

Dative reflexive verbs

[edit]

(Notifying Matthias Buchmeier, -sche, Jberkel, Mahagaja, Fay Freak, Fytcha, Helrasincke): Despite this being a feature (whose examples are few in number but greater in frequency) of (as far as my knowledge goes) German and Romanian, our system has never accomodated these verbs with a category and a label. I think it uncontroversial to create Category:Dative reflexive verbs by language as a subcategory of the one for reflexive verbs (unless a language-specific approach will be preferred). Would a ‘dative reflexive’ label be equally as uncontroversial? ―⁠K(ə)tom (talk) 11:57, 9 November 2024 (UTC)Reply

This is not specific to these two languages. French has it too, as well as Polish, and many other European languages as well I'd wager. It depends on whether the base verb is direct transitive or prepositionally transitive (compare se suivre < suivre quelqu’un vs. se succéder < succéder à quelqu’un). PUC12:11, 9 November 2024 (UTC)Reply
Of course it’s not exclusive. I must say, however, that I fail to see how the French example relates to the phenomenon I have in mind. Just to clarify: to take German vorstellen as an example, regular (accusative) reflexive use would literally be ‘to present oneself = to introduce oneself’, whereas the dative reflexive would be ‘to present to oneself = to imagine’. ―⁠K(ə)tom (talk) 12:55, 9 November 2024 (UTC)Reply
The label category would be useful. The concern of PUC partially applies, as I look at examples of alleged reflexive verbs in dative. sich etwas vorstellen, sich etwas überlegen can hardly be used with another person than the subject as the patient, but sich einen runterholen obviously can and is already labelled as dative reflexive however: Qehath authored it thus comprehensively in 2009 already. The most of the other examples for “reflexive verbs in dative“ in the linked lists and others are presented for didactic rather than lexicographic conclusion, to teach collocations, pragmatics, and fluency, I warn, not being fain to discern merits of separate sense lines in them, for the case that anyone is to step up to gather them, which until this point has been fulfilled accurately in individual cases by virtue of the intuitions of our excellent editors. Fay Freak (talk) 16:43, 9 November 2024 (UTC)Reply
@Ktom: This change would make sense to me.
@Benwing2: Is this something that would be made obsolete by the object template you have in the pipeline? — Fytcha T | L | C 21:00, 20 November 2024 (UTC)Reply
@Fytcha The object template has actually been deployed and is in pretty heavy use, see {{+obj}}. I think they are somewhat orthogonal; you could potentially use {{+obj}} to flag something as having a dative reflexive but it doesn't (currently) categorize. I'd be in favor of adding a category like LANG dative reflexive verbs but we'd have to think of the best way to get such verbs properly categorized; probably the best way is through a label. Many Slavic languages already have labels like reflexive-si for dative reflexive verbs (see e.g. Module:labels/data/lang/cs), which can easily be modified to categorize appropriately. Note also that {{de-verb}} and {{de-conj}} has special support for notating accusative and dative reflexive verbs; you can see an example of the former in the documentation for {{de-verb}} by searching for "accpron" (the example is of sich auf seinen Lorbeeren ausruhen). Benwing2 (talk) 23:07, 20 November 2024 (UTC)Reply

srn-IPA

[edit]

@Kaartje, Rakso43243, Appolodorus1 Looking for consensus before this gets deployed. Template is at {{Template:User:Saph668/srn-IPA}} right now. -saph668 (usertalkcontribs) 18:30, 9 November 2024 (UTC)Reply

Looks very impressive!
  • To generate IPA for pre-1986 ("Dutch") spellings seems a bit redundant to me as we ideally don't have those spellings as lemma forms.
  • My knowledge of IPA is rudimentary but the way people pronounce dy/dj also often sounds like ɟ to me.
  • How would one generate the geminated consonant in mama, wowoyo etc?--
  • Would this be added automatically everywhere with a bot? Because there is a question what to do with elisions in phrases and some univerbations like for instance sanede and no kosi kaiman mama fosi yu abra liba --Appolodorus1 (talk) 00:30, 10 November 2024 (UTC)Reply
  1. We already have some existing Dutch entries (at CAT:Sranan Tongo superseded forms) and I'd like to have compatibility for those.
  2. I can add that - it's a little hard to find information online about phonetics outside of WT:ASRN.
  3. Those are inputted as {{srn-IPA|m'ma}}, {{srn-IPA|w'woyo}} which output these:
    • IPA(key): /ˈmːa/, [ˈmːa̠], [ˈmːɑ̟] (though, I just noticed there's an issue adding the stress in the first one there, I'll have to fix that)
    • IPA(key): /ˈwːojo/, [ˈwːʊ̞jʊ̞], [ˈwːɔ̝jɔ̝]
  4. I'm not sure how possible deploying automatically is; from the looks of it there are only 64 total multiword terms and univerbations so someone could just go over all of those manually.
-saph668 (usertalkcontribs) 11:59, 10 November 2024 (UTC)Reply
From this PetScan there are 1220 pages excluding multiword terms + univerbations, then filtering that down to exclude ambiguous syllable boundaries there are 1187 pages which it can be added to automatically (i.e. about 92% of all our entries). -saph668 (usertalkcontribs) 13:34, 10 November 2024 (UTC)Reply
Sounds great!
Re: 2 @Lambiam I see that you transcribed dy as dʑ and dʲ in dyugudyugu, what do you think?
Here is an example of how R. Dobru pronounced anbegi https://youtu.be/7h6FMvuK2a0?feature=shared&t=23
@Lingo Bingo Dingo any thoughts? Appolodorus1 (talk) 14:19, 10 November 2024 (UTC)Reply
Even though we list a pronunciation /ɟo.ɡo/ for dyompo, I don’t think I’ve ever heard anything like that, at least not with a [ɟ] like that in the pronunciation of Turkish gece. (One problem with the IPA symbols as used to represent phonemes in different languages is that the actual phones and their relations are not absolute in some phonetic space but language-dependent.)
I often hear a palatalization or lenition of /k/ and /ɡ/ before /e/ and /i/. For /k/ this can range (next to remaining /k/) from /c/ to /t͡ʃ/. For /ɡ/ we may get to hear [ɟ] to, rarely, [j]. As far as I can tell these are always merely allophones; the degree of lenition may depend on the informality of the register. I’m not at all an expert, though, neither of IPA nor of Sranantongo, and my exposure to spoken Sranan has been limited both in the amount of material and in the range of speakers.  --Lambiam 17:54, 10 November 2024 (UTC)Reply
If there aren't any objections I'll go ahead and start deploying it from the PetScan when I wake up tomorrow. -saph668 (usertalkcontribs) 09:59, 17 November 2024 (UTC)Reply
@Saph668 Thank you for your work on this. I have a few things you might want to tweak:
- The final -i is often elided, but not with all words. So it would be nice to be able to toggle this as (i). For instance in ala piri tifi a no lafu, a more realistic notation would be /ˈala ˈpiɾ(i) ˈtif(i) ˈa ˈno ˈlafu/, /ˈala ˈpiɾ‿ˈtif‿a ˈno ˈlafu/
- Right now there doesn't seem to be a consistent display of the d͡ʒ/ɟ and t͡ʃ/c pairs. For instance, batyaw is definitely also pronounced as /baˈcau̯/
- I don't think ei/ey is ever realised as ɪ̞i̯
- Likewise, e is never realised as ɪ̞
- I think it's safe to say that in compounds, the stress always lands on the last compound word, so for instance /sɾefidensiˈdei̯/ in stead of /sɾefidenˈsidei̯/ in the case of Srefidensidei
- In addition to as /n/, a final -n is always also realised as /ŋ/. I think it's safe to say that this second option is retained in compounds. So Sranan = /sɾaˈnan/, /sɾaˈnaŋ/. alenten = /alenˈten/, /aleŋˈteŋ/
- Republiek Sranan: you'd have to pay attention to the unadapted loanwords from Dutch such as republiek = /ɾeː(i̯)pyˈblik/ Appolodorus1 (talk) 20:25, 5 December 2024 (UTC)Reply
2. Added.
IPA(key): /ˈbat͡ʃau̯/, /ˈbacau̯/, [ˈba̠t͡ʃa̠u̯], [ˈbɑ̟t͡ʃɑ̟u̯]
3. Was this in addition to below or is it separately true?
4. I got it from WT:ASRN, which does have a source for it; the two should probably be consistent.
5. This should be specified like srefidensi_dei.
6. For some reason this stopped working; I had it in there originally. Fixed.
IPA(key): /sɾaˈnaŋ/, /sɾaˈnan/, [sɾa̠ˈna̠ŋ], [sɾɑ̟ˈnɑ̟ŋ]
7. I added an option, ü, for /y/, to be specified in param 1. So republiek would be respelled as rêpü_blik (and then a separate template with rêipü_blik).
IPA(key): /ɾeːpyˈblik/, [ɾɪ̞ːpyˈblik], [ɾe̝ːpyˈblik]
I'll get around to i-elision later. Lua is being a pain and I don't feel like dealing with it right now. Thanks for the feedback. -saph668 (usertalkcontribs) 16:20, 7 December 2024 (UTC)Reply
Again, it's amazing what you do.
3. & 4. I'll retract my comments, I just don't know enough about IPA to say anything with authority on this matter :)
7. Great. On the top of my head, I guess we'd need an option for the /ə/ and the /ɣ/,/x/, for instance in gebore. I'm not certain how the Dutch /ɣ/ is realised in Surinamese Dutch/Sranan Tongo exactly, it might be as /x/ always.
Some more things:
8. -ow/-aw followed by a vowel used to work well, but now it automatically becomes a diphtong, which is not correct. Like in powisi
9. oi/-oy, ai/ay and ei/ey not followed by a vowel should be a diphtong, like in Sneisi
10. The "-" gets retained in the IPA now which is unnecessary. Appolodorus1 (talk) 12:57, 24 December 2024 (UTC)Reply

Animacy of Slovak nouns

[edit]

(@Benwing2 @Atitarev @Chihunglu83) I noticed that Slovak masculine nouns are split into personal, animal and inanimate nouns (with an error appearing if one tries to set the animacy to simply animate)... I would like to discuss this, as there are groups of nouns which don't fit into this system. Animal nouns are just one group of nouns that (not even always) have mixed animacy and there are fully animate nouns that are not personal nouns:

  1. pieces of art and scientific works are fully animate (Havran),
  2. names of ships, trains, etc. are animate (Lietajúci Škót),
  3. names of magazines and newspapers are often partially animate (Korzár),
  4. names of hotels, restaurants, etc. named after persons or animals have both animate and inanimate (or mixed) declensions (Jánošík, Jeleň),
  5. names of competitions, prices, etc. are partially animate and inanimate (Zlatý Slávik),
  6. names of mountains are often animate (Tulák),
  7. names of feasts or seasons named after people are partially animate and inanimate (Ján),
  8. toys are often fully animate (šarkan),
  9. card games, cards, confectionery, plants, etc. are often animate in singular and inanimate in plural (žolík, starček),
  10. chessmen are fully animate, etc (pešiak).

These are often named after people or animals, but still, I can't imagine categorising a mountain, a toy or a card game as a personal or animal noun. I would propose returning to the animate/inanimate system, while adding a new mixed animacy category (if there is only one set of forms from which some are animate and some inanimate). We could possibly keep animal nouns as a separate subtype of the mixed category given their frequency. I would like to know your opinion on this. Should anyone like more details or other examples, I can do that. TomášPolonec (talk) 22:06, 10 November 2024 (UTC)Reply

@TomášPolonec Hi Tomáš. Can you clarify what you mean by "fully animate" and "mixed animate"? Is your objection mostly to the terms "animal" and "personal" or to the categories themselves? These names are meant to be paradigmatic in the same way that "masculine", "feminine" and "neuter" are paradigmatic and do not necessarily refer to actual males, females and objects (and for that matter, "animate" and "inanimate" themselves are paradigmatic, and inanimate objects often have animate declension; i.e. this issue would not go away if we had only a two-way animate/inanimate distinction). I would rather not use a separate set of animacy names for Slovak than for all other Slavic languages. Keep in mind that in Czech, which has only a two-way animate/inanimate distinction, a lot of inanimate objects have animate declension (e.g. mushrooms, chess pieces), sometimes only optionally (e.g. certain types of sausages, etc.). Polish also has similar mismatches where inanimate objects have "personal" or "animal" animacy (cf. @Vininn126). Benwing2 (talk) 00:25, 11 November 2024 (UTC)Reply
@TomášPolonec: I second @Benwing2's question. Anatoli T. (обсудить/вклад) 05:18, 11 November 2024 (UTC)Reply
To answer your question, when I say (fully) animate/inanimate or mixed, what I mean is that these words have forms that are either exclusively taken from the animate declension patterns (i.e. chlap, hrdina), exclusively taken from the inanimate patterns (i.e. dub, stroj) or mixed, meaning that some cases or numbers use forms from one group, others from the other. Usually it shows in these cases: dative/locative singular (anim. -ovi vs. inan. -u/-e/-i), accusative singular (anim. -a vs. inan. -0), nominative plural (anim. -i/-(ov)ia vs. inan. -y/-e) and accusative plural (anim. -ov vs. inan. -y/-e). I think this is the only thing that matters - if all the forms are animate, the word itself is animate, just as you said, we are not talking about whether the object is "animate" itself. There are cases where all the forms are inanimate, but the accusative singular form is animate, or the singular is animate, but the plural is inanimate (but the word doesn't describe an animal).
Even if we say that these terms are paradigmatic, starček (a plant) is not an animal noun and Havran (a poem) is not a personal noun. So I guess, my objection is towards the terminology used. "Personal" and "animal" are unnecessarily specific and not used in this way in the Slovak linguistics (these categories exist, of course, but not as a part of a three-fold personal/animal/inanimate system, the two-fold system is used always). I don't see how using the two-fold system would create a discrepancy between the Slavic languages, I looked through some entries and Czech, Russian and Slovene also have an animate category. So my objection stands and I don't see why we couldn't use the (for Slovak) usual system. TomášPolonec (talk) 06:08, 11 November 2024 (UTC)Reply
You seem to be confounding two issues: (1) the terminology, (2) the linguistic facts. Fundamentally, from what I can tell, Slovak is not like Czech but is like Ukrainian and Polish in having a three-way animacy system. Some nouns (paradigmatically but not exclusively nouns referring to inanimate objects) have acc = nom in both singular and plural in nouns and corresponding adjectives, while other nouns (paradigmatically but not exclusively nouns referring to people) have acc = gen in both singular and plural in nouns and corresponding adjectives, while a third class (paradigmatically but not exclusively nouns referring to animals) has acc = gen in the singular but acc = nom in the plural in nouns and corresponding adjectives. We cannot use a two-way animacy system to describe this. My comment about creating a discrepancy is about using terms like "mixed animate" and "fully animate" in place of "animal" and "personal". Slovak grammars from what I can tell have a strange way of handling this that involves distinguishing between "animacy in the singular" and "animacy in the plural" but fundamentally from what I can tell, the actual situation is not so different from other Slavic languages with a three-way animacy system. If we were to follow the Slovak grammar system we'd have to have two distinct animacy categories, one for the singular and one for the plural, and if you collapse this down to a single animacy category, you simply cannot properly express the facts related to animal nouns and other nouns that inflect and agree in the same fashion.
As for terminology, I'm not sure why you're prepared to accept the usage of "animate" to refer to inanimate objects but simultaneously object to "animal" and "personal" to refer to inanimate objects. We could rename them "animate-paradigm", "animal-paradigm" and "personal-paradigm", which would emphasize that these are merely paradigmatic, but IMO that wouldn't accomplish anything except to make the terminology more verbose. Benwing2 (talk) 06:40, 11 November 2024 (UTC)Reply
Keep in mind also that Wiktionary tries to adopt a cross-linguistic attitude where possible and emphasize the similarity across languages. Sometimes this involves deviating from native grammar traditions if the native grammar tradition does things in an idiosyncratic way that would obscure the similarities with related languages. Benwing2 (talk) 06:44, 11 November 2024 (UTC)Reply
I see where you are coming from, but it's still not that simple. Also, I think you misunderstood my proposal: I don't want to use terms like mixed/fully animate, that's was just me describing nouns with fully or partially animate paradigmata. I propose using simply animate/inanimate with animal nouns as a separate subcategory of "mixed" animacy as well.
Other cases of mixed animacy are exactly why the three-way system does not solve anything, as there are e.g. nouns that use inanimate forms everywhere except for accusative singular (e.g. categories 3, 4, 5 and 7 from my first post). In Slovak, specific endings are strongly tied to the concept of animacy, but this connection is stronger with the ending -ovi (dative/locative singular), which is used only with animate nouns with very few exceptions, than with the (typically animate) accusative ending -a, which is used with inanimate nouns more often, but still rarely. This is what you cannot express with the three-way system, it's simply about the (in)animate nature of the paradigmatic endings (from what I understand, at least Polish uses -owi with both animate and inanimate nouns).
The plural endings -i/-ovia and -ov for nominative/accusative plural are also connected with the idea of animacy, which is why animal nouns take these forms when they are used to describe people and nowadays the animal paradigms are generally shifting towards animate (some animal nouns have exclusively animate forms in modern usage with inanimate plural forms becoming almost inacceptable). So the "animal" animacy is slowly disappearing anyway.
As for terminology, I feel like the terms "animate"/"inanimate" are mostly tied to linguistics, whereas "personal" and "animal" are commonly used outside of linguistics as well, which creates more specific connotations. I don't know much about Ukrainian or Polish animacy system, but if "personal"/"animal" works for these languages, I can't object to the usage there. What I do know is that it doesn't feel right for Slovak to me. My main arguments are in the paragraphs above. TomášPolonec (talk) 08:06, 11 November 2024 (UTC)Reply
In Polish, masculine animal nouns are nouns of mixed animacy, showing more animate declension (and agreement, which is the more important element in gender) in the singular and inanimate in the plural. Person/animal are also widely used terms in linguistics for this specific concept. Vininn126 (talk) 08:23, 11 November 2024 (UTC)Reply
@TomášPolonec Again I think you are mixing up two different concepts, which in this case are agreement and declension. Let's take the situation with gender e.g. in Latin. Gender (masculine, feminine, neuter) reflects the agreement pattern with adjectives. Some declension patterns are strongly correlated with gender (e.g. most first-declension nouns in -a are feminine and most second-declension nouns in -us are masculine), but there are exceptions in both directions: e.g. agricola (farmer) is a first-declension masculine, which is shown by the agreement (bonus agricola NOT #bona agricola), and similarly mālus (apple tree) is a second-declension feminine, against shown by agreement (bona mālus NOT #bonus mālus). This means that things like the dative/locative singular ending in -ovi vs. -u/e/i should be ignored for the moment, as they are detracting us from the main issue, which is adjectival agreement patterns. AFAIK, in such agreement patterns there is a clear three-way pattern: (1) acc=nom in sg and pl; (2) acc=gen in sg and pl; (3) acc=gen in sg but acc=nom in pl. Given this, we need to make a three-way distinction in animacy, and making a two-way distinction will just confuse things. Please correct me if I'm wrong about the adjectival agreement facts. Again, the fact that certain noun endings are correlated with animacy is not probative for determining the actual animacy of the noun. The fact that the third class of adjectival agreement may be gradually disappearing is again not relevant, because we reflect the way things are today, and especially in the literary language, which is likely to be more conservative, rather than hypothetically in a future colloquial language. Benwing2 (talk) 08:35, 11 November 2024 (UTC)Reply
BTW the way to reflect something like a difference in animacy in literary vs. colloquial Slovak is through gender qualifiers, which are supported; you could say a given noun is (literary) m animal or (colloquial) m pers. Benwing2 (talk) 08:39, 11 November 2024 (UTC)Reply
I would disagree that adjective agreement is the main criterion for determining animacy in Slovak. Some otherwise purely inanimate masculine nouns use the animate adjective ending and the ending -a in accusative singular, while in all the other cases they have inanimate forms (the already mentioned categories 3, 4, 5 and 7 from my first post). Since it's the only case of masculine adjective endings diferring based on animacy, you would say that it's an animal noun, but the declension doesn't reflects that. The ending -ovi in dative/locative on the other hand is a really strong indicator of animacy, as it's used exclusively by animate nouns and by all of them. If a noun doesn't have it, it's not animate, which means that the abovementioned categories of nouns would count as inanimate with this criterion, which would better convey how the Slovak language uses the phenomenon of animacy.
What I wanted to say with the animal nouns disappearing is that any animal noun also has a full animate paradigm as well for the noun and the agreeing adjectives and not just colloquially. Which would mean long double gender notation with qualifiers for almost every animal noun probably :) TomášPolonec (talk) 09:23, 11 November 2024 (UTC)Reply
w:Grammatical gender "Genders are classes of nouns reflected in the behavior of associated words". It is about agreement. That is not an opinion. Vininn126 (talk) 09:26, 11 November 2024 (UTC)Reply
I agree that's how gender works. But animacy is not really about gender in this case. And since there is only one case where the agreement can be distinguished and some nouns have an exceptional ending right there, which is associated with animacy and therefore uses the corresponding animate adjective ending, there must be another criterion. TomášPolonec (talk) 09:36, 11 November 2024 (UTC)Reply
Animacy is part of gender. Vininn126 (talk) 09:40, 11 November 2024 (UTC)Reply
I have to agree with @Vininn126 here. BTW @TomášPolonec the situation with "any animal noun also has a full animate paradigm as well for the noun and the agreeing adjectives and not just colloquially" is in fact exactly what we now see in Ukrainian. You will find for example that this is explicitly shown in the declension tables of Ukrainian animal nouns such as миш (myš, mouse); see [1]. This means there is no need to double-indicate the gender; it simply is indicated as m-anml, and implicit in this is that declension and adjective agreement in the accusative plural can go with either nom or gen pl, just like in Ukrainian. Benwing2 (talk) 09:51, 11 November 2024 (UTC)Reply
Hmmm, миш (myš) is feminine so maybe not the best example. See also вуж (vuž, grass snake, water snake) and its declension here: [2]. Benwing2 (talk) 09:53, 11 November 2024 (UTC)Reply
Thanks for the information about Ukrainian. In Slovak, you have two sets of plural forms - one for the inanimate plural, one for the animate plural. So for the nom-acc agreement, you would have a different nominative than the one you would use for the animate version. E.g. orol: inanimate plural is orly for both nom and acc, animate plural is orli in nom and orlov in acc. Which gives you two separate plural declensions that are not interchangable or combinable (you can't use the animate orli nominative to create the inanimate accusative). So if you say that starček (a plant) is an animal noun as well, there would indeed have to be an extra qualifier for each animal noun that describes a real animal, because what I described happens only with these, starček doesn't have an animate plural in this sense. I still think it's simpler to just say that starček is inanimate with an exceptional animate accusative form and animal nouns are those that behave the same way as orol. TomášPolonec (talk) 10:41, 11 November 2024 (UTC)Reply
Sure, I agree, but I still think this is oversimplifying the situation in Slovak. If an inanimate noun keeps the -a ending in accusative from the original animate form, the adjective automatically takes the corresponding animate form as well. This is basically the case with these anomalies that I listed above. Unfortunately, there is no difference between animate and inanimate adjectives in dative or locative, so there is no difference that would confirm for you that a noun is (in)animate based on these cases.
If you just want to put a category on every noun based on some universal criteria, that's fine, but for this specific language, it doesn't really reflect how animacy is perceived and used by native speakers. I think, our ultimate goal should be to give users useful information while reflecting the particularities of each language. I believe there should be as few discrepancies between Wiktionary and other dictionaries and grammar/morphology guides as possible, even if we want to keep things as consistent as possible. TomášPolonec (talk) 10:28, 11 November 2024 (UTC)Reply
If it's causing adjectives to take animate declension, then it's not inanimate. Vininn126 (talk) 10:45, 11 November 2024 (UTC)Reply
If the ending stays animate in accusative, the adjective stays also animate, even though every other ending is changed to inanimate. As I've said, the animacy connected with the ending -a is not so strongly perceived as with the ending -ovi, which goes first when the noun is losing its animacy with the change in meaning (things named after someone/something, etc.). Out of 12 forms, 11 are the same as for the normal inanimate nouns. You still haven't convinced me that Ján ("John", meaning the feast of St. John) would be an animal noun, just because it keeps the animate accusative :) At least call it nonpersonal or something like that, that's also the official term used in Slovak. TomášPolonec (talk) 11:08, 11 November 2024 (UTC)Reply

Ongoing wrong audio from Flame, not lame

[edit]

Bit concerned about this user adding hundreds of audios which seem to be guesswork half the time, or based on highly unreliable auto-generated "how to pronounce" spam sites. See prior discussion at Talk:igasurine. She's still guessing, I think: since then, I've seen hydrogeniferous pronounced like it has "Jennifer" in it (I think these words are always stressed IFerous); and I am doubtful about aminimide (compare imide's IPA). Such additions could do significant damage to the project over time. 2A00:23C5:FE1C:3701:49E0:B9B8:114:7472 12:33, 11 November 2024 (UTC)Reply

@Flame, not lame: courtesy ping. —Justin (koavf)TCM 12:36, 11 November 2024 (UTC)Reply
yes? Flame, not lame (Don't talk to me.) 12:39, 11 November 2024 (UTC)Reply
Just letting you know that others are talking about you. No one else alerted you to this, so I figured it would be appropriate to let you know. —Justin (koavf)TCM 12:47, 11 November 2024 (UTC)Reply
If a native English speaker is given a text to read aloud and the text in question contains some obscure unfamiliar words, then how does this work in practice? Do people never utter anything that isn't a part of their active vocabulary? As for how to deal with that in Wiktionary, would labeling suspicious audio samples as "nonstandard" be a useful solution? The patrollers, who are native English speakers, could keep an eye (or an ear) on the correctness of the audio samples too.
There's a long backlog of words to be recorded, and it won't go away unless more people start contributing more actively: https://lingualibre.org/wiki/List:Eng/Lemmas-without-audio-sorted-by-number-of-wiktionaries --Ssvb (talk) 19:26, 11 November 2024 (UTC)Reply
People often make their best guess if they're reading an unfamiliar word aloud. In many contexts, that's fine, but a "best guess" pronunciation is not good enough for a dictionary. I agree with 2A00[...] that we should discourage adding audio without doing a reasonable amount of work to check that it is correct. There is no urgency to the task of adding audio for obscure words, so opting for speed over accuracy has minimal upside.--Urszag (talk) 20:34, 11 November 2024 (UTC)Reply
I will certainly record audio for familiar terms on the Lingua Libre list. when I am unsure how to pronounce a certain word, first I search for IPA or another recording. AI voices are less efficient. I was focusing heavily on chemistry vocabulary because a considerable quantity of these terms did not have pronunciations examples, and I will slow down as needed. Flame, not lame (Don't talk to me.) 22:24, 11 November 2024 (UTC)Reply
This is exactly the same issue as former (usually neurodivergent) users who have added "translations" by putting individual words into Google Translate, and it's just as pernicious. If the user's goal is to "create lots of entries" and they don't care about whether they are correct or not, this is what happens. It must be stopped at source. I love Flame's honey-smooth voice as much as the rest of you, but any single wrong or guessed audio is serious damage that will multiply (because once it's on "the wiki" it's borrowing our reputation, which those spammy faked machine voices didn't have: this also, for now, has an impact on SEO). Why is this not obvious? 2A00:23C5:FE1C:3701:95E9:81C2:6E59:5EE1 23:23, 11 November 2024 (UTC)Reply
Adding "usually neurodivergent" was totally unnecessary, really. Theknightwho (talk) 23:31, 11 November 2024 (UTC)Reply
Agreed. It is inappropriate to assume neurodivergence.
Either way, I am going to focus on words within my familiarity, and I am using Oxford Languages' human audios as resources. Flame, not lame (Don't talk to me.) 23:32, 11 November 2024 (UTC)Reply
You may ignore the lack of tact of the IP address. But whether I added it or not (and whether it's true or not, which I think we could prove with statistics, but not today), the entries are correct or incorrect. I assume Wiktionary believes there is a difference between correct and incorrect (can we still say that?), in which case you should lean toward the former, and weed out the latter. 2A00:23C5:FE1C:3701:95E9:81C2:6E59:5EE1 23:38, 11 November 2024 (UTC)Reply
I wasn't disputing the underlying point. Theknightwho (talk) 11:38, 12 November 2024 (UTC)Reply
And this is precisely why, folks, intersectionality matters. No wonder the "IP," a known transphobe, is also bigoted against neurodivergent people. 2600:6C5D:6040:67:3D7:FBD8:BA17:E5E4 01:15, 13 November 2024 (UTC)Reply
General comment: As mentioned I’ve brought this issue up many, many times to said user on Discord, but unfortunately the number of audios has ballooned to almost 12,000 audios added by the same user in a very short amount of time. There is simply not enough time nor energy for anyone to patrol the audios added, so at this point I’ve accepted that there will always be incorrect English audios here and there that I’ll simply never run into. (And I remove the ones that I do run into) I suggest that folks at this point do the same, unfortunately. This is why I really emphasized more action and fixing this problem much earlier, but alas, in typical Wiktionary fashion, we let it slide in favor of quantity over quality and lack of direct action. I’m tired. AG202 (talk) 03:06, 12 November 2024 (UTC)Reply
How about this: We shut down audio pronunciations for a little bit, and then people review all of them, make sure they're right and that it's all what people would expect (maybe making a game out of it.) CitationsFreak (talk) 06:58, 12 November 2024 (UTC)Reply
@CitationsFreak: Making a game is a good idea. If the QA process can be automated, then go for it. But shutting down is a bit too extreme without a really solid plan and commitment. --Ssvb (talk) 07:32, 12 November 2024 (UTC)Reply
@AG202: Do you count the total number of recorded audios or those that are linked from Wiktionary articles? Certain undesirable asymmetry definitely exists, because recording audio is very fast and easy in Lingua Libre. After a new contributor figures out the ropes, they can be incredibly productive. Recording around 100 audios in an hour is perfectly normal. That's even not the peak speed bounded by technical limitations, but also includes the review phase and merciless discarding of bad samples.
Is the IP user's claim about every second @Flame, not lame's audio sample being problematic actually grounded in reality, or was it more of a hyperbole? What is the actual rate of errors? I don't think that it's possible to 100% safeguard against errors even for the native speakers. A clip from How I Met Your Mother illustrates this: https://www.youtube.com/watch?v=-Fy_NYCtSgw (Ted pronounces "Chameleon" to his class). That's why it's up to the patrollers and reviewers to provide the necessary safety net and ensure quality. But the process definitely needs to become much easier for the patrollers. Expecting them to submit requests for audio samples removal on Commons and forcing them to go through various time consuming bureaucratic procedures is very much unreasonable.
I have two possible suggestions to improve the process:
  • Use the |bad= property of the {{audio}} template instead of just commenting it out, removing the audio sample from the article, contacting the uploader via their talk page or requesting file removal at Commons. Because this (a) saves time and (b) puts the bad audio samples into their own wiki category, where they can be tracked and handled.
  • Have one native English speaker do the recording of the audio samples in Lingua Libre. And have another different native English speaker actually adding them to Wiktionary articles. That second person is expected to check its correctness and share the burden of dealing with the angry lynch mob if anything goes wrong.
--Ssvb (talk) 07:13, 12 November 2024 (UTC)Reply
I have added a |bad= property to hydrogeniferous, now it shows up in https://en.wiktionary.org/wiki/Special:WhatLinksHere/Wiktionary:Tracking/audio/bad-audio/en
Now @Flame, not lame or any other Lingua Libre contributor of audio samples can monitor this list of known bad audio and do something to address the problem on case by case basis. This process converges, because after a correct sample becomes eventually available in a Wiktionary article, we are done with it. Rinse and repeat to fix similar problems everywhere else.
Compared to this, just removing audios doesn't work. Because random bots or editors would add the incorrect pronunciation back and the cycle repeats. Another disadvantage of such approach is that those, who are able to do something to fix the problem, are just unaware of it. --Ssvb (talk) 08:23, 12 November 2024 (UTC)Reply
Also pinging @Aquild as another recent English audio samples contributor, who can possibly help or may be interested in this topic. --Ssvb (talk) 08:39, 12 November 2024 (UTC)Reply
Thanks for the ping, I'll take a look. Aquild (talk) 03:48, 14 November 2024 (UTC)Reply
I will pay attention to that list. Flame, not lame (Don't talk to me.) 10:48, 12 November 2024 (UTC)Reply
The user themselves on Discord stated: "last time I checked, Wonderfool made 18,852 audios, and I made 11,911 audios. this vandal challenges me to beat him. 🔊" That's the number I'm going off of. And yes, from the reviews I've done, the error rate is higher than we'd like. I remember having to comment, along with others, on their additions to words with the super- prefix because most of them had the wrong stress. (I don't know if all of them have been fixed yet and I don't have the time to go through them).
As for bots, there was a previous discussion where there was a consensus to limit allowing bots to add audios without some kind of review process for this exact reason. I can find the discussion later. AG202 (talk) 15:25, 12 November 2024 (UTC)Reply
How do you know I sent that on Discord? Flame, not lame (Don't talk to me.) 17:47, 12 November 2024 (UTC)Reply
@AG202: Let me first explain how Lingua Libre works. A user gets a list of let's say 300 words and starts recording them, spending no more than 5 seconds on each of the words. After the recording phase is finished, all these words are presented in a big list. It's possible to replay words and remove any of them from the list if the quality is undesirable. Doing this, the list of 300 words may shrink to even merely 100 due to various reasons (possible pops, clicks, breathing or slurping noises in the audio, abrupt cutoffs in the beginning or end, if it's too quiet or muffled, if the intonation doesn't feel good, or just because of any other reason). With this extra review phase and discarding bad audios, the average time spent on one audio sample may increase from 5 seconds to maybe even 30 seconds. But in my opinion it's still very fast. Producing 500-1000 audios in one weekend is perfectly doable without any quality sacrifices. So ~18K or ~12K audios created over the span of a few months isn't anything particularly unusual or suspicious. That's just a normal productivity enabled by the Lingua Libre tool.
Now if a Wiktionary patroller spots a bad audio (e.g. a mispronounced "chameleon"), how many seconds have to be spent to resolve the problem? My suggestion is to add the |bad= property and move on. This way only a few seconds are spent, and the ball is on the audio recorder side again, who can re-upload a better audio and remove the |bad= property. Or start a dispute if they believe that everything is already fine as-is.
For comparison, the @Derbeth's suggestion and the old bot policy was to make requests for bad audio samples removal from Commons, which is non-workable, because it's too labor intensive. That's the reason, why there was that consensus to suspend the bot. In my opinion, anything longer than 30 seconds per one bad audio is too labor intensive for the patrollers. --Ssvb (talk) 05:11, 13 November 2024 (UTC)Reply
@Ssvb: The problem is the amount of time for a single audio, it's the amount of time to do a quality check for thousands of them. Using your estimate, even a hundred audios would take about an hour, and frankly, I've found it to take longer. Up that to the thousand range and it'd take 10 hours for a single person. I personally am not going to dedicate my time like that to reviewing audios, especially when I've already had discussions with the user in the past and when audios continue to be added at a rate faster than I can review them. It's just unrealistic, and my life does not center around this project. I wish we had more patrollers, but even if we had say 10 actively working on this, it'd still take way too much time. That's why I said that I've basically given up. It was more realistic when there were around 1000 audios, but now it's infeasible. AG202 (talk) 05:33, 13 November 2024 (UTC)Reply
Maybe just have a pron-checker lab, like Lingua Libre. It would be slow, but hopefully faster than "100 per hour". CitationsFreak (talk) 05:50, 13 November 2024 (UTC)Reply
@CitationsFreak: I don't quite understand what is "pron-checker lab". Could you elaborate? --Ssvb (talk) 06:49, 13 November 2024 (UTC)Reply
A system like Lingua Libre, but for verifying that the pronunciations are accurate. CitationsFreak (talk) 07:18, 13 November 2024 (UTC)Reply
@AG202: The patrollers only need to take action when audio samples are incorrect. And hopefully only a small fraction of these ~12K happens to be incorrect. Yes, you mentioned some the systematic problems with the "super-" prefix, but hopefully the total number of them was not large enough to significantly contaminate the whole set and everything is not so dramatic. --Ssvb (talk) 06:39, 13 November 2024 (UTC)Reply
@Ssvb: It was significant. I am one of the few who's actually taken the time to listen to a significant chunk of them, by going through the contributions list, and there were many issues. There's no way to know which ones are incorrect without listening to all of them. Calling it "dramatic" when you don't know the amount of time and effort I've put into this the past few months is an easy way to have me check out completely. CC: @Theknightwho AG202 (talk) 14:18, 13 November 2024 (UTC)Reply
@AG202: I understand your frustration. But this topic had been created by a cowardly anonymous IP user, who was up to no good and never intended to suggest anything constructive, thus stirring an unnecessary "drama". It was pretty transparent, especially considering the added sexist insinuations that allegedly only simps would raise objections, etc.
Wiktionary is a collaborative effort. Both the contributors of audio samples and patrollers need to work together. Patrollers can't assume that all audios are going to be perfect. This is normal, especially when they are contributed by a teenager, who naturally has a smaller vocabulary due to younger age. Of course it would be nice to have more contributors from the older age group, but they tend to be more cynical and less enthusiastic. Some of them already had experience with the "dramas" like this and learned the hard way to avoid being in the spotlight. The current situation is unhealthy. It became so ridiculous, that even foreigners are trying to "help" by recording English audios themselves and even synthetic robotic audio samples are seriously suggested in BP discussions from time to time.
It's more productive to have a more friendly environment, which attracts more contributors, rather than scares them away. Good tutorials and sane policies help. Some of the commenters here have the same opinion. I think that Help:Audio can to be extended to mention the usefulness of dictionaries, best practices for the patrollers how to deal with problematic audio, describe the |bad= property of the {{audio}} template and maybe other things. If we don't do this, then this discussion is a yet another nothingburger. Should I post a new topic with concrete proposals and initiate voting?
And again, I understand that you are fed up with this stuff and don't feel like patrolling audios anymore. And maybe I'm acting out of line proposing policy changes in this domain. I think that it's okay for you to maybe take a break until it becomes clear what works and what doesn't. --Ssvb (talk) 02:51, 15 November 2024 (UTC)Reply
Honestly, I do believe that the IP had good intentions in starting this thread, and good things are coming out of this thread (like your suggestions). CitationsFreak (talk) 00:04, 16 November 2024 (UTC)Reply
Wonderfool made a sexual comment about me yesterday. He lost all of my respect. Flame, not lame (Don't talk to me.) 10:29, 13 November 2024 (UTC)Reply
@Flame, not lame: The OED gives the pronunciation of hydrogeniferous so there was no need to guess for that one. Ioaxxere (talk) 03:25, 12 November 2024 (UTC)Reply
It is crucial on my part to further analyze my resources beforehand, and I agree to focus on creating audio for words I know well. Flame, not lame (Don't talk to me.) 03:36, 12 November 2024 (UTC)Reply
@Ioaxxere: Would you (or anyone else) extend the Help:Audio_pronunciations guidelines to add the missing information about the necessity of consulting reputable orthopedic orthoepic dictionaries prior to uploading audio samples, even for native speakers? And list OED as an example of such dictionary for English. BTW, is there or isn't any danger of succumbing to prescriptivism? --Ssvb (talk) 06:02, 12 November 2024 (UTC)Reply
Sadly, even the OED has to be taken with a grain of salt / native-speaker-knowledge these days, because they've embraced "many American /t/s are phonemically /d/" and a few other quirky ideas. (Initially I thought they were only using /d/ in cases where Americans flap /t/, but I came across at least two cases where their IPA had /d/ where Americans pronounce an unflapped [t]. If I relocate them or find more, I'll edit this comment to mention/link them.) No large source is perfect. - -sche (discuss) 21:07, 12 November 2024 (UTC)Reply
The /d/-v.-flapped-/t/ is probably just a difference of opinion between you and the OED regarding when a shift in pronunciation crosses the line from being merely phonetic to being actually phonemic - presumably because they take Americans hearing e.g. shutter and shudder as homophones as evidence that what started out as a phonetic shift has now become phonemic. (As a native speaker of AmE, I should note that this does make rather more sense to me than does the insistence of many users here that e.g. shutter somehow still has /t/ despite being completely homophonous with shudder.) Whoop whoop pull up ♀️ Bitching Betty 🏳️‍⚧️ Averted crashes ⚧️ 03:30, 29 November 2024 (UTC)Reply
@Ssvb: It's probably just a typo, but the right word is orthoepic, not orthopedic. PUC18:49, 13 November 2024 (UTC)Reply
@PUC: Thank you. It looks hilarious indeed. A spellchecker "corrected" me this way and I didn't notice. --Ssvb (talk) 19:34, 13 November 2024 (UTC)Reply
I agree with others that the recording of pronunciations is important work - I would do it if my voice were more suited to the task, and it's clear that FNL's voice is ideal for the job. But it must be done right. Having looked through a few pages of [3], I found definite errors in electrophoresis and tubercle, and likely errors (or possibly legitimate variant pronunciations?) of emend and precocity. I flagged them all with the |bad= parameter - I think we could definitely improve the way that parameter is handled, perhaps with a call-to-action similar to {{rfp}}. Courtesy ping to Flame, not lame, who might like to look into re-recording these words.
@Ssvb YouTube is a great resource for checking how words are pronounced "in the wild". I was going to criticise the recorded pronunciation of dairyman for wrongly stressing the last syllable, but I searched YouTube and found various Americans discussing "Tevye the Dairyman", all of whom pronounced it the same way as our recording! This, that and the other (talk) 09:32, 12 November 2024 (UTC)Reply
"FNL" is disrespectful, so please address me as "Flame, not lame" or "Flame".
Electrophoresis and tubercle matched IPA, and ememd and precocity were meant to align human recordings from other online dictionaries. Flame, not lame (Don't talk to me.) 10:42, 12 November 2024 (UTC)Reply
FWIW our current dairyman audio does put the stress on the initial syllable. AG202 (talk) 15:25, 12 November 2024 (UTC)Reply
I would guess This, that and the other was commenting not on the position of the primary stress, but on the use of an unreduced vowel in the final syllable versus a schwa.--Urszag (talk) 17:50, 12 November 2024 (UTC)Reply
Quite correct, yes. This, that and the other (talk) 00:43, 13 November 2024 (UTC)Reply
Ahh yes, then in that case, yeah I'd expect a number of Americans to pronounce it the way Flame did, including myself (though I don't use dairyman in everyday speech). AG202 (talk) 02:52, 13 November 2024 (UTC)Reply
AmE speaker here, can confirm. Whoop whoop pull up ♀️ Bitching Betty 🏳️‍⚧️ Averted crashes ⚧️ 03:31, 29 November 2024 (UTC)Reply
Everyone makes mistakes and I've uploaded some audio that was wrong and others corrected me. It's not that big a deal and it's weird and discouraging that there's so much noise about a constructive user who has been an asset to the community. (And as an aside, who has a far more pleasant voice to listen to than mine.) —Justin (koavf)TCM 04:00, 13 November 2024 (UTC)Reply
To be fair, it's that she makes some bad edits in a sea of good edits, in an under-patrolled area. Although I do think she's improving. CitationsFreak (talk) 04:29, 13 November 2024 (UTC)Reply
Several times I informed these people I planned to do my research and redirect to audio for words I know well, so their persistent condemnation brings my hopes down. Flame, not lame (Don't talk to me.) 10:27, 13 November 2024 (UTC)Reply

Automatically generated etymology texts

[edit]

I've noticed that Category:Entries with etymology texts by language has been organically growing and it doesn't seem to be causing any issues. Maybe we can remove the scary "[EXPERIMENTAL]" label in the documentation and officially allow editors to use the |text= parameter. While setting up the IDs is admittedly a bit of a hassle, it comes with the benefit of guaranteed synchronization between entries, so I hope we can use the template more widely in the future. Ioaxxere (talk) 03:46, 12 November 2024 (UTC)Reply

Questionable use of suffix template without root

[edit]

"See {{suffix|en||illion}}" → "See +‎ -illion" looks wrong. zillion is not from see +‎ -illion. That would be *seeillion. This problem is with many terms in the entire Category:English terms suffixed with -illion. I changed these to {{m}} but was reverted. Would "See {{af|en|-illion}}" → "See -illion" be better? @Binarystep, Einstein2: courtesy ping.

Should we also make {{suffix}} show an error if the root (second) parameter is empty? Category:English terms suffixed with -illion and Template:prefix/documentation say that {{af}} is preferred anyway. 76.71.3.150 11:24, 12 November 2024 (UTC)Reply

“See -illion” (i.e., See {{af|en|-illion}}) seems better, without the ⟨+⟩ denoting “and”. J3133 (talk) 11:34, 12 November 2024 (UTC)Reply
Just fixed that mistake Davi6596 (talk) 14:40, 12 November 2024 (UTC)Reply
@Davi6596: I restored the category. J3133 (talk) 14:52, 12 November 2024 (UTC)Reply
Thanks, I didn't mean to remove the category. Davi6596 (talk) 23:28, 12 November 2024 (UTC)Reply

Hmmm, should I or should I not quit Wiktionary?

[edit]

I might give up Wiktionary. Not the best idea. Flame, not lame (Don't talk to me.) 18:34, 12 November 2024 (UTC)Reply

The mods here aren't particularly considerate, many of them are quite confrontational... Purplebackpack89 20:07, 12 November 2024 (UTC)Reply
@Flame, not lame: I hope you won't, despite #Ongoing wrong audio from Flame, not lame. 0DF (talk) 23:24, 12 November 2024 (UTC)Reply
Second. Making mistakes is part of the learning process. Sometimes people are a bit too blunt in their criticism, which isn't good and ought to be called out too. —Caoimhin ceallach (talk) 23:34, 12 November 2024 (UTC)Reply
I second this. Davi6596 (talk) 23:38, 12 November 2024 (UTC)Reply
You should not. —Justin (koavf)TCM 23:30, 12 November 2024 (UTC)Reply
@Flame, not lame Why? You've been contributing a lot. No one is perfect: if there's anything to improve or fix, do so. But, if an audio is right, you shouldn't be afraid of defending yourself. Davi6596 (talk) 23:32, 12 November 2024 (UTC)Reply
From what was brought up in the discussion above, I gather that the insatisfaction towards you is actually due to Wiktionary’s convoluted audio review process. It’s normal to make the mistakes you’ve made, and your readiness to correct them is remarkable!
Hopefully everyone will realize that the path to take is not by attacking you, but by bettering the project’s policies. Polomo47 (talk) 16:54, 13 November 2024 (UTC)Reply
Agreed with the others above. Valuable contributions. And already improving methods to reduce any occasional flaws. Good work! Quercus solaris (talk) 20:34, 14 November 2024 (UTC)Reply
I listened to rainstorm. It is very clear, but as you would expect, it's not the British pronunciation I would use. DonnanZ (talk) 17:38, 15 November 2024 (UTC)Reply
The OED gives /ˈreɪnstɔːm/ as the UK pronunciation (which we would give as /ˈɹeɪnstɔːm/), and I agree. Do you really say /ɹeɪnˈstɔːm/, which is what you added to the entry? That sounds odd to me, and I speak British English. Theknightwho (talk) 13:01, 17 November 2024 (UTC)Reply

Audio whitelist

[edit]

I propose moving User:Metaknowledge/audiowhitelist to the Wiktionary namespace. Currently it is our only source for bot-imported Lingua Libre audio pronunciations (through User:DerbethBot). LL is now an enormous source of audio files but we are importing them at a trickle. That said, there's a lot of garbage on LL because it's so accessible, and that's why Metaknowledge didn't widely advertise this page. I suggest that we set up a system similar to Wiktionary:Whitelist with nomination by one user and approval by another. Because the stakes are lower than user right status, maybe the nominator could be anyone and the approver must be an admin.

Let's also consider the title of this page bearing in mind Benwing's comment on an above discussion that the terms "whitelist" and "blacklist" are falling out of favor. Ultimateria (talk) 02:39, 13 November 2024 (UTC)Reply

SupportJustin (koavf)TCM 03:57, 13 November 2024 (UTC)Reply

Category: words with different stress patterns

[edit]

I suggest creating a category for words that show different stress patterns in different didalects, such as ballet (BrE /ˈbæleɪ/ vs AmE /bæˈleɪ/) or peanut butter (BrE late-stressed vs AmE early-stressed.) JMGN (talk) 01:36, 14 November 2024 (UTC)Reply

Orthography of Istro-Romanian

[edit]

Should Zvjezdana Vrzić's or August Kovačec's orthography of Istro-Romanian be used or should both be employed? The former is modeled after Serbo-Croatian, whereas the latter is modeled after Daco-Romanian. Native speakers and heritage speakers only use Vrzić's orthography. Does each variant, based on one of these orthographies, have to be attested in the chosen orthography? HeliosX (talk) 03:32, 15 November 2024 (UTC)Reply

To search or compare attestations in the respective other othography is too much effort, more importantly would make Wiktionary inconsistent in its headwords. So just pick the Croatian orthography and convert Daco-Romanian spellings to it, you will have to cite the applied references to be accurate anyway. Fay Freak (talk) 04:24, 15 November 2024 (UTC)Reply
I think both deserve to be included. The question is whether only one should be lemmatized (with the other altform-ified) or both should be treated as equals (like Latin and Cyrillic spellings of Serbo-Croatian). Does either enjoy any kind of official or “semi-official” support over the other? Nicodene (talk) 20:02, 15 November 2024 (UTC)Reply

Ban of Denazz

[edit]

A few days ago, User:Koavf banned User:Denazz for ever, ostensibly by request. User:Vininn126 was also involved, but did that user request a ban? Maybe not. The question remains, isn't this ban far too harsh? Should we ban Koavf instead? DonnanZ (talk) 13:01, 15 November 2024 (UTC)Reply

So I was right regarding User talk:Donnanz § Sockpuppet! PUC13:04, 15 November 2024 (UTC)Reply
No, you were wrong. Two different people. DonnanZ (talk) 13:07, 15 November 2024 (UTC)Reply
"Involved" is a strong word. I commented it looked like the vandalism had been cleaned up. A ban on vandalism seems appropriate. Without any real counterarguments, it sounds like admins should be banned for banning vandals. Vininn126 (talk) 13:07, 15 November 2024 (UTC)Reply
According to the block log, you banned Denazz for 1 day, and again for 1 hour. That's way short of Koavf's lifetime ban. DonnanZ (talk) 13:22, 15 November 2024 (UTC)Reply
I have very little interest in interacting with you on this matter. I rarely find it to be coherent or productive. In this issue I see a similar trend already, you've provided no actual arguments and only seem to wish to stir the pot. Good bye. Vininn126 (talk) 13:26, 15 November 2024 (UTC)Reply
Calm down. I don't consider you as the villain here. DonnanZ (talk) 13:31, 15 November 2024 (UTC)Reply
And who is "the villain"? Is it the person who by his own admission was vandalizing this site for the lulz and wanted to be blocked? If so, we agree. —Justin (koavf)TCM 13:32, 15 November 2024 (UTC)Reply
Yes, the user explicitly requested to be banned. No, this is not too harsh. No, I should not be banned instead. If any other admin thinks that it's a really brilliant idea to let this user be unbanned so he can do things like this again, I'm all ears.
Honestly, why did you start this thread? You think that an admin who bans a user for a rash of vandalism which he himself says is for the explicit purpose of being banned is itself bannable? Is this a joke? —Justin (koavf)TCM 13:29, 15 November 2024 (UTC)Reply
I didn't think of looking there! DonnanZ (talk) 13:36, 15 November 2024 (UTC)Reply
None of that answers my questions. Are you saying that you didn't even think to look on the user's talk page before starting this thread? If so, I'm honestly amazed. —Justin (koavf)TCM 13:37, 15 November 2024 (UTC)Reply
I looked at the user page, it didn't occur to me to look at the talk page. Employing hindsight, the wording should have been "per request by Denazz". That would have avoided the topic being created. DonnanZ (talk) 13:46, 15 November 2024 (UTC)Reply
The facts that you didn't do the mildest due diligence, came here with the preposterous premise that admins should be blocked for blocking vandals, and you seemingly have no serious interest in discussing this are in no way my fault. This thread should not have been started and it shows incredibly poor judgement on your part. At a bare minimum, you could have posted to my talk or asked me via email, but coming here suggesting that I should be banned for blocking a spree vandal who explicitly asked to be blocked is frankly stupid and you should feel bad. I hope that my harsh language is well taken and that you are discouraged from posting such inane wastes of the community's time in the future. As Vininn pointed out (and V, please forgive me if I'm putting words in your mouth), you came here with such a ridiculous pretense that you should have known better. Please be better. —Justin (koavf)TCM 13:51, 15 November 2024 (UTC)Reply
Space has been wasted due to your all-too-economical editing. Let that be a lesson to you. That's the end of the matter. DonnanZ (talk) 14:00, 15 November 2024 (UTC)Reply
kthxbye —Justin (koavf)TCM 14:01, 15 November 2024 (UTC)Reply
  • Denazz should never have been unbanned. Did we forget how much trouble he caused when he was Wonderful? Or the trouble he caused under his current name? Support ban, regardless of whether requested or not. Oppose any action against Koavf. Purplebackpack89 17:09, 15 November 2024 (UTC)Reply
I imagine Denazz will reincarnate himself. He was presumably fed up with that name. DonnanZ (talk) 17:42, 15 November 2024 (UTC)Reply
Just for the record, banning a WF sock when they do things like this is perfectly appropriate. I've done it myself more than once. Chuck Entz (talk) 17:49, 15 November 2024 (UTC)Reply
For the record, the block from a few days ago was directed at Whalespotcha, who clearly deserved it. Whalespotcha (WS) is WF and she likes WS (Wikisaurus), Denazz (DZ) is WS/WF, and likes Drz (the rapper). Donnanz (Dz) isn't WF but sounds like DZ and dislikes Drz (the rapper). DZ's WT is WF, as is PS (P. Sovjunk). PS, PS is I, WF. OK? P. Sovjunk (talk) 18:03, 15 November 2024 (UTC)Reply
Oh, and remember, FNL doesn't like being called FNL. P. Sovjunk (talk) 18:05, 15 November 2024 (UTC)Reply

Suggestions for improving the Help:Audio pronunciations guideline

[edit]

I propose to:

  1. Remove the old guide section about the Audacity and uploading OGG files. It only makes the page much bigger and contains labor-intensive obsolete instructions, which are a liability today as they are prone to recording quality and categorization problems.
  2. Structure the page and have two separate parts: (1) for the contributors, who are recording audio and (2) for the patrollers, who are ensuring that only correctly pronounced and cleanly recorded audios end up in Wiktionary entries.
  3. Recommend to consult the reputable dictionaries, such as the OED.
  4. Mention the |bad= parameter of the {{audio}} template, which can be set by the patrollers. And the https://en.wiktionary.org/wiki/Special:WhatLinksHere/Wiktionary:Tracking/audio/bad-audio/en list, which should be preferably prioritized by those, who are recording audio.

As an example, take a look at orange. It's a common word, but it's currently stuck with some low-quality noisy OGG audio samples squatting the space in Wiktionary. Meanwhile, there are 7 audios in https://commons.wikimedia.org/wiki/Category:Lingua_Libre_pronunciation-eng?from=orange to be evaluated as possible replacements. I used the |bad= parameter to flag the problematic audios and now the problem is visible. Not all of the Lingua Libre audio samples are good, but it's still useful to have a sizeable pool of samples to choose from. It's similar to how there are many images available on Commons, but not all of them are necessarily suitable for use in Wiktionary. --Ssvb (talk) 22:40, 15 November 2024 (UTC)Reply

Support. CitationsFreak (talk) 00:02, 16 November 2024 (UTC)Reply
Support 0DF (talk) 00:18, 16 November 2024 (UTC)Reply
Support Aquild (talk) 05:34, 17 November 2024 (UTC)Reply
Instructions for OGG and Audacity are necessary in case Lingua Libre does not work for some people. Instructions for speakers and patrollers are useful. Suggest resources such as IPA, Oxford Languages, and Merriam-Webster, and add a disclaimer about AI voices. Encourage users to communicate on talk pages. PetScan and Lingua Libre's daily list of requests are worth mentioning, and allow users to remove inappropriate requests here. Flame, not lame (Don't talk to me.) 15:01, 17 November 2024 (UTC)Reply
@Flame, not lame: Thanks for your feedback. More text is not always better than less text. Removing the obsolete information makes instructions much more clear. If Lingua Libre doesn't work for somebody, then they are encouraged to post a comment and ask for help. The existence of the alternative obsolete instructions is highly unlikely to help these people, they will only waste time for nothing instead of asking for help. There's apparently an alternative Spell4Wiki solution, which was used for uploading the audios discussed in Wiktionary:Beer_parlour/2024/September#I’m_not_a_TTS, but this doesn't look like a success story to me and I don't feel like recommending it as a viable alternative to Lingua Libre right now.
Encouraging users to communicate on talk pages about each and every problematic audio is inefficient and scales poorly. People quickly grow tired of that and annoyed, so they just stop paying attention to pronunciation problems in audio altogether. That's the situation we are in right now. And the |bad= parameter is expected to help us avoid exactly that in most cases. That said, communicating on talk pages surely makes sense when there's a dispute to resolve.
About the lists of words to record. A lot of common words have noise or other minor or major recording defects in them, such as orange, candle, chameleon or planet. --Ssvb (talk) 10:53, 18 November 2024 (UTC)Reply
By "obsolete", do you mean it can't be done, or that it isn't the preferred way to do things? CitationsFreak (talk) 19:51, 18 November 2024 (UTC)Reply
It's a combination of both. The old instructions had been written by @Dvortygirl back in 2006, they assume a computer-savvy person to install and configure the software, record clean audio with proper noise removal, do files uploading to Commons, adhering to proper naming conventions, categorization, etc. And even if someone has the right computer skills, the procedure is still unnecessarily labor-intensive by modern standards. So yes, it "can't be done" in practice by an average contributor and it "isn't the preferred way" because the results may have substandard quality. Today Lingua Libre automates most of these steps and it's a project under the Wikimedia umbrella. --Ssvb (talk) 23:52, 18 November 2024 (UTC)Reply
I would say have something like "You don't need Lingua Libre to make audios, but it's easier. [explanation of LL] If you do decide to not use it..." and then we lost how to make 'em by hand, since it is possible (although it isn't a necessity). CitationsFreak (talk) 05:10, 19 November 2024 (UTC)Reply
I'll put it bluntly: many aspects of the almost two decades old instructions have a real potential of causing harm. For example, the described bulk upload is an enabler for uploading large batches of synthetically generated robotic pronunciations. Whereas LL is not designed for recording non-human voices, and one needs to go out of their way to do that. Finally, the possibility of still "making 'em by hand" without LL is a liability, because this just provides more room for human errors and more annoyed patrollers, who would have to deal with this stuff. --Ssvb (talk) 06:13, 19 November 2024 (UTC)Reply
I am opposed in principle on being dependent on an external project just to be able to add audio pronunciations. — SURJECTION / T / C / L / 15:49, 19 November 2024 (UTC)Reply
First of all, how do you define external project? Is https://commons.wikimedia.org an external project for Wiktionary? And second, if Lingua Libre stops satisfying the requirements of the Wiktionary project, then it can be replaced. --Ssvb (talk) 16:42, 19 November 2024 (UTC)Reply
No, Wikimedia projects are not external, but the argument is inconsequent since we always depend on stuff in some git repo. CJK editors also depend on IMEs to add spellings. It does not matter really once the data is on Wiktionary or another Wikimedia project. If there are methods to internalize data more efficiently, go for it. Fay Freak (talk) 17:41, 19 November 2024 (UTC)Reply
Commons is not external, Lingua Libre is. It's not hosted by the WMF, as far as I know. — SURJECTION / T / C / L / 07:41, 27 November 2024 (UTC)Reply
Lingua Libre is a project of Wikimedia France and they are not complete strangers. Per [4]: The Wikimedia Foundation and Wikimedia France are two separate entities working together to support Wikipedia and other free knowledge sharing projects. The Wikimedia Foundation works at a global level, providing technology support and support to volunteers and projects. As for Wikimedia France, based in Paris, it focuses more particularly on actions in France, content in French and French speaking communities.
Due to trademark reasons, I don't think that Wikimedia France would have been allowed to pick such a name for themselves without the Wikimedia Foundation's official approval.
That said, as a precaution, maybe we can add a recommendation to register a separate account specifically for recording audio? Because Lingua Libre performs uploads to Commons on user's behalf, facilitated by OAuth. --Ssvb (talk) 08:10, 27 November 2024 (UTC)Reply
@Ssvb we can simply put the Audacity based instructions in a collapsible box. This, that and the other (talk) 22:51, 18 November 2024 (UTC)Reply
Support. Thank you. AG202 (talk) 01:59, 21 November 2024 (UTC)Reply

Bot flag for User:SaphBot

[edit]

This would just iterate over all entries with AWB and normalise them with the regexes at User:Erutuon/scripts/cleanup.js#L-95--L-109. The regexes cover:

  • Trailing space, on both empty lines and lines with content (deletes)
  • Extraneous blank lines (deletes)
  • Horizontal tabs (converts to spaces; general norm 3)
  • Missing spaces in list items (e.g. #foo ==> # foo)
  • Extraneous spaces in headers (deletes)

-saph668 (usertalkcontribs) 22:08, 19 November 2024 (UTC)Reply

Updating COALMINE rule

[edit]

I've created the vote Wiktionary:Votes/2024-11/Updating COALMINE rule and proposed a few changes to how the COALMINE rule applies. The main rationale for this would be that rare, non-standard, or barely attested closed forms of SOP open compounds may not necessarily force the keeping of the open form, and the application of this rule be decided by the RFD discussion for the entry. A concern may be there that if might lead to us having a rare closed form that passes CFI but not the common/standard open/hyphenated SOP variant of it, but that can be dealt by deciding to keep the open form at a RFD discussion, or alternatively mentioning the closed form as rare, nonstandard, etc. such as writing: {{nonstandard form of|en|[[non-]][[Canadian]]}} at the entry nonCanadian. Another concern might be that the passing of the vote could lead to deletion of potentially keepable/useful entries like non-existent but they would still need to go through RFD before getting deleted, and the COALMINE test could still be pointed to as an argument for keeping them, which arguably anyone can do at present too and discuss there keeping or deletion, such as Talk:heatresistant and Talk:heat-resistant, and these prefixed ones possibly do not even fully fit WT:COALMINE as they are not the significantly more common forms of attestable single words, but have not yet been considered for deletion, and I expect it to be the same even if WT:COALMINE is made non-binding. (As at Wiktionary:Votes/2019-08/Rescinding the "Coalmine" policy: Metaknowledge: I think it's healthy that we continue to relitigate this as a community every few years, Robbie SWE: it's good to have these types of discussions every now and again, it might be fine to bring this again now to see if the community would accept these changes.) – Svārtava (tɕ) 11:14, 20 November 2024 (UTC)Reply

Your example nonCanadian isn't a good one, because the most common spelling non-Canadian is also not an open compound – it's not spelled *non Canadian. —Mahāgaja · talk 18:24, 20 November 2024 (UTC)Reply
I meant open/hyphenated as both are deletable by the SOP criteria for English. – Svārtava (tɕ) 07:17, 28 November 2024 (UTC)Reply
I don't understand why the coalmine rule exists. What bearing does the spelling have on whether it meets CFI? Whether it's one word or two just depends on how a particular language's orthography decides to handle compounds. It has nothing to do with idiomaticity. —Caoimhin ceallach (talk) 11:41, 21 November 2024 (UTC)Reply
You are absolutely correct. This is a point that I have been banging my head against for years. Whether something is sum-of-parts has ABSOLUTELY NOTHING to do with whether someone happens arbitrarily to write it open, closed, or hyphenated. Unfortunately, translating this observation into a sensible system whereby people can look up what they perceive as "words" proves difficult. Still, we can surely address the most BLATANT cases such as "non", "anti", etc. Mihia (talk) 22:52, 26 November 2024 (UTC)Reply
What effect would this have in cases where the closed form is well-attested but proscribed in favor of the open form, such as trans woman? Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 00:56, 28 November 2024 (UTC)Reply
None currently - but it would become possible to RFD it, but as the closed form is well-attested and the entry has useful usage notes, I think it would still be kept. Also, COALMINE would remain a valid rationale for voting keep for an entry, but it would just not have the power to over-ride the RFD if the RFD gets consensus for deleting. – Svārtava (tɕ) 07:17, 28 November 2024 (UTC)Reply

Appealing blocks without userpage or email?

[edit]

Hi. Can someone tell me how (if at all) wiktionary handles blocks from users who are blocked without access to email and their user talk? Thanks, Barkeep49 (talk) 17:12, 20 November 2024 (UTC)Reply

How would any wiki handle this, I wonder? (An e-mail list?) I don't think we have an official method; there are unofficial avenues, like pinging Wiktionarians on other projects (or Discord) the user isn't blocked on so someone can relay the appeal here (an avenue I recall at least one user using, despite his talk page access not being restricted), but in general it seems that if someone has managed to get blocked and have their talk page and e-mail access revoked, they've sufficiently obdurately abused their editing abilities, talk page, and e-mail that they're not going to be unblocked.
Someone who was a vandal ten years ago might be a different person now, but if they haven't acquired a new IP that they can edit or create a new account [to edit or post a block-appeal] from in that time, they may be such an edge case as to be out of official options, leaving them only the unofficial options aforementioned.
On rare occasion, particularly persistent bad users have pestered Wikipedia (or Metawiki, etc) admins to come here and appeal on their behalf (relatively soon after their block), but the two such users I can think of offhand (neither of whom had their talk page + email access restricted AFAICR) respectively had the appeal declined and ended up re-blocked because they continued blockworthy behaviour. - -sche (discuss) 20:01, 20 November 2024 (UTC)Reply
Thanks sche. Your question of "How would any wiki handle this, I wonder?" is a good one. I know some projects allow appeals through their info VRT queue but yes I think many projects this size don't have great answers here. I appreciate the practical options you offered. Thanks, Barkeep49 (talk) 21:28, 20 November 2024 (UTC)Reply

Word of the Year vote

[edit]

Hello! I have just started a premature vote the for concept of a Word of the Year (WotY) for Wiktionary, which could perhaps be displayed on the main page for a few weeks in late December and January. Would appreciate feedback, of course :) The entries are all obtained from WT:WOTY/2024, where some of us have sporadically added trending words seen through this year. See also the original, unsuccessful 2023 WotY vote which was far too hastily prepared. Thanks, LunaEatsTuna (talk) 11:58, 21 November 2024 (UTC)Reply

Publicizing discussions on watchlist

[edit]

Similar to the {{smallest discussions}} box that used to be there in the watchlist till the revision Special:Permalink/64505126, I think it would be nice if we had another box appearing in the watchlist below the votes box for important discussions to be publicized to users on their watchlists. This would be very helpful for e.g. sitewide discussions that could get more input such as proposals related to (non-language specific) layout, templates or modules, etc. Discussions could be listed in the box by a page similar to Wiktionary:Votes/Active where users could add them. A notable example of a discussion with sitewide impact that didn't get sufficient input/comments is Category talk:Colloquialisms by language#RFM discussion: July 2021–January 2022 and had to be overturned later by Wiktionary:Votes/2022-01/Label for lower register. – Svārtava (tɕ) 12:01, 21 November 2024 (UTC)Reply

Sign up for the language community meeting on November 29th, 16:00 UTC

[edit]

Hello everyone,

The next language community meeting is coming up next week, on November 29th, at 16:00 UTC (Zonestamp! For your timezone <https://zonestamp.toolforge.org/1732896000>). If you're interested in joining, you can sign up on this wiki page: <https://www.mediawiki.org/wiki/Wikimedia_Language_and_Product_Localization/Community_meetings#29_November_2024>.

This participant-driven meeting will be organized by the Wikimedia Foundation’s Language Product Localization team and the Language Diversity Hub. There will be presentations on topics like developing language keyboards, the creation of the Moore Wikipedia, and the language support track at Wiki Indaba. We will also have members from the Wayuunaiki community joining us to share their experiences with the Incubator and as a new community within our movement. This meeting will have a Spanish interpretation.

Looking forward to seeing you at the language community meeting! Cheers, Srishti 19:55, 21 November 2024 (UTC)Reply

Changes at inflection el tables

[edit]

attn interface admin @This, that and the other and admin. director for Modern Greek @Saltmarsh. Dozens of inflectional tables are altered in some weird way. 2024.11.22 It has something to do with Wiktionary:Beer_parlour/2024/October#Towards_a_Standardization_of_Inflection_Tables, which, I understand, concerns languages with very simple inflections whose administrators have agreed to change. Dear Salt, have you worked with M This, that and the other for such changes? Not only colours look weird, but something is wrong with notes and width. Thank you. ‑‑Sarri.greek  I 08:29, 22 November 2024 (UTC)Reply

As I wrote at User talk:Sarri.greek, I took it upon myself today to spend some time tidying up Greek noun and adjective declension templates, which had a number of visual issues, including
  • lack of support for dark mode
  • overly saturated colours and too little padding between the contents of table cells and the cell border, so that cell contents are hard up against the border (these are both very common issues across all our inflection tables)
  • use of the nonstandard transliteration template {{el-link-ttip}}, which is not usable on mobile and was nominated for deletion (admittedly not the strongest consensus for deletion in the history of the project, but enough to act upon, I think)
I'm happy to reinstate the collapsibility of adjective tables if that is a particular concern.
As for the other changes, I'd invite community members to look at the παλιός entry, which (at the time I write this) contains a template using the new visual style and one using the old visual style, and comment on which one is better - being sure to check in dark mode as well as on your phone. This, that and the other (talk) 08:41, 22 November 2024 (UTC)Reply
@This, that and the other: I've taken a look at παλιός (on my computer; I have no smartphone). I don't have a strong preference regarding the presentation of the tables in light mode, but in dark mode the comparative table's column headers and derivations and notes sections are unreadable, so the “old visual style” needs support for dark mode if it is to be the one to remain. But, other than that, I don't really mind which visual style prevails, as long as the style is consistent (i.e. let's not have a mix of styles). Even before your changes, however, there already existed a mix of styles, with Demotic/SMG being presented in the “old visual style” and with Katharevousa being presented in the same style as Ancient Greek (and, indeed, using that language's templates); this looks really terrible when both are used together, as in the case of Λεϊβνίτιος (Leïvnítios), which has both Demotic and Katharevousa declension. If there is to be a change in the appearance of Modern Greek tables, we should ideally take the opportunity to unify the visual styles of Ancient Greek and Modern Greek. 0DF (talk) 10:32, 22 November 2024 (UTC)Reply
[from User:Sarri.greek] Thank you @This, that and the other for your notes and forgive this long note. The Wiktionary:Beer_parlour/2024/October#Towards_a_Standardization_of_Inflection_Tables is too technical for me to understand. Also ref to el-link-ttip deletion Talk and the nice colours by User:Surjection/swatch2. Speaking of inflectional tables in particular:
Standardisation and unification are two different things. Standardising inflectional tables in particular, with default best view at all browsers, all skins, white, dark, pink modes, for all media, for mobile, for desktop, for hologramme, for whatever, and despite any preferences of a reader, who may not have preconsidered inflectional tables and/or specific languages. Of course, some choices may be offered, but some output principles have to be there, always and everywhere.
These principles may differ from language to language. Assuming that we work in pairs, the ideal editing of templates could be done by a programmer & a language director who has all the previous templates in his/her mind. Which may be the result of laborious discussions, thousands of trials and applications, a procedure which may have lasted more than 12 months (e.g. for el)
_1 dark mode is about the page, I presume. The territory of the inflectional table, is separate and its backgrounds are set. At most, the forms-cell background (default ffffff to work on, with text colour black, or combinations of base+ending, could possibly be offered at dark mode with corresponding text colours/ But I would rather keep this territory as is.)
  • colours for inflectional tables for languages with heavy infl.system need 4 colours (same for maps: Four color theorem), and probably 2 borders: soft inner borders to help the eye and strong separating sections or as outer borders (now, we have only strong borders). E.g. for lang=el then c(olor)_1=#XXXXX darkish, for hypertitles, c_2= for grammatical terms (usually horizontal like genders, aspects, tenses, or important notes, c_3= for grammatical terms (usually at left side like persons or cases) and c_4= for notes (very light). For a dead lang=la, these c_ params could be set somewhere.
Some >>saturated colours<< were chosen as characteristic, I think. e.g. Aegean blue for el
Are they different tones of the same colour? Not always. It becomes too pale and dull.
Some wiktionaries like de.wikt, fr.wikt decided on totally unified infl.table.styles (cf for both ancient and modern: wikt:en:καλός, wikt:de:καλός, wikt:fr:καλός base+ending system, wikt:el:καλός
_2 padding, borders etc (I do not understand very much what you are describing): Could someone design a super-table example of a best-view as help to copypaste from? float= show= paddings=likethat, etc. We need economic spaces; clear and not small font sizes for languages with diacritics.
width. Yes, tables may be large. There is no royal road to that. (wikt:el:συμπεριλαμβανόμενος) Mobile users can scroll, I hope, or use landscape view, which still cannot show all tables in the world.
_3 transliteration at Module:el-translit has to be reviewed (@Saltmarsh). Grek or Polyt fonts, font sizes and transliterations have to be reviewed too, hopefully together with @Erutuon.
The problem started in the old days of wiktionary with this confusion: Yes, we present Ancient Greek grc polytonic, Modern Greek el monotonic. But, all greek may be written (in quotations for example) polytonically in books before 1982. So, a el-translit should include Polyt too.
  • Katharevousa under Modern Greek is polytonic. Ancient Greek tables need a little update for it ( if dial=el-kth, then prosody=notshown (oh, never mind, we can add a note), dual=- ) Λεϊβνίτιος by @0DF is exotic, but serves well as an example of juxtaposing tables face-à-face.
_3b tooltips are a powerful weapon of electronic dictionaries and text presentations. For Modern Greek, I would keep {{el-link-ttip}} (with the tooltip on a dot) and add underneath the ipa which is much more helpful (fr.wiktionary does this a lot). If {{el-link-ttip}} is to be expanded, where is a Templare:link vertical with all parameters vertically tr= ts= t= one under the other? I have tried to show outputs at Template:User:Sarri.greek/tlse but i have no help from any programmer who would make it properly. & small ideas: For dead languages, expected, unattested and rare styles could be considered. For contemporary, learned or high register, vernacular and rare.
PS As the technical things become more complicated, the interface programmers and the language editors have been drifting apart.
Language specifics are cut on the procrustean bed of the hegemonic English linguistic nomenclatura, foreign linguists are ignored and small language editors are mocked as ideoleptic (obsessive) maniacs —and by one of your best admins. The case study of Greek is an example. ‑‑Sarri.greek  I 15:14, 22 November 2024 (UTC)Reply
@Sarri.greek: You seem to speak favourably of the unified inflection table styles used by the French, German, and Greek Wiktionaries. Do I interpret you correctly? Does that mean you'd be in favour of harmonising the styles of the Ancient Greek and Modern Greek inflection tables here on the English Wiktionary? Also, I had not heard of the word ideoleptic before, so needed to look it up, but the NED has no entry. It seems to be very rare; I could only find one use of it on Google Books. I have quoted that use at Citations:ideoleptic; is that the sense in which you meant the word? And is there a Greek ἰδεοληπτικός (ideolēptikós) from which you derived it? 0DF (talk) 22:56, 22 November 2024 (UTC)Reply
@0DF:, _sorry for the 'greekism' ideo- + -leptic (obsessive is the word I should have used), (also ideo -lepsy), new compounds, not ancient. ιδεοληπτικός mostly in ModGr. _Standardising or unifying or harmonising... The story of infl.tables: back in 2000s an editor would know how to make easy-tables for his/her language. These were copypasted heavily, here and interwikily. Now, programmers tell us that things have changed. There are 3 things. Updates that are invisible to the reader (the output is the same). Updates that include radical visual changes. Lastly, updates that are difficult for editors to handle. New master.tables should be initially proposed. Then, the ultimate best view for each language, taking in accound the past tradition of its tables. But all this needs sessions with the directors of languages (en.wikt does not have directors of specific tasks/langs). The only specialty I know of, is interface administrator. ‑‑Sarri.greek  I 06:58, 23 November 2024 (UTC)Reply
@Sarri.greek: No need to apologise for the “Greekism”; it was interesting. Re harmonising, let me put it this way: How would you feel about Modern Greek tables using the appearance of Ancient Greek tables? Or vice versa? 0DF (talk) 08:57, 24 November 2024 (UTC)Reply
@Sarri.greek thank you for this considered and detailed response. You have made a lot of points here, many good, some I would need to think about for longer. I should say, in particular, that I would like to hope that I have not been seen to offend or mock any individual or group, or the hard work put in by any editor to this wiki. We are all seeking, in our various different ways, to advance the dictionary project.
In response to your point "Could someone design a super-table example of a best-view as help to copypaste from?" This is partly my intention with {{inflection-table-top}}. I say "partly" because it is not a resource intended to be copied and pasted from. However, it is a wrapper template that, when used as part of inflection table templates, is intended to be an example of best practice (although some may dispute that it achieves that intent!), in order that individuals do not need to muck around with the fine points of CSS styling to achieve an acceptable-looking inflection table.
When it comes to colours, Surjection has helped us by providing a range of colours that can be used in both light and dark modes (MediaWiki:Gadget-Palette/table). There is a type of blue offered here. If the particular shade of colour used in the template needs to be fine-tuned, further colours can be added (perhaps to the inflection table CSS, not to the global palette).
As for transliteration, I initially had the same thought as you, namely, that we need a new template that displays a terms' transliteration beneath it rather than beside it. Something like {{l-stacked}} or {{l-vertical}} as you say. However, as I considered this some more, it occurred to me that the main, if not the only, place where this style of display is called for is in an inflection table. The {{inflection-table-top}} template is able to "impose" this stacked format on all {{l}} templates contained within it, and this is now in effect in Greek adjective templates such as at κατσαρομάλλης (katsaromállis).
I would like you opinion on something, if I may ask. When there are two possibilities for a single inflection, would it be better to see
κατσαρομάλλα
katsaromálla
κατσαρομαλλούσα
katsaromalloúsa
or
κατσαρομάλλα /
κατσαρομαλλούσα
katsaromálla /
katsaromalloúsa
? Or do you like/dislike each format equally? There is a third form, of course, namely
κατσαρομάλλα / κατσαρομαλλούσα
katsaromálla / katsaromalloúsa
but this cannot really be used, as it would create difficulties with tables becoming too wide.
Thank you, This, that and the other (talk) 12:10, 23 November 2024 (UTC)Reply
@This, that and the other:, you are very polite, and my PS above was about a general situation, in case bureaucrats read, like M Benwing, who follow Greek issues.
But your changes of 2024.11.22 on templates was so sudden, drastic and unexpected. These templates might be updated anyway some day, but differently. The tooltip for Greek transliterations with the template {{el-link-ttip}} discussed here is very helpful. Who wants a tr= for a hundredth time? Tooltips are also used for the grammatical terms. The English term is visible, the tooltip explains if needed, plus gives the term at the target language. e.g. Check the little arrows at παίζω#Conjugation. At el.wikt terms such titles are translated like at wikt:el:Template:ενικός.
table-top {{inflection-table-top}} is for the Hide/Show function, I guess, if the editor does not have one already. The {{inflection-table-bottom}}, which has only the closing |} things, never the param |notes= which mayyy be used already by the infl.template.
Mobile perhaps has to be considered separately: but I do not think that a verb's table can fit in a mobile, whatever way. Best view = desktop view. cf παίζω both langs. The French wiktionary puts them at separated pages. They are SO big. jouer (Vector2022 is not suitable)
Dark mode does not step IN the table at all, unless the creators of the infl templates chose to switch. I hope background=#ffffff protects the territory of infl.tables from switching?.
You ask about applications that are very specific to a language (Modern Greek) and a PoS (adjective) and a lemma κατσαρομάλλης (/⁠katsaromális⁠/) Why? Sorry, I cannot find where el-adj templates begin; I would need a month to discuss adjective templates with my admin. Obviously, preferably every word has tr and ts and t under it. eg ξανθομάλλης (xanthomállis) looks so cluttered! Could you show us both styles, previous and yours? And what about verbs?
Some of my tests when we were wikt:el:Template:table-test/1 choosing among colour.styles wikt:el:Template:table-test. Also, we discarded font variation. Because they come in slightly different sizes. We need good clear fixed sizes! Readers can zoom out if they want. Here at en.wikt, much more complicated with tr and ts at User:Sarri.greek/template4 and base+ending system with tlse & lse An instructions.master.table... a draft needed for editors to copypaste from and create infl templates with a note: what is in Common.css already. Thank you ‑‑Sarri.greek  I 17:31, 23 November 2024 (UTC)Reply
@Sarri.greek To ensure dark mode compatibility, if you set the background, you also have to set the color (text colour). (This is the problem with the untouched, original templates - the text becomes white but the background is also very close to white, rendering them unreadable.) But if this is all you do, the table will appear "bright", even in dark mode, which is as distracting to dark-mode users as it would be if a black-background table appeared in the middle of an entry for a light-mode user. So you then have to use paletted colours, like background: var(--wikt-palette-blue-4) (MediaWiki:Gadget-Palette/table). {{inflection-table-top}} comes with this pre-baked, but of course you can incorporate this styling into pre-existing inflection tables if you wish.
As for my question to you, yes, I have posed it in the context of a specific example, but consider that it applies to any situation where a particular inflection of an adjective has two or more forms available. I'm still keen to hear which style you prefer.
And I am not touching the verb templates at this time! Too complex. This, that and the other (talk) 07:08, 26 November 2024 (UTC)Reply
I am so sorry. Then, I will not do any tables. Thank you, M @This, that and the other ‑‑Sarri.greek  I 11:21, 26 November 2024 (UTC)Reply
@This, that and the other Since @Sarri.greek hasn't weighed in on what looks best, my preference would be option number 1, where you show the transliteration after each variation. I find it easier to make the comparison that way and find it more visually appealing. But I'm happy to cede to anyone else's preference if they feel differently. Andrew Sheedy (talk) 06:07, 27 November 2024 (UTC)Reply

Formatting of IPA in Pronunciation sections

[edit]

(this discussion is following an exchange of diffs between me and @Sgconlaw) I wish to know how IPA transcriptions, and especially English IPA, should be formatted in the "Pronunciation" sections of entries. I have been following what seems most common in and outside English entries, however I want confirmation and consensus from others. I wish comments on the points below:

  • broad and narrow transcriptions have syllable breaks between all syllables (disregarding English's weird hyphenation)
    • except before primary and secondary stress markers
  • broad transcription for English follows the IPA key in the appendix and shouldn't contain more information. this includes:
    • tiebars for affricates (in languages where the distinction between afficates and plosive-fricative sequences doesn't exist)
    • rhotacisation using /ɹ/, not /ɚ, ɝ/
  • all stressed words, including monosyllabic ones, should have a stress marker (in languages with lexical stress)
    • (this wasn't in the edits above, but I am throwing it in for discussion)

Juwan (talk) 13:01, 24 November 2024 (UTC)Reply

As far as I am aware:
  • We don't indicate syllable markers unless there is ambiguity or if required to ensure that {{IPA}} correctly indicates the number of syllables in the word.
  • We indicate the consonant sound at the beginning of juice as /d͡ʒ/ and not /dʒ/. This is despite the fact that /d͡ʒ/ is not indicated at "Appendix:English pronunciation" (perhaps this should be changed), or in the titles of rhymes pages.
  • We now use /əɹ/, not /ɚ/; and /ɜɹ/, not /ɝ/, as indicated at "Appendix:English pronunciation".
  • We don't use stress markers for one-syllable entries, but do indicate if one-syllable words are stressed in entries that are multiword terms.
Sgconlaw (talk) 13:11, 24 November 2024 (UTC)Reply
Wait, when did we stop using /ɚ/? We were using it for years in entries. Was there a formal decision made? I do seem to recall some discussion, I just don't recall a decision. Andrew Sheedy (talk) 03:38, 25 November 2024 (UTC)Reply
@Andrew Sheedy: there was a discussion on the talk page of that appendix, and it appears some consensus was reached among editors who regularly edit that page. (I did not take part in it.) Perhaps there should have been a wider discussion at a forum such as this page. — Sgconlaw (talk) 12:54, 26 November 2024 (UTC)Reply
Oh, OK. I don't know enough about the finer points of phonetics/phonemics to agree or disagree, but I do find it hard to keep track of stuff like this, so hopefully there's some flexibility allowed. I'll try to abide by the decision so long as it stands. Andrew Sheedy (talk) 05:48, 27 November 2024 (UTC)Reply
Inconsistency is basically inevitable anyway, so flexibility is not merely allowed, it's obligatory. —Mahāgaja · talk 09:28, 27 November 2024 (UTC)Reply
For what it's worth, my own understanding is that there's no policy or consensus either mandating or forbidding syllable breaks (is there in fact one somewhere?); some people add them, some people remove them, and it's occasionally contentious. (Any unthinking mass removal would run into problematic edge cases as alluded to above; one class of edge case is New Zealand /i(ː)ə/ ear words vs two-syllable /i(ː).ə/.) I'm not sure there's a clear consensus for /ɚ/-not-/əɹ/ or for /əɹ/-not-/ɚ/, either, though it's been discussed before a few times; the appendix was changed in diff; here is one discussion, and another. Stress markers for one syllable entries have been discussed before too and I don't recall the outcome. - -sche (discuss) 07:11, 25 November 2024 (UTC)Reply

Inflected verb forms for agglutinating languages

[edit]

Is there any preference for how to define verb forms in agglutinating languages? (Not that I’m planning to create a gazillion, but some of them have idiomatic senses and I think it’s our policy to give the verb-form sense as well in that case.) Currently, a Swahili verb form like lililompeleka would be defined as

  1. ji-ma class subject inflected singular relative m-wa class object inflected past of peleka

I think it’d be more informative to give the breakdown in morphemes:

  1. li- (ji class(V) subject) +‎ -li- (past tense) +‎ -lo- (ji class(V) relative) +‎ -m- (m class(I) object) +‎ -peleka (to send)

What are y’all’s thoughts? @Tbm as probably the most interested. MuDavid 栘𩿠 (talk) 03:02, 26 November 2024 (UTC)Reply

I can't comment on Wiktionary policy but I agree your breakdown is much more informative! tbm (talk) 13:57, 26 November 2024 (UTC)Reply
Yes, the first example falls squarely in the category of "technically correct but practically useless" - or at least, useless to all but a Swahili grammarian.
I'd still prefer to keep the form-of template with its boldface text:
  1. form of peleka: li- (ji class(V) subject) +‎ -li- (past tense) +‎ -lo- (ji class(V) relative) +‎ -m- (m class(I) object) +‎ -peleka (to send)
The other approach, of course, would be to place the breakdown in the Etymology section and keep the long string of grammatical terms on the sense line. This would be more in line with existing Wiktionary norms, but seems like a needlessly redundant way of presenting this information. I'd prefer this new approach. This, that and the other (talk) 23:15, 26 November 2024 (UTC)Reply

Add the exclusion of political party names to CFI (proposal)

[edit]

I suggest that we formally disallow the names of political parties on Wiktionary. We can add a bullet point noting this exclusion somewhere at WT:NSE.

Context: so, political parties are actually already de facto excluded from Wikt; nearly all recent political party names have failed at RfD, including Democratic Party and Republican Party (2011), Talk:Japan Socialist Party (2020), Talk:Arab Socialist Baath Party (2021), Talk:Transhumanist Party (2015 RfD-kept, 2023 RfD-deleted) and the ongoing RfD discussion for Communist Party of China is evidently leaning delete. A very short discussion for nine parties back in 2009 (see Talk:Liberal Democrats) yielded no consensus, but since then it seems that the majority of the community have decided that we do not need political party names on Wiktionary.

Most arguments against including them here include that they are more Wikipedia material (not dictionary-worthy per se, rather fit for an encyclopaedia) and SOP, though the latter applies only to non-proper nouns so admittedly that argument is somewhat flawed.

Since there has been a general consensus to exclude them for a few years and there are still ~40 such pages on Wikt right now, I think adding a specific policy excluding political party names would be appropriate and necessary so as to stop users from creating new entries for political parties. I mean, it would otherwise be super funny how Republican Party and Communist Party of China (arguably among the most powerful political parties in the world) are not allowed entries on Wiktionary but the small, satirical Official Monster Raving Loony Party de jure is.

This proposal would, of course, not apply to nicknames, clippings or abbreviations/initialisms.

(As a final note, with respect to RfD, I recommend that we do not immediately speedy-delete all political party entries; rather, they can all be nominated at RfD (individually or in bulk) with reference to this discussion and the hypothetical new policy at CFI, should it be implemented.) I would appreciate any arguments in favour of or against this proposal! Kindest regards, LunaEatsTuna (talk) 02:04, 27 November 2024 (UTC)Reply

What is difference between political parties and business companies? I mean, in terms of inclusion. And what is a political party contra a political organization? Like, a hypotetical National Bolshevik Party of Eastern Udmurtistan having 1,5 members is not really a party, are they? So what the criterias for inclusion are have I never understood. Tollef Salemann (talk) 02:18, 27 November 2024 (UTC)Reply
You make a valid point! WT:COMPANY already excludies business companies and, since political parties are not companies per se but most Wiktionarians feel that they should not be included on Wiktionary, I think making a specific exclusion for them too would be fitting. As for other political-ish organisations, like churches, armed forces, NGOs, militant organisations, airlines etc. I do plan on addressing them separately in other discussions here! I am just starting with the easiest (since most people already think parties should be excluded but not the others I mentioned). My plan is that hopefully we will be able to find consensus on why some should be included and others not, and the reasoning(s) why for each. :) LunaEatsTuna (talk) 02:35, 27 November 2024 (UTC)Reply
  • Mostly oppose: Entries that take the form "BLAH Party" probably need to go. Entries that are just "BLAH" without the party, but refer to a political party or members of same, probably need to STAY. Purplebackpack89 04:28, 27 November 2024 (UTC)Reply
    I think the same about the churches as well. There is no reason to inlude names of all the thousands of American Evangelical churches, Ukrainian Orthodox churches, Siberian Old Rite communities, all the way to Norwegian Pagan school kid groups. Instead it is enough to just have entries about evangelism, orthodoxy, old ritualism, neo-paganism and so on - what we already have by the way. Tollef Salemann (talk) 12:00, 27 November 2024 (UTC)Reply
I'd support such a change. Political party names don't seem to have any lexical value. I'd be curious to learn the types of entries @Purplebackpack89 is thinking of? Something like Labour or Conservatives? These would seem to me to fall under LunaEatsTuna's proposed exception for clippings and/or nicknames. This, that and the other (talk) 22:39, 27 November 2024 (UTC)Reply
@This, that and the other: Yes, I also think that the longer and more descriptive a name is, the less likely it is to deserve an entry. Ioaxxere (talk) 03:21, 28 November 2024 (UTC)Reply
A good example would be that Green contains the definition "A member of the Green Party" Purplebackpack89 01:41, 29 November 2024 (UTC)Reply
@Purplebackpack89 I think that sense is very safe - this proposal only covers the names of political parties, not terms to describe their members. This, that and the other (talk) 11:03, 29 November 2024 (UTC)Reply
Support. As for Purplebackpack89’s point: there are parties without the word “party” in their names (Groen and Vlaams Belang are two I’m familiar with), and I really don’t see any reason to include those. Clippings and nicknames can stay. MuDavid 栘𩿠 (talk) 09:41, 28 November 2024 (UTC)Reply
Hmm, this makes me wonder about the Greens, the green party in Australia. It would be disappointing to lose this entry, as it's easy to imagine contexts in which it would not be obvious that Greens refers to a political party. All the same, I suppose the entry would technically survive, as the official name of the party in each jurisdiction contains the jurisdiction name (e.g. Australian Greens, Greens NSW), so Greens is just a nickname. This, that and the other (talk) 11:08, 29 November 2024 (UTC)Reply
@This, that and the other The disambiguation page on Wikipedia, The Greens, lists several parties with said name, so I would argue that keeping only the third sense as a synonym for Green Party should we fine (we also similar types of entries like Communist Party and Labour Party). Either way, it would technically have been valid as a nickname anyways. And thank you for the feedback! LunaEatsTuna (talk) 00:25, 30 November 2024 (UTC)Reply
@LunaEatsTuna I broadly agree with you, but the term "Green Party" is never used to refer to the Australian party, so the sense would have to be reworded slightly - perhaps {{n-g|A name, either a nickname or an official name, applied to various green political parties.}} or similar. This, that and the other (talk) 00:49, 30 November 2024 (UTC)Reply
Good point! Your alternative wording should be unproblematic. LunaEatsTuna (talk) 01:04, 30 November 2024 (UTC)Reply
Oppose.
  • I don't see how polital parties are SOP. There's nothing in the Name "Liberal Democrats" that tells you it's a UK party or that it's centrist. This is especially the case for cases like the Danish "Radikale Venstre" which despite the name is neither radical nor left wing. Obviously Wiktionary is not the place to write expansively about what the party stands for. The most important part of the entry should be the link to Wikipedia.
  • They should be treated as all other words, i.e. do they meet the CFI. Notably, a name of a party should be included if it is actually used to refer to the party. —Caoimhin ceallach (talk) 20:11, 29 November 2024 (UTC)Reply
I think the other big argument is that there are simply a heck of a lot of political parties in the world that Wikt, as a dictionary, really does not need to catalogue. We already exclude business companies per WT:COMPANY, which I think it should be extended to parties as well. LunaEatsTuna (talk) 00:25, 30 November 2024 (UTC)Reply
I don't think I understand WT:COMPANY. Why are company names excluded and what exactly constitutes a non-trademark use? I remember looking into this before and not getting it. —Caoimhin ceallach (talk) 10:50, 30 November 2024 (UTC)Reply
@Caoimhin ceallach what do you believe our Labour Party and Liberal Party entries should look like? Should they painstakingly chronicle every party that has ever been known by this name, whether or not it was their legal, official name? This, that and the other (talk) 00:30, 30 November 2024 (UTC) (edited 00:36, 30 November 2024 (UTC))Reply
As I already said, every party of that name which passes the CFI. I may be wrong, but my intuition is that there won't be all that many for which three quotations can be found. —Caoimhin ceallach (talk) 08:56, 30 November 2024 (UTC)Reply
Much as I hate using the E word, extensive explanation of the stances or history of political parties...IS encyclopedic. Most definitions for political parties should be simply "A [nationality] political party", with possibly the first time it was used and the man or men who coined it in the etymology. Purplebackpack89 18:41, 30 November 2024 (UTC)Reply
I'm leaning oppose on this, because I don't really understand why political parties are being singled out. Theknightwho (talk) 10:38, 30 November 2024 (UTC)Reply
@Theknightwho Most Wiktionarians seem to think that they simply do not belong in a dictionary. Political parties could be argued to be similar to other organisations or business corporations, the latter we already formally exclude per WT:COMPANY.
The main arguments from what I have read up on is that political parties offer little lexical value (with many possessing simple names like Democratic Party of X, or just Liberal Party) and tend to have defs more fit for an encyclopaedia (like Wikipedia); people looking for information on a political party are likely to consult an encyclopaedia and not a dictionary.
Another point I have seen raised is that including political parties in general would set the precedent for thousands of other such entries, including for parties which are very obscure yet meet the CFI. Finally, regarding not being “dictionary material” (a phrase trademarked by Svārtava /s), a lot of editors have compared political parties to other multi-term institutions and organisations (which are typically not fit for a dictionary); most political parties follow a similar lexical pattern. See, for instance, the opinions raised at Talk:National Hockey League (2021), Talk:Real Academia Española (2021), Talk:Bank of Canada (2022) and uncontested deletions like Talk:United States Marine Corps (2011) and Talk:South African National Defence Force (2022), which I believe are relevant to this proposal.
I will play d*vil's advocate and refute two very common arguments here that I do not believe are actually relevant: firstly, proper nouns cannot be SOP since they are, well, the names of actual things. However, when editors say this at RfD what they actually mean to say is that the entries with generic capitalised parts (World Chess Federation) offer no lexical value to them. Finally, as I presume you may know, there is no inherent problem with allowing something which would have thousands or even tens of thousands of entries; Wiktionary has the server space.
Personally I do see this logic: I do not think Wiktionary will see any benefit from having hundreds of subsenses at Communist Party or really elongated party names when we already exclude institutions as mentioned above. As for the reason why I am singling them out here: it is to avoid one of those super broad no consensus Wikt debates that do not help us establish any precedent for anything. So, I am doing them one by one to see what editors' positions are on each in order to establish a broad consensus.
As you can tell, we are equally as divided on the names of government agencies, militias, airlines, IGOs and Churches etc. but bundling them all in one proposal would be a clusterheck! Especially since this is my first proposal I would not be able to cope TwT. Sorry for the wall of text, I wanted to be as informative for you as possible, LunaEatsTuna (talk) 20:55, 30 November 2024 (UTC)Reply
Oppose. DonnanZ (talk) 14:16, 30 November 2024 (UTC)Reply
I support this also. Names of organizations like that are supposed to be on Wikipedia. There is nothing interesting about these names’ inclusion in a dictionary, especially (but not exclusivelywhen they would otherwise be SoP. Let Wikipedia explain the etymology behind whichever ones aren't SoP, and let's just list the nicknames and abbreviations that Wikipedia might not. Polomo47 (talk) 18:41, 11 December 2024 (UTC)Reply
Oppose as a blanket rule. Some party names are simply proper nouns composed of combinations of existing English words (e.g. Democratic Party, Republican Party, Communist Party of China): in these cases, while they are not SoP in terms of their meaning, I can see the argument that any non-SoP information is encyclopedic rather than dictionary material. However, some party names are made of components that do not otherwise exist as English words (e.g. Sinn Féin) and in cases like that, there's no other logical place to put the English pronunciation of these names. For context, it would be helpful if someone could provide an estimate of the total number of political parties in the word (either active today, or that have ever existed). While large, I would imagine that it is smaller than the number of companies.--Urszag (talk) 01:49, 12 December 2024 (UTC)Reply
At least 400, roughly 2 for each country. While not every country has exactly two political parties, there are some countries with over two, so it counts. CitationsFreak (talk) 04:00, 12 December 2024 (UTC)Reply

Enable new-L2 tag for all

[edit]

The new-L2 tag gets applied to an edit if a new level 2 ==Heading== is added in that edit to an existing page. This is at present disabled for edits from autopatrollers and admins, but I think it would be nice to have it enabled for all as it would be helpful in e.g. checking language specific recent changes, watchlist edits, etc. and maintain a record of new language entries added by a user (like new pages which can easily be viewed on the contributions page). – Svārtava (tɕ) 09:22, 27 November 2024 (UTC)Reply

I support this. It's useful information in general. Theknightwho (talk) 09:31, 2 December 2024 (UTC)Reply

Adding pink highlighting for quotations in runic script

[edit]

I'm requesting that Runic script (.Runr.) be added to the list of scripts at MediaWiki:Gadget-LanguagesAndScripts.css which use a pink highlight for highlighted terms in quotations, instead of normal bold font. The main reason for this is that some fonts which display Runic script make it hard to distinguish between the regular and bold type. Nor does runic have any history of bold type. ᛙᛆᚱᛐᛁᚿᛌᛆᛌProto-NorsingAsk me anything 11:16, 28 November 2024 (UTC)Reply

Done DoneFenakhay (حيطي · مساهماتي) 11:28, 28 November 2024 (UTC)Reply
They have. Tollef Salemann (talk) 19:20, 28 November 2024 (UTC)Reply

New Wiktionary layout

[edit]

If there is a better place to discuss the new layout, please let me know. I hate the fact that you now have to click to get a search box. Obviously, everyone who uses the site is going to use the search box. IMO. the search box should be available without having to go through an extra click. Neelthakrebew (talk) 14:33, 28 November 2024 (UTC)Reply

You can change it back to whichever you prefer in preferences. Vininn126 (talk) 14:35, 28 November 2024 (UTC)Reply
What new layout? Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 15:47, 28 November 2024 (UTC)Reply
Programmers at WMF recently forced a change that caused a.... forced change of skins for a lot of users. From the old vector to the new, I think. Vininn126 (talk) 17:01, 28 November 2024 (UTC)Reply
It was introduced in Wikipedia some time ago. I had trouble here at first, but got the hang of it. DonnanZ (talk) 22:21, 28 November 2024 (UTC)Reply
I hate it so much. Change is bad. All that wasted space at the right. The horror this morning: how drunk was I last night? What happened? Then I spotted a bold thingummy on the left saying I could revert to the old. Eventually worked out how to do so. Why oh why does anyone change things that work? Who would perpetrate this monstrosity? Is it so people on little squitty toy 'telephones' can see things? You shouldn't be catering to them, you should be banning them. People shouldn't be allowed to add "dog's bollocks Zionist Kevin is a bum" with the edit summary "fix typo" unless they have a real computer. It's so simple. Did I mention I hate change? Hiztegilari (talk) 22:53, 28 November 2024 (UTC)Reply
People also make real edits with phones. CitationsFreak (talk) 00:06, 29 November 2024 (UTC)Reply
I agree with the "all that wasted space at the right" part. Davi6596 (talk) 00:41, 29 November 2024 (UTC)Reply
I'm surprised they didn't notice (and set Vector 2010 as their global skin) back when enWiki switched over nearly two years ago (presumably the reason I didn't notice anything change today)! Whoop whoop pull up ♀️ Bitching Betty 🏳️‍⚧️ Averted crashes ⚧️ 03:15, 29 November 2024 (UTC)Reply
Ugh, I hate the new layout and switched back immediately. I do not understand what it improves. But I also hate change in general and still manage to get used to it, so whatever. Andrew Sheedy (talk) 05:04, 30 November 2024 (UTC)Reply
Is there any way to disable the new theme while logged out? I can't seem to find anything. Binarystep (talk) 01:33, 2 December 2024 (UTC)Reply
M @Binarystep. If some MediaWiki:Sitenotice placed on top of every page with
‑‑Sarri.greek  I 09:22, 2 December 2024 (UTC)Reply

Use of y instead of ij in Early Modern Dutch

[edit]

In the modern Dutch alphabet, the digraph ij is used instead of y (although it's often written like a y with an umlaut), but in Early Modern Dutch y was used (up until 1804 apparently). But if you search Wiktionary for any of the y versions, you won't find them. Should we be including the y versions in Wiktionary? And if so, should they be listed under Early Modern Dutch or just Dutch (Early Modern Dutch is not listed at Wiktionary:List of languages). Nosferattus (talk) 01:35, 1 December 2024 (UTC)Reply

Yes, these forms should be included. Wiktionary views any term written after 1500 to be modern Dutch. There are already a few of these y-forms added: zyn, cyfer. As you can see, they use the {{obsolete spelling of}} template. I think their inclusion is limited, not because they shouldn't be included, but because editors mainly focus on adding terms in current use.
It would perhaps be a good idea to create a template similar to {{pt-pre-reform}}, to better organize the obsolete and superseded spellings.
Stujul (talk) 09:38, 1 December 2024 (UTC)Reply
We still find this in Max Havelaar, written in 1860. For example, on just one page we find myn, my, blyken, hy, tyd, belangryke, twyfel, pryzen, stryden, misdryf and zyn.[5] The author used his own, somewhat idiosyncratic spelling, though.  --Lambiam 10:30, 1 December 2024 (UTC)Reply
Thanks! I've started to add some of the more common words like hy and my. Nosferattus (talk) 02:19, 4 December 2024 (UTC)Reply
Regarding Dutch spelling, pannekoek is given as superseded spelling of pannenkoek, and the latter is the official spelling, but not everyone agrees with that, see Witte Boekje. Shouldn't 'pannekoek' be rather a 'non-official spelling'?
More examples here. Exarchus (talk) 19:18, 5 December 2024 (UTC)Reply

Reveal potentially shocking/NSFW images only upon clicking?

[edit]

I visited loxoscelism to add a translation and was greeted by a slightly revolting image.

I would be in favor of hiding such images behind a "click to reveal" message so that they aren't shown by default. This should be quite easy to do using JS.

What do others think? — Fytcha T | L | C 13:13, 1 December 2024 (UTC)Reply

@Fytcha: The image would still be visible in the mobile search: [6]. It would be better to just make that image an external link. Ioaxxere (talk) 15:17, 1 December 2024 (UTC)Reply
FWIW, this was discussed in 2015 and last year. If we start censoring images, it's a slippery slope: people made headlines the very week we last discussed this for censoring Michelangelo's David. People—you see them on Talk:gay as recently as this week—complain that gay people are pornography / NSFW, and pass laws to that effect. There are people who think, and seek laws saying, trans people are pornography / NSFL. There are people (conservative Jews, Muslims, Christians) who think images of any women are NSFL. People complain about the image at swastika, or issue legal challenges (at least to WP) over maps of countries they'd prefer had different borders. Some people object to the image at penis, or the image at areola, but I think they're worth a thousand words, and don't see why a workplace would be OK with someone looking up penis, and only freak if the dictionary were illustrated—as others said in prior discussions, if one works at such a place, one may need to avoid Wiktionary at work, since images are liable to show up.
With that said, I acknowledge that it's reasonable that we unofficially have some practices, e.g. the entry for mangle doesn't contain an image of a mangled body even though it theoretically could. I don't mind the image at loxoscelism, but I'm not entirely opposed to hiding some images behind a click... I'm just very wary of the slippery slope.
One idea, if this doesn't exist already, is an opt-in gadget which would hide all images and require a click to see them; that avoids the slippery slope by being image-agnostic and opt-in. - -sche (discuss) 16:43, 1 December 2024 (UTC)Reply
@-sche: Thanks for the reply as well as the links. One of my take-aways is that selectively hiding pictures (behind a "click to show" message) is not at all "politically unviable" on Wiktionary.
As pointed out plentifully, finding a sane demarcation will prove difficult. Reading these discussions, the impression I got was that the wisest strategy would be to start with a very liberal policy (that is however enacted for everyone by default in an opt-out fashion) and then have people incrementally work out amendments in subsequent (BP) discussions. These kinds of demarcations are not found conclusively in a single sweep. My mentioning of "NSFW" above was probably ill-advised, so what I'd suggest now as a starting point for which images to hide by default is (medical) gore, i.e. photos of wounds, deformations, the effects of disease, photos taken during surgeries, etc.
one may need to avoid Wiktionary at work, since images are liable to show up.: That's true; currently, people cannot access Wiktionary at work (or in similar situations) free of risk. What I would point out is that this is an unusual and thus surprising fact about Wiktionary as a dictionary. Of all the dictionaries I've used, I don't think there has ever been another one where I had to be cautious using it in front of other people. — Fytcha T | L | C 19:07, 1 December 2024 (UTC)Reply
Whether it is a slippery slope depends on the art of formulating policy, otherwise of course it can be watered down if we are unsure about it. We can distinguish motivations by which people might avoid images. For cases of medical irregularities the hardiness which we expect differs — one may well prefer a certain time of decision and mental preparation to see the image because there is only so much repulsive content any one can consume without his affective wellbeing being called into question – from the responsiveness to the regularly behaving exposed human body. If someone does not suffer locally appropriate coverage of it on كَتَبَ (kataba, to write) it is his problem and it is not even easy to have a depiction of an action while on the other hand the majority of the internet is porn anyway, and grounds for much greater dissonances and contradictions to scripture offended readers would have to care about, calling the survival of Islam in the 21st century into doubt, a question of available and appropriate attention we have to put into the balance. We do not have to equate illness, violence, nudity, and making love. There is also a historical depth to the matter: I guess Nazi stuff falls under “violence” but we can expect a distance towards things because of how long ago a thing prevailed, possibly again leaving only a limited number of images.
However yes, I’d rather not burden our editors with dealing with thinking about the general guidelines even, and keep a policy of deliberate ambiguity beyond what we have written. You could try some technical execution anyway of course, just that, unless we exert ourselves much to bloat our policy pages, the eventual uncontentiousness of which is doubtful, we won’t deploy it with discernible regularity beyond reverting new users futzing around with images by reason that “I have 10,000 edits per year/I am admin and I know well enough which pictures are appropriate in the given context, you however have an ideological agenda, from what I can see.” It would result in templates and/or gadgets which, in effect, new users would be discouraged to use, not to say disallowed. Fay Freak (talk) 17:48, 1 December 2024 (UTC)Reply
@Fay Freak: [...] unless we exert ourselves much to bloat our policy pages, the eventual uncontentiousness of which is doubtful, we won’t deploy it with discernible regularity beyond reverting new users futzing around with images [...] I think I agree which is why I'm thinking towards some kind of consistent policy. We're currently in the wild west with respect to images.Fytcha T | L | C 19:14, 1 December 2024 (UTC)Reply
FWIW this is being discussed on Wikipedia, too: Wikipedia:Village_pump_(policy)#Can_we_hide_sensitive_graphic_photos?. (On the whole, I find myself in the NOTCENSORED camp.) - -sche (discuss) 01:45, 3 December 2024 (UTC)Reply
Here's the right link: w:Wikipedia:Village pump (policy)#Can we hide sensitive graphic photos?. FWIW, I think we should hide them, mostly for the low-bandwidth peoples. CitationsFreak (talk) 03:02, 3 December 2024 (UTC)Reply

Template:defdate and pre-1500 dates

[edit]

A number of English entries contain things like this (at sky):

  1. (obsolete) A cloud. [13th–16th c.]

However, we take the cutoff between Middle English and Modern English to be around 1500, so it has always struck me as anachronistic to say that the Modern English word arose in the 13th century. That information should be at the Middle English entry.

I'd like to propose moving the origin date of these senses to Middle English, then replacing the Modern English {{defdate}} invocation (some of which can be found using this crude search) with

  1. (obsolete) A cloud. [Middle English–16th century]

or

  1. (obsolete) A cloud. [until the 16th century]

(Also, this is not a paper dictionary, so there's no good reason to abbreviate "century" as "c.")

Thoughts? This, that and the other (talk) 05:39, 2 December 2024 (UTC)Reply

Commenting and subbing as I have been wondering the same for Lechitic lects. I similarly do not use {{etydate}} in Polish if the term was inherited from Old Polish, etc. Vininn126 (talk) 08:31, 2 December 2024 (UTC)Reply
Support. Tollef Salemann (talk) 08:51, 2 December 2024 (UTC)Reply
I don’t see a contradiction perforce. You propose to water down, information that could later be used, to edit history. If these datings are credible information at all and not random attestation ages that can happen with the Middle Ages; we still have not solved the problem of regularly labelling “reconstructed” lects, which would allow us to cleanly state things like “probably in the 4th century already, but attested from the 9th”; okay I sometimes use the etymology for this, as on بَال (bāl), if a reconstruction entry is not feasible. How is the move of Byzantine Greek going? 🙄 Fay Freak (talk) 18:05, 3 December 2024 (UTC)Reply
Would sister languages also be marked as being attested since that time? Would we use Latin to mention since when we see attestation dates of Spanish? Vininn126 (talk) 18:47, 3 December 2024 (UTC)Reply
Our coverage of Latin is larger. The decision depends somewhat on how secure an individual language’s editors are expected to be with the corpora, and what they can expect to be created any time soon. If we had lots of Greek entries having such phrasing as proposed, the planned reorganization would be considerably more challenging, demanding to revisit the attestation situation in affected cases. Just let the editors—including you—leave as much as they know, in so far as it is not overwhelming to the eye?
To ever halt before your problem, one has to construe one’s task as an editor gigantic enough that one boasts to never leave any gap, inconsistency, or inconsequence, which also does not align with reality, in as much as the presence of a gap, inconsistency, and inconsequence appears to align not with the actual reality of a language. Instead we acknowledge our finite manpower. Not some imaginary limit stemming from language cutoffs, the purpose of which apparently one has remind editors about once in a while again: inasmuch as they are justified by mutual intelligibility of languages, they do not constitute impermeable walls, though we may remember them as such and speakers constitute their identities by such ideas to some degree; instead the language headers, subheaders and labels are there to communicate something which you otherwise wouldn’t immediately relate to them. Seen in such a way, the defdates to senses are, beyond their situation in time and place—as identified by dialect and chronolect headers—, exactly what the dictionary glosses to senses of a word are supposed to do. What you bring up as a question of logics turns out a question of balance. Fay Freak (talk) 22:27, 3 December 2024 (UTC)Reply
I'm still overall against including it over our arbitrary boundaries. Vininn126 (talk) 22:33, 3 December 2024 (UTC)Reply
Adding the information to the Middle English entries definitely seems like a good idea. While I can see the theoretical justification for replacing dates before 1500 with "Middle English", I'm not sure that change is really an improvement: it obviously removes some information, and the periodization convention of distinguishing between Middle English and Modern English is not particularly significant in and of itself.--Urszag (talk) 22:35, 3 December 2024 (UTC)Reply
I support using dates with definitions, so 13th to 16th century and not from Middle English to the 16th century. The date rage is more informative and easier to read. The sometimes arbitrary boundaries between stages of a language can live in the etymology section and the categories generated from it. Vox Sciurorum (talk) 00:30, 4 December 2024 (UTC)Reply
I would delete that from the list of Modern English senses and move it to the etymology section (‘from Middle English foobar”…’) and to the Middle English entry. Nicodene (talk) 21:09, 4 December 2024 (UTC)Reply
@Nicodene: 16th century is Modern English, hence if attested it should not be removed. J3133 (talk) 07:58, 7 December 2024 (UTC)Reply
Obviously. I was going by the “until the 16th century”. Nicodene (talk) 09:30, 7 December 2024 (UTC)Reply
By all means add information to Middle English entries, but I don't see any reason to remove it from English entries. The proposal just makes things vaguer and more imprecise. The distinction between ‘Middle English’ and ‘Modern English’ is just a historical convention anyway; for a linguist, enforcing this distinction in practice is next to impossible if you're working with texts from the 16th century (as I have done here in the past). At least with Old English there is a clear break in the written record which makes the change in grammar and vocabulary pretty sharp. Ƿidsiþ 07:23, 7 December 2024 (UTC)Reply
I agree. Plus, anyone who cares about the distinction between Middle English and modern English can extrapolate from the dates given. But anyone who has never heard of Middle English (which is probably most people) won't find the information meaningful. Andrew Sheedy (talk) 05:04, 20 December 2024 (UTC)Reply
I still feel this doesn't hold up for languages like Latin with multiple children and also the fact it's well known. Vininn126 (talk) 08:54, 20 December 2024 (UTC)Reply
I agree. English is a bit of a special case relative to other major languages, because so much of its vocabulary entered the language late. I wouldn't go past Old English, and maybe not even that far (I would be fine with the defdate template reading [Old English to present], but not [Middle English to present]. Andrew Sheedy (talk) 17:18, 20 December 2024 (UTC)Reply
So would I, I suppose – especially since the dates of OE texts are often a bit speculative. Ƿidsiþ 06:40, 21 December 2024 (UTC)Reply
Support, provided the 'removed' information is transferred over to Middle English. I disagree with changing "c." to "century" though. Regardless of whether you're doing things online or on paper, it's generally a good idea to optimize the space used and keep things concise; it just looks prettier that way. MedK1 (talk) 04:05, 26 December 2024 (UTC)Reply

FYI: December 2024 Unicode update

[edit]

https://us11.campaign-archive.com/?u=c234d9aba766117eac258004b&id=d533f3804fJustin (koavf)TCM 23:19, 2 December 2024 (UTC)Reply

'LANG forms' -> 'LANG spellings'

[edit]

IMO it is confusing that we use 'forms' to mean 'spellings' in categories like Category:American English forms and Category:European Portuguese verb forms and Category:Brazilian Portuguese forms superseded by AO1990; also for that matter, more generally in CAT:Obsolete forms by language, CAT:Archaic forms by language, etc. Most of the descriptions of these categories make clear that the "forms" referred to are superseded/archaic/obsolete/etc. spellings, not some other kind of form. Even opening up Category:Ukrainian archaic forms produces 5 subcategories whose names all contain "spellings" or "terms spelled with" in them. Unfortunately the term "form" is badly overloaded at Wiktionary; any action to reduce the overloading is welcome in my book. So I propose at first to rename ad-hoc language-specific categories containing 'forms' to 'spellings'; and if there are no objections, rename the more general 'LANG superseded/archaic/obsolete/dated/rare/uncommon/informal/nonstandard forms' -> 'LANG superseded/etc. spellings'. Any terms that are in a 'LANG foo forms' category but aren't mere spelling variations should be moved to the corresponding 'LANG foo terms' category (which exist for all 'foo' except for 'superseded', but 'superseded' seems specifically for spellings, so this is unlikely to be an issue). The only 'foo forms' category I've excluded is 'LANG short forms', which is using "forms" differently, and should eliminated in favor of either 'LANG ellipses', 'LANG clippings' or 'LANG abbreviations' (depending on what sort of short form is involved), but that's a different can of worms. Benwing2 (talk) 09:08, 7 December 2024 (UTC)Reply

@-sche Sorry to ping you directly but surprisingly no one has commented and I figure you might have something to say here. Benwing2 (talk) 09:46, 9 December 2024 (UTC)Reply
Thanks for the ping. The entries in these categories don't seem to be all of one type: it seems they will need pruning (especially, but not only, if renamed) iff people still want to distinguish spellings from forms. (Or are we abandoning that distinction? I know some later commenters in that discussion argued for that instead, and I'm not sure whether a decision was reached or, if not, which approach would be best.)
For example, I see we have anemia as an American form of anaemia (it should indeed rather be spelling if we're distinguishing those two words), but we also have airfoil in the same "American forms" category but using an "American spelling" label although it differs from aerofoil in more than just spelling. Likewise Abissinia, currently listed as an archaic "form", would be better as a "spelling", but the difference in adipsy and adipsia is not just spelling; abyssus, currently presented as an "archaic form of abyss", also does not seem like a mere archaic "spelling", but perhaps it is also best not as a "form" but as an archaic (or obsolete) synonym of abyss, or as (obsolete) Abyss. So, especially (but not only) if renaming the categories, it seems like we need to decide what we want the scope to be, and whether we want to distinguish "only the spelling is different" and "the pronunciation is also different" or combine those two things...? - -sche (discuss) 17:23, 9 December 2024 (UTC)Reply
Hmm. In practice I suspect people won't be able to distinguish mere archaic/obsolete/American/British spelling variants from those that also differ in pronunciation (aluminum vs. aluminium). At the same time I think "form" is far too overloaded. Maybe we could say "American English variants" etc.? Also technically the "European Portuguese verb forms" vs. "Brazilian Portuguese verb forms" reflect slight differences in pronunciation; they are mostly in past tense -amos (Brazilian) vs. -ámos (European), which is meant to indicate a difference in vowel quality. Likewise although the majority of "Portuguese forms superseded by AO1990" are just spelling differences, there are a few that are not, e.g. pre-reform abeto Douglas vs. modern abeto-de-douglas (although in that case the definition specifically says "pre-reform spelling of ..." and it seems there was also a pre-reform abeto de Douglas). So maybe we should use the term "variant". As for alt forms vs. alt spellings, I do think we should try to make that distinction since some of the things tagged as "alt forms" differ quite a bit from the form they are said to alternate with. Benwing2 (talk) 20:50, 9 December 2024 (UTC)Reply
Me and the Portuguese editors I know use {{alt spelling of}} when the difference is in spelling but not in pronunciation, at least “phonemically” — i.e., different spellings between European and Brazilian Portuguese are alternate spellings because the difference comes from each dialect’s pronunciation of phonemes, not just of that particular word. Meanwhile, I use {{alt form of}} when it’s a different pronunciation that doesn’t stem from a systematic difference between dialects.

However, this distinction in template usage is almost entirely moot if the category that gets assigned is the same. I think the most useful decision is to create new categories, 'LANG archaic spellings' etc., as daughters of 'LANG archaic forms' etc. Though this would need us to pay some real attention to replace the category tree definitions as well as the categorizations called by templates. Polomo47 (talk) 01:48, 10 December 2024 (UTC)Reply

Template:syncopic form / Template:syncopic form of

[edit]

syncopic seems to be vastly less common than syncopal, which is itself less common than syncopated (see ngrams). Should we rename these templates? PUC16:09, 8 December 2024 (UTC)Reply

Maybe it should just be {{syncope}}/{{syncope of}}, since we already have {{clipping}}/{{clipping of}}, {{ellipsis}}/{{ellipsis of}}, etc.? Benwing2 (talk) 09:53, 9 December 2024 (UTC)Reply
This sounds better to me. Vininn126 (talk) 10:05, 9 December 2024 (UTC)Reply
Or just replace it entirely with {{clipping}} (of), since syncope is a type of clipping anyway, and it's not clear why one would want a specialized category for it. Nicodene (talk) 10:23, 9 December 2024 (UTC)Reply
I actually only had instances of clipping in Polish entries. Syncopy might be seen as more phonological and clipping is often a process in more colloquial things. Not sure. Vininn126 (talk) 10:25, 9 December 2024 (UTC)Reply
There isn't a difference, as far as I am aware, other than the fact that syncope refers to clipping in medial position. And that it sounds fancier. Nicodene (talk) 10:37, 9 December 2024 (UTC)Reply
I too was under the impression that syncopy is a purely "mechanical" phonetic process whereas clipping is a deliberate removal of syllables used to coin new words. Not that I have any source to support that interpretation but... PUC11:18, 9 December 2024 (UTC)Reply
I can't find any sign of such a difference outside the realm of (accidental?) Wiktionary convention. Google Books, for instance, brings up a laundry list of sources confirming that these are indeed synonyms. Nicodene (talk) 12:00, 9 December 2024 (UTC)Reply
I have to add my voice to the chorus of people saying that I find the present distinction to be valuable. I certainly wouldn't insist on the current terms used - and I am increasingly convinced we shouldn't keep using them as we are. But distinguishing between clipping that occurs as part of a gradual phonological process (e.g. Romance syncope, or English fancy) vs deliberate, conscious truncation (e.g. math) seems very valuable. This, that and the other (talk) 00:22, 10 December 2024 (UTC)Reply
Very well. In that case the issue is finding a pair of terms that can reasonably be specialized in the way that you have described.
We could try elision and shortening respectively. Nicodene (talk) 04:07, 10 December 2024 (UTC)Reply
I personally am fine with elision and clipping respectively since we're already using clipping in the sense of "conscious truncation". Benwing2 (talk) 07:27, 10 December 2024 (UTC)Reply
Seconded. Vininn126 (talk) 07:29, 10 December 2024 (UTC)Reply
I agree with @Nicodene here; I don't see why we need to categorize syncopes, apocopes and aphereses separately from clippings. See Wiktionary:Requests_for_moves,_mergers_and_splits#Template:clipping_of,_Template:aphetic_form_of;_Template:clipping,_Template:aphetic_form (but unfortunately there was pushback for this). Benwing2 (talk) 10:26, 9 December 2024 (UTC)Reply
Alternatively, if "clipping" seems specific to colloquial language, we could merge syncopes/apocopes/aphereses into "elisions". Benwing2 (talk) 10:29, 9 December 2024 (UTC)Reply
It could be that these are all part of the same process, I'm not sure I like the usage of ellipsis - differentiating skipping a word versus a syllable (and from there skipping a syllable in other ways) could be useful. Perhaps that could be a separate parameter. Vininn126 (talk) 10:32, 9 December 2024 (UTC)Reply
@Vininn126 elision, not ellipsis -- they are different processes and would be categorized differently. Benwing2 (talk) 11:07, 9 December 2024 (UTC)Reply
Ah, yes you're right! So perhaps there's something to that, then. Vininn126 (talk) 11:08, 9 December 2024 (UTC)Reply

Why is there no quote-thesis template?

[edit]

We have cite-thesis. I was wanting to add a quote from a thesis to an entry, and the quote, quote-book, and quote-journal templates are not fit for purpose. Is it worth putting it to a vote? I don't know if I can do that as I've only been active on Wiktionary quite recently. Cameron.coombe (talk) 23:04, 9 December 2024 (UTC)Reply

@Cameron.coombe: I think {{quote-book}} is fine for this purpose, and don’t think a separate template is required. — Sgconlaw (talk) 23:10, 9 December 2024 (UTC)Reply
@Sgconlaw cheers, I assumed thesis titles needed to be set in quote marks, not italics, but that's Chicago style. Does Wiktionary not require this? Cameron.coombe (talk) 23:15, 9 December 2024 (UTC)Reply
I don't think we're that fussy 😊 This, that and the other (talk) 23:37, 9 December 2024 (UTC)Reply
I usually use quote-journal or quote-book and add "|genre=Thesis" as a ham-handed work-around. --Geographyinitiative (talk) 23:45, 9 December 2024 (UTC)Reply
Thanks all! Will proceed with boldness Cameron.coombe (talk) 00:11, 10 December 2024 (UTC)Reply

Request AutoWikiBrowser

[edit]

I've been doing mass-correction of Portuguese pre-reform or otherwise archaic spellings — for reference, see how many pages are listed in WT:RFVI, and how I've cleared out [[Category:Portuguese superseded forms]]. My current project is clearing out the categories dated forms, archaic forms, and, above all, obsolete forms.

I think AutoWikiBrowser might just be able to help me with my antics — they must've helped my friend @MedK1 —, so I'd like to request access. Polomo47 (talk) 01:37, 10 December 2024 (UTC)Reply

@Polomo47 I gave you access. Benwing2 (talk) 07:26, 10 December 2024 (UTC)Reply

Use of titles in quotes and citations

[edit]

I'm just fixing a quote here and noticed tbe editor put a title (Dr. med.) preceding the author name. Is this established practice here? I don't personally like it because it's clutter and likely not applied consistently. But I couldn't find a policy. Cameron.coombe (talk) 04:45, 10 December 2024 (UTC)Reply

@Cameron.coombe: I don’t think we have a policy on this yet, but I always remove titles and forms of address unless they are strictly required to identify the author (for example, in some early works, female authors were named after their husbands, as “Mrs. John Smith”). — Sgconlaw (talk) 05:07, 10 December 2024 (UTC)Reply
@Sgconlaw thank you. Do you think it's worth me drafting a policy proposal? Cameron.coombe (talk) 05:26, 10 December 2024 (UTC)Reply

Do we not label attributive adjectives?

[edit]

I noticed that common attributive-only adjectives are not labelled as such:

Is there a reason behind this? Whether an adjective is general use (so no note), attr only, postpositive only, or pred only is important information, especially for non-native speakers, and it's provided in other dictionaries. There is an attributive label in the lb template, but it links to a gloss of the meaning for nouns, not adjectives, and, at least based on the common examples above, doesn't seem to be in use? Cameron.coombe (talk) 10:53, 11 December 2024 (UTC)Reply

It is a counsel of perfection that we should properly label every adjective sense that needs such a label. Add the label to the appropriate senses when you find them. I can't think of a practical way to detect all the cases where such labels are missing. A list from some source would be helpful, probably just for the more common cases.
We have labeled as "attributive" (mostly not "attributive only") some 200+ English terms. To label a sense of a polysemic adjective "attributive only" may risk user confusion. DCDuring (talk) 14:38, 11 December 2024 (UTC)Reply
@DCDuring thank you for the thoughts. I'm quite happy with simple "attributive," which is what I'm familiar with from other dictionaries. "Mostly attributive" can also be helpful if pred. sense is rare or nonstandard but attested. I'm not sure about the label auto-linking here though when I'd use it of adjs. Cameron.coombe (talk) 22:11, 11 December 2024 (UTC)Reply
If you are saying that we should link the attributive (and postpositive and predicate) labels to something explanatory, I agree, though I would usually settle for our entries for the terms or {{senseid}}ed definitions at the entries. It also might make sense to have categories for the terms that have such labels. Making the changes required is not in my wheelhouse. DCDuring (talk) 22:30, 11 December 2024 (UTC)Reply
@DCDuring cool, thanks. I wasn't familiar with senseid. I can have a play next time I need to. My only concern now though would be adding a whole lot of attributive labels and then having someone go through and revert them. I've got your support, but I don't know how universal that translates to! Cameron.coombe (talk) 23:15, 11 December 2024 (UTC)Reply
@Cameron.coombe: I would also Support your additions, FWIW. 0DF (talk) 00:02, 12 December 2024 (UTC)Reply
Maybe, we should give folks a chance to comment.
I'm already disagreeing with myself about my rejection of attributive only as a label rather than attributive. The normal ("unmarked") state of an English adjective is that is prepositive and usable both attributively and as a predicate. The function of our labels is to mark exceptions to the unmarked state. Bare attributive does not do this, IMO. I don't know that we can be certain that only should follow attributive, because exceptions are likely, if not now, then perhaps in the past, and if not in UK and North America, then in Australia, the Caribbean, or India. Maybe the default for all of these should contain usually, with stronger only reserved for cases where the supporting evidence is strong. DCDuring (talk) 00:32, 12 December 2024 (UTC)Reply
True, attributive only or usually attributive is more precise. Other dictionaries use simply attributive, but probably because of space restrictions. (I know space isn't a concern for Wiktionary, but is clutter?) For exceptions, I would handle these as subdefs:
former
  1. (attributive) Previous.
    1. (nonstandard, India) Predicatively [I can't think of a usage example lol]
I'm sure this isn't used in India, just an example. Cameron.coombe (talk) 00:57, 12 December 2024 (UTC)Reply

The U4C is ordering an admin to respond to a block appeal

[edit]

User Ghilt from the Universal Code of Conduct/Coordinating Committee (U4C) has proxy-posted a 3rd block appeal on User_talk:Gapazoid. They have stated that an en-wiktionary admin other than the blocking admin (Surjection) must read and respond to the unblock request. 2603:6011:C801:9FED:B33F:5C52:5400:3763 14:56, 11 December 2024 (UTC)Reply

Why not just change the blocking reason to "disruptive editing" and have done with it? 0DF (talk) 18:22, 11 December 2024 (UTC)Reply
I'm fine with changing the block reason, but I maintain that this editor cannot be allowed to edit again. — SURJECTION / T / C / L / 18:57, 11 December 2024 (UTC)Reply
@Surjection: As far as I can make out, your new block reason should satisfy the blocked editor. I think it's reasonable, FWIW. 0DF (talk) 19:23, 11 December 2024 (UTC)Reply
Are you the IP who said they'd kill themself? Polomo47 (talk) 19:01, 11 December 2024 (UTC)Reply
I note that User:Gapazoid is (in addition to being locally blocked) globally locked as a "Vandalism-only account", though User:Xaosflux stated that global unlocking might be considered if en.Wiktionary unblocks. Unless Gapazoid has deleted contributions on other wikis that I can't see, I actually find the global lock rationale harder to understand than the local block rationale; the user appears to have edited only en.Wiktionary and en.Wikipedia, and the few edits to en.Wikipedia appear to be mundane copyediting.
AFAICT Gapazoid has made only a single edit to Wiktionary content (?), to MAP; the only other (eight) edits the user has made were to his/her talk page; is this correct? (I see no deleted contributions.) If Special:Contributions/2600:387:0:803:0:0:0:95 (also locally blocked and globally locked) and/or Special:Contributions/2603:6011:C8F0:E4E0::/64 are the same person, their own sole contributions were to threaten, on Gapazoid's talk page, to commit suicide. If the user has made other edits I have missed, either on Wiktionary or elsewhere, I hope someone will bright them to light.
The user's edit to MAP was to change the usage note from commonly interpreted as a sign that the speaker supports (or is sympathetic to) such people to ...a sign that the speaker supports sexual contact between adults and children. That change seems mistaken / incorrect to me, and had I seen it, I would have undone it with an edit summary explaining that the "supports such people" language seems more accurate, but — if the edit had been made with no edit summary, or with a mundane edit summary — I would have taken it to be a mistaken but good-faith edit, not vandalism, and would not personally have issued a block. However, the edit was made with an edit summary which, like the user's posts on his/her talk page, state that he/she is a pedophile but is opposed to child sexual abuse.
I can understand the user's objection to the original block summary saying he/she engaged in "pedophilia advocacy", and I appreciate the improved block summary. I also understand the position that a user openly announcing himself/herself as a pedophile is disruptive, somewhat similar to w:WP:HID; threats of w:WP:SUICIDE also seem disruptive. I'd also note that my spider sense is that people who are blocked for things like this [edited to add for clarity: meaning "borderline disruptive things", not meaning "pedophilia-related things", which are rarer] and then spend this much time trying to get this many wikis / functionaries / organs of the WMF / etc involved in unblocking them . . . in the situations in the past where it's happened, such users have either been felt also by other wikis' admins to be NOTHERE (and so remained blocked not only here but also on other wikis that considered their appeals), or have been unblocked but then proven themselves to indeed be disruptive (NOTHERE, here only to bog people down in debates, etc) and gotten reblocked in time. Considering all of that, I, for my own part, as just one admin here, decline to unblock. If other admins (or other editors) want to weigh in, I encourage them to do so! I pinged Xaosflux above to make him aware of this discussion, and now ping User:Ghilt and will also leave a note on Gapazoid's talk page pointing to this discussion. - -sche (discuss) 00:44, 12 December 2024 (UTC)Reply
I remember a pedophilia commenting on BP, saying that are a pedophile. It was later hidden within a folder, though. CitationsFreak (talk) 03:53, 12 December 2024 (UTC)Reply
Thank you for your statement, -sche. And also thanks to Surjection for changing the log entry. This concludes the matter for us. On behalf of the U4C, --Ghilt (talk) 09:27, 12 December 2024 (UTC)Reply
We can entertain user lock appeals at m:Special:Contact/Stewards, and yes: overcoming community blocks is a good way help such an appeal be successful. Xaosflux (talk) 12:24, 12 December 2024 (UTC)Reply
After a private lock appeal I have unlocked the account. To respond to comments here, the lock was implemented after an SRG request due to pedophilia advocacy - similar to why we, for example, lock accounts for uploading CSAM on Commons, even if they only edited that project. With that being said, they have made a reasonable further explanation to me in private and I see it as a sign that this can currently be locally handled. EPIC (talk) 16:20, 19 December 2024 (UTC)Reply

Protecting pages as "model pages"

[edit]

Saltmarsh (talkcontribs) has semi-protected a couple dozen Greek entries as "model pages". I don't think this is a good practice, since it deters editors who could materially improve these pages (no dictionary entry is ever complete), and there are much better approaches, e.g. having example entries in a separate namespace. — SURJECTION / T / C / L / 18:46, 11 December 2024 (UTC)Reply

The full list of protected pages appears to be -τερος, Άγγλος, άγγλος, αγγούρι, αγγούρια, ακρογωνιαίος, ανηψιών, ασκί, βαθύς, βρέχει, λύνομαι, λύνω, μεταφρ., περισσότερος. Some of these were originally fully protected (i.e. only admins could edit them). — SURJECTION / T / C / L / 18:48, 11 December 2024 (UTC)Reply
See also Template talk:el-model-page. PUC19:23, 11 December 2024 (UTC)Reply
@Surjection, PUC, these pages were not locked; I have edited often (they used to be protected from anonymous greek editors who mostly write vanadlisms about soccer teams, and silly schooljokes. That is because we -editors of modern greek- are not around every single day). The models are in Category:Greek model pages so that we can copypaste from them. All languages should have copypaste models for us: because wikitext is getting harder and harder. Also see a trial at User:Erutuon/Ancient Greek model pages which is even more complicated. I always try to find copypaste patterns from recent edits by administrators; I would have liked to have them in some Cat with their endorsement, rather than going around Histories and their Contributions, hoping to find something similar to my task. If not protected, fine: but someone has to patrol them. Thank you. ‑‑Sarri.greek  I 10:47, 12 December 2024 (UTC)Reply
These pages are still semi-protected and many of them were admin-protected. I don't see any "anonymous greek editors who mostly write vanadlisms about soccer teams, and silly schooljokes" in the history of any of these pages, so they cannot simply have been protected to guard them from vandalism.
The idea of model pages on its own appears sound, but it's not a good idea to make the actual mainspace pages the 'model pages' and then protect them because they're 'model pages'. These should be in a separate namespace. — SURJECTION / T / C / L / 10:51, 12 December 2024 (UTC)Reply
No problem: unlock them, M @Surjection. We can make a List and write specific examples -because they cannot be changed without discussion: they are heavily copypasted- at the About Greek page or Help Greek. My administrator @Saltmarsh has done SO much for Modern Greek! I would like to help him a bit. It's just... mmm I need a little help from programmers. For example, a little template for the Orthographic Reform to monotonic of 1982. (cf Άγγλος.2024 cf Notes Little things like that. I could make it myself, but from experience, I see that only interface programmers check Templates and make them in a correct way. ‑‑Sarri.greek  I 15:52, 12 December 2024 (UTC)Reply
@Surjection, Sarri.greek As far as I can see these "Model pages" do no harm (kindly point any out if you any see one). New editors need help with layout, not always easily extracted from "Help". Protecting them (which again does no harm) ensures that any changes in suggested layout can be discussed.   — Saltmarsh 14:29, 12 December 2024 (UTC)Reply
Yes, thank you @Saltmarsh. Need to trust some pages; the ones checked by an admin. By the way, I am checking some of the pages. When robots finish their work, we can check again. (... I know only named parameters, cannot remember the sequences of positional params: I hate it). I have to throw away alll my cheatsheets. Thank you, dear Salt!! ‑‑Sarri.greek  I 14:38, 12 December 2024 (UTC)Reply
The harm they do by being unnecessarily protected is to prevent users from editing them. This goes against the entire idea of having a wiki. — SURJECTION / T / C / L / 15:21, 12 December 2024 (UTC)Reply
I would oppose protecting any page in principal namespace on the grounds that it is a model page. Such model pages might be useful in Wiktionary space. I wonder how that could work in any page with multiple L2 sections.
It might be useful to have templates, possibly located on entry talk pages, that indicate that a given L2 section has achieved some stage of "completion", so that contributors could find such "models". DCDuring (talk) 14:56, 12 December 2024 (UTC)Reply
Nice idea, thank you M @DCDuring. Something analogous to wikisource's coloured bars. not reviewed / reviewed - see List so and so. A list of 'SOS' pages can be created, especially the ones with 3 Greek L2s, 2 Greek L2s, for every part of speech or inflectional group etc. Usually, I edit Ancient and Modern Greek in unison (lots of pages coinicide and Modern refers all the time to previous etymologies and inflections. Especially with Hellenistic Koine -which has many problems and is usually ignored-). I hope robots will normalise the standard templates because it is very difficult to have 2 or 3 ways to write the same thing in the same page. I am awaiting also for the pending Medieval Greek gkm. Thank you all for your attention. ‑‑Sarri.greek  I 15:11, 12 December 2024 (UTC)Reply
I realized that the situation in Ancient/Modern/Medieval? Greek made the model-page-in-principal-namespace idea practical for those languages, as other languages do not use the same characters. But it wouldn't work so well for CJKV entries where the different L2s often have different levels of development. I would prefer an approach that worked across all kinds of entries with multiple L2s. Maybe it would be useful to see what works for Greek-character entries along the lines that you suggest, without protecting model pages. That might be a 'model' for entries with multiple L2s in other character sets. DCDuring (talk) 15:30, 12 December 2024 (UTC)Reply
I agree with DCDuring there may be a case to be made for putting such entries in the Wiktionary namespace, but I also agree strongly with Surjection that these protections should be reverted. This is a bad use of the page protection mechanism. — Mnemosientje (t · c) 19:53, 17 December 2024 (UTC)Reply

──────────────────────────────────────────────────────────────────────────────────────────────────── Well @Surjection "against the entire idea of having a wiki." Wikipedia has numerous such protected pages. These pages do no harm at all - I suspect that basically you "just don't like them". Well I do!, and these interminable discussions, which some people seem to relish, really piss me off !!   — Saltmarsh 19:58, 12 December 2024 (UTC)Reply

Pages are protected because of vandalism, because of high use rate (templates, modules, etc.) or because they are non-content pages that should not be edited by anonymous users. Neither applies here. Again, if we want to have "model pages" that are protected, then they need to be copies outside of mainspace. — SURJECTION / T / C / L / 20:45, 12 December 2024 (UTC)Reply
I agree that entries should not be fully protected (admin-only) unless they have been or are likely to become the target of enough vandalism to warrant that (and even then, unless the vandalism has been enduring, protection should generally be temporary, like the protection applied to words that appear on the mainpage). Protecting pages (even at a lower protection level) simply because they are "good" is not the way to go; as recent edits to some of the pages mentioned here have shown, they were far from complete, so preventing some people from improving them is inadvisable. I agree with Surjection that if the goal is to show ideal formatting (or such), it is better to have examples (or even one single example, e.g. made-up word illustrating all possible things, e.g. how to format an adjective, a verb, a noun, all at once) somewhere in Wiktionary: space like the language's "About" / "Entry guidelines" page.

Inspired by the discussion above, I looked at what other pages are indefinitely edit-protected at high levels. 吃飽 has been indefinitely protected, allowing only template editors(!) and admins to edit, since an edit war 2019; is this still needed? The user who was edit-warring back then seems to have matured. (Even if there is still a problem, we now have the ability to block specific users from editing specific pages, while still allowing them to edit the rest of the site, which seems like it'd be better than protecting the whole page and thus blocking anyone from editing it.) - -sche (discuss) 00:08, 13 December 2024 (UTC)Reply

Chiming in: Protecting pages for them to be models is obviously unsound. This is platonic idealism, which does not hold water empirically given that there is always room for improvement, you just have not exerted yourself long enough on it. And protecting pages always expresses distrust to users, which needs to have some basis other than the quality of the page. Fay Freak (talk) 00:57, 13 December 2024 (UTC)Reply

Cantonese, Hainanese, and Hakka lemmata treated as Chinese

[edit]

I've recently done some work periodically to clear out Wiktionary:Todo/Lists/Derivation category does not match entry language. Currently, there are ten entries in categories entitled Cantonese, Hainanese, or Hakka terms derived from X which are not also part of the corresponding lemmas or non-lemma forms categories. Chinese/Cantonese 0T used to be an eleventh such entry, but then I edited it to resolve that problem with it. However, doing so seemed to cause other problems.
Firstly, the entry is still a member of Category:Chinese lemmas and Category:Chinese nouns despite the lang-code changes. Secondly, whereas {{lb|zh|HKC}} correctly displays (Hong Kong Cantonese) and adds the entry to Category:Hong Kong Cantonese, {{lb|yue|HKC}} just displays (HKC) and does not categorise the entry at all. And thirdly, my changes moved the entry from Category:zh:Beverages (172 members) and Category:zh:Chinese restaurants (48 members) to Category:yue:Beverages (0 member[s]) and Category:yue:Chinese restaurants (0 member[s]), both of the latter of which were red-linked until WingerBot created them.
This seems a manifestly suboptimal way to resolve the language-mismatch issue in the cases of these Cantonese, Hainanese, and Hakka lemmata. What is the proper way to deal with these cases? 0DF (talk) 00:37, 12 December 2024 (UTC)Reply

Pinging ND381 (who added Cantonese sorry), Wpi (who added Cantonese 0T and the other Latin-script Cantonese terms), Justinrleung (who added Hainanese and ), and Mar vin kaiser (who added Hakka 雪文). 0DF (talk) 03:31, 16 December 2024 (UTC)Reply

@0DF: Chinese is a special case, because terms are simultaneously Chinese and any of a huge variety of sublects. The writing system has a lot to do with this, since it allows writing things that are basically the same words in writing but completely different when spoken. It's very complicated, with variations in grammar, in pronunciation, and in writing that only partly overlap.
There's a whole universe of Chinese-specific templates and modules that do things in a completely different way from anything else on Wiktionary. When I'm going through the Todo lists, I treat most of the Chinese-related stuff as false positives and leave it alone. In all likelihood, "fixing" things will just cause other problems. The other CJKV languages share some of the same issues and are best left alone, for the most part.
I do fix things like Chinese etymologies that use language codes for Tibetan, and any {{lb|en}} on CJKV definition lines- but I know my limits (I took a year of Beginning Mandarin at UCLA, but that was 38 years ago). Chuck Entz (talk) 04:16, 16 December 2024 (UTC)Reply
@0DF: This is happened before (see here) and the correct way would be something like Special:Diff/72937108/75521861, and not changing it to Cantonese L2.
There seems to be a bug(?) in mod:zh-pron where it did not add Cantonese lemmas if |c= is empty and | (which adds Cantonese nouns) - I'll look at this later this week. – wpi (talk) 06:43, 16 December 2024 (UTC)Reply
P.S. I should note that on Wiktionary:Todo/Lists/Derivation category does not match entry language/description#Cleanup instructions it says Occasionally, the L2 language header, etymology template and {{head}} template may disagree on the language of the entry. If you do not speak the language(s) involved, it is best to ask the entry's creator to resolve the issue. (bold text mine) – wpi (talk) 06:47, 16 December 2024 (UTC)Reply
@Chuck Entz: Yes, I was somewhat aware that Chinese is a unique case: unified in writing, but divided in speaking. Thank you for chiming in.
@Wpi: OK, I'll add {{cln|yue|lemmas}} vel sim. henceforth. That should fix things. Thanks for pointing me to the correct solution, and I hope you're successful in fixing the issue with Module:zh-pron. I'd already clocked that “If you do not speak the language(s) involved,…” caveat, but if I observed that literally, I wouldn't be being nearly as productive or helpful as I would be by being bold in editing. I'd already noticed that my changes to 0T were inadequate, hence my raising the issue in this BP section and then pinging you and the other relevant editors, which has led to the proper resolution, so I think I have my boldness–caution level fairly well calibrated.
0DF (talk) 13:34, 16 December 2024 (UTC)Reply
@wpi: I've employed your {{cln|…|lemmas|[POS]}} solution. However, I'm apprehensive that that lead to the creation of Category:Hainanese verbs and Category:Hainanese classifiers, at least the former of which I would have expected already to have existed (like Category:Hakka nouns already did). Is there some reason why Hainanese terms shouldn't get POS categories? Justinrleung? 0DF (talk) 14:03, 16 December 2024 (UTC)Reply
Thanks. Regarding Hainanese, I believe it's because {{zh-pron}} does not support Hainanese (yet), so there hadn't been any category infrastructure for it. – wpi (talk) 14:40, 16 December 2024 (UTC)Reply
@wpi: Ah, OK. In that case, the categories are ready-made for a time when {{zh-pron}} does support Hainanese. Thanks for your help. 0DF (talk) 17:02, 16 December 2024 (UTC)Reply

Temporary Accounts - introduction to the project

[edit]
A temporary account notification after publishing the first edit

The Wikimedia Foundation is in the process of rolling out temporary accounts for unregistered (logged-out) editors on multiple wikis. The pilot communities have the chance to test and share comments to improve the feature before it is deployed on all wikis in mid-2025.

Temporary accounts will be used to attribute new edits made by logged-out users instead of the IP addresses. It will not be an exact replacement, though. First, temporary users will have access to some functionalities currently inaccessible for logged-out editors (like notifications). Secondly, the Wikimedia projects will continue to use IP addresses of logged-out editors behind the scenes, and experienced community members will be able to access them when necessary. This change is especially relevant to the logged-out editors and anyone who uses IP addresses when blocking users and keeping the wikis safe. Older IP addresses that were recorded before the introduction of temporary accounts on a wiki will not be modified.

We would like to invite you to read the first of a series of posts dedicated to temporary accounts. It gives an overview of the basics of the project, impact on different groups of users, and the plan for introducing the change on all wikis.

We will do our best to inform everyone impacted ahead of time. Information about temporary accounts will be available on Tech News, Diff, other blogs, different wikipages, banners, and other forms. At conferences, we or our colleagues on our behalf are inviting attendees to talk about this project. In addition, we are contacting affiliates running community support programs.

Subscribe to our new newsletter to stay close in touch. To learn more about the project, check out the FAQ and look at the latest updates. Talk to us on our project page or off-wiki. See you! NKohli (WMF) and SGrabarczuk (WMF) (talk) 03:27, 12 December 2024 (UTC)Reply

Could a user chose to use IPs instead of temp. accounts if they wanted? CitationsFreak (talk) 03:50, 12 December 2024 (UTC)Reply
No, it will not be possible. The only choice will be between the accounts: logged-out (temp account) or logged-in (regular account). SGrabarczuk (WMF) (talk) 14:08, 12 December 2024 (UTC)Reply

Banning Proto-North Caucasian and Proto-Northeast Caucasian reconstructions

[edit]

1. Proto-North Caucasian. In my opinion, there are currently no reconstructions of the Proto-North Caucasian simply on the grounds that there are no reconstructions of the Proto-Northeast Caucasian. Here I would prefer to end any discussion about this superfamily and delete the category itself in order to avoid reconstructions.

2. Proto-Northeast Caucasian. Just as it was written above, I believe that there are no reconstructions of the Proto-Northeast Caucasian. Whereas the so-called reconstructions of Starostin and Nikolaev are actually tentative pseudo-reconstructions. In addition, they do not give reconstructions of Proto-Northeast Caucasian forms anywhere. All their reconstructions in the database are Proto-North Caucasian, which are identical, apparently. Realizing this, Johanna Nichols uses the pound sign (#) for pseudo-constructions in her works.

This convention follows Williams 1989, who uses the asterisk for reconstructions based on regular sound correspondences and the # for "[p]seudo-reconstructions based on a quick inspection of a cognate set without working out sound correspondences".

It should be noted the recent case of a User:Qmbhiseykwos who began to add (in addition to pseudo-reconstructions by Nichols and "reconstructions" by Starostin and Nikolaev) "reconstructions" by the Dutch linguist P. Schrijver (2018, 2021, 2024), which should also be considered pseudo-reconstructions. For example, Reconstruction:Proto-Northeast Caucasian/rɔḳʷ(ə).

2.1. Appendix. Since the Wiktionary does not operate with the concept of tentative pseudo-reconstructions, all such "reconstructions" of the Proto-Northeast Caucasian should be indicated only in the appendix. For example, Appendix:Proto-Nakh-Daghestanian reconstructions
2.2. Renaming. I believe that it is necessary to rename the family to the (Proto-)Nakh-Daghestanian one. This must be done, since the name hint at the division of the North Caucasian and South Caucasian languages (Kartvelian), which is unacceptable.
2.2.1. Accordingly, it is necessary to rename the (Proto-)Northwest Caucasian to the (Proto-)Abkhazo-Circassian or (Proto-)Adyghe-Abkhaz, etc.

3. Proto-Daghestanian. It may be necessary to create a category for this family. Regarding this family, there are curious reconstructions by B. Giginejšvili (1977) and E. A. Bokarev (1981). But they don't seem to give any reconstructed forms. It is difficult to tell me anything here, since I have not studied these languages. I'll give you a comment by the American Caucasologist Alice C. Harris (2003: 180):

“It should be noted first that the phonetic reconstructions proposed by Nikolayev and Starostin (1994) and adopted by Alekseev (1985) are not widely accepted. For example, Nichols (1997) and Schulze (1997) show serious problems with the proposals in Nikolayev and Starostin (1994), and Giginejšvili (1977), Schulze (1988), Talibov (1980), provide reconstructions that are in various ways more rational”.

@Vahagn Petrosyan, კვარია ɶLerman (talk) 12:17, 15 December 2024 (UTC)Reply

Not even Nakh-Daghestani sound correspondences are fully understood, to even consider enrolling Abkhaz-Adyghe here is insanity. Proto-North Caucasian was never _not_ controversial, so I have no clue why it was even added to Wiktionary in the first place. Nuke Proto-North Caucasian. On the other hand, banning Proto-Nakh-Daghestani reconstructions is perhaps too extreme. Imho, there's no great harm in having them exist even if it turns out they're wrong/imprecise. კვარია (talk) 14:39, 15 December 2024 (UTC)Reply
Agree. Tollef Salemann (talk) 16:14, 15 December 2024 (UTC)Reply
Nuke North Caucasian both as a group and as a reconstructed language, for Nakh-Daghestani/NE-Caucasian - I'm fine with agreeing not to create reconstructions, but I think having a code may be a good idea nonetheless. Just need to patroll them from time to time. Thadh (talk) 21:57, 15 December 2024 (UTC)Reply
  • Nuke North Caucasian yes, already since it's unclear if even the family exists at all. Tentative cognates in NWC could be noted in NEC entries if we end up having/keeping them (same as we do with longer-standing Indo-Uralic, Altaic, etc. etymologies).
  • Keep Proto-Northeast Caucasian. NCED's (and Schrijver's) reconstructions may have many problems, but they are generally not "pseudo-reconstructions", and there's enough reason to think many of them are at least valid etymological groups. Any etymologies where Nikolayev & Starostin propose NWC reflexes are given in PNC form, but this is mainly because they set up very few changes from there to PNEC. The one I find on a lookover of their preface is *gg(w) > *ddɮ(w). In effect they admit the reconstruction is of PNEC in the first place, but it comes out so complex they end up able to derive (their reconstruction of) PNWC almost directly from it.
  • I do not follow the argument against "Northwest Caucasian" and "Northeast Caucasian", perfectly illustrative and mainstream names as far as I can tell. What is the "division [that] is unacceptable"? Treating Kartvelian / South Caucasian as an unrelated family? That if anything seems to be much closer to consensus than the question of North Caucasian, and I also do not see how this would be "hinted at" here.
  • "Daghestanian" as a distinct node is also not consensus, does not have distinct reconstructions for it either, and should not be added (seems IMO like an outdated typological unit against Nakh being more innovative). Probably we should not commit to any NEC grouping scheme beyond the unambiguous base units like Lezgic or Avar-Andic.
--Tropylium (talk) 13:58, 16 December 2024 (UTC)Reply
I agree, let's ban Proto-North Caucasian. I have no opinion on the rest of the issues. I will only note that there are many weak scholars and outright charlatans dealing with the three Caucasian branches. All of their etymological works should be reviewed by our more intelligent editors. @Qmbhiseykwos, no mindless copying, please. Vahag (talk) 18:03, 16 December 2024 (UTC)Reply
Perhaps we should have a section in the Appendix namespace for StarLingish. It would fit right in with Klingon, Na'vi and other constructed languages from fictional universes... Chuck Entz (talk) 06:06, 17 December 2024 (UTC)Reply
I, too, am surprised about the presence of Proto-North Caucasian on Wiktionary. It must have slipped our eyes like the proverbial monkey walking down the street, few would be willing to see, and only been included as a consequence of Wikipedia or another reference not having been unequivocal about its unacceptedness. It has to be removed. Fay Freak (talk) 21:38, 16 December 2024 (UTC)Reply
Since there is consensus to delete (Proto-)North Caucasian, I'm going to implement that. — SURJECTION / T / C / L / 13:29, 17 December 2024 (UTC)Reply
How many users does it take to make a decision? ɶLerman (talk) 15:45, 17 December 2024 (UTC)Reply

Adverbs?

[edit]

I know how much everyone loves a part-of-speech question, so here is another one.

"I was indoors."
"He is upstairs."
"They were outside."
"Look, your keys are there!"

Sometimes I feel that some dictionaries, including Wiktionary, are coy about giving examples of this nature, as if they are unsure of the part of speech of the complements. While these are not "traditional" adverbs, in that they do not modify anything adverbially in a traditional sense, and cannot be removed leaving a relevantly valid sentence, nevertheless they do answer adverbial wh-questions, and do not seem like adjectives. Some people call these "adverbial complements", I think. Are we happy to place these uses under "adverb"? Another possibility for some cases -- e.g. "outside" in these examples -- is "intransitive" preposition, in that "They were outside" implies "They were outside (somewhere/something)", but I'm not sure that this concept is fully mainstream. What do you think? Mihia (talk) 21:52, 17 December 2024 (UTC)Reply

The 2021 vote against categorizing words as intransitive prepositions is still current, isn't it? "Adverb" seems acceptable to me.--Urszag (talk) 22:04, 17 December 2024 (UTC)Reply
Gosh, I forgot entirely about that vote. Thanks for reminding me. Can we make any clear distinction between "I was indoors" being an adverb, and "Is Mr. Smith in?" being an adjective, as is currently listed at in — and, indeed, generally between my examples above and various other supposedly adjectival instances of other "short function words", where in some cases the philosophy, quite possibly perpetuated in part by myself, seems to be "if it's the complement of the be-verb then it's an adjective"? Mihia (talk) 22:54, 17 December 2024 (UTC)Reply

Rethinking Middle Korean verb lemmatization

[edit]

@AG202 @Solarkoid @Chom.kwoy @Tibidibi

As per Wiktionary:About Korean/Historical forms#Lemmatizations and this discussion in October 2020, we currently lemmatize morphophonemic forms for nouns but allomorphic forms for verbs (faithful to "而餘皆爲入聲之終也 然ㄱㆁㄷㄴㅂㅁㅅㄹ八字可足用也").

But I really cannot help but think (as I and others have already stressed) that this is misguided and adds needless confusion.

  1. Etymology sections already use the faithful phonemic form by convention. This creates at best alt hyperlinks/double hyperlinks and at worst redlinks even when we have an entry for the MK verb in question. This is especially problematic because, let's be real, 99% of people ever going to MK entries on here do so through a MoK ety section.
  2. In the discussion linked above, it was said that "by convention" Korean lemmatizes actual inflected forms for verbs.
    1. Since when? Even in Modern Korean, (-da) is defined as a "dictionary citation form ending," sufficiently demonstrating that even within our morphological orthographic framework we are specifically citing "dictionary forms," not any real form in use.
    2. This should and has carry/ied over to Middle Korean dictionaries. Consider four popular Middle Korean dictionaries—15세기 국어 활용형 사전, 우리말큰사전(옛말과 이두), 고어사전, and 한불자전. Consider now that the former two use the "morphophonemic spelling," and only the latter two use "faithful" spelling as we do now. Consider further that 고어사전 is a) from 1960 and b) also lemmatizes other forms such as the infinitive in some cases, with the express goal of being accessible for learners. We don't do this, we shouldn't do this, etc. 한불자전 is from 1880(!) and was written by a French missioner. Is this really the precedent for us to be following?

I would love to start adding more MK entries but there are a lot of gaps right now in infrastructure(?) that make this difficult. This is IMO the largest blocker; I've brought this up countless times on the Discord, but I'd love to reach an actual BP consensus. Any input appreciated. Lunabunn (talk) 03:48, 18 December 2024 (UTC)Reply

Agreed. I've already expressed my opinion on this several times before, but I'd rather the forms show the original stem. This will be beneficial for the learners and those curious in the long run, and it will help majorly advance the cause to create an automated conjugation template, most importantly for the header.
Additionally, I myself also want us to reach a consensus fast, as there seems to be a confusion whether or not Modern Korean etymology header contains the actual, attested form of the verb (=용언) or the root. The only real caveat is, syllable-final ㅸ looks pretty ugly in syllables — 어드ᇦ다, 셔ᇕ다, ᄠᅥᇕ다, etc... would be some of the roots we have to add. Other than that, I think this is for the best.
- Solarkoid (talk) 18:02, 18 December 2024 (UTC)Reply
Support: Matches what we do for other Koreanic lects. AG202 (talk) 18:05, 18 December 2024 (UTC)Reply
Strong support with a suggestion. This exact issue has been on my mind for the past few years ever since editors, including myself, have begun adding significant numbers of MK verb/adjective entries. I thought I should speak on this matter as someone who has added numerous MK entries throughout the years. Thanks @Lunabunn for finally bringing this up.
I can now see consistency for consistency's sake is really the only thing going for the current "historically faithful (allophonic)" framework we have. While unapologetically uniform in its lemmatization rules, I agree that this leads to needless confusion and is at the expense of navigability. This is especially true for readers who likely access MK entries through MoK etymology sections (whom I assume are the overwhelming--I can't stress this enough--majority as you have mentioned). As for the "convention" from the previous discussion (I was there), I believe it referred not to dictionaries but the MK spelling convention, i.e., 표면형(表面形) (phonetic), as using the 기저형(基底形) (morphophonemic) would be anachronistic.
Speaking from personal experience, this has also been quite confusing and time-consuming for even editors who are familiar with MK orthography. For example:
  • Having to actively think about the proper "historically faithful" lemma when creating wikilinks (see how, in ᄉᆡᆷ, I had to link 기픈 to the "proper" 깁다, ATM a totally unrelated MoK entry, instead of the phonemic 깊다, at least the descendant MoK entry), which isn't intuitive at all as myself a native Korean speaker accustomed to MoK orthography. Although correctly linked according to current conventions, I would imagine this would be utterly baffling to a beginner.
  • Having to add "phonemically faithful" stubs to make up for this (e.g., see the MK "entry" for 및다, which would become the main MK entry under this proposal), unnecessarily adding workload to the already thin MK editor base.
All in all, it is clear that, in addition to the points Lunabunn has brought up, the positives--if any, really, other than doing it ostensibly for faithfulness' sake--of creating lemmas consistent with historical MK orthography do not outweigh its numerous negatives. Being anachronistic is not a good reason to continue this. I am now convinced that this is not the goal for which we should aim, especially Wiktionary being a word dictionary and not a spelling guide. This is also what modern monolingual dictionaries do, and this is what we should follow, which is more in-line with general Wiktionary policy. Moreover, we already don't do this for nouns, so the historicity argument is indeed moot.
However, I do not think an entirely phonemically faithful lemmatization scheme is desirable. As @Solarkoid specified, this would mean we would need to create entries such as 셔ᇕ다, which never appeared in actual MK or MoK texts (it's like an imaginary number) and is, well, yes, "ugly." Aside from looks, which shouldn't be something we consider in a dictionary, the general 표준어대사전 and the academic 15세기 국어 활용형 사전 both actually list the historically faithful 셟다 as their headword, while mentioning the phonemic 셔ᇕ- as a "form" appearing before vowels (which is not wrong). Conversely, both list the phonemically faithful 맞다 rather than the historically correct 맛다 as the headword. Monolingual dictionaries (and thus conventionally cite verbs/adjectives with the ending -) seem to treat the lenes and (distinct phonemes in MK) as exceptions, to align, I am pretty sure, with how MoK treats them. In fact, 15세기 국어 활용형 사전 explicitly states this in its preface. Therefore, you simply are not going to find 셔ᇕ다, particularly as the headword with the ending -, in any mainstream dictionary or work (except, I suppose, research papers in Korean, which have the liberty of, well, not being a dictionary for learners; they could use forms such as 셔ᇕ다 all they want).
I believe we should not implement an entirely phonemically faithful lemmatization scheme for, again, the sake of uniformity, as neither do popular modern monolingual dictionaries do this; we would be the first dictionary to do this, as it is also demonstrably not a "convention," as with currently using historically accurate forms. As such, I think creating entries such as 셔ᇕ다, spelled in Hangul, would also be a source of confusion for those expecting the same lemma coming from popular MK dictionaries (as well as deviating from MoK morphophonemic standards [which treat vestigial W [-w-] and z [-∅-] as allophones calling it "irregular conjugations"] on which they by principle base lemmatization [for a stage of the language when W and z were still phonemes] yet with which most people would be familiar). It's not that using 셔ᇕ다, besides aesthetics (lol), is inherently wrong (it's actually correct); however, 셟다, technically "wrong" (read: an inconsistent treatment), is how contemporary dictionaries have chosen to lemmatize in order to make it easier for modern readers unfamiliar with MK phonology. So it's really nobody's fault; we would just be following precedents--conventions as you will.
Yet, this is not perfect either, as some words would be lemmatized according to a different principle from the rest (and for something as superficial as their spelling at that). Nevertheless, I propose that we still likewise make exceptions for cases like and (the only exceptions I could find with a cursory review of dictionaries) for the reason explained above, in Hangul, of which we use the historically faithful spelling, but apply an entirely phonemically faithful (containing the root) scheme in Romanization. We can do this as Wiktionary is unique in that it always provides both Hangul and Romanization for MK.
So, for example, in -ᆸ다 where represents an underlying /β/ such as in 셟다, we would use the Yale W. Consequently, we would get 셟다 (Yale: syelW-ta), with the historically faithful Hangul spelling as the headword and phonemically faithful spelling as the romanization. We would not be the first to do this, as some English language works on MK, which only use Yale Romanization, do exactly this (see Martin 1992 p. 57, who uses the phonemic stem syelW- to refer to this exact word). In -ᆸ다 where represents an overt /p/ such as in 저줍다, we would obviously still use the Yale p, and such verbs/adjectives are not affected by this proposal. Hence, instead of using -ᇦ다 and -ᆸ다, -ᆸ다, as an exception, could have two possible romanizations, -Wta and -pta, depending on the word, but the Hangul spelling won't reflect this. The same goes for -ᆺ다 with -zta and -sta, instead of -ᇫ다 and -ᆺ다 (e.g., ᄃᆞᆺ다 and 벗다). In all other cases, both Hangul and romanization would represent the phonemic spelling as opposed to the historically faithful spelling that we use now, as per the proposal. This compromise would follow the convention found in monolingual dictionaries while still being consistent in providing readers with at least one phonemically faithful representation throughout all MK verbs/adjectives; there is no ambiguity, and the two different phonemes are distinguished.
This seems like a simple enough solution for a problem of an otherwise commonsense change IMO. The only downside I could think of is the need for manual input for transliteration, but MK already has these cases.
For those who might not fully understand or tl;dr, here is essentially what would happen:
Current entries with "historically faithful" spelling must be moved and could be converted to non-lemma entries as an inflected form. 맞다 becomes the lemma while 맛다 is reserved for an entry for "inflected" forms (if they are ever created, though [this should not be the focus]; Middle Korean (-ta) had a more complicated usage compared to its modern descendant, so it wouldn't be a mostly empty, redundant entry. Nonetheless, I think the entry at Middle Korean (-ta) will suffice). (-ta) would serve two functions: form part of the dictionary citation form as per the modern convention (with phonemic spelling) and as part of inflected forms (e.g., declarative mood suffix) (with historical spelling). The second case would ever only be seen in conjugation templates, quotations, or, as mentioned above, non-lemma stubs. This entails that, for example, 맞다 (mac-ta), as the lemma, is the only form you would see in most parts of Wiktionary, while 맛다 (mas-ta), despite being historically accurate, would only be seen in the above mentioned places. For the / cases, if accepted, the romanization 셟다 (syelW-ta), the lemma version, would be the one seen in most parts of Wiktionary, whereas 셟다 (syelp-ta), with the same Hangul spelling and "accurate/literal/surface" transcription, would, again, only be seen in the above mentioned places (telling the reader that it represents a real (attested or possible) form/inflection with (-ta), for disambiguation purposes). And, of course, for anything else, normal romanization rules apply (e.g., 셟고 (syelp-kwo)); only / headword forms get this special treatment.
Examples of current entries whose main entry would be affected (if we adopt an entirely phonemically faithful lemmatization scheme) are:
  • 더럽다 (telepta) and ᄃᆞᆺ다 (tosta) would be moved to 더러ᇦ다 (teleWta) and ᄃᆞᇫ다 (tozta), respectively. However, if the above-mentioned exception is applied, these would stay at their original locations.
  • 벗다 (pesta) would stay where it is, as its Hangul phonemic and historic spellings are the same; ᄌᆞᆽ다 (cocta) would stay where it is, but ᄌᆞᆺ다 (costa) is correct under the current convention.
  • 여다 (yeta) and 우다 (wuta) would be moved to 열다 (yelta) and 울다 (wulta), respectively.
  • 깃다 (kista, to rejoice) would be moved to 기ᇧ다 (kiskta), whereas 깃다 (kista, to cough) would be moved to 깇다 (kichta).
  • 됴타 (tyotha) and 나타 (natha) would be moved to 둏다 (tyohta) and 낳다 (nahta), respectively.
-- 123catsank (talk) 01:14, 21 December 2024 (UTC)Reply
Thank you for your thorough contribution. I am relieved to hear that my opinions on the matter are shared by other editors (no doubt more experienced than myself). Just one thing I would like to comment on:

In fact, 15세기 국어 활용형 사전 explicitly states this in its preface.

This seems misleading. Yes, that dictionary does indeed state in its preface that W and z stems would be listed with p and s respectively, but it also explicitly states that it is for convenience only, not indicative of an analysis of these stems as p and s stems under any circumstance ("... 이런 어간들의 기본형을 'ㅅ, ㅂ'으로 하겠다는 인식을 반영한 것은 아니고 편의상의 조치임을 밝혀 둔다."). Indeed, modern scholarly practice does not treat W and z stems as irregulars (although 표준국어대사전 does, that's just because it's ass), so we shouldn't either.
Now, if we choose to lemmatize these forms with p and s instead of W and z anyway for convenience, I do not necessarily object. I do, however, find myself wondering what convenience we gain by lemmatizing p and s if that means we have to manually specify the headword for romanization.
If we decide to lemmatize with p and s, I would also like to suggest that we use the W/z form in the hangul headword as well, not just its romanization. This would be aligned with how we don't include diacritics in entry titles but still show it in the headline.
(p.s. Do you edit on a different account and/or are you in the English Wiktionary Discord?) Lunabunn (talk) 03:14, 27 December 2024 (UTC)Reply
I agree. Manual transliteration for the same "spelling" displayed would not be ideal. I would support Lunabunn's idea if we do decide to lemmatize with /p/ & /s/. AG202 (talk) 03:40, 27 December 2024 (UTC)Reply

Beekes

[edit]

Bluntly, Beekes is neo-Vennemann, except for Greek, and without even an actual attested language (/family) from which to derive the substrate.

That may even be too polite. I personally have thought Beekes dubious since my first encounter with him (his grammar of Avestan, in which he identifies numerous Avestan roots without Sanskrit analogues, almost none of which are actually without obvious Sanskrit analogues). That said, I am joined by Meissner, de Decker, Vine, Verhasselt, Beckwith, Nikolaev, Woodhouse, Olson, Miller, Simkin, Colvin, Meester, Garnier, Nardelli, and countless others in my reservations about Beekes as a source in the specific matter of Greek etymology/'Pre-Greek'. Even *within* Leiden, Beekes was considered peculiarly dogmatic, even by very close colleagues (e.g. Lubotsky, Kloekhorst, etc.) - indeed, even van Beek, his prize student, has published numerous papers over the last several years, especially after Beekes' death, rejecting Beekes' particular approach to Pre-Greek. Kroonen's public critique is also worth noting.

I have numerous criticisms of Beekes' approach to Pre-Greek, and am happy to systematically go through them if anyone should wish, but I hardly need to, since the critical scholarly literature is, at this point, voluminous. That said, if anyone is curious, do ask.

I am not going to go so far as to say that Beekes should not be cited at all on matters of etymology, but his views should *always* be tagged as his, as opposed to in the voice of Wiktionary, and preferably with a modifier that makes it clear that his views do not reflect the communis opinio ('Beekes, typically, assigns...' or similar), where applicable, which is frequently the case.


GatlingGunz (talk) 18:11, 18 December 2024 (UTC)Reply

Beekes's etymologies are all over Wiktionary not because we find him particularly reliable, but because his accessible dictionary is the only one in English, so it was easily copy-pastable into Wiktionary. Frisk is in German, Chantraine is in French. Others' English etymologies are sprinkled across inaccessible articles.
Now good luck finding someone to go over the several thousand pages referencing his dictionary and reviewing his proposals one by one. The damage may be permanent. Vahag (talk) 18:24, 18 December 2024 (UTC)Reply
By the way, apart from the pre-Greek stuff, Beekes is almost entirely a word for word translation of Frisk. —Caoimhin ceallach (talk) 23:04, 19 December 2024 (UTC)Reply
Regular Wiktionary editors all have made pertinent observations and more or less openly concluded with remarks encouraging liberal dismissal of Beekes’ etymologies.
It would be more frank to mark etymologies as unknown or uncertain or otherwise speculated upon, while pushing Beekes’ claims of to his mere reference, not worthy of taking space in serious etymology, since collectively they have to be regarded as nuisant.
Of course, the silver bullet for anyone in the know about the particular philology is to cite an author or more to positively provide differing opinion. You don’t “need to” but it is a gain for all of humanity and your personal scholarly achievement. There is a mismatch between those who have an intimate familiarity with certain comprehensive university libraries and other historically interested people who attempt to have conceptions of the past, if only because one works on another language touching upon Greek. Our open Hellenic lexicography is seriously underdeveloped, and part of it is uncritical thinking, burdened by Beekes’ dogmaticism and lifeless superficiality in place of inviting examples of how language science is actually done. Fay Freak (talk) 16:19, 19 December 2024 (UTC)Reply

French Wiktionary Word of the Year

[edit]

Dear colleagues,

In the French Wiktionary, we have experimented this year our first top 10 words of the year!

It started on November 15th with a call to suggest words, without any specific methodology in mind, like an analysis of statistics of reading or anything. A dozen of people suggested about 50 words. Then in December, we had a vote with 30 participants and a simple result as a list. It wasn't perfect but it was not that complicated to do.

To my knowledge, it wasn't experimented yet in English Wiktionary, is it? If you want to try next year, I suggest you create an on-going draft to keep track of some new words during the year, it would make the selection easier. Also, having a meeting in person with seven Wiktionarian in December helped a lot. Finally, I am not hoping any echos in the press this year, but we may work to build something for 2025 and 2026, and I think we could be stronger together, if several editions of the Wiktionary project are organizing a similar initiative in parallel. So I invite you to try it too! Cheers Noé 12:33, 19 December 2024 (UTC)Reply

@Noé: there is currently an ongoing vote on whether to have a Word of the Year, and what that word should be. Currently, it looks like the vote will fail, as it failed last year. There doesn't seem to be enough support for the proposal at the English Wiktionary. — Sgconlaw (talk) 12:56, 19 December 2024 (UTC)Reply
Thanks for pointing this discussion, I missed it in November Beer parlour, and it was not called back in December. It is interesting to read the various opinions on this process and goals. I did not asked for a collective validation at first, I just started it and I realize now that it should have be nice to open a discussion first. Well, sometimes, it is hard to have pros and cons on something completely new, without having evaluation what words may be in the final list. Having two weeks to collect entries suggested by anyone and a simple vote with top 5 was, I think, was easier to manage that your way of doing it. I am not sure. Well, if someone want to discuss this idea next year, in October maybe, I would be glad to help with more feedback on our experimentation and media responses Noé 13:20, 19 December 2024 (UTC)Reply

WT:TRANS

[edit]

I don't understand what kind of situations this sentence refers to: "If there are multiple paraphrases in the target language for an English term but no direct translations, one such paraphrase may be provided after {{no equivalent translation}}." Template:no equivalent translation/documentation isn't helpful either. What is "potentially unidiomatic / sum-of-parts descriptive" supposed to mean? I want to know when this template should be used and when it shouldn't.

More generally, I am often in doubt about what to do when the most direct translation (with the same part of speech) isn't actually the best translation. I know I'm not the only one with this issue, because such tricky translations are currently most often left blank. Can we clarify the relevant sections? —Caoimhin ceallach (talk) 10:08, 20 December 2024 (UTC)Reply

Dobrujan Tatar language name

[edit]

Hi there, the Dobrujan Tatar language doesn't have a separate language code. Therefore [crh-ro] is used in Wikis. But in Wiktionary there is only [crh] Crimean Tatar, and when I add a word in Dobruja Tatar it appears in Crimean Tatar categories. This is a problem, because the languages use different orthography and are not actually not so connected how it seems. Also there is the Category:Dobrujan Crimean Tatar, but this naming is wrong, it's Dobrujan Tatar. Would it be possible to use the code [crh-ro] Dobrujan Tatar, with Dobrujan Tatar categories? Zolgoyo (talk) 10:34, 20 December 2024 (UTC)Reply

Hello! I personally do not think we need a separate name space for Dobruja Tatar specifically, for the following reasons:
1. It is a dialect of Crimean Tatar (so assumes Ethnologue and Glottolog[7].)
2. From what I have seen, Wiktionary does not show dialects in their own name space, to give examples on Turkic languages we have:
  • Yenisei Kyrgyz (Old Turkic,) uses different letters altogether and would be illegible to someone familiar with the Orkhon script. Orthographical differences is not a big deal for inclusion.
  • Viryal and Anatri Chuvash represented as just 'Chuvash' (except in etymologies)
  • Various dialects of Turkish and Azerbaijani, all shown with a lb tag.
  • Kumandy, Kuu-Kizhi and Kyzyl dialects (which can be quite divergent at times) of Northern Altai are under the same name space.
and so on... Dobrujan Tatar would be best to be shown just by a lb tag, so like the rest.
3. There seems to be only one published dictionary[8] for this dialect ('Dobruca Kırım Tatar Ağzı Sözlüğü'), and the vocabulary is clearly reminiscent of the main Crimean Tatar one.
4. If Wiktionary added Dobrujan Tatar, then why shouldn't it add Nogai Tatar also? Spoken 10 km. north of the Dobrujan Tatar speakers with a far divergent lexicon?
However, this is my opinion. We really don't need this name space.
AmaçsızBirKişi (talk) 00:05, 22 December 2024 (UTC)Reply
We have Nogai as a distinct code, CAT:Nogai lemmas. And we do often have a separate code for varieties traditionally considered 'dialects', if this is found necessary to effectively document the variety. I can't speak for this specific case though. Thadh (talk) 01:03, 22 December 2024 (UTC)Reply
These Dobujan Tatar words are from a book, which is probably not so good for etymology. Not bad book, but be carefull. Check it out on Tomriga in the references to Taner Murat. He writes "Tomri - queen of Mesagetes, also known under Persian form Tahm-Rayis, Greek Tomiris.... from her name came the name for Dobruja province, Tomriga". The guy is obviously a Turkic nationalist from the parallel world where Massagetes are speaking Tatar and establish Dobruja. Tollef Salemann (talk) 01:39, 22 December 2024 (UTC)Reply
It seems like that. The dictionary there is not quite academic I figured.
Moreover, it seems like Dobrujan Tatar is just a descendant of a (relatively) larger 'Romanian Tatar' family[9]. This article also says how similar the Dobrujan Tatar dialect is to Crimean Tatar, saying how children use Crimean Tatar primers/reading books in schools.
There's also a poem, in Dobrujan Tatar, that's the extent I could find about this language[10].
AmaçsızBirKişi (talk) 09:32, 22 December 2024 (UTC)Reply
Note that the Nogai Tatar you speak about are probably quite different from Nogai lemmas listed in the category which Thadh speaks about. The Nogais of Dobruja are related to Nogais of Caucasus, but they have splitted up in 1850-60s because of the war with Russia. I mean, they have splitted even before it, but had some contacts until 1850-s. So their language are probably closer to the Crimean. Tollef Salemann (talk) 02:00, 22 December 2024 (UTC)Reply
Limited prior discussion: Wiktionary:Grease_pit/2023/November#Add_Category:Dobrujan_Tatar_language_to_the_relevant_language-related_modules_if_appropriate. Unfortunately, other than your own writings on other wikis, I am having a hard time finding evidence that Dobrujan Tatar is a separate language. I am trying to think if we have any editors who might be able to find relevant resources in other languages (Romanian?). - -sche (discuss) 06:12, 23 December 2024 (UTC)Reply

WT:TENNIS

[edit]

It seems curious that tennis player, the archetype of the "Tennis player test", supposedly a test of idiomaticity, is itself listed not on its own merits, but only as a "translation hub". Does this make sense? Mihia (talk) 18:38, 22 December 2024 (UTC)Reply

Is WT:Idiom supposed to be a policy page? DCDuring (talk) 23:38, 22 December 2024 (UTC)Reply
No. Svārtava (tɕ) 09:11, 23 December 2024 (UTC)Reply
It does, to be fair, say at the top of the page that "Tests can be used as guides during RFD, but they are not hard/fast rules", but, even so, one would expect the guidelines to at least apply to the examples given. Mihia (talk) 09:47, 23 December 2024 (UTC)Reply
The closing statement from the 2016 RFD is quite interesting. I wonder if there's more to the history of the ‘tennis player test,’ because this alone makes it pretty questionable. Seems THUB was the keep reason all along?
RFD kept as no consensus for deletion: ≥ 12 keep votes. Note that translation target was used often as the keeping rationale, while the "tennis player test" was rejected by multiple participants. Polomo47 (talk) 00:08, 23 December 2024 (UTC)Reply
The text was added by @Catonif, though I'm not sure why. I personally strongly support WT:TENNIS for its usefulness. AG202 (talk) 06:18, 23 December 2024 (UTC)Reply
The usefulness is limited by our incomplete coverage of names of professions. Where are emergency services dispatcher,[11] franchise opening trainer[12] and heavy equipment operator[13]?  --Lambiam 22:17, 23 December 2024 (UTC)Reply
Whoops, I wasn't aware of the policy when I did that, given the policy existence it would need be removed. But IMO the policy itself sounds pretty dubious, by its wording it would also allow professions such as turtle feeder or cookie taster. I would personally ditch the policy and keep tennis player for THUB. The test's paragraph itself claims its partial redundancy to THUB anyways. Catonif (talk) 08:51, 23 December 2024 (UTC)Reply
@Catonif: For the record, unvoted tests given at WT:IDIOM are not binding policy; only WT:COALMINE is since that is voted upon. Svārtava (tɕ) 09:11, 23 December 2024 (UTC)Reply
It wouldn't hurt to make the distinction clear, preferably by having WT:COALMINE on a separate page, ONLY including it by reference, and placing a banner at the top of the WT:IDIOM page. DCDuring (talk) 13:30, 23 December 2024 (UTC)Reply
Noting that "COALMINE" is mentioned individually in the CFI. So is the "fried egg" test, which is also part of "WT:IDIOM", implying that that one is policy too, I suppose? I haven't checked all the others. Mihia (talk) 15:36, 23 December 2024 (UTC)Reply
I’m not that favorable to the test either. If almost all terms that qualify for it also qualify for THUB, then all it does is prevent us from adding (This sense is a translation hub). Is that desirable? I don’t think so, since the main reason for keeping them appears to be translation. Polomo47 (talk) 15:08, 23 December 2024 (UTC)Reply
I could be wrong, but I think the Tennis player test predates a consensus on keeping translation hubs. So it may have been a good workaround when it was first proposed, but it seems redundant now. Andrew Sheedy (talk) 16:06, 23 December 2024 (UTC)Reply
Yeah, exactly. Polomo47 (talk) 17:12, 23 December 2024 (UTC)Reply

Romance languages: reflexive verb forms and enclisis

[edit]

This discussion is an offshoot from this RFM, which discusses reflexive verbs in Portuguese specifically. Said RFM in turn derives from this RFD discussion.

Currently, some Romance languages have a specific way of making entries for reflexive verbs; others do not have a pattern at all. Per @Benwing2, Spanish and Portuguese currently follow this scheme:

  • If a verb is only used reflexively
    • It is listed at the page with an enclitical -se. See Portuguese automedicar-se and Spanish automedicarse
    • The page without -se lists, for Spanish, that the word is only used with a proclitic pronoun; see Spanish automedicar. For Portuguese, the page without se usually does not exist.
  • If a verb has reflexive senses in addition to non-reflexive ones

Some Portuguese editors complained about this arrangement a while ago. We proposed a new scheme in an RFM (linked above), but some editors felt the need for consistency with other Romance languages. Thus, this is a proposal on changing/standardizing how it works for most other Romance languages — the use of unhyphenated enclisis (despedirse vs. despedir-se) changes things slightly. For languages that do use a hyphen in their enclises, such as Catalan, a proposal closer to the one for Portuguese is more adequate.

The proposal, for languages with unhyphenated encliticals:

  • If entries exist for both the forms with -se and without it, they will get merged under the page without -se. The entry at the page with -se will list infinitive of verb combined with se.
  • If an entry exists only at the page with -se, it will be moved to the page without -se. In its place, the page will list infinitive of verb combined with se.

A brief list of applicable reasons. For more detail, please read the Portuguese RFM and RFD discussions (which also includes some unapplicable arguments).

  • It is inconsistent and confusing to list reflexive-only verbs at the page with -se, but list verbs with reflexive senses only at the page without -se.
  • Listing reflexive-only verbs at their enclitical forms implicitly prescribes the use of enclisis, but proclisis is just as valid and may even be used more often.
    • By having the entry under the page with no -se, we could format its headword to include both forms. Like, automedicar-se or se automedicar
  • Among dictionaries, there is no consensus on what URL reflexive verbs get put under. The only consensus is that the headword includes the enclitical pronoun, which we can do regardless per the above.


Ping, for Italian: @Samubert96, Federico Falleti, Emanuele6, Catonif, Imetsia
Ping, for Spanish: @Ultimateria, AG202, Ser be etre shi, JeffDoozan, Orrigarmi, Brawlio, Jberkel
Ping remaining members from the Galician-Portuguese usergroup: @Davi6596, Faviola7, JnpoJuwan, MedK1, Ortsacordep, Rodrigo5260, Stríðsdrengur, Trooper57
Please ping other editors you know who may be interested in the discussion.

Polomo47 (talk) 00:01, 23 December 2024 (UTC)Reply

CC: @Benwing2: For Spanish, honestly, I'd match what the RAE does: if the verb is only used pronominally/reflexively, then they put the lemma at the version with -se. Ex: RAE entry for automedicarse. I really don't like the idea of putting "se" in the headword at the lemma without "se", especially when the page with "se" already exists. That seems to add a much higher level of inconsistency.
I also don't like the idea of moving entries like automedicarse to automedicar, as a learner familiar with Spanish is going to search for the latter one only to be redirected to the former, as the verb is only used pronominally. What we have now isn't my favorite way to go about things (I'd have the reflexive usages at the entries with -se, regardless of if the non-reflexive version exists), but it's better than having everything at the bare infinitive. There's also precedent, at least with Spanish. AG202 (talk) 03:00, 23 December 2024 (UTC)Reply
I also don't like the idea of moving entries like automedicarse to automedicar, as a learner familiar with Spanish is going to search for the latter one only to be redirected to the former. How so? My proposal is that we move the definitions over precisely to solve this type of issue.
Also, while the RAE categorizes URLs in that way, the RGL does not, and many Portuguese dictionaries don’t either. I don’t know about Italian, though. Polomo47 (talk) 03:56, 23 December 2024 (UTC)Reply
@Polomo47: Oops, I meant search for the former and be directed to the latter, sorry. Learners are more likely to search for the forms with "se" is what I wanted to say. AG202 (talk) 06:13, 23 December 2024 (UTC)Reply
Hm, I’m not confident that’s how people usually search for words. I would expect native speakers (even if we don’t particularly appeal to them) as well as more advanced learners to search without the enclitical. That’s what I do, at least — do others google differently? Polomo47 (talk) 15:04, 23 December 2024 (UTC)Reply
At least for Spanish, having been studying it since 2013, I've almost always seen learners search with the "se" form once they're aware of it, as that'll give them more direct hits, especially from learning websites. In pretty much every learner's text as well, they'll be listed as the "se" form in any vocabulary section. I personally still search that way as well. For (notably Brazilian) Portuguese, I'd expect the trends to be different, since the se forms aren't used as much. AG202 (talk) 17:38, 23 December 2024 (UTC)Reply
I actually find this to be very persuasive in Spanish's case. Having had some mild interactions with it over the years, it's very true that Spanish speakers just love their "se" forms. — comparatively, "lo" forms go essentially unused by Portuguese speakers around me.
While I'm really starting to think that 'it tracks' that reflexive clitics in Spanish are seen as more integral to the verb — and not necessarily because of the spelling — I can't help but wonder about other forms.
I hope I'm not bringing this up too early when we haven't even truly talked at length about the initial proposal, but do we really need pages for all the forms? This likely enters CFI territory, but I'd like to draw some attention to the non-reflexive forms. In Spanish medicar and mostrar, there's an entire table dedicated to combined forms, and yet I see several that might be missing?
Admittedly, I don't know a lot about Spanish, but one such form would be "medícote" — corresponding to "te medico" in proclisis — or something like "mostrárlela". Perhaps Spanish's rules forbid these pairings (tho I did get a hit for the latter), but Standard Galician's doesn't. — you'll find many hits for, say, quérote and mostrarlla online. There's even a TV program named Dígocho Eu.
I guess we could include all of these combinations (every single tense of many many verbs with nearly every single clitic tacked on afterward — me, te, che, vos, os, o, ma, mo, ta, to, cho, cha, lle, lles, nos, lla, llo, possibly a couple more), but I can't help but think it'd be a more productive use of our time to instead draw a line somewhere.. I'm getting some serious COALMINE conversation flashbacks right now. MedK1 (talk) 19:13, 23 December 2024 (UTC)Reply
@AG202 just in case the mobile reply button didn't actually ping you. MedK1 (talk) 21:28, 24 December 2024 (UTC)Reply
Sorry for the late reply, but yes, the "se" forms are integral to the verbs. However, forms like "medícote" are no longer standard usage in Spanish. Pronouns can only be attached afterwards to the gerund, infinitive, and imperative forms. AG202 (talk) 03:30, 29 December 2024 (UTC)Reply
Thanks for the CC.
It occurs to me there are various possibilities for the way reflexives are handled, and this may have some consideration on the ultimate outcome (please expand with other languages):
  1. Reflexives are always enclitic, and written as part of the verb. Examples: East Slavic (Russian, Ukrainian, Belarusian, ...) and North Germanic (Icelandic, Swedish, Danish, Norwegian, Faroese, ...).
  2. Reflexives are normally proclitic, including in particular on the infinitive, and written as a separate word. Examples: German, French, apparently also Romanian. (Clarifications: German reflexives sometimes come after the finite verb, particularly when the verb is in V2 constructions and in imperatives. French reflexives come after imperatives and are joined by a hyphen, and when coming before the verb are joined with an apostrophe if the verb is vowel-initial.)
  3. Reflexives are sometimes proclitic, sometimes enclitic. AFAIK, all such languages have the reflexive pronoun enclitic on the infinitive.
    1. When enclitic on the infinitive, the verb + reflexive is written as a single word. Examples: Spanish, Italian, Galician in standard spelling.
    2. When enclitic on the infinitive, the reflexive is attached to the verb with a hyphen. Examples: Portuguese, Galician in reintegrationist spelling.
    3. When enclitic on the infinitive, the reflexive is written as a separate word. Examples: West Slavic languages (Czech, Polish, ...), South Slavic languages (Bulgarian, Macedonian, ...).
I mention this because there is a lot of inconsistency in how reflexive verbs are lemmatized, and it may partially correlate with the way the reflexive infinitive is written.
Benwing2 (talk) 03:38, 23 December 2024 (UTC)Reply
AFAIK, all such languages have the reflexive pronoun enclitic on the infinitive. Is that really how it works? In the case of Portuguese, from what I gather automedicar-se is no more valid an infinitive than se automedicar — the former is just the preferred form used by dictionaries because (1) it’s a single word (2) it’s less predictable than proclisis (3) it’s something people generally like to prescribe, lol. I’ve yet to find another explanation for the preference for enclisis, but I have no reason to believe it’s because automedicar-se is the only possibility. Polomo47 (talk) 04:06, 23 December 2024 (UTC)Reply
Sorry, I meant to clarify that "all such languages have the reflexive pronoun enclitic on the infinitive" refers to how dictionaries express the forms. I know that Brazilian Portuguese, for example, leans towards proclisis in all cases and thus says vou me deitar, not #vou deitar-me. West Slavic languages similarly are very flexible in word order and sometimes have the reflexive pronoun before the infinitive and sometimes after, but all dictionaries I've seen lemmatize the reflexive pronoun after. In contrast, French dictionaries always list reflexive infinitives with the reflexive pronoun before, because it never comes after in actual usage. Benwing2 (talk) 05:15, 23 December 2024 (UTC)Reply
I mentioned above that Galician has a hundred forms (bare minimum) that Spanish completely lacks coverage for at the moment (potentially because they don't exist over there? I wouldn't know); you can mix and match any tense with any clitic for the most part.
It might be worth noting that for Portuguese, these countless forms exist as well, and often with more patterns — Galician roughly shares the European Portuguese rules prioritizing enclises, while for Portuguese, we have Brazilian Portuguese's proclises preferences to consider as well.
Since for Portuguese, they're framed as 'regional preferences' rather than the rules actively changing, you get far more possibilities than you would normally, all of them being SOP — you have either a separate word before, a separated suffix or a separated infix according to tense.
With Brazil liking proclises, the lemmatized enclitical can end up being quite rare in comparison to the proclitical ones. "Precisamos parar de automedicar-nos" even sounds weird in comparison to nos automedicar to me. You can have similar sentences for -te and others too. Do note that these are all considered impersonal infinitives (i.e. the ones that get lemmatized in Wiktionary).
For these and many, many other reasons, it stands to reason that one shouldn't include any of those clitical forms as separate pages for Portuguese at least. This doesn't necessarily mean anything for Spanish; more and more I'm thinking their systems are different beasts altogether and as such should be treated differently..
PS: Priberam at least does express proclitical forms for verbs. MedK1 (talk) 00:06, 24 December 2024 (UTC)Reply
I'll also add that PT-PT dictionaries seem to prefer the -se forms: entry for "arrepender-se" at O Dicionário da Língua Portuguesa & entry for "arrepender-se" at Infopédia.pt. AG202 (talk) 03:33, 29 December 2024 (UTC)Reply

Yiddish in Latin characters

[edit]

Please lift the ban on including Yiddish terms attested in Latin characters. I know that writing it with other scripts is uncommon (except to assist beginners), but there are a few lengthy Yiddish works written mostly or entirely in the Latin script. Examples:

https://books.google.com/books?id=o_P6DQAAQBAJ

https://books.google.com/books?id=nrCYDwAAQBAJ

See also: https://brill.com/view/journals/jjl/12/1/article-p27_3.xml (((Romanophile))) (contributions) 00:45, 23 December 2024 (UTC)Reply

Probably even more. And in Cyrillic as well I guess? Also, I remember to own myself "Di Avantures fun Alis in Vunderland", having both Hebrew and Latin script in the same book. Tollef Salemann (talk) 02:20, 23 December 2024 (UTC)Reply
The Brill article confirms that Cyrillic is another script, yes, but it is the rarest of the three (and the only other script in which Yiddish is attested, as far as I'm aware). I welcome lifting the prohibition on that as well. Anyway, cheers for suggesting another source. (((Romanophile))) (contributions) 04:06, 23 December 2024 (UTC)Reply
Does the Wiktionary's transliteration of Yiddish terms match the spelling used in these books? I tried searching for some random words from the "Di Avantures fun Alis in Vunderland" preview sample and successfully found the relevant Yiddish entries on Wiktionary. Is this not good enough for the end users? --Ssvb (talk) 05:16, 23 December 2024 (UTC)Reply
Usually they do match, though historically Romanizations of Yiddish have varied in form and consistency. In any case, utility is not the motive here. We already have Romanization entries for Chinese, Japanese, and Serbo-Croatian. I doubt that a proposal to delete them would succeed on grounds that there are already transliterations in the main entries, thereby making the Romanization entries 'redundant'. (((Romanophile))) (contributions) 06:31, 23 December 2024 (UTC)Reply
I wouldn't advocate deleting them. Just creating additional Latin script entries and keeping them in sync with the Hebrew script entries is an extra maintenance effort. If contributors are ready to spend their time and efforts on that, then it's fine. If attestable Latin spelling of some terms encountered in real books differs from the transliteration of their corresponding Hebrew script entries, then these can be probably prioritized. --Ssvb (talk) 09:11, 23 December 2024 (UTC)Reply

Yes. This has been requested at least twice in the past year, once by me at Wiktionary:Beer_parlour/2024/April#Latin-script_Yiddish and once after that by someone else somewhere else... but although there seems to be support for at least allowing Latin-script entries to point to the Hebrew-script entries, like is done for Arabic-script Afrikaans (pointing to Latin-script Afrikaans) (or, in a different vein, for Latin-script Gothic), neither I nor anyone else has gotten around to it yet. Well: unless there are objections, I will finally add "Latn" as another script to yi in, say, a week (ping me if I forget), with the understanding that Hebrew script will continue to be lemmatized at least in most cases. - -sche (discuss) 06:28, 23 December 2024 (UTC)Reply

I personally favor a treatment like Serbo-Croatian where both scripts are lemmatic, but I won't feel devastated if we treat the Latin script as secondary to the Hebrew one either. You may want to include the Cyrillic script as another option, too, though I don't have examples on hand. (Yiddish's cousin Ladino is more my field of expertise. Or should I say Spanish Yiddish?) (((Romanophile))) (contributions) 06:42, 23 December 2024 (UTC)Reply
For Japanese we have such entries as
jiyakuRōmaji transcription of じやく
Is there a reason not to use a similar approach for Yiddish?  --Lambiam 21:15, 23 December 2024 (UTC)Reply
My understanding is that this is the intention, yes; in the April discussion, Benwing proposed using {{spelling of}}, which would look like this. @Romanophile, if at some point in the future we have the ability to lemmatize two different scripts/spellings without them falling out of sync (e.g. via them both "transcluding", with "smart" changes, some underlying central backend page), I would support "double-lemmatizing" a great many things, but for now it would just lead to duplication. - -sche (discuss) 16:52, 26 December 2024 (UTC)Reply
If it's just a stripped down soft redirect entry, then the required maintenance effort is low. BTW, does it need a declension table? And what would be the right place for book quotations in Latin script? I'm interested in this topic, because many of the same guidelines would probably also apply to Belarusian Łacinka, like the horny entry. --Ssvb (talk) 17:33, 26 December 2024 (UTC)Reply
As Yiddish has been a contemporary of Early New High German, there needed to be Yiddish text in blackletter, and certain Germanists on the continent regularly deal with these Early Modern equivalents, but from the perspective of Anglos it is a suppressed blind spot: fractura est, non legitur. We have to cover Yiddish in Latin script like we include Hebrew spellings of Arabic language as Judeo-Arabic. The current Hebrew-written standard is just a later Ausbausprache like Luxembourgish, but unlike Luxembourgish, which is within the ballpark of another broader dialect (Category:Central Franconian language), Yiddish, due to ethnic and cultural separation, always was a distinct dialect, though the Middle High German beginnings are difficult to oversee, of course. So I don’t see how it was ever banned, only a skewed perspective; more parsimoniously one may observe an oversight in the language data, which until now only lists Hebrew script for Yiddish, factually wrong. A few times I also added Serbo-Croatian terms in Arabic script only to be annoying, without any preference for it and without believing it to be prohibited, only that rendering is faster if we only check Latin and Cyrillic script. Fay Freak (talk) 17:16, 26 December 2024 (UTC)Reply

Dutch defective verbs

[edit]

(Notifying Mnemosientje, Lingo Bingo Dingo, Azertus, Alexis Jazz, DrJos): I am working on an update of the Dutch verb conjugation module, and in that I came across the issue of how to handle defective verbs. These are verbs that act like they have a separable part, but are (generally) not actually separable.

I usually use woordenlijst.org for checking Dutch conjugation, and it seems two distinguish two types of defective verbs. The first is verbs like herinvoeren, for which the subordinate clause form is given, but the main clause omitted. The second is verbs like zakkenrollen, for which only the infinitive and present participle is given. However, searching online, it seems that in actual usage, the second type is actually used exactly like the first type (i.e., forms like zakkenrolt and zakkenrolde are attestable). I added the option to specify these types of verbs through a parameter |subonly= (see the bottom of the page at User:Stujul/test-nl-conj).

My main question is about how to categorise these verbs. Currently there are two categories for these verbs: Cat:Dutch defective verbs and Cat:Dutch uninflected verbs. The first is added manually and the second is added by a parameter in the headword template {{nl-verb}}. These should definitely be merged. But should the two types of defective verb I mentioned be categorised separately as different subcategories, because the forms of the second one are nonstandard?

I hope to hear your opinions on this.

PS - sorry if this not the appropriate place for this discussion.

Stujul (talk) 13:36, 23 December 2024 (UTC)Reply

If forms of zakkenrollen are missing, might it be the woordenlijst that is defective? In the conjugation table on the Dutch Wiktionary all seem to be present, although the subjunctive currently seems unattestable. Here, for example, is a use of gezakkenrold, and here of finite zakkenrollen in a main clause. Is it not just like stofzuigen (not only semantically, but also grammatically)?  --Lambiam 21:07, 23 December 2024 (UTC)Reply
Maybe zakkenrollen was a bad example. It seems indeed to be used more like stofzuigen. This may have to do with the fact that rollen is a weak verb. For example geboogschiet and gelipleest return far fewer results than respectively booggeschoten and lipgelezen. About the Dutch Wiktionary's approach: I found a list of such verbs and most are listed as fully defective there. liplezen gives the main clause forms in parentheses, and on the main page gives a note that these forms appear sporadically. I also note that some verbs that you may expect to fall into this category are actually given as complete verbs on woordenlijst.org, e.g. hartenjagen.
It may just come down to a case to case analysis, but it would be nice to have a standard approach when dealing with such verbs, as we are currently very inconsistent with it.
Stujul (talk) 10:12, 24 December 2024 (UTC)Reply
Gelipleest is orthographically wrong anyway; /ɣəˈlɪp.leːst/ should be written as gelipleesd. But liplezen is one of the entries on this list of defective verbs.
We are not prescriptive; shouldn’t three properly attestable uses of forms like gelipleesd or lipgelezen trump any lists and suffice for including these forms (with a note warning that they are not generally accepted)? Here are two uses “in the wild” of lipleesde: [14], [15].  --Lambiam 11:26, 24 December 2024 (UTC)Reply
Sure, we are not prescriptive, and three attestable uses do merit an entry for these forms, I don't disagree with that. But I'm not sure whether we should include these forms in the conjugation table on the lemma entry. You can find many "in the wild" uses of "ik leesde", but we don't include that form in the table at lezen. Of course, in that case, there is a clear "correct" and "incorrect" form, while for liplezen, there isn't a "correct"/"standard" form we can point to (should it be lipleesde, liplas, las lip,...).
Stujul (talk) 11:57, 24 December 2024 (UTC)Reply
I see that the Dutch Wiktionary happely presents the unsplit conjugated form ik herindeel and the split form ik breng heruit. Both feel wrong to me; are these acceptable?  --Lambiam 21:58, 23 December 2024 (UTC)Reply
The Dutch Wiktionary is again inconsistent in this regard: indeed heruitbrengen is conjugated as a normal separable verb, herinvoeren gives an alternative construction "ik voer opnieuw in", and heruitzenden just leaves the main clause forms empty.
Both these forms that you gave also feel wrong to me.
Stujul (talk) 10:22, 24 December 2024 (UTC)Reply
I'm amazed that I was completely unaware that these kind of verbs existed. Thinking about it I would indeed categorise them as defective, as the woordenlijst does. If you put a gun to my head I might indeed say "ik zakkenrol" or "ik herindeel", like other speakers, but they still don't feel quite right. My intuition is that these forms which can be sporadically attested are ad-hoc formations. Some standard strategy to deal with these in the language may crystalize at some point, but the fact that everyone feels unsure about them shows that it hasn't yet. —Caoimhin ceallach (talk) 18:02, 26 December 2024 (UTC)Reply

Extended Mover Request: User:AG202

[edit]

Hi, I'd like to request extended mover rights, mainly to be able to fix issues like tones in entry titles where they're not supposed to be, such as with Igbo ákpị̀, per WT:About Igbo AG202 (talk) 18:21, 23 December 2024 (UTC)Reply

@AG202 Done. Benwing2 (talk) 21:35, 23 December 2024 (UTC)Reply
Thank you!!! AG202 (talk) 23:22, 23 December 2024 (UTC)Reply
@Benwing2: For the record, the process is WT:WL, see WT:Extended movers. Svārtava (tɕ) 04:46, 24 December 2024 (UTC)Reply

Username pronunciations

[edit]

Hello,

There is a new subpage for username pronunciations called User:Flame, not lame/Username pronunciations.

Thank you Flame, not lame (Don't talk to me.) 19:54, 25 December 2024 (UTC)Reply

Love the page! Polomo47 (talk) 17:08, 29 December 2024 (UTC)Reply

jive talk

[edit]

We should categorise jive talk, like frolic pad, there's probs some good stuff in this website P. Sovjunk (talk) 23:37, 26 December 2024 (UTC)Reply

Hebrew transliteration

[edit]

I'm probably not the first person to ask this, and I likely won't be the last: but what is the reason for Wiktionary to use conventional Israeli romanization (i.e. based on colloquial Israeli Jewish pronunciation) over something more narrow and scholarly like ISO 259? Narrower transliterations have a lot of bells and whistles, sure, but I think they still do a good job at being a compromise between various historical, regional and cultural variants of Hebrew. Why should ⟨צ⟩ be written as "ts" when that's not how Yemenite or Sephardic Jews pronounce it, or how it was historically pronounced during Biblical and Classical times? Why should ⟨ח⟩ and non-geminated ⟨כ⟩ be rendered both as ⟨kh⟩ when this merger pretty much only happens in Israeli Hebrew, while every other dialect still distinguishes the two? Why should ⟨א⟩ and ⟨ע⟩ not be rendered at all when, even inside Israel, some Jews do pronounce them? Even if Israeli Hebrew is the de facto standard dialect these days, the common transliteration isn't even the de jure standard, that would be the Hebrew Academy's, which is slightly different. I understand Hebrew is a living language, but if you're like me, a non-Jewish non-Israeli who has a mostly academic historical linguistic interest in Hebrew, the modern Israeli transliteration is just not very useful. Sure, it's more "phonetically accurate" (as discussed, for a single dialect anyway), but isn't that what the IPA section is for?

Obviously we'd have to agree on the details of the transliteration, and I have my opinions on the specifics, but overall, I think a narrower transliteration would make much more sense. It would also likely allow us to begin some sort of automatic transliteration template that languages like Russian, Arabic and Greek have got going on. Pescavelho (talk) 15:55, 27 December 2024 (UTC)Reply

No good reason, sure, only catering to cognitive biases of majorities. The thought of continuing to use your English keyboard without any acquired extra characters is just too appealing.
In recent months, I have increasingly succeeded to see through the grievances of the world as being the consequences of neurotypicals splitting up the world, they ever imagine, into social relations: what is relevant in the present context (see it again!), for this reason, is that they fail to imagine capable keyboard layouts or input methods, and rather configure six different keyboard layouts if they know French, Spanish, Romanian, Turkish and German, for instance, in addition to English, rather than to use the international version of any of these layouts, or a Unicode search made accessible on their machine for the very occasional but recurring goal of transcribing certain foreign phonemes faithfully.
Engaging the habit learning circuitry of the brain to switch to a more convenient, even if less intuitive (according to neurotypical cognitive biases), input setup would be easy though: it is just excusable, not defensible, not to switch to us(intl) or de(deadtilde) from us(basic) (in /usr/share/X11/xkb/symbols/), and many neurotypicals editing this dictionary or similar academic works already succumbed to this which is reasonable. I also use the actual Russian layout, with extensions, ru(prxn), for all Cyrillic languages, when my neurotypical bro is ticked off by it because its assignments do not phonetically correspond to the ones on the standard German layout—all being invented by someone around 1900 and hence carried forward, few ever questioning it, the social pressure to type the same layout with “ten fingers” is too high.
One just has to look up which combination can be utilized to get bonus characters, and repeat until one does not need to expend notable brainpower for it. Juggling multiple languages to maintain polyglotism is a context where one needs bonus characters, like it or not (everyone shall like it, following the adapt neuroscientific recipe). Fay Freak (talk) 16:33, 27 December 2024 (UTC)Reply
Is the point here that it's "too cumbersome to type"? That feels subjective, some people would feel like setting up all the templates an average Wiktionary page uses is rather cumbersome (I've certainly felt so at times). In any given case, I'm hoping the adoption of a narrower transliteration would go hand-in-hand with automated transliteration, so this concern would be null and void. Pescavelho (talk) 21:38, 27 December 2024 (UTC)Reply
I'm also not very happy about the transliteration situation for Hebrew. I don't edit it enough to have much sway in that sphere, but I would like to see a transliteration system that is actually transliteration and not transcription of a certain dialect that I am only marginally interested in. Andrew Sheedy (talk) 22:30, 27 December 2024 (UTC)Reply
(Lurker/new Hebrew editor. I've read some of the past discussions on this topic.) I would prefer to see both Israeli and Biblical/liturgical/scholarly transcriptions next to each other (except contexts where one of them is irrelevant, of course), ideally (somewhat) automated by a module. This would satisfy both main Hebrew user bases. It's my understanding that a lot of work has already been done on automatic transliteration; it's about time it should be deployed, so we can iterate and check edge cases. I appreciate those still adding (inconsistent) manual scholarly transliterations in 2024, but think it may be useless once we apply the module. Contra the above replies (and I don't understand/ignore/tldr whatever the fuck fay freak wrote), I am satisfied with the gist of the status quo Israeli transliteration system, and generally am not convinced that one-to-one reversibility is a major virtue (compared to, say, readability and not being laden with diacritics); but I'll sooner take any reasonable automated module finally being made widespread over continued bikeshedding of the exact romanization scheme. Hftf (talk) 11:06, 28 December 2024 (UTC)Reply
@Pescavelho I agree with you, but Neo-Hebrew editors will never agree. They have no understanding for the perspective and needs of people like you and me, who are only interested in Hebrew from a historical point of view. Unfortunately in my experience, they are incredibly biased and obtuse. Here a discussion we had in the past:
Hebrew transliteration – time to clear the mess
Sartma 𒁾𒁉𒊭 𒌑𒊑𒀉𒁲 19:42, 28 December 2024 (UTC)Reply
Seems like the main argument is that, if you are someone who is only interested in modern Hebrew, then a narrow transliteration isn't helpful. OK?... What if one isn't just interested in modern Hebrew? Then the modern Hebrew transcription is probably less than useless. I feel like there's a bigger case for having only a narrow transliteration over a conventional transcription, given that modern Hebrew mostly experienced mergers rather than splits compared to Tiberian Hebrew (which is the de facto standard Hebrew orthography), so you can just ignore half of the diacritics and you're basically left with modern Hebrew, but there'd be nothing wrong with having both systems side-by-side. And again, pronunciation is what the IPA section is there for.
Personally, ISO 259, with a few modifications, would be my go-to system. I am willing to provide reasoning each of the modifications in question, and, if we decide to go ahead with the transliteration system, it will be these modifications we'll spend the most time arguing about. (the biggest issue will undoubtedly be the vowels) Pescavelho (talk) 15:45, 29 December 2024 (UTC)Reply
Wiktionary's transliteration of Hebrew has been discussed (and disputed) a lot over the years (search the archives of this page for various discussions). One idea which seems to me to have been gaining support is, as mentioned above, to have two transliterations, one scholarly and oriented to representing the distinctions of Hebrew script / Biblical Hebrew, beside the current one that is oriented to representing the modern (Israeli) Hebrew pronunciation. A two-translit approach would also help with certain other languages where some people want a transliteration that reproduces the distinctions of the original script, and other people want a transliteration that hints at the pronunciation in the manner of a simplified version of enPR or IPA. (The second group thinks of the first group: if you want to know the distinctions of the original script, why not just learn the original script? The first group thinks of the second: if you want a pronunciation, why not provide a pronunciation, rather than putting an ambiguous respelling in the transliteration parameter?) Having seen how consistently scholarly/"Biblical" transliteration is something people want, I support adding it. - -sche (discuss) 17:21, 29 December 2024 (UTC)Reply

Adjective definitions

[edit]

E.g.:

  • Whose first and last vertices are different.
  • That ends in a vowel.

My feeling is that adjectival definitions of this style seem old-fashioned or cryptic, and are potentially difficult for modern readers to understand. I would change them where I see them to e.g. "Ending in a vowel", but does anyone else have an opinion? Mihia (talk) 20:53, 27 December 2024 (UTC)Reply

I agree about "that ends in a vowel" and similar; I would definitely change to "ending in a vowel". The first definition "whose first and last vertices are different" seems OK; paraphrasing an "open polyline" as "a polyline whose first and last vertices are different" seems fine to me. You could change it if you want to "having different first and last vertices", which seems about the same in terms of understandability. Benwing2 (talk) 02:20, 29 December 2024 (UTC)Reply