Wiktionary:Beer parlour/2024/June

From Wiktionary, the free dictionary
Jump to navigation Jump to search

How to resolve conflicts on Wiktionary

[edit]

Thou wilt quarrel with a man that hath a hair more, or a hair less, in his beard, than thou hast[.] — William Shakespeare, Romeo and Juliet

Throughout the lifetime of this online dictionary, there have been plenty of conflicts between users. Some of this is unlikely to end any time soon, such as the fight between admins and vandals. This fight has a clear "good guy" and "bad guy", unlike some of the other fights we have had over the years. These morally-conflicting fights often turn into virtual bloodbaths, with people hurling vicious insults at each other, at each other throats over who's harassing who or if some person should be banned. While these are important conversations to have, too often is the main point ignored in favor of calling people "idiots". This problem has been pointed out before [cf Beer Parlour July 2023 § "please reduce the heat"], but nothing seems to be done on the topic, and we seem to keep going in circles, never reaching the point where we can have civilized discussions.

I intend to change that. Please, leave your thoughts as to how we can avoid any future conflicts for good. CitationsFreak (talk) 04:13, 1 June 2024 (UTC)[reply]

In principle, there is no way to stop all conflict, of course, but a really good start would be an expectation of civility and admins enforcing that. —Justin (koavf)TCM 06:38, 1 June 2024 (UTC)[reply]
Would that just be the way to end all heated conflicts? CitationsFreak (talk) 14:48, 1 June 2024 (UTC)[reply]
Clearly not all of us have the same standards for civility and for the need for civility in all interactions with others. Further, some don't seem to care much about feedback from others about their behavior. In some cases, people seem to get very annoyed that others might be potentially causing them to waste their precious time, taking them away from their sacred mission to improve Wiktionary, by their lights. I'm pretty sure that the folks whose behavior I most object to are supremely confident that they are right and that civility is for others who are not on the sacred mission they have defined. DCDuring (talk) 15:04, 1 June 2024 (UTC)[reply]
Pobody's nerfect and no system is perfect, but it's a start. —Justin (koavf)TCM 20:09, 1 June 2024 (UTC)[reply]
Conflicts are not always bad if people are arguing about somethin they both care about. I guess that the biggest problem is when people are call each other bad words when arguing about some stupid stuff like a definition of a sand broom, without caring about anyone’s input. It feels that they are either too much drunk or too little. However, luckily, it’s not happening so often. Tollef Salemann (talk) 15:38, 1 June 2024 (UTC)[reply]
I think @Theknightwho should try to make an effort to participate in drama less. I think we can be reasonable and agree that nothing good came out of engaging with Wiktionary:Beer parlour/2024/May#Stalking/harassment by User:Theknightwho which was a pretty obvious (yet successful) attempt to fan up drama. Ioaxxere (talk) 18:46, 1 June 2024 (UTC)[reply]
@Ioaxxere Sure, but there needs to be a way to resolve issues that doesn't just amount to ignoring them. Theknightwho (talk) 18:57, 1 June 2024 (UTC)[reply]
Yeah, Help:Dispute resolution doesn't offer much guidance except, hilariously, relax and do something more important and assume that they [the other user you're disputing about] are eccentric and will thus never be able to see eye to eye with you. Hardly worthy of a Nobel peace prize. P. Sovjunk (talk) 19:11, 1 June 2024 (UTC)[reply]
It's rather pithy... but I also sometimes feel this way looking at drama from the outside in. Vininn126 (talk) 19:49, 1 June 2024 (UTC)[reply]
@Theknightwho Sometimes stepping away from things IS the best course of action. You need to do that more often Purplebackpack89 06:02, 5 June 2024 (UTC)[reply]
@Purplebackpack89 Given the amount of friction you're creating, you actually might want to do the same. Benwing2 (talk) 06:15, 5 June 2024 (UTC)[reply]
@WordyAndNerdy: Do you have feedback here? I'd value it. —Justin (koavf)TCM 20:10, 1 June 2024 (UTC)[reply]
It isn't feasible to "avoid any future conflicts for good." Such an approach will only worsen conflicts that inevitably emerge. Ignoring problems doesn't resolve them. It's like putting a lid on a pot and expecting it not to make a huge mess when it boils over. If TKW had been given guidance at an early stage, this issue may not have grown to this extent. Now there's at least five productive contributors (me, Huhu9001, LlywelynII, Mahāgaja, Purplebackback89) who find his admin conduct to be a recurrent issue. Wiktionary desperately needs both formal dispute resolution processes and the willingness to enact them. This isn't about "fan[ning] up drama." Seeing it characterised as such with no pushback (except from TKW, to his credit) does little to reassure me that Wiktionary is interested in having difficult conversations as a community and making necessary systemic changes. I have noted that TKW hasn't been as combative as in past incidents. That gives me hope that there's room for course correction. But my continued participation here is contingent on resolving the current policy vacuum. We cannot have a repeat of an admin (Equinox) functionally being given carte blanche to be as hostile and combative as he pleases for years because he also makes valuable contributions. WordyAndNerdy (talk) 21:54, 1 June 2024 (UTC)[reply]
Huhu and Purple have both been criticized for not being productive. Mahāgaja's complaint has been addressed as their behavior was problematic. Please don't ignore these aspects in your diagnosis. Vininn126 (talk) 22:00, 1 June 2024 (UTC)[reply]
Also LlywelynII has been heavily criticized for being sloppy. The term "productive" is being used too loosely here. Vininn126 (talk) 22:01, 1 June 2024 (UTC)[reply]
Huhu9001 seems to do solid work in the Japanese language area and in template-space. My understanding is that the dispute between TKW and Huhu9001 arose over changes that TKW made to modules that ended up unintentionally breaking things. So, if LlywelynII can be faulted for "being sloppy," so can TKW. Purple has been around as long as I have. Wiktionary has shown habitually sloppy editors (Luciferwildcat) the door before. I wouldn't necessarily number Purple among them. In any case, the common denominator in these disputes is TKW, not anything any editor did to get on his radar. It also needs to be underscored that all of these disputes were unrelated. TKW has found himself at the centre of multiple heated disputes with unconnected editors working in different areas of this project. That isn't a coincidence. It's a sign of a pattern of escalating and personalising conflicts. WordyAndNerdy (talk) 22:19, 1 June 2024 (UTC)[reply]
This only half-addresses the issues I raised with hand-waiving. A user frequently (I admit, too frequently) addressing sloppiness in others' edits does not make their edits not sloppy. From Purple I have seen 10x more drama and the issue "is a hot-dog a sandwich", which I'd hardly call productive. Huhu has been criticized by others, as well, and is known to be abrasive in conversations. So no, it's not just knight there, it's also an uncooperative personality. I find your reply to be lacking. Vininn126 (talk) 22:23, 1 June 2024 (UTC)[reply]
Isn't this thread now devolving into precisely the kind of escalation that it was designed to stop? Theknightwho (talk) 22:46, 1 June 2024 (UTC)[reply]
This seems like a seeing-the-forest-for-the-trees situation at best. Whether a rank-and-file contributor is insisting hot dogs qualify as sandwiches (if subs are sandwiches, so are hot dogs, FWIW) is perpendicular to the issue of problematic admins. Huhu9001 having been "abrasive" at some point doesn't justify an admin becoming hostile in kind. It absolutely did not justify TKW implementing a blatantly retaliatory block against Huhu last year. Admins have more power than rank-and-file editors. They need to be held to a higher standard of conduct accordingly. They absolutely shouldn't take administrative action in disputes in which they are personally involved. Admins aren't frontier sheriffs. They shouldn't be making and enforcing policy at their own own discretion. Power necessitates accountability and a certain level of restraint. What has been core policy on every other WMF project for decades shouldn't be weirdly controversial here. We shouldn't have a culture in which everyone nods along as an editor (not TKW, to be clear) with a history of inserting Daily Stormer quotes votes against an anti-harassment proposal with inane blather about "wokery" and the suggestion that PB89 seek "treatment for paranoia." This is discussion is doing nothing to relieve my sense that Wiktionary loves being a boys' club. It really does seem that some users will be forgiven any trespass, however severe, while others, no matter how much good work they do, will be summarily dismissed and denigrated and blamed for inviting the hostility to which they've been subjected. WordyAndNerdy (talk) 02:44, 2 June 2024 (UTC)[reply]
1) An admin doing their job by addressing sloppy edits is a good thing.
2) Fayfreak’s point about “wokery” seems à propos given your throwing out questionable accusations of misogyny and now, apparently, Nazism. Apparently pointing out that a user is being a bit high maintenance means one must hate women. And pulling a random collection of usage examples from Google that happens to include some kind of far-right tabloid rubbish means you might as well be merrily goose-stepping and Heil Hitlering your way to the Reichstag. Nicodene (talk) 04:43, 2 June 2024 (UTC)[reply]
Ugh, User:Nicodene, you are not doing yourself any favors with this post, and you are aptly illustrating User:WordyAndNerdy's point about Wiktionary being a boy's club. Benwing2 (talk) 04:53, 2 June 2024 (UTC)[reply]
Also, IMO Fay Freak is in a class of their own with their weird views (and contorted syntax). They know a ton about obscure languages but tend to go off on bizarre rants/tangents that are best ignored; I would not hold them up as an example to be emulated. Benwing2 (talk) 05:00, 2 June 2024 (UTC)[reply]
@Benwing2 Given that neither I nor WAN are happy about this, it seems fairly clear that the underlying issue is not that this is an old boys' club, but that there is no adequate way to resolve conflicts, because consequences are essentially arbitrary, and there is a culture of admins allowing things to peter out instead of actively drawing things to a close. WAN has concluded it's because of nepotism because she's only considering me (and now, apparently Fay Freak), but doesn't seem to realise that she's got away with quite a lot of disruptive behaviour herself, and it's not like people haven't noticed ([1]). Theknightwho (talk) 05:15, 2 June 2024 (UTC)[reply]
Name one example of "disruptive behaviour" on my part. Since I'm allegedly guilty of so many you ought to be able to name one. Our clashes at shitgibbon and cupsona don't count. Neither of us behaved with the decorum we ought to in both instances. I've never deleted the main page. I don't habitually insert nonsense into entries. I don't add translations for languages I don't know. I think the most dust I've ever kicked up is over a user having a Patreon in 2015 and the weird resistance to accepting online cites in 2020-2022. And in both cases I just voiced my opinions and left for a time. Rather the opposite of "disruption," I'd say. Unless you're insinuating that not contributing is itself a form of disruption. In any case, you're deflecting again. WordyAndNerdy (talk) 05:42, 2 June 2024 (UTC)[reply]
@Theknightwho I agree with all your points about the problems with Wiktionary (and I think Nicodene's comments were inappropriate). I do not think User:WordyAndNerdy's attempt to get you desysopped soon after Huhu9001's attempt was called for, and I said that at the time; but at the same time it's hard not to notice how multiple times, WAN has made a statement about something problematic in Wiktionary, and expressed a fear of getting subjected to denigration and hostility for expressing this, and someone then proceeded to come out and do exactly that.
As for a more systematic way of resolving conflicts, we definitely need that; but at the same time I don't think there's any appetite for a Wikipedia-style legalistic approach. IMO it has to be more mediation-based than arbitration-based, with arbitration-style "let's lay down the law" as a last resort. I think a good start would be maybe something like this: (1) a more clearly expressed code of conduct that clearly prohibits bigoted remarks, and gives examples of reasonable punishments for transgressions that admins (or bureaucrats if an admin is the transgressor) can make; (2) some sort of "appeal" process if one or the two sides (transgressor or transgressee) feels they're not getting fair treatment or their concerns aren't being heard or addressed. My hope is to avoid long, drawn out processes in the vast majority of cases, because IMO people here don't have the time or energy for this. Benwing2 (talk) 05:43, 2 June 2024 (UTC)[reply]
I am in full support of your plan. CitationsFreak (talk) 05:47, 2 June 2024 (UTC)[reply]
@Benwing2 I'd like to avoid long, drawn out processes as well, but I'd prefer them over long, drawn out threads where everyone gets angry and nothing gets done. Theknightwho (talk) 06:09, 2 June 2024 (UTC)[reply]
I'd support this as well. AG202 (talk) 21:06, 2 June 2024 (UTC)[reply]
@User:Benwing2 I think that "bigoted remarks", problematic though they are, are not the source of all the bad behavior that policy needs to address. More common are uses of derogatory labeling of people as, eg, idiots, morons, drama queens, even when cleverly or humorously worded. The emphasis in establishing a behavioral norm like "No personal attacks" has to be on personal. We may need a total ban on personal attacks (including accusations of Naziism, geneder bias, etc). Enforcement of such a ban couldn't be on a hair-trigger, but it would point in the right direction. A single personal attack should require an apology or temporary block; multiple personal attacks, say, over the course of 12 months would earn longer blocks, etc. I'm not sure about how to enforce better behavior by admins and veteran users (and their bots, templates, and modules). DCDuring (talk) 19:42, 2 June 2024 (UTC)[reply]
Calling out someone for using racist/misogynistic/etc. language or linking to a neo-Nazi site in an entry isn't a "personal attack." Usually such call-outs are backed up by diffs demonstrating said behaviour. As a community we need to be able to discuss inappropriate conduct in order to effectively mitigate it. You nailed your colours to the mast long ago.[2][3][4] WordyAndNerdy (talk) 20:29, 2 June 2024 (UTC)[reply]
Did I say it was? Whatever evil we attribute to such behavior would not justify attacking the person as a Nazi or an advocate of Nazism. We should be calling out the behavior, not the person, no matter what. I am proud to advocate freedom of expression, toleration, and universal coverage of English expressions in Wiktionary based on uniform standards of attestation and idiomaticity, regardless of the source or meaning. DCDuring (talk) 02:35, 3 June 2024 (UTC)[reply]
You're off-the-mark on this front, I think, but I do respect you haven't been combative about it, and I do get the sense your take is born of principle. It's why I don't consider you a problem admin even if I regard your thinking as totemic of Wiktionary's systemic issues. Some of my reaction here may be that your initial comment was posted in "mostly unproductive." There's agreement that Nicodene's comments toward me in there crossed a line and -sche putting a lid on that is the main reason I've felt comfortable returning to this discussion.
Protecting individual freedom of expression shouldn't be a pressing concern on a crowd-sourced dictionary project. (Government censorship regimes OTOH can make our mission more difficult). No one's legal rights are infringed by a website setting standards on the type of speech permitted on the site itself. People still have a legal right to express their views on other platforms and in other contexts. Wiktionary is functionally a professional setting. Many employers maintain some type of code of conduct. Letting employees freely spout off their opinions will very likely create a hostile work environment. Our unvarnished thoughts aren't always helpful. I'm sure no one here wants to read my random thoughts on tax reform, ongoing military conflicts, etc. But sometimes uncomfortable conversations are necessary for change to occur or for problems to be rectified. We can't discuss individual user conduct issues if we can't name the specific problems some users present. It isn't a personal attack to characterise someone's speech as "racist" etc. If we treat it as such, all we'll be doing is ensuring that marginalised voices go unheard, as the majority is often resistant to putting its own biases under a microscope.
I also haven't advocated disallowing the inclusion of offensive terms. A lot of Category:English 4chan slang and Category:English incel slang is my work. I do think there are middle-ground interpretations of "Wiktionary is not censored." There is a lot of distance between having [slur] as an entry and including a quote featuring [slur] in some random entry like umbrella. The former is objectively documenting language as it exists. The latter is an unnecessary and inflammatory editorial choice. The Daily Stormer quote shoehorned into smash wasn't for a specifically neo-Nazi/white supremacist sense. It was for a sense that was a synonym of hottie (attractive person). This is why I long ago concluded that Fay is an edgelord. Edgelords don't necessarily personally endorse the views they express. For many it's about stirring up trouble for the lulz. But -sche seems to think think Fay may be the real deal, and I do trust his expertise in this area. WordyAndNerdy (talk) 04:42, 3 June 2024 (UTC)[reply]
Expertise indeed, didn’t he graduate with a PhD in identifying Nazi lexicographers?
Frankly it looks like you’re bullying a person who may not be entirely neurotypical, wielding “they’re an X-ist!” as a cudgel to smash someone you dislike into submission. Nicodene (talk) 11:16, 3 June 2024 (UTC)[reply]
@Nicodene: She also underrates that I am not a native speaker and was triangulating new definitions, that I wouldn’t know how specifically saucy or redpilled it is or not. Lacking intercultural competence in an international dictionary. Where would I have looked, amongst all bilingual and monolingual dictionaries, to find all bymeanings and implications, huh, WordyAndNerdy? All was gained inductively, the gold-standard of documenting language for a dictionary. For these movement-kind of words multiple people had to guess around because they were previously uncovered, cf. slam later written in Feb 2022 by me.
And then rather than trying to be edgy I wasn’t too happy to quote that guy so that’s why I hedged and balanced it with other quotes and Wikipedia links for author and publication where you already read “far-right, conspiracy theorist, neo-Nazi, white supremacist, misogynist, Islamophobic, antisemitic, and Holocaust denial”; that was all I could, save not including it, which wasn’t compelling, since dictionaries nowadays are notoriously not SFW unless defined otherwise, but the quote read so easy and illustrative! And since I had not studied psychology consciously to guesstimate the tribalization programs hinging on references, it was barely possible to be concerned; I say concerned but not bothered because I have only an cognitive simulation of what happens in others, which I note is a bit extreme in WordyAndNerdy.
This day I read p. 63:
> Carter and her colleagues (2012) were interested in investigating the ability of children with ASD to make judgments about pictured social interactions. The pictured scenarios did not require the use of language and both the children with ASD and those with typical development accurately identified the situations that depicted inappropriate social interactions. However, the children with typical development had robust activation in their language processing network when performing the task; they appeared to be spontaneously verbally encoding the information from the scenes that they were viewing. In contrast, the children with ASD had activation in a network associated with the processing of social information but no significant use of neural resources in the language network. This result suggested that the children with ASD were not spontaneously encoding the information into a verbal form.
Well, when I read this social interaction by political content creators, nothing happens and I don’t connect scenes and don’t encode the author’s or my or my publishing platform’s eventual position. Seems like others do but the automated categorization is still likely to be toned down or correlated with better possibilities by reason, and some wokeness courses do away with this capability again, such that people graduate to see intersectional discrimination structures and victimizations everywhere, trouble as a business sector, for which people privately readapt whole personal identities, as it is defined to operate by means of identification. Fay Freak (talk) 13:48, 3 June 2024 (UTC)[reply]
I won't dignify this with a response except to note that I have decades of first-hand personal experience of ASD and somehow manage not use it as an excuse for questionable behaviour. WordyAndNerdy (talk) 21:05, 3 June 2024 (UTC)[reply]
What is your excuse, then? For things like making a personal attack on the same page where you’d voted to ban personal attacks. Nicodene (talk) 23:14, 3 June 2024 (UTC)[reply]
That's not a "personal attack" by any reasonable standard. It's a thing that actually happened, as is demonstrated by the diff. You need to stop dogging every comment I make. You've already been told that your previous comments toward me in this thread have been out of line. WordyAndNerdy (talk) 01:50, 4 June 2024 (UTC)[reply]
The very first sentence of "No personal attacks" reads "Comment on content, not on the contributor." The exact opposite of what you did.
In the list of examples of what constitutes a personal attack, we specifically find:
  • "Using someone's political affiliations as an ad hominem means of dismissing or discrediting their views."
I should like to add that in this case it's a matter of "imagined political affiliations", since FayFreak has never once that I have ever seen actually expressed the slightest whiff of believing in racial superiority or exterminating undesirables.
I'm curious by what standard you consider anything I have written "out of line" which would not apply just as well to what you have been writing yourself. Nicodene (talk) 02:04, 4 June 2024 (UTC)[reply]
Please, it's not worth your time or effort. AG202 (talk) 02:08, 4 June 2024 (UTC)[reply]
It’s you who is dogged. The same point stands, whether framed with this concept or not. You can also consider yourself as one of extreme, insane, unhealthy, not to forget wrong. Hyperfocus on the same fiddlestick for five years. Someone with the same neural preconditions as me I would know to tell to stop being autistic; it appears the same “revisiting past points” happens if you answer trauma. Like to reinstate the DMN they rub themselves off on railway tracks, though it be obviously disadvisable.
Other candidates with ASD seclude themselves, keep their interactions brief, out of concerns or actual anxiety of being blamed or missing to react on social information appropiately, depending on their verbal abilities. Looking at the stats, instead of your first-hand anecdotes we can’t revisit (unlike my story which I retell you as far as I remember), with the must-criteria for this diagnosis of impaired social cognition + repetitive and restricted behaviours, most fall just short of schizoid or obsessive-compulsive personality disorder, so they (and me) need to make an actual effort to sidestep avoidant reaction to social input arising from its restricted interpretation. It is artificial though, rather than people-pleasing, and I had to practice it years to see things differently, which includes discussing and defending controversial viewpoints favourable for certain outcomes even if I don’t feel strongly or anyhow at all about them. I am supposed to be controversial. It is sure there would be issues incomprehensible to you, with your inflexible paradigms, if they engaged in politics, which I now have to do as a daily business since graduating law school, which is an exceptional case which you probably haven’t experienced even from hearsay, and you can’t imagine how edgelordy it had to be, for my life. Stats say hubris is greatest among jurists in Germany as compared to all academics (you take your standards from other fields?), again you miss the language and culture barrier on top of the double empathy problem, by which a lot more questionable things appear anyhow if one speaks across continents, and I couldn’t expect to reap a stalker from reciting the Daily Stormer once: kidding of course, don’t get it twisted, it is what AG202 said, not worth your time or effort, though we interpret the result differently. I know you aren’t trying to stalk or concern-troll, I tried to interpret and put into different perspective, again, and enable you. Extreme viewpoints which one contradicts are super necessary to set benchmarks, very different they appear in me from how you currently deal with them. Fay Freak (talk) 03:02, 4 June 2024 (UTC)[reply]
@WordyAndNerdy: I wasn't planning to respond but I noticed you quoted me here. The reasons why I characterized User:Purplebackpack89's posts as an attempt to fan up drama:
  • Hyperbolic language ("stalking/harassment") which is frankly disrespectful to actual victims of stalking.
  • Referencing TKW's desysop vote, which has little to do with the current situation (and which also seems to argue against your premise that no action is taken against problematic editors — recall that Dan got indeffed mid-vote) and quoting random comments.
  • Inflammatory language, viz. "his edit was so bad", "it's likely he's following me around BECAUSE it bothers me.", etc., apparently intended to provoke TKW.
Ioaxxere (talk) 05:19, 3 June 2024 (UTC)[reply]
"Wikistalking" is old-school wiki-jargon. "Wikihounding" or simply "hounding" has seemingly replaced it. But it needs to be remembered that PB89 has been around since 2010. It's not unexpected for a veteran editor to sometimes use older jargon. Wikistalking/hounding has never been regarded as a one-for-one equivalent of real-life stalking or even cyberstalking. It's exactly as PB89 has expressed it: combing through someone's edit history, systematically undoing their edits, inserting yourself into unrelated disputes, etc. It might not be intended as antagonistic, but it understandably comes across as such. TKW should ideally seek to moderate his tone and conduct if he wishes to avoid finding himself at the centre of conflicts (he has improved since last year). And mentioning the desysop votes was absolutely relevant. This is not an isolated incident. It's a pattern of conduct. Ignoring past incidents won't do us any favours. WordyAndNerdy (talk) 05:44, 3 June 2024 (UTC)[reply]
@WordyAndNerdy PB89 made an absolutely unfounded accusation of harassment towards me for a single RfD of one of his terms combined with a single comment I made about him in another RfD (which is located directly above the RfD I added), in response to a comment of his. This is not the first time he has made unfounded accusations of harassment (and not towards TKW; I reserve judgment on this matter as I haven't looked at it in detail to see what the circumstances were). PB89 seems to think he can shut down criticism of his (IMO often sloppy or ill-considered) edits with such accusations. I should also add, from statements made on his user page, he rejects some core Wiktionary principles such as SOP, and seems to have difficulty understanding why Wiktionary isn't just Wikipedia-lite; so it's not surprising to me that several users feel his edits deserve extra scrutiny. Benwing2 (talk) 05:57, 3 June 2024 (UTC)[reply]
I'm going to quote myself from ten years ago (June 2014) because I found this while digging up the roadworn diff and it seems just as relevant today:
Expressing minority viewpoints or being the lone dissenter in an RfD discussion does not constitute disruption. The fact that we have a formal discussion process at all means that the deletion of entries isn't an open-and-shut policy-enforcement matter completely up to the discretion of administrators. It means that RfD is an open forum where people may put forward serious arguments for or against the inclusion of terms and have these arguments weighed on their merits. Sometimes arguments put forward will not align with majority opinion. Sometimes they'll challenge the soundness of our policies. That's good! We need more of that, not less. The exchange of ideas is what discussion is all about. On the issue of "drama," as an outsider who's watched these incidents transpire from the sidelines, I'm not going to disagree that PBP's behaviour has been problematic, or that it needs to change. But the passive tolerance of incivility on Wiktionary is the proverbial elephant in the room here. We don't have formal dispute resolution or mediation processes like Wikipedia, and when incivility occurs and someone gets upset, the general response, in my own experience, is getting told that occasional rudeness and hostility is par for the course and one should learn to deal with it. This is unacceptable. So if PBP has developed a flair for the dramatic, perhaps it's because Wiktionary, lacking any means for addressing civility concerns in a reasonable and orderly fashion, has left PBP no recourse but dramatics. PBP isn't the problem; PBP is a symptom of the problem. Is it really fair to punish someone for a problem that Wiktionary as a whole has helped to create?
WordyAndNerdy (talk) 06:07, 3 June 2024 (UTC)[reply]
@User:WordyandNerdy How would you categorize the types of incivility that are not personal attacks? Or are all types of incivility personal attacks, possibly veiled. I am wondering how to give shape to a civility policy beyond the most obvious. Attacking people's unstated (and possibly imagined) values, attitudes, beliefs, or motives is an example of problematic behavior, IMO. On occasion I have resorted to this, but I believe it to be undesirable in a wiki, as well as in many other environments. DCDuring (talk) 12:40, 3 June 2024 (UTC)[reply]
@DCDuring Two examples of this are rudeness and passive aggression. We all engage in them sometimes, but they can easily have a chilling effect on productive discourse. I'm not saying we should ban them (which would probably have a much bigger chilling effect), but I do think any civility policy needs to be more nuanced than simply banning overt personal attacks and leaving it at that. Theknightwho (talk) 13:57, 3 June 2024 (UTC)[reply]
I am looking for categories of items that are relatively easy to characterize and which have a high likelihood of triggering escalation. Such categories can form the core of undesirable behavior which can be controlled. There are lots of types of uncivility that are undesirable, but are hard to police. I think 'rudeness' and 'passive-aggression' are hard to define operationally. We can't start with them or let their existence prevent action on what might be relatively easy to control. My hope is that the basic lessons of the psychology of interpersonal relations can be productively applied here. DCDuring (talk) 14:07, 3 June 2024 (UTC)[reply]
Targeting a large number of edits by the same editor all at once is likely to make that editor feel targeted. You talk of basic psychology...basic psychology would suggest that, if a large number of edits (made in some cases over a period of years) are all targeted at once, that I would feel targeted! Anyone probably would! What's the solution here? Spread it out! Instead of targeting all my edits in a period of a few days, maybe take a couple months. Purplebackpack89 14:40, 3 June 2024 (UTC)[reply]
I would argue that this discussion should not be a forum for airing personal grievances and settling scores. User talk pages are better for those purposes. When they fail, a mediator's assistance might be warranted. We could at least try to generalize to the matter of how, in an environment of volunteers, a patroller should select entries and edits for revision and how the patroller and 'targeted' patrollee should interact. DCDuring (talk) 15:03, 3 June 2024 (UTC)[reply]
"When it rains, it pours." I'd say the airing of personal grievances here was inevitable. When there's been no remedy for problematic user conduct – and when discussions on the subject have fizzled out – the result is feeling unseen, unheard, and unvalued. Such an experience can naturally leave one with a sense of injustice. But I hope that everyone's gotten things out of their system now and we can focus on finding solutions.
I'd say that the question of how to categorise "types of incivility that are not personal attacks" depends on tone, context, and several other factors. If someone is generally in the right but is unnecessarily hasty or severe about it, I'd characterise that as "rude," "short," or "brusque." An example might be someone reverting a poorly-formatted but well-meaning edit by a newbie with "learn correct mark-up!" If they're unnecessarily harsh ("f***king learn mark-up!), I'd describe that as "abrasive," "hostile," etc. If they assert their own superiority ("learning mark-up isn't hard!"), I'd call that "snide," "condescending," etc.
None of these present major issues in isolation. We're all human and we all err from time to time. It becomes a community problem when it's a pattern of behaviour. That said I don't think it will be necessary for any user conduct policy we create to classify types of incivility with this level of granularity. All of this could be covered by a general advisement to "please try to keep a cool head and remain respectful in discussions".
Where I think we would need to get into specifics is with statements that express antipathy toward characteristics typically covered by human-rights legislation. There's no reason for invective statements to target someone's race, religion, national origin, disability, sex, gender identity, sexual orientation, etc. Of course there'll be disagreement on what crosses the line. Calling someone a "dumb American" would be taking an unambiguous potshot at their nationality. But "'British people don't say 'elevator.' Why are so many Americans ignorant?" is arguably just an unhelpfully cranky statement of the fact English varies between countries. More nuanced incidents will warrant consideration on a case-by-case basis. But a blanket rule against what is generally deemed hate speech is necessary for a communal project like Wiktionary to function (and thrive!). WordyAndNerdy (talk) 22:41, 3 June 2024 (UTC)[reply]
@Vininn126 it is irrelevant whether a particular editor is perceived to be "productive" or "sloppy". That shouldn't be an excuse to be combative with them, or escalate things Purplebackpack89 15:51, 2 June 2024 (UTC)[reply]
I've had limited but productive interaction with both TKW with WAN. I respect them both as editors and hope they can both find a way to continue editing. Contributing to Wiktionary is a particularly thankless endeavor and I imagine that, like many editors, each has received much less praise than they deserve for their efforts while being on the receiving end of a disproportionate amount of criticism. They, and other editors, have good reason to feel aggrieved and I think that we, as a community, could do a better job of shutting down bad behavior earlier and providing a forum to air grievances where the involved parties could get some perspective from uninvolved editors instead of feeling like they have to personally defend themselves against attacks. I would hope such a forum could provide actionable support for legitimate grievances, perspective for editors who feel slighted by innocuous remarks or edits, and a quick boot for anyone using it in bad faith. JeffDoozan (talk) 00:16, 2 June 2024 (UTC)[reply]
I'll be honest. I think too many people here have stopped actually building a dictionary. I don't like that. So I'll be absolutely clear as to my position once, and I sincerely hope that at least some of the people here that are trying to figure out how to emit as much aggression as possible onto unknowns on the internet will find a better hobby.
  1. I didn't become an admin to enforce any rules on "civility" or the like. I simply don't care. I should probably start helping out with closing RFDs and RFVs more often (I have been pretty busy with real-life things, but right now I have a bit more time), but other than that I am a volunteer as much as anyone else on this website, and I don't come here to do busywork I wouldn't even do if I were paid.
    • So, basically: If you need a nanny, this isn't a website for you.
    • If you get called an idiot, or stupid: Tough luck, you making a BP post on that only proves this statement.
    • Actual slurs are a different matter, and we shouldn't tolerate those in any shape or form. Use your head.
  2. We aim to be a full dictionary. We are also a politically neutral dictionary.
    • Yes, that means we have entries for slurs, Neonazi slang, communistic formation and whatnot.
    • We shouldn't use politically loaded quotes unless necessary, but sometimes they are: 99% of literature written on the territory of modern Russia in languages other than Russian will be loaded with communistic messages, that doesn't mean we shouldn't quote them.
    • If anything, quoting anything that shows a capitalistic or religious view (including the Bible even!) should be as problematic as neonazism or communism.
    • If you can't handle us hosting such quotes when they are necessary, maybe lexicography isn't your thing.
    • If you find something you think wasn't necessary, remember: assume good faith. That's like page one of our whole dictionary. I feel this rule that should be plastered all over the website is forgotten too easily in the last few years. The person adding a neonazistic quote isn't necessarily a neonazi themselves, they may just be lazy and have found this quote before any others. That's why I add communistic quotes for Ingrian, because that's most of the literature, and it's easier for me to just take a book and add quotes word for word than look through the entire corpus hoping to find a sentence where the word "religion" isn't followed by "is complete bollocks".
  3. The recent amount of technical "fixes" has grown out of control.
    • Entries go first, templates go second, and markup goes last.
    • Going out to change any technical feature of a language you aren't personally in the process of adding entries for should be done only at the request/agreement of the ones that do edit it. In the best case, you will have to re-do these changes later on when an active editor appears, and in the worst case you will lose every single editor that is invested in working on this dictionary at all.
  4. In the end, seriously, I would rather have an editor do constructive work and be a little rude than an editor doing nothing and be the nicest person in the world.
    • I'd say 99% (yes, I like that number) of the languages in our dictionary are grossly underrepresented. To give an example: Just today Ingrian (which has an estimated 20 native speakers) surpassed the closely-related Estonian (which has an estimated 1.2 million native speakers) in terms of number of lemmas, and the situation in Africa and Southeast Asia is even worse.
    • If an admin is monitoring your edits, it's because you apparently did something wrong. Doesn't mean you're a bad editor, just means you have room to grow. See what was changed and try applying that in the future.
    • Now, if you continue to make the same mistakes over and over again, then you'll at some point get the message "Please stop, and if you don't you'll get a block", and at that point you should really stop. We cannot keep fixing your mistakes for you.
    • To the admins monitoring: If you tell the people why you're going to monitor their edits, that will probably be more effective than just acting like you're not doing that, or only explaining it after they have completely freaked out.
Maybe let's stop trying to figure out who's right and wrong and start actually working on the dictionary? Does that sound like a plan? In that case, we don't need any conflict resolution, because nobody will offend anyone and nobody will get offended. Sounds like a win-win to me.
Because seriously, what in the world is keeping you from editing so much that you absolutely need me and a few dozen other editors to write this type of enormous text just to solve it? Thadh (talk) 15:59, 2 June 2024 (UTC)[reply]
I'd have to strongly agree with a lot here. Maybe not everything, but a lot. I'd like to emphasize that it seems to be the people who stir up the most mud also seem to do the least editing. Vininn126 (talk) 16:01, 2 June 2024 (UTC)[reply]

Here's my 2c, most of which has been said by me or others elsewhere.

  1. Wiktionary tends to be dominated by a relatively small group of "guardians", such as Knightwho, Equinox and Fay
  2. Some of those guardians (again, Knightwho, Equinox and Fay) have problems getting along with non-guardians
  3. The guardians aren't that interested in holding each other accountable
  4. Some of the guardians are OK with driving non-guardians from the project. At least one of them (rather foolishly) stated that publicly.
  5. This is in conflict with one of the base principles of all Wikimedia projects: that anyone can edit them
  6. With great power comes great responsibilities. In exchange for being awarded the blocking tool, admins should be expected to be held to a higher standard than non-admins
  7. There is no deadline. Except for obvious vandalism, there's no need for minor tweaks to be done immediately, nor is there any need for them to be done by any one editor in particular
  8. It's been pointed out quite a few times, by several different editors, that Knightwho has a problem with conflict and escalation (one example being that, when I felt harassed, he just went further and further back into my edits, rather than stepping away)
  9. Remedies have been offered to KnightWho on how to avoid conflict, and he's ignored them

What does this mean in real terms?

  1. De-escalation is a good and necessary thing
  2. If the parties are unwilling to de-escalate, remedies like two-way interaction bans need to be available.

Purplebackpack89 15:44, 2 June 2024 (UTC)[reply]

I am going to be perfectly frank. Someone shouldn't be an admin if they aren't willing to enforce user conduct standards. Civility is one of the five pillars on Wikipedia. There is no reason for a load-bearing policy to be entirely absent on Wiktionary except to preserve and enable a toxic culture. Any rank-and-file editor could theoretically do menial maintenance tasks such as closing RfVs. I had a short stint running Word of the Day back in 2012 and I was (and remain) a non-admin. The necessity of admins is not in doing maintenance tasks but in keeping the peace. With the ability to block disruptive users, they might be thought of as a wiki's police. Ideally, blocking shouldn't be the first line of defence. Problem users can be dealt with through guidance, de-escalation, interaction bans, mediation (if such a process existed here). When one of the few woman editors sticks her head above the parapet to speak on her negative experiences, she shouldn't receive gaslighting, condescension, and a stunningly weird and deeply discomfitting jeremiad about how men are too horny to work with women in response. It's impossible to have a serious conversation when this type of rank nonsense is tacitly allowed. Was this thread started to have a discussion about how Wiktionary can create dispute resolution processes? Or is it an exercise in hand-waving and navel-gazing ("Why can't everyone just get along?") without any actual commitment to examining Wiktionary's systemic issues and implementing badly-needed changes? The fact that a civility policy seems slated to be rejected by a landslide beggars belief. I honestly don't think anything is going to change without WMF intervention. The rot has spread too deep for Wiktionary to keep its own house. WordyAndNerdy (talk) 17:17, 2 June 2024 (UTC)[reply]
I did hope that it would lead to the former, myself. (In fact, I hoped that we would make a dispute resolution process.) CitationsFreak (talk) 18:08, 2 June 2024 (UTC)[reply]
It won't. I went into this discussion skeptical, and it's affirmed every misgiving I had. Even the level heads in the room seem to be taking a hands-off approach. No one wants to the one to button down and call for change. Tall poppies are smacked down; squeaky wheels are dismantled. Doesn't matter if they've got 14 years of solid work behind them. Preserving a cootie-free space for the boys' club is apparently more important than building a dictionary. Heaven forbid anyone be required to exercise personal restraint in what is functionally a professional setting. That's woke pinko free-speech suppression or something. WordyAndNerdy (talk) 18:47, 2 June 2024 (UTC)[reply]
@WordyAndNerdy I made a (bare-bones) proposal above, do you have any thoughts about that? User:CitationsFreak and User:Theknightwho are the only ones who made any comments about it so far. I am trying to find something that will both have some substance in it and work in practice (two aims that aren't easy to reconcile). Benwing2 (talk) 18:59, 2 June 2024 (UTC)[reply]
Do you mean this? I'd considered the possibility of a semi-formal mediation process myself. But such a scheme would be just as easy to game as a more legalistic one. Too often subjective judgments inform individual perceptions of a situation. The scale will always be weighted in favour of those with power and the right connections. People are more willing to assume good faith of people they admire and/or consider friends. Which is why I believe an intermediate stage in the dispute-resolution process would be necessary. Problem users (including admins) could be restricted to 1RR and required to bring concerns to the BP to ensure uninvolved eyes assess the situation. We'd need to be comfortable with enforcement being applied asymmetrically in some cases. Sometimes both "sides" in a conflict aren't equally guilty of bad behaviour. An admin who is habitually hostile/antagonising isn't the same as a rank-and-file editor who reacts poorly in an isolated instance. That's a level of nuance more legalistic approaches are generally better at handling. WordyAndNerdy (talk) 19:39, 2 June 2024 (UTC)[reply]
@WordyAndNerdy Thank you for your response. I think in general, edit wars should quickly be brought to the Beer parlour; if you get to the point that you've done 3 reverts (or even two), you should stop and bring the discussion to the BP. At least, this is what I've done and I have seen others do the same. We are generally less tolerant of edit warring than Wikipedia is. Maybe something like this can be put into a formal policy. I do agree that sometimes one person will be right and other wrong, although it's not always apparent to outside admins. As an example, there was a dispute a few years ago between User:Saranamd (aka Tibidibi/Karaeng Matoaya) and B2V22BHARAT. Both users asserted the other was wrong and was edit warring; eventually it was clear that the latter user was in the wrong and was blocked for a week (causing them to leave), but it took awhile to sort this out, esp. since there was no admin dedicated to the dispute. I agree in general that any process can be gamed, but having the process is better than not having one at all, and I think maybe a mediation process with a single uninvolved admin could be an intermediate step required before a full legalistic panel. I have read through such panels in Wikipedia, and they're exhausting just to read (much less to participate in, I'm sure). Such panels may be necessary in Wikipedia because they are often caused by underlying real-world political disputes (abortion and other US political issues; the Israeli-Palestinian conflict; a whole host of Eastern European conflicts; etc.). But in my experience these disputes are thankfully less relevant in Wiktionary, where the disputes instead are more on the personal level. I invite others to contribute suggestions regarding what should be considered actionable, what the steps are in the process, etc. Benwing2 (talk) 21:49, 2 June 2024 (UTC)[reply]
You're correct that people are often unaware of points of contention outside their own personal experience and knowledge base. That's why it seems integral for project Wiktionary to strive to both invite and sustain a diverse editor base in order to help counteract systemic bias. While I'd personally prefer a more structured ("legalistic") approach, any dispute-resolution process would be a vast improvement on none. WordyAndNerdy (talk) 22:59, 2 June 2024 (UTC)[reply]
I am cautiously more hopeful; I read support on the vote page, even from oppose voters, for having a thought-out civility policy; the thing which the vote looks set to defeat is one editor's attempt to win a personal dispute by pushing through a page from 2006 seemingly without even reading/comprehending it enough to notice it still said one of the processes involved notifying Jimbo. I'd like to hope a guideline that doesn't posit "Head Boy of the boy's club should be notified", a modern civility policy written in 2024, is attainable. (I also think ensuring the policy / community has mechanisms for dealing with gaming is a valid and serious concern; on Wikipedia, my anecdotal count is that it seems like about half the trans editors who've dared edit trans topics there have gotten baited/gamed and censured/censored/banned; I think we do need to think about how to write a civility policy that doesn't empower the one or two people taking the stance that someone calling out / disliking Nazism is the one in the wrong.) - -sche (discuss) 18:33, 2 June 2024 (UTC)[reply]
I'm also not aware of any openly trans or non-binary Wiktionarians. I'm sure there's a couple but how many want to hang around with all the trans-antagonistic soapboxing that goes on here? Our collection of trans-related terms has seemingly been built primarily by cis people. Imagine if all entries for a language were created exclusively by non-native speakers. How would that shape Wiktionary's coverage of that language in subtle ways? I mean, the general lack of AFAB editors on here is of genuine lexicographical concern. WordyAndNerdy (talk) 20:52, 2 June 2024 (UTC)[reply]
While not the same issue, I feel the same way about racial issues. I've been called epithets by users/IPs and had to go on resource dives for showing that the most basic terms are actually offensive, see the history of all lives matter, specifically this edit, for an example. However, one thing I do think I've learned here, for better or worse, is that it's not worth it to get into spats even if you're in the right. It just bogs you down and puts a negative light on you. For myself, I just keep mental track of folks I've interacted with and act accordingly, such as with Equinox. Not worth it to argue anymore. That obviously doesn't work for everyone, and it's not easy, but it keeps me sane on this project, especially after 2022 with the discussions leading up to the creation of WT:DEROGATORY. I just hope that one day this project will be welcoming enough to where we can get actual coverage done for the languages that really need it. AG202 (talk) 21:05, 2 June 2024 (UTC)[reply]
Same here. CitationsFreak (talk) 21:12, 2 June 2024 (UTC)[reply]
I'm not sure if I'd personally label all lives matter as "offensive." That phrase seems to be employed more as a silencing tactic than a provocation. One might argue it's the racial analogue of not all men. That kind of complexity can be difficult to condense into a context label. I might've offloaded it onto usage note as happened at TERF. But I'm willing to accept that I've got a large blind spot here. It's definitely good to have a diverse editor pool for this reason. Not everyone is going to catch errors that result from their own limited experience and/or biases. As for continuing to edit despite it all, I'm not sure that's feasible for me, given it's clear I'm unwelcome here. There was a time when it took me more than a year to point out that an editor (not Equinox, to be clear) was habitually inserting inflammatory quotes from manosphere blogs into random entries. I don't have the patience for tying myself in knots trying to explain why that's a bad thing without referencing systemic oppression and prejudice anymore. WordyAndNerdy (talk) 21:54, 2 June 2024 (UTC)[reply]
@WordyAndNerdy I'd like to clarify that you are definitely not generally unwelcome. Yes, some contributors have essentially told you to fuck off, but I for one appreciate your contributions. E.g. you have added a lot of info about fandom ships, something I know next to nothing about; from reviewing your contributions, I also see stuff related to non-binary and other gender-non-conforming communities (if that is the right term), social-media memes and trends, and other stuff that's important for keeping Wiktionary up-to-date and representative of all (sub)cultures, not just the dominant one. Benwing2 (talk) 05:13, 3 June 2024 (UTC)[reply]
Thank you for the kind words. One of the most gratifying things was randomly seeing "WIKTIONARY HAS SHIP NAMES???" in a tweet. Knowing my work is being referenced by people outside the fandom sphere is cool. WordyAndNerdy (talk) 06:13, 3 June 2024 (UTC)[reply]
I can think of at least two who have openly identified themselves; I'm sure -sche knows of more. I'm not sure however if either of the people I'm thinking of have contributed to trans-related entries. One used to be one of the most active contributors, esp. for bot-related work, but left for reasons (I think) are at least partly unrelated to their trans status. The other is still active but has stayed away from this discussion. Benwing2 (talk) 21:20, 2 June 2024 (UTC)[reply]
Nor are they required engage in this discussion in a "any marginalised individual in a group is required to serve as a spokesperson" kind of way. I just think it would just be nice to have more LGBT editors onboard to help counteract systemic bias. As rewarding as it has been documenting trans-related coinages on Wiktionary, it can feel like talking over actual trans people or treating them as anthropological curiosities at times. WordyAndNerdy (talk) 22:14, 2 June 2024 (UTC)[reply]
If Wiktionary really is a "boys' club", may I suggest you take the first step to improve this state of affairs by de-sysopping yourself, having been one of the boys in charge for years now? "Walk the walk", as they say.
For the record I don't buy it. A perennially catty user (Equinox) being catty to yet another person is not because they're a woman, it's because they're just another person. FayFreak is not a Nazi whatsoever, he's a "free speech" champion. You disagree with him, I disagree with him as well – the difference is you see burning malice where I see a kind of optimistic naïveté. Nicodene (talk) 22:15, 2 June 2024 (UTC)[reply]
What part of "I was (and remain) a non-admin" do you not understand? Would be really nice if you actually followed this discussion instead of shadowboxing against things that no one said. WordyAndNerdy (talk) 23:11, 2 June 2024 (UTC)[reply]
What part of my replying to -sche, not you, do you not understand? Nicodene (talk) 23:19, 2 June 2024 (UTC)[reply]
Then use @ to make it clear who to whom you're speaking because this thread is playing fast and loose with indentation. WordyAndNerdy (talk) 23:23, 2 June 2024 (UTC)[reply]
Your hostile remarks toward -sche are also completely unwarranted. Maybe sit this one out if you're just gonna throw peanuts from the gallery. WordyAndNerdy (talk) 23:26, 2 June 2024 (UTC)[reply]
Basic reading comprehension on your part is not my responsibility. How "[you've been] one of the boys in charge for years now" could possibly be construed as being about you is beyond me.
I don't think what I've said (and I stand by it) comes anywhere near frivolously accusing someone of Nazism. If you'd like to apply your own apparent standards for hostility to yourself and "sit this one out", I'll be happy to follow suit. Nicodene (talk) 23:34, 2 June 2024 (UTC)[reply]
Can we please de-escalate here? —Justin (koavf)TCM 23:42, 2 June 2024 (UTC)[reply]
Feel free to start a de-sysop vote for me, but something tells me your idea of what an admin should or shouldn't want or have to do is not the community consensus. Thadh (talk) 18:36, 2 June 2024 (UTC)[reply]
  • If we subtracted all of the statements in this discussion that themselves were about individual persons' values, attitudes, and beliefs, including defensive reactions, we would have a very short discussion indeed. I don't see that most of the discussion here is contributing to the topic-creator's concerns or even to an improvement of that statement of concerns. DCDuring (talk) 12:40, 3 June 2024 (UTC)[reply]
    I completely agree with you. Theknightwho (talk) 13:59, 3 June 2024 (UTC)[reply]

how to identify locations in audio snippets of minority languages?

[edit]

I am cleaning up the captions of audio snippets, and I've come across an issue that needs discussion. Sometimes if the audio file refers to the location where the language in question is a minority language, the file identifies the location using the minority language's preferred name instead of the common English name (which is usually based on the majority language). Examples:

  • There are 1,179 snippets for Palestinian Arabic as spoken in Lod, Israel, which identify it using the Arabic name al-Lidd.
  • The audio for the Northern Kurdish term emerîkî comes from Van in Turkey but originally identified it using the Kurdish name Wan. (In this case I changed it to Van before the wider issue became apparent.)
  • There are 5-6 Northern Kurdish terms from Diyarbakır that identify the location as Diyarbakir (note the two i's in the spelling), using the Kurdish form of the same name, and one that identifies it as Amed, using the normal Kurdish name. (Note, in this case, the form Diyarbakır is a Turkified name adopted in 1937; the older form in Turkish was Diyarbekir, from Arabic.)

I'm sure there are others, but these are the most politically fraught ones I've come across. The questions are:

  1. Should we use the common name, as Wikipedia does (the above cities are found under Lod, Diyarbakır and Van, Turkey) or defer to the minority language's name?
  2. If we defer to the minority language's name, do we do this only in certain cases (e.g. ones that are politically fraught)? (I bring this up because e.g. Navajo names of places tend to be radically different from the corresponding English ones, cf. Window Rock, Arizona vs. Navajo Tségháhoodzání and I think it would be confusing to use the Navajo names.)
  3. What about accent marks not typically found in the common English name? E.g. there are hundreds of Vietnamese audio snippets that currently use the spellings Hà Nội and Hồ Chí Minh City, which I've changed to Hanoi and Ho Chi Minh City in accordance with the common English names.

Benwing2 (talk) 04:22, 2 June 2024 (UTC)[reply]

This is the sort of thing that AI should be good at doing. —Justin (koavf)TCM 04:29, 2 June 2024 (UTC)[reply]
@Koavf I don't get what you're saying at all. Maybe you're misunderstanding my questions? Benwing2 (talk) 04:49, 2 June 2024 (UTC)[reply]
I can be ignored here. Sorry. —Justin (koavf)TCM 04:58, 2 June 2024 (UTC)[reply]
For Navajo and other Native languages, my gut reaction is: if the entries currently use Navajo names, then either just continue to use the Native name, or list both ("Tségháhoodzání / Window Rock" or vice versa). Perhaps not in that specific case, but in the case of some other Native placenames, the nearest semi-applicable English name may have different scope/boundaries (or it may be unclear where the Native placename was, although this is probably not going to be a problem with audio files), so retaining the Native name seems useful. Slashing both would be a lot to type, but this might be mitigated if the template/module drew on T:a-et-al and so e.g. "Tségháhoodzání", "Window Rock", and optionally some even shorter name like ~"nv-TG", could all be aliases...? Pinging User:Eirikr for your thoughts.
For Palestinian Arabic, renaming cities to Israeli names indeed feels way too loaded, and for my part I would not support it. (If we have audio samples from "Bakhmut, Ukraine", does there come a point at which it's been occupied long enough that we change them to "Artyomovsk, Russia"? Ehhh...) For diacritic differences like the Vietnamese examples, I'd be inclined to use the common English form; that seems like another place where it could be useful if the template/module could know Hà Nội was an alias of Hanoi and display "Hanoi" when given the input "Hà Nội". - -sche (discuss) 06:06, 2 June 2024 (UTC)[reply]
@-sche The template does use {{a}} for this purpose so a lot could be done with aliases, although I'm not sure it would make sense to have slashed names in most circumstances. (The Navajo example I brought up is theoretical in any case; AFAICT none of the Navajo audio files identify any place name at all, although many say "Audio (NV)", which I am tempted to delete because it seems to convey no useful info. Similar issues occur with "Audio (AF)" for Afrikaans, "Audio (CS)" for Czech, "Audio (KN)" for Khiamniungan Naga [KN is the country code for St. Kitts and Nevis, which is nowhere near India :) ...], and "Audio (BCL)" for Bikol Central = lang code bcl.) The issue with Lod, as with all Israeli/Palestinian issues, is very complex and fraught; the reason I brought up this example in particular is that Lod is not internationally considered occupied and AFAICT the term "Lod" does not have the sort of political baggage associated e.g. with terms like Judea and Samaria, so it may not be parallel with the case of Bakhmut or with cities in Gaza and the West Bank, which unquestionably should use Arabic language names. Maybe a more parallel example is Lviv, formerly a Polish city known as Lvov; if we somehow had Polish audio from this city, it might make sense to use a slashed form Lviv/Lvov, and similarly here maybe Lod/al-Lidd? Same thing might apply to Jerusalem/al-Quds? (The status of this city is even more convoluted and intractable but since the common name in English is "Jerusalem" and most readers won't be familiar with "al-Quds", I think it would be confusing to only say "al-Quds".) For that matter, maybe this approach is tenable also for the Northern Kurdish terms I mention above. Benwing2 (talk) 06:39, 2 June 2024 (UTC)[reply]
OK, I seem to have reversed myself from what I said at top. Benwing2 (talk) 06:40, 2 June 2024 (UTC)[reply]
Lvov is the Russian name. The Polish name is Lwów. There is surely some English dialectological study of Palestinian Arabic, where the Jerusalem dialect has some name. If it is called Al-Quds, i will rather go for using Al-Quds, because it is how this dialect is known in the English books about the Palestinian dialects. But nobodys gonna refer to Moscow dialect of Russian as "Moskva", cause the English books on Russian dialectology are surely using "Moscow" as the name of this dialect. On Diyarbakir, we should see some English books on Kurdish dialects how they call this dialect. Tollef Salemann (talk) 19:33, 2 June 2024 (UTC)[reply]
@Tollef Salemann Thanks, my mistake. If you know of any books dedicated to Palestinian or Kurdish dialects, feel free to list them. I would guess that the more well-known a place is, the more likely the common name will be used (as you note with Moscow vs. Moskva, etc.). Benwing2 (talk) 21:51, 2 June 2024 (UTC)[reply]
Thanks for the ping, but I'm not sure I have any useful input here. Cheers! ‑‑ Eiríkr Útlendi │Tala við mig 22:08, 5 June 2024 (UTC)[reply]
We should use whatever the literature does, which will probably be the language's own name. Thadh (talk) 07:32, 2 June 2024 (UTC)[reply]
@Thadh I actually suspect it will vary greatly depending on the individual author. It's hard for me to believe there will be any discernible standard here. But I may be wrong. Benwing2 (talk) 08:28, 2 June 2024 (UTC)[reply]
Of course it will vary, but there will probably be an overal tendency to prefer native words over local words or the other way around. Thadh (talk) 08:32, 2 June 2024 (UTC)[reply]
Agree with Thadh. Now we need to find all the English books about Palestinian and Kurdish dialects. Tollef Salemann (talk) 19:36, 2 June 2024 (UTC)[reply]

Dealing with controversial quotes

[edit]

In a bid to end the discord concerning the addition of quotes that disseminate objectional political etc. views, I would like to draw everyone’s attention to a recent discussion in which User:Geographyinitiative said he favors adding controversial quotations in the Citations namespace, which he deems a safe haven for such quotes which may not be suitable for adding in the dictionary entry. I, on the other hand, held the opinion that we could consider adding a note of disclaimer stating that Wiktionary does not endorse any of the views expressed in any quotes and they are for educational purposes alone (in this case however there’s the problem of cluttering up the dictionary page, so the note probably could be put in the mainpage?) Alternatively as a marriage of the twain ideas, we could as well resort to adding every controversial or inappropriate quote soever in the Citations namespace along with the said note of disclaimer put at the top of the Citations page using a template.

I think any of these ideas will be an attractive option if some people get so triggered by quotes bearing controversial POVs. Just my tuppence, thank you. Inqilābī 22:05, 2 June 2024 (UTC)[reply]

I thank Inqilābī for the above comment, and I will say that I do not anticipate there is any negative outcome from this discussion from my view. I am fine with any note of disclaimer as proposed. Even if every Citations page I have worked on were deleted, I'm still okay. However, one among many uses for the Citations page seems to be to catalogue "fringe" material in a way that people can see it without it being right on the entry. There are other reasons for a Citations page. But I consider it one of uses. For instance, the users here like to analyze some wild racist words from dangerous evil blogs. That material seems so vile and repulsive to me that no note of disclaimer could fix it. But there should be some venue for the material given the "descriptivist" stance of the dictionary, so Witkionary "throws it in the hole" (the Citations page) so you can consult that if needed. There are numerous other uses for Citations pages including: a place for inter-sense citations or citations of uncertain sense (the 1966 and 1975 citations for Citations:transgender), a place for re-organizing senses or analyzing contexts, a place for cites of little importance or value for the entry proper, a place for words with only two acceptable cites so far (Citations:intercessionate), a staging area for a potential future entry if conditions permit (Citations:Pinghai), etc etc. The Citations page doesn't have to meet the standards of the entry proper, and stuff is less likely to be deleted there. But I tell you, some of the soul-scarring shit I've seen on the Citations page could NOT be solved by any note of disclaimer. It would HAVE to be deleted from the entry proper, regardless of anything, IMO. The Citations pages create distance from some of the most evil authors's evilest sentences I've ever seen and Wiktionary's entries, while simultaneously remaining true to the purist descriptivist mission. Wiktionary will not be allowed to exist if it puts those sentences on the entry proper. --Geographyinitiative (talk) 22:18, 2 June 2024 (UTC)[reply]
Thank you for the elaborate reply Geographyinitiative. Just for the record, the main reason I wrote this post is due to disputes involving other editors, and not because of my RFD nomination that day. I would also like to maintain that I do not advocate deleting every Citation page, I understand your reasoning. Now if other editors overwhelmingly agree that such quotes can be thrown and kept secure in the Citations bin, then my suggestion of a disclaimer can be ignored. Inqilābī 22:34, 2 June 2024 (UTC)[reply]
The "citations page containment zone" idea was floated back in 2022 and was not well-received for all its merits. WordyAndNerdy (talk) 23:22, 2 June 2024 (UTC)[reply]
IMO, quotes espousing controversial or bigoted viewpoints should be limited to terms that are themselves associated with such viewpoints. If we stick to this, it shouldn't be necessary to have a cordon sanitaire like putting them in the Citations page, because the terms themselves will normally have (or certainly should have) labels indicating that they are controversial, offensive, etc., which clues the reader into the fact that the quotes (which are hidden by default) may express such viewpoints. Benwing2 (talk) 23:44, 2 June 2024 (UTC)[reply]
Does this include things like using a quote from a racist speech on a word that is related to racism? CitationsFreak (talk) 23:53, 2 June 2024 (UTC)[reply]
I would think so in general. What is the example you're thinking of? What to me isn't appropriate is e.g. User:WordyAndNerdy's example of an incel-type quote added to the word roadworn, since there's nothing about this term that relates specifically to the incel community or any other controversial viewpoints. Benwing2 (talk) 23:59, 2 June 2024 (UTC)[reply]
I do not have a good grasp of policy. I'm just trying to 1) protect Wiktionary while 2) allowing the purist descriptivist mission to flourish. So my view does create a cowardly "semi-censored" and "self-censored" aspect to the project. It's not a good solution. But we exist in a society, and I guarantee Wiktionary could be snapped like a twig if it crossed the wrong lines. One device we can use to assuage people is say "hey it's not on the entry". Basically this applies to "fringe" content, so you just have to judge it for yourself. --Geographyinitiative (talk) 00:14, 3 June 2024 (UTC)[reply]

Fundamentally, it boils down to "don't use a controversial quote unless you absolutely have to"

  1. Don't use controversial quotes to talk about editors or about real people
  2. If a word has three non-controversial quotes, use those three

Purplebackpack89 01:00, 3 June 2024 (UTC)[reply]

No, it doesn’t. We do things that we don’t have to because it comes out optimized or more illustrative, rather than absolutely necessary. Don’t do things I “need to”, for example I don’t need creatine monohydrate but probably still benefit from it. And you wipe off the issue how controversiality is inferred and portrayed; in isolation, the Daily Stormer quote wasn’t the same as the site in general, but someone pushes a stance about the whole resource. Fay Freak (talk) 01:20, 3 June 2024 (UTC)[reply]
Like the lock in {{R:OED Online}} “paid subscription required” (which I wanted elsewhere, for legal databases I quoted from) we could have a symbol and tooltip warning about “low factuality”. As on Ground News but less regularly. Nothing too regular since we generally shan’t consider any sources controversial inasmuch they are used for their language (which in rare cases itself is trolling), we already have a contradiction here and a lot of cognitive capacity is wasted for evaluating sources. “Incel-type quotes”? Am I supposed to waste my energy to say anything about these people?
Yet still Geographyinitiative does not recognize content generated via AI by the Chinese propaganda department snuck in as quotes, about which in some cases I have insider knowledge. Academic databases are littered with automated language, and in the former cases “publishing” takes places via PEMT. I invite everyone to search "gullible Bayes".
At least for random Neo-Nazis we know they are real people putting in the effort, and back then I also reasoned that this human language has durability, since Mr. Anglin has still not been downed from the internet despite all the efforts. AI imitates Mr. Average, and avoids controversial statements, think about it. Fay Freak (talk) 01:20, 3 June 2024 (UTC)[reply]
@Fay Freak: The AI as a technology is perfectly capable of generating hate speech and Nazi propaganda [5][6]. It's just that the big players in the AI industry are making efforts to suppress this in their own products. But the technology can't be stopped and it is available to anyone. There may be already a lot of AI generated Neo-Nazi content in the net. So I wouldn't just blindly assume that every Neo-Nazi content is human generated and thus has some kind of linguistic relevance. --Ssvb (talk) 16:04, 3 June 2024 (UTC)[reply]
@Ssvb: I don’t blindly assume it, but there are a number of reasons against the existence of formally plausible versions of it, apart from the circumstance that I have not stumbled upon it despite searches of the most heterodox things and following the upcoming trends in politics, which are warily tracked by hostile journalism more than anything if coming from this end. AI-generated article images of white families or the like appear, but we mean the texts. Currently everyone suppresses it, the hard cores of Neo-Nazis are too dumb or ideologically averse for targetted computer-generated content, and manual labour is too cheap and worth it for them: Like Kremlebots are real people sitting at a known address in Saint Petersburg. And it does not work: As the neuronal networks are trained on some old averages, even if it be biased content, and then have so-called model decay, they don’t hit humans where it hurts, they would have to have intricate understanding of current connotations of ideological concepts in order to reframe personal identities of people. You don’t change people’s worldviews with AI, though you can promote specific assumptions.
It’s a general problem in education, too. AI programs very much but teaches programming very little and human teachers will always exist and be preferred by totalitarian systems as well, and our dictionary be human-made because we explain politics, philosophy, psychology etc. Fay Freak (talk) 16:45, 3 June 2024 (UTC)[reply]
@Fay Freak: "the hard cores of Neo-Nazis are too dumb" - this is a very questionable claim and I wouldn't count on that. Additionally, dumb people tend to make grammar and spelling mistakes, so this reduces the value of their content for Wiktionary. And some of them are even not native speakers. For example, I wouldn't consider the Anders Breivik's Manifesto to be a valuable example of written English. --Ssvb (talk) 19:07, 3 June 2024 (UTC)[reply]

My approach, as I said in the 2022 discussion linked above, is: "we can (and do) already move un-illustrative, including unnecessarily offensive, quotes to Citations: pages if they're needed for WT:ATTEST. (If they're not, like someone is adding racist screeds as cites of and, just replace them with normal cites and block the user if needed.) This does lack a reader-facing warning [...] but eh, that probably reduces the amount of bad-faith or even good-faith debates over whether a quote is "really offensive" that a content warning would attract." We already see trolling about "they're not white supremacists, they're white racialists / race realists" etc etc: any "this quote is offensive"/"we don't agree with this quote" notice would just be a magnet for endless disputes. And do we apply "this quote doesn't represent our views" to quotes that express e.g. old or modern flat-earth or geocentric views, i.e. views that aren't really offensive but which nonetheless aren't Wiktionary's views? It's a morass we needn't create. Indeed, I'm not sure there's actually a problem here in the first place? AFAICT what I outline is what is broadly already done; is anyone actually going around and adding citations of Mein Kampf to und and der (and not immediately being reverted), is there an actual issue happening...? - -sche (discuss) 01:26, 3 June 2024 (UTC)[reply]

Maybe the age of a quotation also plays a big role? Being old gives it at least a historical value. So that the ancient "flat-earth" theories are okay, but modern "flat-earth" theories - not so much. The former are likely to be honest mistakes, while the latter are likely to be the work of nutcases. Also if the readers see that a quotation is older than maybe 1950, then they can figure out themselves that it's unlikely to present a relevant up to date scientific information even without any extra disclaimers. For example, I added this quotation recently, which is stating something that is possibly not true nowadays (and possibly even debatable back in 1916). But does anyone really care? --Ssvb (talk) 18:49, 3 June 2024 (UTC)[reply]
I pretty much agree with -sche here. I am not sure if this problem really merits a whole policy to tackle it, it's really a problem of common sense.
If an offensive quote does not add lexicographical value compared to a non-offensive quote, don't use it, or feel free to replace it with a more neutral quote (even if only because it is a waste of everyone's energy building this communal project to be bogged down in disputes over offensive quotes, or what constitutes offensiveness).
If it does add value (such as in illustrating firsthand the usage of offensive words, or of offensive senses of otherwise unoffensive words) or there are no good unoffensive candidate citations available in durably archived sources, feel free to use an offensive quote within the limits of reason. The guidelines that apply at WT:USEX ("Be friendly", particularly) already codify this for usage examples, fwiw. If we really want, we could expand WT:Quotations#Choosing quotations with a few (permissive) lines to the same effect, I wouldn't be opposed to that.
(I would on the other hand be opposed to disclaimers in mainspace indicating that a quote may be considered offensive, and I do not think that quarantining potentially offensive quotes in the Citations namespace is necessary as long as the principle of least offensiveness is followed wherever offensive quotes do not add any lexicographical value.) — Mnemosientje (t · c) 14:43, 5 June 2024 (UTC)[reply]

Announcing the first Universal Code of Conduct Coordinating Committee

[edit]
You can find this message translated into additional languages on Meta-wiki. Please help translate to your language

Hello,

The scrutineers have finished reviewing the vote results. We are following up with the results of the first Universal Code of Conduct Coordinating Committee (U4C) election.

We are pleased to announce the following individuals as regional members of the U4C, who will fulfill a two-year term:

  • North America (USA and Canada)
  • Northern and Western Europe
  • Latin America and Caribbean
  • Central and East Europe (CEE)
  • Sub-Saharan Africa
  • Middle East and North Africa
  • East, South East Asia and Pacific (ESEAP)
  • South Asia

The following individuals are elected to be community-at-large members of the U4C, fulfilling a one-year term:

Thank you again to everyone who participated in this process and much appreciation to the candidates for your leadership and dedication to the Wikimedia movement and community.

Over the next few weeks, the U4C will begin meeting and planning the 2024-25 year in supporting the implementation and review of the UCoC and Enforcement Guidelines. Follow their work on Meta-wiki.

On behalf of the UCoC project team,

RamzyM (WMF) 08:15, 3 June 2024 (UTC)[reply]

"ux" template

[edit]

I now religiously (well, most times) use the "ux" template for usage examples, since it is what I see others have done, but since this is no easier (in fact actually more to type) than not using it, I wonder whether anyone could explain what the actual advantage is, if any? Mihia (talk) 17:41, 3 June 2024 (UTC)[reply]

As opposed to plain wikitext? Category. Same with {{co}}. Vininn126 (talk) 17:46, 3 June 2024 (UTC)[reply]
By "category", do you mean that it puts the article in the category "English terms with usage examples"? Not that I am really complaining about typing a couple more characters to use "ux", it's not a big deal, but out of curiosity I wonder what use to anyone or anything is such a category? (A category for articles without usage examples I could understand.*) Mihia (talk) 17:57, 3 June 2024 (UTC) -- (* or, actually, a category for definitions without usage examples would be more useful, since an entry could have ten definitions, only one of which had a usage example, yet still, as far as I gather, show up in "terms with usage examples")[reply]
A big underappreciated advantage is that the "ux" and "quote-book" templates are machine readable. This allows easily doing various kind of automatic processing. Yes, it's possible to find terms with missing usage examples if you are interested in that. --Ssvb (talk) 18:06, 3 June 2024 (UTC)[reply]
I use it quite often to see what entries still need a usex in the languages I edit. I think others do, too. It's similar to the "English terms with quotations". Thadh (talk) 18:22, 3 June 2024 (UTC)[reply]
How do you use category "terms with usage examples" to find entries that don't have usage examples? Mihia (talk) 18:33, 3 June 2024 (UTC)[reply]
I compare it to the other category. Thadh (talk) 20:55, 3 June 2024 (UTC)[reply]
For non-English languages, {{ux}} is required for the text to be tagged in the correct language for e.g. screen readers or other automated software. — SURJECTION / T / C / L / 19:15, 3 June 2024 (UTC)[reply]
Not to mention script and font. Thadh (talk) 20:56, 3 June 2024 (UTC)[reply]

I wasn't quite aware of the intended scope of this. Apparently it's to be an all-in-one etymology template, subsuming the functions of {{affix}}, {{inherited}}, {{etymid}}, etc.

Its current syntax strikes me as more than a bit unintuitive, and I'd like to propose a somewhat more user-friendly way of going about it:

For categorization purposes, the default assumptions would be as follows.

  • If all the language codes match (i.e. it's a language-internal formation): compounding, suffixation, prefixation, or confixation. That can be automatically determined by hyphens: yass + -ify is suffixation, neuro- + -genic is confixation, etc. Other types of derivation can be specified with an additional parameter like |blend=1 or |deverbal=1.
  • If the language codes do not all match: "English terms derived from Dutch", etc. For mixed cases like the aforementioned монтировать, nonsensical categories like "Russian terms derived from Russian" would of course be disabled. More specific types of relation can be expressed with an additional parameter like |bor=1, |inh=1, |calque=1, |conflation=1, and so on.

This strikes me as a reasonaby straightforward way to handle things.

Thoughts, objections, or alternative suggestions?

Paging @Ioaxxere as the person who made the template and @Vininn126, @Rex Aurorum, @Qwertygiy, @Akaibu, @Biolongvistul, @Protegmatic as people who have used it. Nicodene (talk) 21:48, 3 June 2024 (UTC)[reply]

Condensing the language and the ID parameters is very agreeable. As for the reshuffling in the etymon slots, it disrupts the ascending hierarchy of specificity and would not prove any easier to internalise to me.
The semantic austerity of the af keyword is, I dare to assure, a temporary solution. We don’t even have categorisation implemented yet. ―⁠Biolongvistul (talk) 22:19, 3 June 2024 (UTC)[reply]
Could you explain what you mean by ‘ascending hierarchy of specificity’? Nicodene (talk) 23:25, 3 June 2024 (UTC)[reply]
Broadest first, most specific last, as in taxonomy for species. I believe it's mostly a happy coincidence that it's implied with the current syntax using the "greater than" symbol. Language > term > sense.
The rest of the proposition I don't believe I quite understand. Syntax like "bor|fr>unité>to unite|af|en>-ed>past participle" for "Borrowed from French unité (to unite) and suffixed with -ed (past participle)" feels intuitive enough to me. Qwertygiy (talk) 23:41, 3 June 2024 (UTC)[reply]
@Ioaxxere I have been meaning to respond to another thread about adding manual transliteration into {{etymon}}. The obvious way to do that is through inline modifiers; in that respect, the choice of > as a separator is singularly unfortunate as it prevents use of inline modifiers with the normal <...> syntax. I would recommend changing this to something else; for example, the {{given name}} template uses < to indicate inheritance, but requires that spaces be put around the < sign, which allows concurrent use with inline modifiers. You could also use ^, @, etc. Benwing2 (talk) 00:11, 4 June 2024 (UTC)[reply]
BTW if you need help changing this, I can do this easily by bot. Benwing2 (talk) 00:12, 4 June 2024 (UTC)[reply]
@Benwing2 I don't think it does prevent the use of < >, as it's not actually ambiguous, but I could see it being confusing (though no more than template syntax). Theknightwho (talk) 00:37, 4 June 2024 (UTC)[reply]
I suppose you may be right, I need to think if there are any edge cases that will be problematic, although without spaces it will be very hard to read, e.g. фоо<tr:foo>>бар<tr:bar>>баз<tr:baz> is well-nigh unreadable. Benwing2 (talk) 00:43, 4 June 2024 (UTC)[reply]
@Benwing2 It's not great, I agree. My suggstion is foo:bar<id:baz>, which probably maximises consistency with other templates. Theknightwho (talk) 00:46, 4 June 2024 (UTC)[reply]
@Theknightwho I agree with this. Benwing2 (talk) 00:49, 4 June 2024 (UTC)[reply]
I tried it that way to have the same adjacent order of language and ID, as in {{ety|en:ID|charitee:enm:ID}} "From Middle English charitee". But I don't have any issue with {{ety|en:ID|enm:charitee:ID}}.
As for the use of ">", in addition to the issue that Benwing mentions, I found it unintuitive. The code on state for example currently contains "enm>stat>condition". Reading this according to the standard meaning of ">" in linguistics results in "condition is from stat, which is from enm".
As for united, as it happens I don't agree with the given etymology, since French unité is a noun meaning "'unity", not a past participle comparable to united. The latter is just unite + -ed. But if I were to agree with the given etymology, my proposal would result in {{ety|en:ID|fr:unité:ID|en:-ed:ID}} "From French unité + English -ed." Which seems a good deal simpler. Nicodene (talk) 00:50, 4 June 2024 (UTC)[reply]
There are a lot of suggestions in here so I'll just dump a few opinions:
  • Neutral on changing the etymon parameter format. However, I oppose any scheme where > is used both as a separator and for inline modifiers for the reasons pointed out by Benwing. Out of the options discussed here I would take foo:bar<id:baz> (I assume foo is the language code).
  • Weak oppose having |1 be in the format lang:ID as I find this very unintuitive, although it does admittedly save keystrokes.
  • Oppose changing anything about the keyword parameters for now until the requirements are more established. I feel like @Nicodene is putting the cart before the horse in discussing categorization when it's not even clear how this should work. In particular, I'd like to eventually deprecate the existing "X terms derived from Y" system in favour of something more fine-grained (although this will be tough to implement in the short term).
Ioaxxere (talk) 04:15, 4 June 2024 (UTC)[reply]
In something like foo:bar, foo should definitely be the lang code, otherwise it will be too confusing. In foo:bar:baz:bat, I would assume foo is a lang code and the others are terms. If the lang code is optional, we'll need a different separator for the terms. Benwing2 (talk) 04:28, 4 June 2024 (UTC)[reply]
@Benwing2: Currently, with the > separator, the lang code is optional. Hence you can do something like {{etymon|ine-pro|id=father|af|unc|*peh₂->protect|*-tḗr>agent noun}} (the ine-pro> part is implied). Part of the reason I like the current system is that it's optimized for keystrokes, e.g. *peh₂->protect has 14 characters, whereas ine-pro:*peh₂-<id:protect> has 26 characters. But I think that it should be possible in the new system to omit the lang code in the same manner as long as : characters are escaped everywhere else. Ioaxxere (talk) 04:42, 4 June 2024 (UTC)[reply]
@Ioaxxere I am not saying you need to use inline modifiers for things like ID's that occur frequently. You will find, for example, in {{it-conj}} that there are various delimiters used, e.g. {{it-conj}} for riempire might look like {{it-conj|a/riémpio,riempìi,riempìto:riempiùto}}; here the a/ at the beginning indicates the auxiliary verb avere; following are three principal parts, comma-separated, and alternatives for principal parts are colon separated. Some verbs need four principal parts and use ^ to separate the fourth principal part, e.g. venire, whose full spec looks like {{it-conj|e/vèngo^viène:viéne,vénni:vènni,venùto.fut:verrò.presp:veniènte}}. To help unpack this, the format for principal parts is PRES1S,PHIS1S,PP in most verbs (specifying the 1sg pres indic, the 1sg past historic, and the past participle), but PRES1S^PRES3S,PHIS1S,PP in verbs where the 3sg pres indic is also irreg. In addition, . separates distinct specs, where the main principal parts are collectively a single spec, and fut:verrò is another spec indicating the future principal part, and presp:veniènte is yet another spec indicating the present participle. I could have used the format of fut:verrò for all principal parts, which would look like {{it-conj|e/pres:vèngo.pres3s:viène:viéne.phis:vénni:vènni.pp:venùto.fut:verrò.pres:veniènte}} (BTW you can put spaces and newlines next to any delimiter to make it easier to read), but that's a lot more keystrokes. Benwing2 (talk) 05:07, 4 June 2024 (UTC)[reply]
Is handling language and ID the same way throughout, as in
{{ety|en:polity|stat:enm:condition|inh=1}}
less intuitive than handling them in different ways like this?
{{etymon|en|id=polity|inh|enm>stat>condition|tree=1}} ed: nevermind; see below
I wasn't aware you're considering getting rid of "X terms derived from Y" categories. Is the problem the name (as it happens I'd been thinking of suggesting "X terms of Y origin") or is it the problem that such categories exist at all? Nicodene (talk) 04:58, 4 June 2024 (UTC)[reply]
@Biolongvistul, Qwertygiy, Ioaxxere, Theknightwho, Benwing2:
Adjusting for your comments, we get something like:
{{ety|en<id:X>|en:clever<id:Y>|en:-ly<id:Z>}} "From clever + -ly".
Does that syntax satisfy everyone?
If so perhaps we can get to discussing Ioaxxere's proposed changes to categories. Nicodene (talk) 09:16, 4 June 2024 (UTC)[reply]
I like this, to be honest. Vininn126 (talk) 09:18, 4 June 2024 (UTC)[reply]
I'd prefer something like
{{ety|en|clever#Y|-ly#Z}}
That way you minimize typing. Benwing2 (talk) 09:21, 4 June 2024 (UTC)[reply]
Happy to go for #X instead of <id:X> if people like it.
It looks like you favour setting the default assumption for language codes to “same as the first one mentioned, unless otherwise specified”? So in this case, given the {{ety|en…}}, the following clever and -ly are assumed to be English.
I suppose in that case the syntax for Russian монтировать (montirovatʹ) would read {{ety|ru#X|de:montieren#Y|-овать#Z}} “From German montieren + Russian -овать (-ovatʹ)” or similar. Nicodene (talk) 10:47, 4 June 2024 (UTC)[reply]
This raises the issue of adapted borrowings anyway. I suppose for the tree you'd have a fork either way, but the question is whether to print "bor" in the tree or not. I have a slight preference for <id:X>. Vininn126 (talk) 10:51, 4 June 2024 (UTC)[reply]
Can we move forward with one of these syntaxes? Vininn126 (talk) 09:47, 6 June 2024 (UTC)[reply]
@Vininn126 languagecode:lemma<id:X> appears to be the most accepted. Perhaps space-saving feature ls can be added down the line, like the aforementioned #ID or having language codes default to the first one mentioned.
@Ioaxxere wants to make major changes to the category system. From what I gather we’ve a long ways to go before reaching that: we’ve yet to hash out any details, and then there’s community consensus to reckon with.
On the other hand we have, if I’m not mistaken, agreed on a new syntax for {{etymon}. So I also think we might as well implement it now, unless someone has further modifications to suggest. It shouldn’t make adapting to future category changes any easier or more difficult than it would be currently.
I’ve volunteered to manually clean up the existing transclusions of {{etymon} and update the documentation. Nicodene (talk) 10:22, 6 June 2024 (UTC)[reply]
Yes, I think adding categories would be great; I also don't think it's necessary for updating the syntax? I could be wrong. If not, then I think we can move forward. Vininn126 (talk) 10:25, 6 June 2024 (UTC)[reply]
I have no strong feelings about the exact markup, I can adjust. Vininn126 (talk) 08:47, 4 June 2024 (UTC)[reply]
Was suggested to bring up the fact that I've been setting the trees below the etymology as opposed to the "current practice" of putting them above, as to me, the trees are not be the focus of the etymology section, or at least they shouldn't be considered as such, as your average joe will probably not care that creepypasta's lineage contains the doublets pasta and paste, they'll just be interested that it came from the /x/ board. Akaibu (talk) 06:31, 5 June 2024 (UTC)[reply]
Personally I'd prefer them above. Vininn126 (talk) 06:33, 5 June 2024 (UTC)[reply]
I prefer above, it just looks much better. Plus it's collapsed by default, so I definitely think people will notice the etymology first. — SAMEER (؂؄؏) 07:31, 5 June 2024 (UTC)[reply]
@Babr re diff: not currently. But you may be interested in this discussion. Ioaxxere (talk) 20:21, 15 June 2024 (UTC)[reply]

Rethinking confidence parameters

[edit]

Currently, to indicate uncertainty, you might do something like {{etymon|ine-pro|id=father|af|unc|*peh₂->protect|*-tḗr>agent noun}}. As pointed out by @Fenakhay, this is a bit unintuitive due to the fact that there are two "layers" of keywords present (both etymons are associated with both af and unc). As an alternative, I support being able to write {{etymon|ine-pro|id=father|af|*peh₂->protect?|*-tḗr>agent noun?}}. This is intuitive and also saves two characters. We would just have to make sure that there are no IDs ending in a question mark.

Also, I'm personally not a fan of using # to show IDs, since it could be confused with the actual fragment. In Benwing's example, {{ety|en|clever#Y|-ly#Z}} would link to clever#English:_Y. Ioaxxere (talk) 19:45, 4 June 2024 (UTC)[reply]

If you like <id:X>, perhaps another inline modifier like <unc:1>? Nicodene (talk) 21:24, 4 June 2024 (UTC)[reply]
I think using ? to indicate uncertainty is fine. I'm not sure about what > and -> mean here; I need to read the docs, but they maybe could be replaced with something more intuitive. Benwing2 (talk) 04:36, 5 June 2024 (UTC)[reply]
> precedes an ID, and the hyphen is just part of the PIE lemma *peh₂-. Nicodene (talk) 04:41, 5 June 2024 (UTC)[reply]
I see. In that case maybe use @ or ^ to separate the ID from the lemma. Benwing2 (talk) 04:51, 5 June 2024 (UTC)[reply]

Classical Attic audio files

[edit]

Umm ... I have come across several of these. Do we really want them? E.g. λέγω, where on top of everything else, the pronunciation is completely wrong; the speaker says /leːɡuː/ when the reconstructed pronunciation should be /lɛɡɔː/. Some others (which I have not checked yet): καί, , ψυχή, φύσις, αὐτός, εἰμί, χείρ, οὗτος, χθών, τίς, φθόγγος. Benwing2 (talk) 00:48, 4 June 2024 (UTC)[reply]

I believe there was consensus to remove the audio files for Classical Latin, so this should be no different. Andrew Sheedy (talk) 01:10, 4 June 2024 (UTC)[reply]
I don't want them either personally. At the very least they should be labelled with a disclaimer like ‘modern attempt to approximate Attic’ to convey some idea of the uncertainties involved in attempting a phonetic rendition of a pronunciation predating Christ. Nicodene (talk) 01:23, 4 June 2024 (UTC)[reply]
The ones you have not checked, I am surprised how well they match. Such small details could make readers fond, in their grim and despondent struggles to master Greek. Can’t withsay them in the interest of the art and science. Fay Freak (talk) 01:42, 4 June 2024 (UTC)[reply]
Audio for reconstructed pronunciations is extremely unscholarly. It's practically conlanging. Ioaxxere (talk) 08:31, 23 June 2024 (UTC)[reply]

Use of etymology trees made with Template:etymon in the entries for multi-word terms

[edit]

Hello, following the passage of Wiktionary:Votes/2024-04/Allowing etymology trees on entries last week, etymology trees generated by {{etymon}} have been added to a number of entries. Earlier today, there was some discussion on the Discord server about the inclusion of etymology trees in the "Etymology" sections of multi-word entries like United States of America (added here) and Abkhaz Autonomous Soviet Socialist Republic (not added as of writing). Some supported etymology trees on such entries while others opposed their inclusion. The discussion started getting detailed enough as well as got enough attention that I've decided to try and move it here, on-site so that it is more "official" and can have more organization and visibility. Pinging those who expressed views on Discord: @Qwertygiy, Vininn126, Lattermint, Ioaxxere, Akaibu, Soap, Saph668, AG202, Theknightwho. —The Editor's Apprentice (talk) 02:08, 4 June 2024 (UTC)[reply]

Replying to say that I don't think it's best to have etymology trees on multiword terms like United States of America. It starts to get unwieldy, and while it looks "cool", we should be aiming for information presented in a concise and helpful way, not the pseudo-gamification that I've started to see. AG202 (talk) 02:15, 4 June 2024 (UTC)[reply]
Completely agreed. In general the etymology of a multiword term should indicate the way the term was constructed in the same language, and that's it, unless the term was calqued from some other language. Benwing2 (talk) 03:20, 4 June 2024 (UTC)[reply]
In the same vein, the discussion around adding a tree to Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch on Discord shows me the gamification that I'm talking about. Even after being pointed out to that they shouldn't work with languages that they don't know, the tree was still added. I assume because it's a long word and they explicitly stated that they couldn't edit pneumonoultramicroscopicsilicovolcanoconiosis (locked to auto-patrollers and up). I'd also like to remind editors of the statement from the vote:

This vote does not:

Allow or encourage editors to mass-add etymology trees across the site. As stated above, each language community will decide if or when they are appropriate.

AG202 (talk) 00:37, 5 June 2024 (UTC)[reply]
Weak support etymology trees on multi-word terms. I don't see the harm considering they're collapsed and don't take a lot of effort to create. However, I admit that the tree on United States of America is virtually unusuable simply due to how wide it is. I think the best course of action is to have trees of a certain width display in a horizontal format as seen in Wiktionary:Beer parlour/2024/May#Descendant tree design. Ioaxxere (talk) 04:22, 4 June 2024 (UTC)[reply]
@AG202: I would like some clarity on what you're actually aiming for. Are you saying that no etymology tree should be added to terms with a space? What about a term like chow mein, which was directly borrowed from a single word? Ioaxxere (talk) 04:28, 7 June 2024 (UTC)[reply]
@Ioaxxere: No, I said words like United States of America, where it’d be a clear SOP term if not for the fact that it’s a proper noun. When we start debating whether or not to add the tree for of in a multiword term, it’s getting out of hand. AG202 (talk) 04:41, 7 June 2024 (UTC)[reply]
Oppose per AG202. —Caoimhin ceallach (talk) 00:36, 7 June 2024 (UTC)[reply]
Oppose per AG202. DCDuring (talk) 15:02, 8 June 2024 (UTC)[reply]
As much as I support the template in general, Oppose the generation of trees on multiword entries. Of course having it for an ID and such is still useful. Vininn126 (talk) 15:09, 8 June 2024 (UTC)[reply]
Oppose per AG202. — Fenakhay (حيطي · مساهماتي) 17:01, 8 June 2024 (UTC)[reply]
@AG202, Caoimhin ceallach, DCDuring, Vininn126, Fenakhay: I've tweaked the CSS so that the tree on United States of America is less "unwieldy". Does this have an impact on your opinion? Ioaxxere (talk) 19:23, 4 July 2024 (UTC)[reply]
Mildly, I'm still not sure open compounds are the best candidate. Vininn126 (talk) 19:25, 4 July 2024 (UTC)[reply]
@Ioaxxere What were the tweaks: was it adjusting the horizontal vs. vertical space? The tree now shown at united doesn't look optimized in this regard to me: viewing it on a computer screen, there's a lot of wasted horizontal space and there's unnecessary and awkward hyphenation of language names on the rightmost branch, like "Proto-Ger- manic" and Old Eng- lish". If there's some way for the tree to adjust a bit based on what space is available, that seems like it would be good.--Urszag (talk) 19:57, 4 July 2024 (UTC)[reply]
@Urszag: Language names shouldn't be getting cut up like that—what browser are you on? But I've changed it so united still uses the wide format on large screens. Ioaxxere (talk) 20:13, 4 July 2024 (UTC)[reply]
This is on Safari. I still see those breaks in the language names on [7], although it's not as egregious since they aren't at the right side of the tree anymore. I don't see them on Firefox or Chrome, so it does vary by browser apparently.--Urszag (talk) 20:19, 4 July 2024 (UTC)[reply]
No. The default should be no etymology-tree display for open compounds. The few warranted exceptions would be marked by tree=1 provided no trace whatsoever appeared in the default. If some etymology fans wanted custom CSS to display hidden-by-default, non-performance-impairing etymology trees, so be it. DCDuring (talk) 20:22, 4 July 2024 (UTC)[reply]
While they don't particularly interest me, I have to say that I'm baffled at this kind of staunch opposition given that it's automatically folded into a dropdown. If you don't care about it, can't you just ignore them? Theknightwho (talk) 20:26, 4 July 2024 (UTC)[reply]
I'm appalled at the single-minded obsession with creating complete sets of things, whether they add value to most or not. I don't like the waste of dropdown bars or the distraction of the dropdown-opening tool. DCDuring (talk) 02:33, 5 July 2024 (UTC)[reply]
@DCDuring There are no "obsessions" here. I wish you'd stop misrepresenting people you disagree with as caricatures. Theknightwho (talk) 23:37, 5 July 2024 (UTC)[reply]
Honestly I find the vertical bars worse. I really just do not think that there's any significant benefit for having trees on multiword terms like this. And clearly consensus agrees with that right now. AG202 (talk) 07:57, 5 July 2024 (UTC)[reply]
Yes indeed, I agree. Benwing2 (talk) 08:20, 5 July 2024 (UTC)[reply]
I agree with AG202. I would like to expand the tree ban to compounds whose etymology is identical to its surface etymology, in other words recent compounds, e.g. homework. It's cool that you can make trees do all of this, but that isn't a good reason to include them. I think the standard should be that they really add something, i.e. display something notable that you can't easily get from a text etymology. —Caoimhin ceallach (talk) 13:09, 7 July 2024 (UTC)[reply]
I can't really agree with this; I don't think the presence of the tree is harmful in any way, and banning it on all but root words (which is what this implies) wouldn't achieve anything, in my view. Theknightwho (talk) 16:24, 7 July 2024 (UTC)[reply]
agreed here Vininn126 (talk) 17:18, 7 July 2024 (UTC)[reply]
I think similar principles to what I said here about the {{root}} template can be applied to etymology trees:

This user is making an awful lot of noise for very little signal, and judging by their mainspace-to-talkpages edit ratio, they don't seem particularly interested in actually building a dictionary.

Purplebackpack will probably argue that they're not making as many mainspace edits as they'd like because other people are constantly putting spokes in his/her wheel. They apparently don't like their work being reviewed and quality-controlled, or their edit history being looked at, and will readily dismiss criticism as "harassment", an accusation they've levelled at no less than four different people in the course of a single week (diff, diff, diff, diff).

While we should look to see if there isn't some truth there (I think we could have done without WF's trolling, at least), and make sure that there isn't a systemic problem of people feeling pressured (a topic which has recently been brought up), I would argue that rapid-fire accusations from a single editor make it harder to think clearly on such an issue.

And the fact that the same person has levelled similar accusations at an entirely different set of editors many years ago (diff, diff, diff, diff) certainly doesn't help in taking their claims seriously now.

They seem to take particular exception to people challenging them on their votes (see this discussion); notice the similarity between this and the accusation of harassment thrown at Benwing2 after his comment (on Purplebackpack regularly failing to provide a rationale for his/her votes).

I'd also like to mention that, while complaining of other people's behaviour towards them, they seem unbothered (diff, diff) by the idea that their own attitude might have played a role in the abrupt decision of a fellow editor to leave; note the striking temporal proximity between the aforementioned discussion and that editor's departure.

If Purplebackpack perceives any kind of scrutiny as harassment, I would say Wiktionary simply isn't the right place for them. Everyone on this project must be ready to face criticism - sometimes repeatedly.

I personally am loath to imagine not being able to go through a user contributions and express earnest concern about the quality of their interventions (in the main space or elsewhere) without being labelled as a "harasser".

Therefore, for the good of the project, I would like to propose that this user be prevented from further editing. This is not meant as a punitive measure (I'm not "out to get him/her"), but as a way of putting an end to highly toxic and massively detrimental behaviour, thereby preserving an atmosphere more conducive to serene dialogue and productive work. PUC23:07, 4 June 2024 (UTC)[reply]

PBP's false harassment accusations have gotten to the point of trolling. I view such unwarranted accusations, esp. a pattern of them, as a blockable offense, and I think if PBP makes any more such accusations that aren't clearly warranted, they should be blocked, maybe on the schedule of one week, then one month, then permanently if they keep it up. PBP reminds me of Dan Polansky; a ton of heat, little light, and a strong increase in the toxicity of the atmosphere as a result of them. In Dan Polansky's case, I finally permablocked him for outright racism on top of everything else. I suspect PBP is smart enough not to engage in outright racism, but IMO that should not prevent a warranted block. Benwing2 (talk) 04:33, 5 June 2024 (UTC)[reply]
^ I agree with Ben's suggestion of issuing increasing blocks. PBP's recent behavior has been really inappropriate and rude, but I'm not sure if a permaban is the best immediate action. But I definitely think we should not tolerate disruptions to the project. — SAMEER (؂؄؏) 07:24, 5 June 2024 (UTC)[reply]
I agree. Theknightwho (talk) 12:56, 5 June 2024 (UTC)[reply]
I have not had a single productive encounter with this user. Vininn126 (talk) 05:53, 5 June 2024 (UTC)[reply]
Same - just a lot of vitriol and repeated sniping. Theknightwho (talk) 09:15, 5 June 2024 (UTC)[reply]
Many can and have characterized your interpersonal relations same way, @Theknightwho Purplebackpack89 12:31, 5 June 2024 (UTC)[reply]
"no u". thread's not about them, bro, it's about you. Vininn126 (talk) 14:23, 5 June 2024 (UTC)[reply]
Have you heard what Wordy and I have been saying? There are greater systemic concerns here and it's wrong to single out one editor. Purplebackpack89 12:16, 6 June 2024 (UTC)[reply]
@Purplebackpack89 There may be larger systemic concerns here, *AND* this does not absolve you from behaving in a civil fashion at all times. Imperfections in the system don't give you a free pass to run rampant and blame your bad behavior on "the system". Everyone (even Wordy) has tried to make that point in one way or another, but IMO you don't want to listen. Benwing2 (talk) 07:24, 7 June 2024 (UTC)[reply]

At some point I may prepare a longer response, but I gotta interject this right now: I'm CLEARLY HERE to build a Wiktionary, as I've created 636 entries. Purplebackpack89 05:21, 5 June 2024 (UTC)[reply]

I was going to say:
On a balance, I'm inclined (perhaps naively) to think PBP is not trolling but sincere, that he really regards people as harassing him, and is really freaked out about being blocked... in part because I think a troll would know that being so over-the-top — accusing so many different users of harassment (some on very flimsy grounds); and when blocked, sending lots of pings on his talk page, sending me an e-mail and contacting me on Wikipedia asking to be unblocked; and holding up creating ~636 entries since 2009 as an accomplishment — is counter-persuasive. Sincerity doesn't ameliorate the extent to which many of the accusations are unwarranted; indeed, sincerely perceiving most disagreement as harassment is a problem. PBP, when you're complaining to multiple different users about (for example) the fact that they RFDed an entry you made, but then the community discusses the entries at RFD and determines they indeed aren't the sort of thing we want to include, it would be prudent to reflect that the RFDer was not harassing you but correctly perceiving that the entry didn't meet commonly-accepted criteria for inclusion.
However, before I could post that, I see his lack of any indication of awareness of irony in telling other users to walk away while himself continuing to poke at them🙄 which... well, whether it's trolling or sincere, it's ill-advised either way. - -sche (discuss) 06:25, 5 June 2024 (UTC)[reply]

@-sche I think the idea that I "perceive most disagreement as harassment" is exaggerated. Below I am going to explain how I came to the conclusion that I am being harassed. Purplebackpack89 12:36, 5 June 2024 (UTC)[reply]
I agree with -sche's views here. Rarely have I seen so histrionic a user, who demands so much attention from his fellow editors and politicks so energetically on the discussion pages, while contributing so little and showing so few signs of introspection. — Mnemosientje (t · c) 15:42, 5 June 2024 (UTC)[reply]

Not gonna weigh in on the question of whether PB89's contributions have been constructive on balance. I do find there seems to be a lot of selectivity in which editors are deemed intolerably disruptive. WordyAndNerdy (talk) 07:52, 5 June 2024 (UTC)[reply]

Oh, totally agree. There are the guardians here and there are the peons. The behavior of the guardians is no better than that of the peons, but no peon can ever tell a GUARDIAN that he's wrong
And some of the people who are commenting on this are people who, in undoing or modifying my edits, have made questionable edits themselves. For example, Theknightwho stumbled into the hot-dog-is-a-sandwich debate being too hasty about reverting me. Benwing nominated dont tread on me for deletion...and quickly five votes that he was wrong showed up. Instead of owning up to their screw-ups, they're here. Purplebackpack89 12:29, 5 June 2024 (UTC)[reply]
To be clear the "intolerably disruptive" remark was intended to reference generally trollish editors that Wiktionary has collectively chosen to tolerate/ignore for some reason. Problem admins ("guardians") are definitely an issue as well – and my experience is also that Wiktionary typically circles the wagons around them – but that's separate from a wiki keeping pet trolls. I'd also urge you to consider the possibility that Benwing RfD'ing dont tread on me was independent of TKW leaping into into the fray at hot dog. You didn't include an etymology explaining that this is the precise text on the Gadsen flag. It's possible Benwing saw the entry without being familiar with that history and concluded it was simply an unlikely misspelling. WordyAndNerdy (talk) 15:47, 5 June 2024 (UTC)[reply]
I am willing to concede that they are possibly unrelated...but they still happened in the same window of a few days, which again brings us to the problem of a whole lot happening to me at once and that (understandably!) making me frustrated. If we're talking in hypotheticals, it's also possible Benwing could've acknowledged there was information he didn't know and admitted he erred. HE DIDN'T (He's a guardian...why would he?). If we're talking hypotheticals, it's also possible that Knight or Ben could've noticed "hey, Purplebackpack89 feels stress out and put upon! Maybe I should leave him alone for awhile, and if there's problems that need fixing, I'll get to them at a later date!". THEY DIDN'T. Purplebackpack89 16:22, 5 June 2024 (UTC)[reply]
I think this is the stage at which we need to shift from narrowly focusing on individual incidents to discussing remedies to overarching systemic issues. WordyAndNerdy (talk) 16:47, 5 June 2024 (UTC)[reply]
@User:WordyAndNerdy You could take the lead on that. My proposals haven't gained any traction:
  1. to forbid any mention of any username (pings and signatures naturally excepted, probably also sayonaras and welcome-backs) in principal namespaces, Wiktionary space and their talk spaces, excepting the page required for the following proposals and enforcement thereof.
    1. this would be enforced by increasing blocks and/or removal of admin powers. Formal public apologies on offended user's talk pages or in BP might mitigate the blocks or removals.
  2. to have a request for mediation process, page, and template. Requests for interaction bans could be handled there as well.
Only the request for mediation addresses 'hounding' or 'abuse of administrative powers', including unjustified blocking, 'passive aggressive behavior', etc, or, possibly, the consequences of the 'gender-related, structural' composition of our veteran contributors, admins, and discussion participants. DCDuring (talk) 21:48, 6 June 2024 (UTC)[reply]
I think there is abundant evidence, even on this subpage, that mentioning individual users on core community discussion pages too often rapidly leads to defensiveness and a total loss of focus on substantive, principled discussion, even discussion of how to limit (interpersonal) conflict. We should not want to have our conflict-suppression mechanisms be targeted against individuals, as has been suggested here. DCDuring (talk) 22:15, 6 June 2024 (UTC)[reply]
Point one as I'm reading it strikes me as unworkable. Some discussions will inevitably centre on a specific user or group of users. Sometimes these discussions will be of a positive or neutral nature. Sometimes they'll involve navigating more difficult territory. But implementing formal mediation as a frontline remedy to interpersonal concerns doesn't seem like a viable plan. Some people won't look at the process as mediation. They'll see it as arbitration – being put on wiki-trial. Starting out on what some will find to be an adversarial footing doesn't seem like it would be conducive toward conflict suppression to me. It seems more likely to put people in a siege mentality and escalate matters that might otherwise be resolved without much fuss. I do think limiting the number of active BP discussions concerning a specific user to one at a time might be a step in the right direction. We do need a formal mediation process. I just think less-formal discussion might be ideal as a frontline approach. Why require a mediation process by default when it won't be necessary to resolve every disagreement that arises? WordyAndNerdy (talk) 06:52, 7 June 2024 (UTC)[reply]
I'd be happy to hear about other proposals that have a better chance of success.
As a starting point, it is basic practical psychology (followed in business, law, government, and sometimes, even politics) to frame issues as about substance and not persons, even personal actions, let alone invisible attributes, like motivations, values, attitudes, beliefs, intelligence or energy levels, etc. To the extent our users aren't doing that, they would benefit from learning to do so. The first locus after edit wars are talk pages for entries, next are user talk pages. Right now people chime in (or pile on) on talk pages they are watching. At some point the discussion may fail to resolve the issue. This is where things go wrong if the issues are framed as personal and not substantive.
As soon as issues of personal behavior come up, especially in a public forum, we see: defensiveness, score-settling, etc. This can be worse than a real trial, it can be mobocracy. Were interpersonal conflicts diverted to a mediation, as there are necessarily two parties in an interpersonal conflict, neither party need be on trial. I would suggest that we may need the mediation page to be basically private, invisible to the community at large, except possibly in the event of failure of the process, after a waiting period. The role of a mediator is probably first to sort out substantive issues (for the appropriate forums) to the extent the users have failed to do so. Then behavioral issues can be sorted. Keeping attributions out of the discussion at all stages is critical. DCDuring (talk) 23:53, 7 June 2024 (UTC)[reply]
@WordyAndNerdy I agree with you. In particular I think, as you do, that the mediation process should start only when an informal BP discussion fails to resolve the issue. I also think there's no way that it's workable to forbid mentioning specific users in Wiktionary-space, talk spaces, etc. Most of these mentions as they currently occur are not intended to single out a user for opprobrium or anything but for any of a number of other reasons, e.g. to agree with someone, to mention their theory or proposal on something, etc. I think your suggestion of limiting BP discussions concerning a particular user to one at a time should be enforceable; if there are multiple simultaneous concerns about a particular user they're likely to be related and should be merged. If in some weird circumstance we really need to have two unrelated simultaneous discussions about a given user and one can't wait for the other to finish, that should require prior explicitly discussed consensus. Benwing2 (talk) 07:18, 7 June 2024 (UTC)[reply]
The occasional efforts of some of our wiser experienced users to mediate discussions in public forums often seem to simply lead to the interpersonal conflict threatening to involve them.
Direct person-to-person contact on user talk page is the first-line location for discussions. If an issue arises from a substantive matter, then the substantive matter should be discussed in the appropriate forum: BP, TR, GP. It should not be hard to refer to edits by diffs without mentioning the editor by name. I don't think that we have a very good record of resolving interpersonal conflicts in group forums, unless we count driving contributors of all kinds away or into virtual hiding (changing username, narrow range of edits) as success. It is very easy to exclude personal mentions: policies, warnings, escalating blocks. DCDuring (talk) 23:53, 7 June 2024 (UTC)[reply]

Purplebackpack on feelings of harassment

[edit]

Were people actually harassing me? Maybe, maybe not. May I explain why I felt harassed?

  1. A large portion of my edits have been scrutinized in a very short amount of time. Taken literally years of work, some of which hadn't bothered anybody for years, and tried to change or delete a lot of it in just two weeks. Had the scrutiny occurred more slowly, I would not have not felt as put upon.
  2. Editors have given the appearance of assuming bad faith and focusing on the editor, not the content .There have been several nominations or comments on the lines of "oh, well, this is a Purplebackpack89 edit". That's not supposed to matter.
  3. Editors made no good-faith effort to deescalate continued making the edits even though it was clearly bothering me. No deadline...could just wait until I was less stressed out.
  4. Some of the attempts to modify my edits ended up being questionable themselves. For example, Theknightwho stumbled into the hot-dog-is-a-sandwich debate being too hasty about reverting me. Benwing nominated dont tread on me for deletion...and quickly five votes that he was wrong showed up. Denazz piled on by trolling left and right

Given those four things happening, basic psychology would suggest that I would be frustrated. And naturally, a questionable 31-hour block and this thread would also put me on edge! It would probably put anybody on edge! Was I over the top? Maybe, but I feel that where my feelings of harassment came from are understandable. The solution to say that this is entirely my fault, I'm never entitled to feel frustrated, and nobody else did anything questionable is...just wrong. Fundamentally, this thread COULD end up having a chilling effect on speaking up if you feel put upon and...we don't want that either. Purplebackpack89 12:49, 5 June 2024 (UTC)[reply]

Regarding your comments about "focusing on the editor": I think it's to be expected that if a user has a history of questionable edits/entries, their activity will get more scrutiny. For better or worse, there's an (unwritten) reputation system here, and it does matter who created an entry. Pretending otherwise is a fantasy. Jberkel 13:17, 5 June 2024 (UTC)[reply]
Of late, people have been exaggerating the questionability of my edits though, @Jberkel. Above, people are essentially claiming that I never did anything productive at all and that is inaccurate. Purplebackpack89 13:31, 5 June 2024 (UTC)[reply]
I'd say you're particularly sensitive to corrections, from what I've seen. I'd love to have people scrutinize my work. Vininn126 (talk) 14:24, 5 June 2024 (UTC)[reply]
I think you'd love it to a point and not be comfortable with it beyond that point, @Vininn126 (And I believe that is true for most editors). If people scrutinized you in the manner I outlined, I think you (or anyone) would be somewhat bothered. Purplebackpack89 15:19, 5 June 2024 (UTC)[reply]
I was heavily scrutinized when I first started editing. Even berated. I don't see similar berating towards you. I see corrections that I personally would welcome. Vininn126 (talk) 15:28, 5 June 2024 (UTC)[reply]
I'd argue there's a significant difference to receiving heightened scrutiny as an actual wiki-newbie and receiving it as veteran editor. At a certain point it's only natural for a veteran to start feeling that they're being subjected to disproportionate scrutiny and opposition. Especially when this community creates a special policy carveout for a habitually trollish editor. It's almost as if provocation is treated as excusable while being especially provokable is not. WordyAndNerdy (talk) 16:12, 5 June 2024 (UTC)[reply]
I really feel like you didn't read my messages. I'd say I'm a veteran editor at this point and that I'd love more scrutiny. Vininn126 (talk) 16:13, 5 June 2024 (UTC)[reply]
Also I really don't see why editing for a long time gives you this freedom. Let's say someone took a long break, or have just always been problematic. Vininn126 (talk) 16:21, 5 June 2024 (UTC)[reply]
That's you. Other editors will respond differently. Yes, this is a wiki. Every edit comes with the caveat it might be objected to or undone. But it's not unexpected for someone to start feeling like a pariah or whipping kid if they routinely encounter intense opposition. That feeling doesn't come from nowhere. This wiki definitely plays favourites at times. WordyAndNerdy (talk) 16:26, 5 June 2024 (UTC)[reply]
Your assumption that it can come from nowhere, in my opinion, greatly misrepresents PB's reaction. I'm not trying to invalidate anyone's emotions, but that also doesn't mean someone's reaction can't be over-the-top or unproductive. If we never address that behavior, things get bad very quickly. Vininn126 (talk) 16:28, 5 June 2024 (UTC)[reply]
It doesn't matter, @Vininn126. You still have to assume good faith about their edits (and the "reputation system" mentioned above flies in face of that btw). And if an edit feels put upon, it seems like a good idea to lay off him for a bit unless there's something serious like vandalism that has-to, has-to, has-to be dealt with right away.
I don't think Wordy is saying my reaction came from NOWHERE, I think he's saying that there is a SOMEWHERE, AND that that needs to be addressed rather than singling me out alone. Purplebackpack89 16:32, 5 June 2024 (UTC)[reply]
At no point did I assume bad faith on your part. Having good faith doesn't absolve you from any bad behavior. Vininn126 (talk) 16:33, 5 June 2024 (UTC)[reply]
My point is that it's a double standard to treat having a dramatic reaction to provocation as requiring community action while giving a pass toward actual provocation (see the second link in my above comment). WordyAndNerdy (talk) 16:40, 5 June 2024 (UTC)[reply]
I saw, and see my comment that this thread is about this user in question. If it's about being "hounded", I don't think that those claims are founded. If it's about other actions, I'd prefer they stay in that thread. Please don't muddy the waters on the conversation to make a point that's tangentially related. Vininn126 (talk) 16:43, 5 June 2024 (UTC)[reply]
It isn't "muddy[ing] the waters." It's providing relevant context. Nothing happens in a vacuum. My thoughts on this haven't changed in ten years. WordyAndNerdy (talk) 16:53, 5 June 2024 (UTC)[reply]
You are muddying the waters - it's just a boatload of whataboutism. Theknightwho (talk) 00:03, 6 June 2024 (UTC)[reply]
There was no heightened scrutiny, and even if there had been it would have been justified given the number of mistakes I (and others) have found in PB89's edits. Saying that PB89 "routinely encounter[ed] intense opposition" is simply a complete fiction. Theknightwho (talk) 00:16, 6 June 2024 (UTC)[reply]
Knight, you bear some responsibility for this situation. There was nothing you were doing vis-a-vis me that had to be handled immediately. You could have noticed that I was frustrated by the way you were handling things and proceeded more slowly and cautiously. You didn't, in fact, you literally did the exact opposite.
And on top of this, you yourself made mistakes while hastily trying to undo my mistakes. And you never owned up.
There are several threads that have expressed concern about your confrontationalism and this one should echo those concerns. Purplebackpack89 12:14, 6 June 2024 (UTC)[reply]
I wasn’t confrontational with you at all, and making some minor changes to entries you’d edited and then posted about on high-traffic pages didn’t have anything to do with you specifically. I tagged you in one edit as a form of guidance, and your disproportionate feelings of negativity are not a reasonable response, as numerous people have said by now. It is not my responsibility to manage your emotions; you are an adult, and you do not get a free pass on mistreating other editors simply because you feel upset. Theknightwho (talk) 16:25, 6 June 2024 (UTC)[reply]
PBP, you need to understand that this is a collaborative project and the other users here are not your enemies. You brought up Assume good faith but you yourself have never assumed good faith in anyone this whole time. In the many disputes you've had you always assumed the other person had a motive against you, which is extremely rude and disrespectful. Which begs the question: Why are you the only one who deserves the assumption of good faith? Why do you never afford others the same assumtion you mention?

You should not assume people who look through your contributions are doing so out of spite, but rather because everyone makes mistakes and it's honestly for the best that everyone's edits gets reviewed at least occasionally. Otherwise, mistakes could go unnoticed to decades! If you've ever done entry maintenance, you would know that yourself. Hell, I actually used to get into disputes with Fenakhay when I first joined the project for the same reason. But he basically taught me how to format entries and now he's the main person I ask when I have a question about entry formatting.

Additionally, your repeated provocations of editors recently is completely inappropriate. What good reason is there to tag TKW 5 days after things calmed down to tell him to walk away? Especially since he DID walk away, 5 days ago! It was you who didn't! What reason is there to send aggressive messages to Ben telling him that he "better rescind" his RfD? After Ben told you to calm down because you were being aggressive, what was the point in continuing to double-down rather than walk away and discuss the issue at RfD?? And after being blocked for "intimidation and bullying", what is the reason to try to pick an argument with the blocker —who has not participated in any conversations with/about you— rather than just walking away (as your yourself suggested)? You preach values you don't even follow and regularly throw stones in a glass house. You started a whole post about how TKW to checking your edits was harassment, yet your somehow incapable of seeing how your actions towards Ben could be perceived as bullying by a third party. — SAMEER (؂؄؏) 18:55, 5 June 2024 (UTC)[reply]
Benwing made a bad RfD. It was so bad that five people almost instantly voted keep. But...the problem is me telling him it's a bad RfD, not him creating one?
Benwing and Knight and Denazz bear some responsibility for this situation. They made the situation worse with trolling in Denazz's case and questionable edits in the other two. Why do they get free passes and I don't? Is it because they're GUARDIANS and I'm a peon? Purplebackpack89 20:06, 5 June 2024 (UTC)[reply]
You're right, that RfD may have been a "mistake" (as in, it seems people disagree with him), but you had no right to be rude about it. When we see RfD's we disagree with, we discuss at the RfD why we think it's a bad idea and let other people compare the reasoning provided. We do not hound the people who made the RfD demanding they withdraw it (that's not even the process for resolving RfD's), and mock them for incorrectly putting up a term for RfD. That is bullying.

What do you mean Ben got a free pass just because he's a "Guardian"?? Do you mean that everyone just listened to Ben cuz he's an admin? Cuz if so, that literally didn't happen. In fact, you said yourself that most users in the RfD read your reasoning and agreed. Nobody voted against you just because you're a "peon" (whatever that means), so I have no idea what you are talking about.

Also, looking solely at interactions —that you specifically— have had with Ben and TKW, it seems like they were just doing entry maintenance. And looking at your recent interactions with Ben, it genuinely appears to me as though you are being a bully. — SAMEER (؂؄؏) 20:38, 5 June 2024 (UTC)[reply]
Making an RfD that doesn't end up passing is, in fact, not a problem. That's perfectly ordinary. Making an RfD into a pissing contest, on the other hand... Nicodene (talk) 21:36, 5 June 2024 (UTC)[reply]
Yeah agreed. It’s ok if you’ve ‘mistakenly’ rfd-ed an entry convinced that the entry will fail.
Purple is bereft of maturity and is sorely inexperienced. He isn’t necessarily acting in bad faith, he takes it for granted that he is always right in any untoward issues involving him but… evil dictators also know they are doing the correct deed, oh well. Thus it’s just a matter of interpretation whether Purple’s bearing is going to result in other editors getting psychologically harassed and being coerced into quitting Wiktionary—just as the populace of a brutal dictatorship are forced to flee their country or face persecution in their homeland—or Purple is actually an innocent victim of harsh law enforcement here. Inqilābī 22:09, 5 June 2024 (UTC)[reply]
Yeah, this is basically my reading of it: PB89 isn't acting in bad faith. They just lack social awareness and think they're axiomatically correct about everything, so they conclude the only possible reason anyone disagrees with them must be because they're out to get them. Frankly, I don't care whether it's down to incompetence or maliciousness, but either way it's having a very negative effect. Theknightwho (talk) 00:10, 6 June 2024 (UTC)[reply]
I tell you that something bothers me, Knight. Your response is to do that thing that much more, and to do it so hastily you make mistakes while doing so. It's not surprising that anybody would feel attacked under that circumstance. Your critique comes off as hypocritical because embedded in it is that your edits and conduct towards me are "axiomatically correct". Also, can everybody cut out playing amateur psychologist? You ain't Joyce Brothers. Purplebackpack89 15:17, 7 June 2024 (UTC)[reply]
It is unreasonable to demand that an admin stop doing their job simply because of your personal feelings. Nicodene (talk) 07:02, 9 June 2024 (UTC)[reply]
You keep saying I make lots of mistakes, but what you actually mean is that after I reverted your change to the definition of hot dog from “sandwich” to “entree”, Equinox then changed it to “snack”. The fact that you keep focusing on this doesn’t make any sense to me, as it clearly misrepresents what happened, and I don’t see how it’s supposed to be hypocritical anyway. Theknightwho (talk) 10:41, 9 June 2024 (UTC)[reply]
  • I would suggest that, even if there is consensus to block Purplebackpack89, it would make more sense to just block him from discussion pages— while still letting him contribute to the dictionary proper, cause his lexicographical additions, with fixes and corrections by other editors, are still substantial. Inqilābī 19:50, 6 June 2024 (UTC)[reply]

Constructed languages in the mainspace

[edit]

(Notifying -sche, The Editor's Apprentice, Mahagaja): : I recently created this vote (start date TBD): Wiktionary:Votes/2024-06/CFI for mainspace constructed languages, in hopes of coming to a consensus on which conlangs should be included in the mainspace and why. Since its creation, nonetheless, I've come to realize that we currently include possibly two conlangs in the mainspace outside of the ones listed at WT:CFI, and I'm not sure what to do about them. These include:

If we consider N'Ko a conlang, should it be included in our permitted mainspace list? I would think so, but I also don't feel like it's an actual conlang. I don't know anything about Eskayan to comment on it.

On the same note, I'd like to bring up the case of palawa kani, created by the Tasmanian Aboriginal Centre as it seems closer to the revival of an indigenous language instead of a language like Volapük, at least based on my surface-level research of it. It looks to be taught to children, is used in place names, is used in official dubbing, has a growing oral tradition and more. I cannot yet verify if it has native speakers, but I wouldn't be surprised if it does, if not for a lack of direct access to the language (the merits of which I won't comment on). If so, I'd like to see what the consensus is about adding it to Proposal 1 of the above vote. AG202 (talk) 23:57, 4 June 2024 (UTC)[reply]

Pinging @Mar vin kaiser since you seem to be the most active editor of Eskayan & @Thadh since you mentioned it on Discord. AG202 (talk) 00:22, 5 June 2024 (UTC)[reply]
This looks well thought out, nicely done. Thanks. Vininn126 (talk) 07:15, 5 June 2024 (UTC)[reply]
In a previous discussion (before the Interslavic discussion), someone said it felt like the divide we make between mainspace conlangs and appendix-space ones was that the handful of long-used conlangs are in mainspace, and new ones are in appendix space... and they said that thinking it was a bad thing (arbitrary), but I think it's been a reasonable approach. Having a fair number of native speakers and/or works in the language could be another decent rule of thumb. As regards Eskayan, I note how many aspects of our attitudes to / treatment of artificial languages seem to have been developed with Western conlangs in mind (often created recently and for certain reasons, attempting and failing to be world languages or new nations, or for fiction), to the extent that the existence of old non-Western artificial languages like Eskayan (created for different reasons and used in rather different ways, in Eskayan's case as a language of the Eskaya people, taught in several schools) seems to have slipped the minds of the people devising the original conlang policies, and flown under the radar. All things considered, that (fact that Eskayan is currently included) seems OK to me. I'm not wedded to it being in mainspace if people want it moved to appendix-space, but it does seem to be in a different boat from various Western conlangs that have been suggested for inclusion. - -sche (discuss) 00:53, 5 June 2024 (UTC)[reply]
I have no strong opinions either way on Eskayan; it feels different in some way from run-of-the-mill conlangs but I don't know if that's just a bias based on its non-Western origin. Benwing2 (talk) 04:06, 5 June 2024 (UTC)[reply]
BTW as for N'Ko, from reading the Wikipedia entry it sounds more like Standard Basque, Standard Moroccan Amazigh, Rumantsch Grischun or Unified Kichwa, which I do not consider conlangs so much as intentionally created koines. These are on the same spectrum as Modern Hebrew, standard German and standard Italian, all of which are partly planned languages but none of which are reasonably considered conlangs IMO. Benwing2 (talk) 04:17, 5 June 2024 (UTC)[reply]
Thanks! Yeah I won't worry about it then. AG202 (talk) 05:06, 5 June 2024 (UTC)[reply]
I've since done a skim of relevant chapters of The Last Language on Earth: Linguistic Utopianism in the Philippines by Dr. Piers Kelly (Dec 2021), which focuses on Eskayan, and based on what I've read, it seems like it has a strong rationale for inclusion. It's taught in schools to children, used in praying, singing, speechmaking, excluding overhearers, and common phrases, and there's an extensive literary history. "In effect, Eskayan appears to have supplanted the special authoritative role of English." They estimate that there are between 500-550 speakers of Eskayan, with several speakers with a high degree of linguistic competence in speaking, reading, and writing the language.
The only issue I'm seeing is that it's technically not a mother tongue: "Unlike Boholano-Visayan, which is acquired as a mother tongue, knowledge of Eskayan is learned through voluntary attendance at traditional Eskaya schools, and mastery of the language is considered a prerequisite for becoming truly Eskaya." Thus, there technically aren't any L1 speakers from birth, but seeing as though there are children taught it from a fairly young age, would that not qualify as a pseudo-native language? It's definitely different from the typical conlang, and has fully-fledged educational aspects, including arithmetic & equations being taught and performed in schools. The author starts out the final section, stating:

The immediate future for Eskayan as a viable language is reasonably assured. Competent speakers have status within the communities; in Biabas and Taytay the language is being actively learned by children, and plans are well under way to construct an Eskaya school in Cadapdapan. Recent government recognition, through the Indigenous Peoples Rights Act, provides additional legitimacy to an already valued language.

This makes it clear to me that it holds legitimacy and should be included in the namespace. AG202 (talk) 05:06, 5 June 2024 (UTC)[reply]
@AG202 Sounds good to me; I would amend your proposals to include it along with Esperanto. Benwing2 (talk) 05:17, 5 June 2024 (UTC)[reply]
@AG202, Benwing2: I seem to be late in entering this discussion but yeah, I agree that Eskayan is quite different from other conlang. I would describe Eskayan as already part of indigenous culture of that region of Bohol. So it should be part of the mainspace. --Mar vin kaiser (talk) 07:10, 5 June 2024 (UTC)[reply]
Actually the more I'm thinking about it, the more I feel it might be possible to include it as a "jargon" of Cebuano. I guess it doesn't really give a complete picture, but it's basically Cebuano with an almost complete substitution of words, which is very similar to what we usually consider a jargon, rather than an independent language. Thadh (talk) 07:11, 5 June 2024 (UTC)[reply]
On a similar note, Eskayan is currently considered an LDL; I assume that that should stay the same? I hate to continue having separate threads on this topic, but I want to make sure that everything is addressed before setting the start time & date for the vote. AG202 (talk) 22:39, 6 June 2024 (UTC)[reply]
It's been brought to my attention that Ido is said to have 26 native speakers in Finland per the Ido language Wikipedia page. However, I'm not sure if it's been independently verified and would like more input before I make any changes in either direction about it. Surjection previously told me on Discord that the Finnish website does not provide any additional information. CC: @Benwing2, @Thadh, @-sche, @Vininn126 AG202 (talk) 19:55, 6 June 2024 (UTC)[reply]
I think that if we can't verify it, we shouldn't consider it. Vininn126 (talk) 20:08, 6 June 2024 (UTC)[reply]
I mean, we can verify it, namely by looking at the reference. Tilastokeskus isn't an organisations to just invent 26 native speakers, is it? Thadh (talk) 20:19, 6 June 2024 (UTC)[reply]
I suspect that it can't be ruled out that these were just some pranksters fooling around with their own self reported information submitted to the population census database (if such choice had been presented in the questionnaire form). I doubt that the Finnish statisticians actually made any effort to verify the actual Ido language proficiency of these people. And if native speakers actually exist, then it should be possible to confirm this information from the other sources. --Ssvb (talk) 21:10, 6 June 2024 (UTC)[reply]
I was very astonished when I saw it for the first time, and frankly, I find it extremely hard to believe. I mean, there are reports about native speakers of Volapük, but that was when Volapük was still a huge movement. Ido has never been a huge movement. In fact, I think these 26 native speakers appearing out of the blue are possible only if the entire community of Ido users decided to move to Finland within a short period of time, started multiplying themselves and teaching it to their newborns. But on the other hand, the source doesn't seem to be unreliable, although in this case I wonder if it couldn't simply be a mistake. IJzeren Jan (talk) 21:28, 6 June 2024 (UTC) N.B. I wouldn't put my money on pranksters either, as that would require some pretty good organization; and why would they pick Ido, of all possibilities? Putting myself in their shoes, I'd rather have chosen Klingon, Na'vi, Huttese or something similar. IJzeren Jan (talk) 21:39, 6 June 2024 (UTC)[reply]
@Ssvb: From what I know, the Finnish statistical database, just like the Dutch one, is based on population data obtained at birth/subsequent corrections during the person's lifetime. Unlike the census, this is personal information the government has on you, so the chance people would play with that is a lot lower. But it's always possible that I misunderstood this? Thadh (talk) 21:41, 6 June 2024 (UTC)[reply]
But on the other hand, does Finland collect data on one's native language, too? Because I'm sure as hell that the Netherlands don't! IJzeren Jan (talk) 21:59, 6 June 2024 (UTC)[reply]
That's a good point. I don't know, but I'm sure that that information can be found somewhere. Thadh (talk) 22:05, 6 June 2024 (UTC)[reply]
@Thadh: Yes, it's the personal information, but seems like the Finnish residents can just login using their online banking credentials here and update various details, including their "native language". I doubt that some oddly selected native language can possibly affect anything in everyday life. And I think that having a few dozens of conlanger weirdos in the whole Finland isn't statistically improbable. For example, there were some nutcases in Taiwan, who even changed their names just to get a discount. If there's a loophole in the system AND a real incentive to abuse it, then it will be abused. --Ssvb (talk) 01:01, 7 June 2024 (UTC)[reply]
@Ssvb: More realistically tax advisors now suggest to change genders according to the new self-determination act in the FRG, because you get different capitalisation factors for the assessment of the value of a land encumbrance, remaining at the owner after donating a property and hence reducing gift tax, depending on legal gender. I can imagine legal advantages to slip in for someone determining his native language, as it is also an idpol kind of thing, or even purposefully teach a child an artificial language as a second native language just for benefits introduced somewhere. Wiktionary alone though is just not important enough to be gamed this way. Fay Freak (talk) 14:13, 7 June 2024 (UTC)[reply]
The borderline transphobia aside (was that really a meaningful addition to the discussion?), I don't see how Ido, a constructed language, would give anyone benefits, so I am highly sceptical anyone would change their native languages for that reason, even more so than for the reason of trying to be funny. Thadh (talk) 16:37, 7 June 2024 (UTC)[reply]
Strange individuals exist in every society. You can't expect everyone to be sane and reasonable. --Ssvb (talk) 18:06, 7 June 2024 (UTC)[reply]
Maaaybe self-promotion, as you could brag about how your hip new language has "26 recorded native speakers in Denmark", even if it isn't technically true. CitationsFreak (talk) 08:32, 8 June 2024 (UTC)[reply]
Personally I think we should ignore this data point about Ido, because it seems a priori unlikely, as others have pointed out. Claims about native and total speakers are habitually inflated, e.g. someone insists on putting back into Wikipedia the claim that there are over 200 million total speakers of Swahili, based on a single questionable reference and in contradiction to all other references; I have deleted this info several times but it keeps getting put back, and I don't have the energy to fight this. Benwing2 (talk) 22:33, 6 June 2024 (UTC)[reply]
True that. Same goes for Esperanto, by the way: the ridiculously high number of 2 million speakers (sometimes even 10 million) keeps popping up regularly, even though it was refuted already a long time ago. Today we know that even a number of 100,000 is probably way too optimistic. Same goes for those one or two thousand so-called native speakers. Usually, such figures come from sources with an interest in inflating them. However, that cannot be said of those figures from Finland. Instead of drawing conclusions based on suppositions, shouldn't we at least ask them where those 26 native speakers come from? IJzeren Jan (talk) 23:25, 6 June 2024 (UTC)[reply]

synthesized audio files

[edit]

Do we have a policy on this? I have encountered some, e.g. at inconsequential the audio is explicitly labeled "CA synth", which I take to mean synthesized Canadian. Although it's now possible to synthesize realistic sounding text-to-speech audio, this particular audio sounds very artificial to me, and I think it doesn't belong. Even for realistic-sounding audio, I'm skeptical. Here are some other words with audio labeled "CA synth": extraterrestrial, catamaran, angst, centralization, depolarization, disorganization, amnesia, counterfactual, homily, atherosclerosis, icicle, ecclesiastical, enclose, intruder, gasp, entitle, grievance, goose flesh, biodiversity, lethargic, hyperventilation, coliseum, macrobiotics, impracticality, autobiographical, disputant. An additional file labeled as ca-synth in the filename but not the caption occurs in isolationism. Some of these have additional non-synthesized audio files, some don't. Benwing2 (talk) 04:04, 5 June 2024 (UTC)[reply]

Our general rule (not sure how much of a policy it is) is that audio pronunciations ought to be recorded by native speakers of the languages—a machine is admittedly not a native speaker of any language. Some time ago I came across a bunch of synthesized audio files on English entries (around 20–30 IIRC) that were all created by a single Commons user years ago and then a few years later were automatically added by a bot (User:DerbethBot) that was adding missing Commons audio recordings not on the entries. The quality of the audios was really poor and many weren't even correct, so I went ahead and removed them. I admit perhaps I should’ve brought up the matter here, but it seemed pretty clear-cut to me that they had to go as, with an audio recording, one would expect the voice of an actual native human speaker. Even with better quality recordings and as voice synthesis technology gets better and better (particularly with the AI stuff), I think we should still try to supply authentic human recordings as any voice synthesis services will be available to the readers elsewhere. lattermint (talk) 05:05, 5 June 2024 (UTC)[reply]
I support removing these and sticking a note somewhere that people shouldn't add synthesized audios as pronunciations. (OTOH, if someone could add a synthesized audio to e.g. voice synthesis as a T:examples type thing, that could actually be appropriate use of synthesized audio, ha.) I suspect these will become an increasingly common issue unfortunately; Commons is similarly dealing with low-quality AI art being added. - -sche (discuss) 05:33, 5 June 2024 (UTC)[reply]
Agree with lattermint and -sche. Vininn126 (talk) 05:46, 5 June 2024 (UTC)[reply]
I’ve encountered these before. Not a fan. Among other things a formal policy of ‘recordings should be of native speakers’ would help by automatically disqualifying this sort of thing. Nicodene (talk) 12:16, 5 June 2024 (UTC)[reply]
We can formulate it this way, “recordings within pronunciation sections must be of native speakers”; how for allowed conlangs without native speakers? For dead languages I formulate “recordings for extinct languages are discouraged”, to leave room for interpretation, mine being that if it would pass off as native if we were ignorant of the information of the language being extinct then it is tolerated for practical purposes—we will find agreement on making a statement about recordings of natural languages possessing native speakers easy. Fay Freak (talk) 13:19, 5 June 2024 (UTC)[reply]
Perhaps it could be phrased as 'If a language has a large body of native speakers, any recording should be of a native speaker. If a language is extinct or constructed but has a large body of non-native speakers, any recording should be of a proficient speaker using one of the conventional pronunciations’. Nicodene (talk) 20:09, 5 June 2024 (UTC)[reply]
it reminds me of the radio recordings of traffic conditions i used to hear on the radio, where both the tone and the speed were out of step. outside of acute distress, we just dont speak that way. basically it sounds like someone who was abducted and is being forced to read a letter saying "no im fine please dont look for me im definitely okay i promise" Soap 12:50, 5 June 2024 (UTC)[reply]
Phonology is also out of step; of native speakers announcing the stops in the tram I hear local toponyms being pronounced dodgily, since these speakers rarely get IPA transcriptions; even the pronunciations of municipality-level names from TV presenters are unreliable. It is impossible to guess and almost impossible to look up that BaumheideAltenhagen, Hiddenhausen, Oerlinghausen but not Bad Oeynhausen, Ubbedissen and Asemissen are stressed at the second stem, and for Hövelhof I still don’t know whether ⟨v⟩ is /v/ or /f/ – IP corrected it to /v/, from a local public broadcaster’s newsreader I remembered it /f/. There is a lot of confirmation bias and no source, lest to speak of reliable ones, on de.Wikipedia’s claim for Hiddenhausen and Oerlinghausen being stressed at the onset. Fay Freak (talk) 13:19, 5 June 2024 (UTC)[reply]
Can this policy be generalized to dealing with just any low quality audio? E.g. anything with disruptive background noise too. Also rather than deleting, maybe it's more productive to aggressively categorize and label them as something that needs replacement? The deleted low quality audio samples won't just disappear from the net and may be re-added again by the less attentive editors or bots. It would probably help if https://lingualibre.org/wiki/User:Olafbot could prioritize replacement of such known low quality recordings over adding the totally new audio samples. Pinging @Olaf just in case if he might be interested in this discussion.
As for the possible replacement of the current artificially synthesized isolationism audio, the https://commons.wikimedia.org/wiki/Category:Lingua_Libre_pronunciation-eng?from=isolationism link lists one human recorded sample. But it's a sample recorded by a native Mandarin Chinese speaker and, despite of that, it's used by fr.wiktionary.org and pl.wiktionary.org. This is also not ideal in my opinion. --Ssvb (talk) 15:02, 5 June 2024 (UTC)[reply]
And to give an example, just listen to the click in the iridium audio sample. I encountered a lot of samples with similar or even worse defects, but can't easily find them offhand right now.
What is a Wiktionary editor supposed to do upon spotting such audio? Just simply let it be because it had been recorded by a native speaker? Remove it? Label it somehow? --Ssvb (talk) 18:03, 5 June 2024 (UTC)[reply]
If the audio is particularly bad, please do bring it up for discussion (compare e.g. Wiktionary:Tea_room/2013/February#Dutch_enig.2C_Audio_file_file:Nl-enig.ogg; note that the current audio file is different than the one that was there when that discussion happened); if it's bad, people will probably agree on removing it; if a lot of files by a particular speaker have problems, we may want to remove them all systematically and try to 'blacklist' the user's files (Metaknowledge was spearheading a project to do this, but has been inactive). I like the idea of listing known bad recordings somewhere after removing them from entries, so they can hopefully be replaced (either outright overwritten with a better file, or someone just records and uploads a separate file). - -sche (discuss) 18:19, 5 June 2024 (UTC)[reply]
@-sche: I understand and fully agree with starting a public discussion when bad faith is suspected. Such as covert vandalism or if a person with an obvious accent has an audacity to pretend to be a native speaker.
However this is simply not workable in all other cases due to excessive bureaucracy involved. You already set the bar at "particularly bad", possibly to limit the scope and the amount of paperwork. But even "moderately bad" or "slightly bad" audio samples shouldn't be normally desirable. Using Lingua Libre, it's possible to easily record more than 100 audio samples in less than one hour if one is up to it. Another factor is that the beginners are likely to systematically record and upload a certain percentage of low quality audio samples simply due to lack of experience. Jumping the gun to harass or blacklist the users for this is also counterproductive, because this is a sure way to lose a potentially valuable contributor.
My suggestion is to simply add a new parameter to Template:audio for flagging low quality audio samples. The parameter value can be a short text description: "wrong accent", "noise", "clipped", "muffled", "synthetic", etc. So that when I encounter a bad quality audio sample, I can just spend a few seconds on a quick edit to flag it. When this process is established, the problematic words can be automatically added to a Lingua Libre list, similar to this one: https://lingualibre.org/wiki/List:Eng/Lemmas-without-audio-sorted-by-number-of-wiktionaries (so that the Lingua Libre contributors know what to prioritize when recording their audio samples).
Please look at the same iridium audio sample again. Is it so bad that it needs an urgent removal or a public discussion? Maybe not. Would it be a good idea to eventually replace it? Yes, of course. --Ssvb (talk) 17:14, 6 June 2024 (UTC)[reply]
@-sche: And here's another example: the Ukrainian хлор (xlor) sounds almost indistinguishable from хор (xor) because the sound "л" is missing.
It's interesting that another audio sample also recorded by @Tohaomg drops a different sound ("р" instead of "л") in хлорметан. And I can actually hear a click in place of the missing sound. After searching a bit, I found a known problem https://lingualibre.org/wiki/LinguaLibre:Technical_board/Audio_click_bug#HIGH_PRIORITY:_Audio_recordings_have_dust_and_clicks, which was likely fixed only in 2023.
Anyway, even though the audio samples likely got corrupted because of a bug in the recorder application, I see this primarily as a QA issue. Corrupted audios shouldn't be normally uploaded to commons.wikimedia.org by the person, who recorded them. --Ssvb (talk) 02:47, 9 June 2024 (UTC)[reply]
@Benwing2: What's your opinion? Should we label problematic audio samples as |a=synthetic or |a=defective? Or a better solution is needed? --Ssvb (talk) 03:01, 9 June 2024 (UTC)[reply]
@Ssvb I added support for a |bad= parameter for labeling bad audio recordings with arbitrary text. You can see it in action in User:Benwing2/test-audio (specifically, the last example under the "Production" section). The "bad recording" note should appear boldfaced in red, but it may take 5-10 minutes for it to appear this way as I just added the appropriate specs to MediaWiki:Common.css for this and it takes a few minutes after doing so for the changes to propagate. Let me know if this is helpful or if you want some other param. Benwing2 (talk) 03:16, 9 June 2024 (UTC)[reply]
Also, uses of this param are currently tracked using the WT:Tracking mechanism, by visiting Special:WhatLinksHere/Wiktionary:Tracking/audio/bad-audio and Special:WhatLinksHere/Wiktionary:Tracking/audio/bad-audio/LANG for a specific lang code. Maybe this should be made into a category. Benwing2 (talk) 03:19, 9 June 2024 (UTC)[reply]
I wouldn't be a priori opposed to artificially produced audios if native speakers can vouch for their sounding natural.  --Lambiam 14:28, 6 June 2024 (UTC)[reply]
@Lambiam: This is a slippery slope and it's not always easy to tell the difference between natural and non-natural. Audios may be just slightly unnatural and people would hesitate to discard them. That said, I don't mind having synthetic audios as a temporary placeholder, but only if they are always clearly labelled as such. And only if they are added to a publicly visible to-be-replaced list. --Ssvb (talk) 17:25, 6 June 2024 (UTC)[reply]
I have no strong feelings about this, but note that occasionally an audio file presumably produced by flesh-and-blood native speaker may sound off as well (and sometimes even plainly wrong). In the end, whatever the means of production, the capitalist will appropriate the surplus value the quality of the result needs to be assessed and ensured by native speakers.  --Lambiam 17:34, 6 June 2024 (UTC)[reply]
Sure, not everyone is a professional voice actor. But a synthetic audio is like a synthetic flower. Some aspects of it are as good or even better than the real thing. Yet the other aspects are different, possibly in a subtle way. --Ssvb (talk) 19:10, 6 June 2024 (UTC)[reply]
Hard against. Vininn126 (talk) 17:30, 6 June 2024 (UTC)[reply]
I don't think it's a good idea that AI will learn humans how to speak "right". We've already got humans who are doing it wrong. Tollef Salemann (talk) 13:29, 23 June 2024 (UTC)[reply]
@Tollef Salemann: Did you mean "AI will teach humans"? I don't like this idea either. --Ssvb (talk) 14:29, 23 June 2024 (UTC)[reply]
Support removing synthesized audio. — SAMEER (؂؄؏) 04:21, 9 June 2024 (UTC)[reply]
Support synthesized/AI audio on the condition that it's indistinguishable from a natural human voice. The ones listed above are extremely robotic-sounding which I dislike. Ioaxxere (talk) 08:31, 23 June 2024 (UTC)[reply]
@Ioaxxere: Who will decide that it's indistinguishable? Even native speakers sometimes can't notice a foreign or a regional accent unless they pay close attention to very specific subtle details. --Ssvb (talk) 17:08, 23 June 2024 (UTC)[reply]
Oppose synthesized/AI audio. IPA by non-natives depends on the expertise and ear of the person who originally recorded the information- sometimes it's very good. At any rate, one doesn't have to speak a language to record what has been heard. Audio by non-natives is a lie- sometimes a harmless white lie, but always a lie. Chuck Entz (talk) 15:23, 23 June 2024 (UTC)[reply]
Oppose synthesized/AI audio. Such synthesized audio would be effectively squatting Wiktionary pages, effectively preventing audio samples recorded by humans from finding their way there. --Ssvb (talk) 17:24, 23 June 2024 (UTC)[reply]

Anti-intensifiers and the epidemic of British meiosis

[edit]

At the moment our entry for maybe lists the following sense:

  1. (UK, meiosis) Certainly

Similarly, we find the following under a bit:

  1. (UK, meiosis) Very.
  2. (UK, meiosis) A lot.

and the following under somewhat:

  1. (UK, meiosis) Very

The problem is that, as far as I am aware, every single word or phrase that carries a sense of moderation is fair game for ‘meiosis’. In no particular order I cite slight, modest, mild, decent, small, minor, light (adj.); relatively, perhaps, to some extent, fairly, a little, possibly, not exactly; might, could, seems; scuffle, tiff, misunderstanding.

The flipside of this is that people can and (especially in the UK) do assign sarcastic senses to any word that denotes a positive quality: genius, fantastic, brave, brilliant, revolutionary, creative, and so on.

Both meiosis and sarcasm are, I think, cultural/metalinguistic and as such beyond the purview of a dictionary. Nicodene (talk) 00:24, 6 June 2024 (UTC)[reply]

Are there examples where words actually acquired new meanings through meiosis. What exactly distinguishes it from understatement? —Caoimhin ceallach (talk) 00:46, 7 June 2024 (UTC)[reply]
Not that I'm aware of, and I don't believe there is a difference other than meiosis coming off as ‘a bit’ affected. Nicodene (talk) 00:53, 7 June 2024 (UTC)[reply]
I do think there is a place for meiosis in wiktionary as it can contribute to etymology. As mentioned in the wikipedia article on meiosis, the Australian 'outback' is one example where the word did acquire a new meaning through meiosis. It was originally used as a meiotic comparison to the back yard of a house, but is now commonly used without that comparison in mind. That said, I would agree that meiosis should not be included as an additional sense in each of the entries you referenced. If meiotic or sarcastic senses are to be included at all, I suggest it be in usage notes, as in nice. Pangur Bán & I (talk) 06:21, 9 June 2024 (UTC)[reply]
On a balance I think you are right, this is not worth a separate sense line, at least in those entries. (As Pangur says, there are cases where meiosis seems to become lexical, like pond.) I suppose the (small-c) conservative thing to do would be to conserve the information in a usex like the one showing sarcastic use at Sherlock, or move the quotes under the 'regular' sense. Indeed, it is nonobvious to me how one discerns that the "somewhat weatherbeaten" quote at somewhat is meiosis, anyway; do I need to have knowledge of the real condition of the train in question to know the writer is understating its weatherbeatenness, and out of meiotic intent rather than misassessment? - -sche (discuss) 16:31, 18 June 2024 (UTC)[reply]

Kyakhta Russian–Chinese Pidgin

[edit]

I suggest adding Kyakhta Russian–Chinese Pidgin in addition to existing pidgins based on the Russian: Mednyj Aleut (mud), Russenorsk (crp-rsn), Solombala English (crp-slb), Taimyr Pidgin Russian (crp-tpr). AshFox (talk) 08:18, 6 June 2024 (UTC)[reply]

Support Protegmatic (talk) 18:13, 8 June 2024 (UTC)[reply]
Do you have any plan about how to make the entries for Kyakhtian? The only clear feature it has is the suffix -la, and you even can't always use it. There is no clear grammar or spelling or pronunciation records. Also, i remember rumors that there are some Chinese records of this pidgin, and it were some problems with them as well. I have thinked about this pidgin long time and just gave up, cause am not sure how to make it structured enough for Wiktionary. Anyway, good luck with this work if you decide to do it. The pidgin is a mess, but it has many cool words worth to be mentioned on Wiktionary. Tollef Salemann (talk) 22:02, 8 June 2024 (UTC)[reply]
But as for adding an own language code for it, I'm fully supporting it. Tollef Salemann (talk) 22:04, 8 June 2024 (UTC)[reply]
Oh, yeah, there is also -shek-/-nek- instead of Russian -shk-/-nk- but I guess that it is just the Chinese pronunciation, and not really a pidgin grammar feature. Tollef Salemann (talk) 22:07, 8 June 2024 (UTC)[reply]

Full stops after templates like {{synonym of}}

[edit]

Should automatic full stops be added after templates used in definitions like {{clipping of}}, {{short for}}, and {{synonym of}} (with the option to turn it off)? @Sgconlaw suggested to make this discussion after I manually added one to sacrifice. J3133 (talk) 13:20, 6 June 2024 (UTC)[reply]

Support (for English; separate discussion for other languages desirable). In fact, {{clipping of}} and {{short for}} already automatically add a full stop at the end. I think it makes sense to have a full stop automatically added for other templates like {{synonym of}} (with the option to turn it off in appropriate cases) for consistency with the earlier-mentioned templates, and because we treat our definitions for English entries like sentences, starting them with a capital letter and ending them with a full stop. — Sgconlaw (talk) 14:25, 6 June 2024 (UTC)[reply]
Support Def templates usually benefit from having full stops.. Support for English, Oppose for other languages in light of Ben's argument. Vininn126 (talk) 09:30, 7 June 2024 (UTC)[reply]
Support (edited to add: for English only, in agreement with Benwing below); IMO, the templates should use the langcode to format themselves the way definitions are formatted, capital + period for English, lowercase + no period for other languages. That might need separate/more discussion, because some people disagree and want capital + period for all languages/definitions, or want lowercase + no period for all langs/definitions, but in any event having some [but only some] templates do "capital but no period" is not consistent with anything. Let's check for cases these are followed by (a) a manual period a bot should remove if we have the template start supplying them, or (b) something else, such as another template or gloss, which we / a bot might solve by adding nodot= — I sometimes see (or do!) things like "{{altcase|en|fooh}}: {{altform|en|foo}}". - -sche (discuss) 17:54, 6 June 2024 (UTC)[reply]
Oppose. It's trivial to add punctuation: it's one keystroke. It's comparatively cumbersome to use parameters to disable unwanted punctuation: |nodot=1. Not automatically outputting punctuation is a more flexible design, more user-friendly, and less obtuse.
There have been cases in the past where someone has gone in and added auto-punctuation to long-standing templates, requiring lots of manual editing to fix existing wikicode where the templates were used mid-sentence.
The time gains from auto-punctuation are trivial. The time losses are substantially larger. ‑‑ Eiríkr Útlendi │Tala við mig 20:33, 6 June 2024 (UTC)[reply]
Hmm, this is a testable argument: T:alternative form of (for example) is used on 174,660 pages, so the cost of adding the missing period to them all would in fact not be 1 keystroke, but 174,660 keystrokes. [If we only add dots for English, the number will be lower but we also won't have to worry about adding nodot= to non-English entries... in any event, by ratio,] it seems like somewhere in the vicinity of 1/8th of affected entries would have to be putting other text (besides a period) after the template in order for the keystroke argument to support defaulting to no dot, rather than supporting defaulting to dots. Can anyone check what's the case? I suspect the number which are putting other text after the template is in fact far, far smaller than 1/8th, but I could be proven wrong! (In most cases, I think whatever display would be correct in the majority of cases should be the default display.) If we do decide the default display should be no dot, I hope someone will write a bot to add dots where they're missing, since at present this is not done and entries just sit around with their normal definitions and these templates looking inconsistent. - -sche (discuss) 21:29, 6 June 2024 (UTC)[reply]
On Translingual entries, I have been forced to add "nodot=1" for synonyms templates used within {{taxon}} (which has a default period) and for instances of "See {{specieslite}} and June on Wikipedia.Wikipedia for other species". I'd be happy to forego the default period in {{taxon}} for the benefit of consistency in the need to consider the punctuation needs of the entry. DCDuring (talk) 15:12, 7 June 2024 (UTC)[reply]
Oppose per Eirikr. If this is an issue, I can start adding full stops to the entries I create, or someone can make a bot do that; Adding an automatic printing of a full stop is always a whole headache trying to remove it. Thadh (talk) 21:12, 6 June 2024 (UTC)[reply]
Strong oppose Support for English, Strong oppose for other languages. I believe as a general rule that all form-of templates should auto-generate capital letters and final periods (full stops) only for English (if that), and should default to lowercase and no periods for all other languages. I have wanted to implement that for all form-of templates instead of the morass of randomness we currently have, but need to get consensus for it. Benwing2 (talk) 22:26, 6 June 2024 (UTC)[reply]
OK, I see User:-sche agrees with me, but has phrased it using "support". I think we should have a separate poll to implement this option. Benwing2 (talk) 22:28, 6 June 2024 (UTC)[reply]
@Benwing2: mmm, isn't your view in support of J3133's proposal, at least where English is concerned? I also have no objection if it is felt that for non-English languages there should not be an initial capital letter or terminal full stop (though I'm unclear why). — Sgconlaw (talk) 22:29, 6 June 2024 (UTC)[reply]
@Sgconlaw User:J3133 did not qualify their proposal with a restriction to English; I'm strongly opposed to making this a blanket addition to all languages, which is what the proposal suggests on its face value. Benwing2 (talk) 22:37, 6 June 2024 (UTC)[reply]
@Benwing2: We (Sgconlaw and I) were discussing English entries, but I forgot to mention it. J3133 (talk) 06:05, 7 June 2024 (UTC)[reply]
Abstain The current state of capitalisation and full stops in definitions is pretty chaotic. I think it doesn't make much sense to change some templates one way or the other before reaching consensus on puncuation for each type of definition (English and non-English, lemma and non-lemma, gloss and non-gloss). After deciding on that, the inclusion of automatic full stops in form-of templates may be worth another discussion. Personally, I think most (or all) non-gloss definitions, including those which use form-of templates, should have capitals letters and full stops in both English and non-English entries. Einstein2 (talk) 23:07, 6 June 2024 (UTC)[reply]
Support for English, Oppose for other languages, in strong agreement with Benwing above. — Vorziblix (talk · contribs) 00:30, 7 June 2024 (UTC)[reply]
Support Per User:Benwing2 and User:-sche. Ioaxxere (talk) 04:28, 7 June 2024 (UTC)[reply]
Support for English, Oppose for other languages. Same thing with capitalization (capitals for English, lowercase for other languages). However, there should be a "nodot" parameter on all of these. Sometimes it's useful to add information (that should be part of the same sentence) after the template. Andrew Sheedy (talk) 05:02, 7 June 2024 (UTC)[reply]
@Andrew Sheedy Agreed. Whenever a template auto-capitalizes or auto-adds a final period, there should be (and usually are) |nocap=1 and |nodot=1 params to disable the capitalization and auto-period. Benwing2 (talk) 06:23, 7 June 2024 (UTC)[reply]
Support for English, Oppose for other languages (as this proposal’s initiator). J3133 (talk) 06:28, 7 June 2024 (UTC)[reply]
Abstain for English, given current practice which I would not have begun but now we have it, Oppose for other languages and strongly support the opposite. Fay Freak (talk) 09:53, 7 June 2024 (UTC)[reply]
Oppose for English and Translingual. Abstain for other languages. The nodot=1 option required to maintain flexibility in the use of the template is an annoyance. Prohibiting use of such templates except in prescribed cases and in prescribed manner needs some kind of justification that I haven't seen here. I hope we aren't going in the direction of "Everything that is not mandatory is forbidden." DCDuring (talk) 14:38, 7 June 2024 (UTC)[reply]
Somewhat oppose for all languages, strong oppose having English as a special case apart from other languages. It not only might confuse editors, it will definitely confuse editors. — SURJECTION / T / C / L / 21:56, 7 June 2024 (UTC)[reply]
Oppose -- Sokkjō 05:51, 9 June 2024 (UTC)[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── Comment - it appears we have consensus to not include final full stops/periods (and probably not initial capitalization either) in non-English form-of templates, but no obvious consensus for English form-of templates. It appears there are two options, either include them by default with English or don't include them. The latter makes English consistent with non-English, but the former is closer to existing practice (in many cases, at least). Some thoughts:

  1. Would it make a difference in your voting if there were a one-character way of turning off initial-caps and/or final period/full-stop? For example, a symbol like ^ or > (just brainstorming here, maybe there are better symbols, and it doesn't have to be the same symbol at the beginning and the end) could be placed at the beginning to suppress the initial caps and at the end to suppress the final period.
  2. How strongly do you feel about the inclusion or non-inclusion of initial caps and final periods for English? (E.g. for me, I could go either way with English; what I feel strongly about is that initial caps and final periods should *NOT* be present for non-English.)

Benwing2 (talk) 09:18, 9 June 2024 (UTC)[reply]

Assuming we are still talking about "templates like {{synonym of}}", I am considering abandoning their use embedded in {{taxon}} because {{syn of}} doesn't accept nocap=1. If the final period is mandated, then there would also be an extra period. DCDuring (talk) 19:25, 9 June 2024 (UTC)[reply]
@DCDuring Are you sure that {{syn of}} doesn't accept |nocap=1? It's documented to accept it and internally it sets |withcap=1, which simultaneously turns on initial capitalization and adds a |nocap= option to turn it off. Also as mentioned above, I am thinking of adding a feature to make it easier (fewer keystrokes) to turn off the initial caps/final period. Benwing2 (talk) 19:39, 9 June 2024 (UTC)[reply]
I'll try again. See Geomalia for the look at present. DCDuring (talk) 20:42, 9 June 2024 (UTC)[reply]
@DCDuring Looks good to me. Benwing2 (talk) 20:51, 9 June 2024 (UTC)[reply]
That's after I corrected my error. Previously: [8]. I still wish that the italics could be removed (optionally: noi=1), so italicized taxa could appear italicized in Translingual/taxonomic definitions that use {{syn of}} within {{taxon}}. DCDuring (talk) 21:34, 9 June 2024 (UTC)[reply]
@DCDuring I could implement that, although at that point I wonder if it wouldn't be better just to manually write out "synonym of"; the template doesn't categorize so there seems little point in using it if you have to add a bunch of flags to get non-default behavior. Benwing2 (talk) 21:54, 9 June 2024 (UTC)[reply]
It might be better for me to fork out {{taxonsyn}} for the increasing number of cases where I embed a synonymous taxon in a definition within {{taxon}}. Those 'lesser' taxon definitions merit less detail that 'real' taxon definitions, so excluding them from searches for 'incomplete'/improvable entries is desirable. The formatting peculiarities of taxa derived from italicization (and perhaps the meaning of synonym) may justify otherwise undesirable forking. DCDuring (talk) 22:12, 9 June 2024 (UTC)[reply]
I am not familiar with the ins and out of taxon formatting, but in general if you are doing something repeatedly, it makes sense to have a dedicated template or template parameter for it. Benwing2 (talk) 22:35, 9 June 2024 (UTC)[reply]
Support for English, Oppose for other languages. @Benwing2 I note many (not all) of the full oppose votes are from editors who don’t edit English, who may have not realised the growing consensus for separate options. Theknightwho (talk) 11:18, 9 June 2024 (UTC)[reply]
Oppose for English, Support for other languages. PUC13:20, 9 June 2024 (UTC)[reply]
@PUC: But we (including you) do not use full stops for non-English definitions. J3133 (talk) 13:33, 9 June 2024 (UTC)[reply]
He's goofin. Vininn126 (talk) 13:38, 9 June 2024 (UTC)[reply]
I feel strongly about having all English definitions start in a capital letter and end in a period, because my initial reason for becoming a Wiktionary editor was to rectify inconsistencies like that. But I would also like it to be based on consensus and not be forced on people who feel strongly the other way. Andrew Sheedy (talk) 18:16, 9 June 2024 (UTC)[reply]
Support. Imetsia (talk (more)) 22:22, 9 June 2024 (UTC)[reply]
@Imetsia Can you clarify? What do you support exactly, and for English or non-English? Benwing2 (talk) 22:35, 9 June 2024 (UTC)[reply]
I support automatically adding full stops in templates like {{synonym of}}, for both English and non-English. I've wanted this for Italian entries for quite a while by now. Imetsia (talk (more)) 22:49, 9 June 2024 (UTC)[reply]
Support, with the automation that nodotbe enabled for non-English by default. While at it, I also support nocap be enabled for non-English by default but I have not seen editors follow that convention as rigorously (I do, though, since I was told it is prescribed to do so). Svartava (talk) 07:11, 12 June 2024 (UTC)[reply]
Oppose. Full stops are a nuisance where they are included in the template, especially when adding text after it, and a comma is needed. DonnanZ (talk) 23:47, 12 June 2024 (UTC)[reply]

OK, from what I can tell, consensus is in favor of initial caps for English and no initial caps otherwise. There seems to be a general consensus in favor of final period for English and no final period otherwise (cf. around 10-6) so I'm going to proceed with this. To address the concern about the annoyance of turning off the initial caps and final period, there will be a single-character way of turning them off for English. I'm leaning towards using a ~ char (meaning "switch"), which if it comes before the language code will turn off initial caps and if it comes after the language code will turn off final period, hence:

{{alt form|en|foobar}} -> Alternative form of foobar.
{{alt form|en~|foobar}} -> Alternative form of foobar
{{alt form|~en|foobar}} -> alternative form of foobar.
{{alt form|~en~|foobar}} -> alternative form of foobar'

The alternative is to put one or both of the switch characters before or after the lemma rather than the language code. I'm planning on doing this gradually, starting with a single lesser-used form-of template. There are 142 form-of templates in Category:Form-of templates so this won't happen overnight. Benwing2 (talk) 00:15, 20 July 2024 (UTC)[reply]

@Benwing2: how's this coming along? Just wondering, as this came up (and see also "Talk:fireball"). — Sgconlaw (talk) 13:30, 13 August 2024 (UTC)[reply]
I will get to it at some point but I have a lot of things on my plate, as always ... Benwing2 (talk) 08:27, 15 August 2024 (UTC)[reply]
@Benwing2: indeed! Thanks. — Sgconlaw (talk) 10:28, 15 August 2024 (UTC)[reply]

Old Franconian

[edit]

Old Franconian is a language variety derived from Frankish, and has many languages within West Central German like Luxembourgish, Rhine Franconian, East Franconian, and Central Franconian. See this. That Northern Irish Historian (talk) 03:49, 7 June 2024 (UTC)[reply]

Collapsible lists within definitions

[edit]

I propose that for cases in which definitions include lists (especially long ones), it be adopted as best practice to make said lists collapsible with a template such as those existing for quotations and semantic relationships (or one based on a code I cobbled together to attempt this for the list of place names in Eden). I believe this would be worthwhile to help streamline some unwieldy pages, prioritizing definitions and relationships. @Soap @IoaxxerePangur Bán & I (talk) 21:41, 7 June 2024 (UTC)[reply]

I support this. Hopefully if we approve this we can base it on existing code like that of {{collapse}} or {{collapse-top}} (neither of which will work inside a list as of yet) so that it can be guaranteed to work on all browsers. Soap 21:52, 7 June 2024 (UTC)[reply]
@Soap, you said before that you would be willing to assist me in drafting this proposal. What are your thoughts on this in light of DCDuring's opposition? Pangur Bán & I (talk) 22:19, 17 June 2024 (UTC)[reply]
I only meant I could help start the post since you're a new user and I felt you might be too shy to come here outright. But you have a good understanding of the issue and how to express yourself, so right now I dont have anything else to add. Soap 17:44, 18 June 2024 (UTC)[reply]
I gotcha. Thanks anyway! Pangur Bán & I (talk) 17:50, 18 June 2024 (UTC)[reply]
Support although there aren't that many pages where collapsed definitions are worth using. Maybe Mandarin màn (consider someone searching for the Vietnamese entry)? Ioaxxere (talk) 22:48, 7 June 2024 (UTC)[reply]
Thank you for your support. To your point, as noted by DCDuring, this does seem to be primarily an issue with toponyms. See entries like Chester, Richmond, Franklin, and Weston for a few examples of pages bloated of pages where I think collapsible lists of subsenses would be worth using. Pangur Bán & I (talk) 15:54, 17 June 2024 (UTC)[reply]
Support but I don't want it to make a box around the sub definitions when you expand it, cuz I think that's kinda ugly. That is, if it's even possible to do that. — SAMEER (؂؄؏) 04:25, 8 June 2024 (UTC)[reply]
That is absolutely possible. I gave my makeshift collapsible list a border just to make it visually distinct, but in hindsight I think it would make more sense for something like this to more closely follow the style of the semantic relations and quote templates, just in a bulleted or numbered list format unlike those. Pangur Bán & I (talk) 16:20, 17 June 2024 (UTC)[reply]
Abstain Oppose Not a complete proposal. It's just based on the Eden anecdote. By my lights it would have to be restricted to definitions formatted as subsenses. As nobody seems to have analyzed the cases, perhaps we should wait to see how it would be applied to toponyms for now. DCDuring (talk) 15:12, 8 June 2024 (UTC)[reply]
@User:Geographyinitiative Any thoughts? DCDuring (talk) 20:58, 17 June 2024 (UTC)[reply]
I really have no opinion on the proposal. I will be fine with it if you do it. Please compare Washington County on Wiktionary with Washington County on Wikipedia and Category:Washington County on Wikimedia Commons. The solution sounds like an innovation beyond Wikipedia and Commons. I would want to find out if this have been discussed in Wikipedia, etc. Geographyinitiative (talk) 23:04, 17 June 2024 (UTC)[reply]
But, what's the benefit in the case of Washington County? There's is only one screenful of total content in the entry. DCDuring (talk) 16:50, 18 June 2024 (UTC)[reply]
Apologies for my perhaps poor phrasing. I would be absolutely fine with amending this proposition to be restricted to definitions formatted as subsenses and I would even support having a toponym-specific template. Though, I would still be in favor of having one for subsenses more generally as I think that would allow some editor freedom without any cost that I can see. Any thoughts? Pangur Bán & I (talk) 15:51, 17 June 2024 (UTC)[reply]
How many non-toponym entries would benefit from this? What criteria are to be applied, eg, number of subsenses, total number of definitions in PoS section, nature of supersense definition (Some are purely hypothetical for purpose of grouping. @User:-sche)? Others may have more questions and issues. I feel this might need a formal vote, not just a straw poll on this page. DCDuring (talk) 16:47, 17 June 2024 (UTC)[reply]
Your concerns about a general subsenses template are absolutely worth discussing, but before we move on to that, would you definitely support a toponym-specific collapsed-list template in the vein of the formatting of in-line collapsed quotations, and hypernyms, meronyms, etc. (but formatted as a bulleted or numbered list)?
Once the details are more hammered-out, a formal vote sounds like a great idea. My main trouble is that I don't have the coding knowhow to do a good job writing the template I'm envisioning. I don't know how I would go about producing a comprehensive count of how many entries would benefit, but block, cross, finger, head, stand, slash, and band are just a few non-toponyms I've found that I think could potentially use collapsible subsenses. As for requisite criteria for use, if you have any specific suggestions I'd genuinely love the help in fleshing out this proposal. The existence of two or more items seems to be the only hard criterion for quotations formatting and semantic relations templates, which seem fine models for something like this, but I'm happy to consider alternatives. Based on this poll, it would certainly seem that there is some interest in this functionality, and if it does reach the point of a formal vote, different options for potential criteria could easily be offered. Pangur Bán & I (talk) 18:51, 17 June 2024 (UTC)[reply]
If we don't begin to address the issues now, than it will not be possible to draft a meaningful proposal. At head we have two levels of subsenses. The first definition is "The part of the body of an animal or human which contains the brain, mouth and main sense organs.". Under this definition, the first subsense layer consists of two non-definitions: "(people) To do with heads." and "(animal) To do with heads." Would that first layer be visible or not under a yet-to-be specified proposal? DCDuring (talk) 20:58, 17 June 2024 (UTC)[reply]
I'm not set on anything and am entirely willing to continue workshopping this proposal. In pages like head, perhaps subsenses hosting a second layer of subsenses should not be collapsible under this prospective template. I see no problem with that if that's what you're suggesting. Pangur Bán & I (talk) 22:19, 17 June 2024 (UTC)[reply]
Neither of the two member of the first subsense layer at the first definition of head are real definitions. The first definition itself does not necessarily suggest the range of definitons at the second layer of subsenses. To me this is a specific sign that hiding subsenses can make it harder for less experienced user to find less common definitions. DCDuring (talk) 16:47, 18 June 2024 (UTC)[reply]
Support. Imetsia (talk (more)) 22:23, 9 June 2024 (UTC)[reply]
I am ambivalent about the idea of doing this to placenames, or long lists of Chinese "romanization of"s as also suggested above; I would not support collapsing 'real' definitions e.g. at take, even if there are very many. It seems like the number of placename entries which really have so many senses as to merit collapsing is small, and it seems like the sort of person who'd go to màn#Mandarin is someone interested in learning what it's a romanization of: why else wouldn't they go to or click through in the TOC to màn#Vietnamese? so collapsing just adds an extra step for them. In general it does not seem like that much of a hassle to scroll past placenames one is uninterested in. Whereas, collapsed content is easily missed, even by veteran editors who know to look for it (I myself often missed the existence of various inline -nyms under definitions back when they were autocollapsed, and have seen other veteran users miss collapsed etymology content), let alone new users. So I am ambivalent, leaning against it. - -sche (discuss) 16:15, 18 June 2024 (UTC)[reply]
Thank you for your consideration and your well articulated concerns. I have no opinion on the "romanization of" example as that's not something with which I have any experience myself, and in hindsight I do think my original proposal here is likely too broad. I don't really want every list of subsenses to be collapsed, but rather for this to be available as a tool in situations where it may be truly helpful, its usage being determined via consensus for edge cases.
For 'real' definitions, I would agree that genuinely distinct subsenses such as those in take probably shouldn't be collapsed, at least not by default (I think making them open by default but with the ability to collapse them could still be useful). I really had in mind entries wherein the "subsenses" are really just examples, which can be seen in some of the examples I cited (block, head#Noun sense 2, etc.).
My issue is less that it is inconvenient to scroll past them per se, but rather that lists of examples subordinate to the most common senses are effectively privileged over secondary senses that can be more prevalent/noteworthy than items in those lists. This, I think, is not conducive to efficiently absorbing the information, and rather counter to the purpose of ordering senses in the first place. Pangur Bán & I (talk) 17:30, 18 June 2024 (UTC)[reply]
We only have opinions, not facts, about the relative frequency of use of different definitions, the relative frequency of requests for different definitions, even of the time-period of use of definitions. I have trouble justifying the privileging of some contributor(s) opinions about what is to be listed first and what de-privileged by being rendered into subsenses. I also have trouble understanding why we discuss this in terms of the rights and privileges of definitions. Our concern is merely with users and their ability to navigate an essentially linear presentation of data, in which some data necessarily precedes other data. I'm afraid that tradeoffs are inevitable and that we have little reasoned basis to make them in general. DCDuring (talk) 20:47, 18 June 2024 (UTC)[reply]
I suppose 'privilege' was ill-chosen here. I meant 'prioritize', in the sense of placing one thing before another in sequence. As for privileging contributors' opinions about what is listed first, every Wiktionary entry that includes multiple senses already does that, in accordance with WT:SG#Definition sequence, with the frequency-based order determined via consensus, exactly as I'm proposing the usage of this template be. My concern is also with users and their ability to navigate the information, which is precisely why I am proposing this. Pangur Bán & I (talk) 21:41, 18 June 2024 (UTC)[reply]

I invite you to check out this new glossary format. Using JavaScript (User:Ioaxxere/auto-glossary.js), it automatically scrapes every entry in a certain category and finds definitions containing a certain label. To see the output, you will have to add the line importScript("User:Ioaxxere/auto-glossary.js"); into your common.js page. Here's what the output looks like: https://imgur.com/a/kKQLGSG.

I propose that we create more of these automatically-generated glossaries in Appendix space, as I think that they are very useful for keeping track of a certain category. Ioaxxere (talk) 05:45, 9 June 2024 (UTC)[reply]

Yeah, they should be efficient search engine spam, people land on when searching slang words. Fay Freak (talk) 08:20, 9 June 2024 (UTC)[reply]
I wonder if search engines will manage to index them, given that they are dynamically loaded. In general I'd think the outcome Ioaxxere is aiming for is better accomplished by frequently updating the actual wikitext of the page using a bot or script. This, that and the other (talk) 00:18, 24 June 2024 (UTC)[reply]
@This, that and the other: I'm not sure if this is possible given the relatively strict pagesize limits, but it's definitely worth trying. Who would be available to run the bot? (note that we may end up with hundreds of these glossaries) Ioaxxere (talk) 05:07, 24 June 2024 (UTC)[reply]
@Ioaxxere Good point about page size limits. It just doesn't "feel" like the right thing to do using client-side JS... but maybe that's just the old-school web developer in me talking. Is there at least some pagination or limiting in the event the number of glossary entries exceeds a certain number?
Having said that, the concept of auto-generated glossaries, however implemented, is undeniably good for Wiktionary, so if no-one else objects (or offers to implement it) let me know and I'll put your JS into place as you request, and we can review it later if a better solution appears. This, that and the other (talk) 12:15, 24 June 2024 (UTC)[reply]
@This, that and the other Before you do that, I should mention: since the template makes up virtually all of the content on the pages on which it is used, it would make sense for the script to run before everything else — maybe even at the very top of MediaWiki:Common.js. At the very least we should optimize away User:Ioaxxere/auto-glossary.js#L-143. Ioaxxere (talk) 04:51, 27 June 2024 (UTC)[reply]
@This, that and the other: It looks like no one has offered any comments or suggestions in the past two weeks. Ioaxxere (talk) 21:24, 8 July 2024 (UTC)[reply]
Neat. Vininn126 (talk) 18:38, 9 June 2024 (UTC)[reply]
Should this gadget be enabled by default? Notifying a few active interface administrators: @-sche, Benwing2, Surjection, This, that and the other. Ioaxxere (talk) 08:08, 23 June 2024 (UTC)[reply]

standardizing the form of phrase lemmas

[edit]

This is based on a discussion in WT:RFM originally concerning tail wagging the dog, which someone proposed moving to the tail wags the dog. User:Theknightwho asked about general conventions, and I suggested the following:

  1. try to avoid "one" or "someone" in a lemma unless it's unavoidable, e.g. it's in the possessive; so kiss goodbye not kiss one goodbye or kiss someone goodbye;
  2. if "one" or "someone" needs to be expressed, use "one" if it is the same as the subject, "someone" otherwise; hence kiss one's ass goodbye is correct, not kiss someone's ass goodbye; take someone's word for it is correct, not take one's word for it (which is correctly a redirect); but someone's ass off should be one's ass off (the latter is incorrectly a redirect to the former);
  3. use the infinitive for verbs occurring at the beginning of an expression (in a verb-object phrase), but the simple present for verbs occurring with a subject (hence the tail wags the dog not the tail wagging the dog; time stands still not time stood still, time standing still, time stand still, etc.
  4. there should be something about whether to include the word "the", e.g. in tail wags the dog or the tail wags the dog.

User:DCDuring asked:

Those seem like good rules to me. There is an interaction with what I think is our preference not to have headwords with leading the. Also, to clarify, when you say infinitive you mean the 'bare infinitive', not the 'to infinitive'. When should something be used instead of someone? (Does it depend on the relative frequency of use of the expression with non-gendered things? Threshhold?) Are there circumstance in which we would go with a different lemma headword? Should we have alt form entries for some of the inflected and other variant forms or just hard redirects. I don't know how complete we should try to be. To much detail might delay implementation and course correction. DCDuring (talk) 01:53, 7 June 2024 (UTC)[reply]

To which I replied:

These are good questions. You are right that I mean "bare infinitive" rather than "to-infinitive". As for something vs. someone, I think if it can reasonably occur with both, one should be a soft redirect to the other. Generally I prefer soft redirects over hard redirects, although I understand that hard redirects are easier to enter. Another issue is, what's the inanimate equivalent of one's? Is it its? I will bring these rules to the BP and see what people say. Benwing2 (talk) 03:01, 7 June 2024 (UTC)[reply]

The suggestion is to put these in the WT:Style guide rather than WT:Entry layout (which requires a vote to make any substantive changes). Does anyone have any thoughts or additional suggestions for standardization rules? Benwing2 (talk) 09:06, 9 June 2024 (UTC)[reply]

Definitely agree on point 2, which is WT:CFI#Pronouns already. Re point 4, Wiktionary:Tea room/2023/December#Proverb_entries_starting_with_"the" suggested more people want to include the in proverbs than don't (obviously only for phrases that can include the; nobody is moving →*the Rome wasn't built in a day), hopefully a wider discussion finds a wider consensus. (Maybe we can even determine whether to standardize the situation with short the X phrases / nouns: we have the bomb, but (after some TR discussion) the talk is a redirect to talk; the Netherlands redirects to Netherlands, but the Rock is an entry, and I don't think anyone would dream of moving The Hague.) I advocate redirects from whichever form we don't lemmatize to whichever we do. Point 3 seems reasonable; there too I advocate redirects from other common forms (e.g. the tail is wagging the dog). If we remove the object from the entry title (point 1), I hope we strongly encourage people to add usexes or citations showing where in the phrase the object goes, because sometimes it's [verb] [other word] someone and sometimes it's [verb] someone [other word] and sometimes it's other possibilities. - -sche (discuss) 17:29, 9 June 2024 (UTC)[reply]
@-sche Thanks, and I completely agree with your idea of strongly encouraging the inclusion of usexes showing where the object goes. Sometimes even a single expression can go both ways; my canonical example for this is see through. For this example, we do include usexes for each sense, along with a usage note indicating that some senses take the object before through, some after. Maybe there is a way to standardize this? Benwing2 (talk) 18:40, 9 June 2024 (UTC)[reply]
I don't have a strong opinion on how we lemmatise (though I see the merits of the cut down) but I agree that pronouns (and other arguments) are extremely important (and in the case of phrasal/particle verbs, also their relative positions), especially for learners, and would support a policy which requires mandatory marking of the arguments which a word/phrase takes, at least in the entry (via usex or similar), if not also in the headword. By way of illustration, compare the variants of the lemma turn on:
  • turn something on (activate, start"; also possible in the order: "turn on something) (as far as I can tell, this is the only construction from these examples which can display ergativity, and thus can occur in the bare form, apparently without an object: "the coffee machine turned on [by itself] in the middle of the night"),
  • turn someone on (excite, esp. sexually),
  • turn on something (revolve around, centre on"; also: "activate, start),
  • turn on someone (unexpectedly attack or betray"; but IMO this order is not possible rather awkward in the meaning "excite sexually).
In practise, this information may not be as obvious/readily accessible to editors as we might hope, since although I've probably used all the above examples before, the third and fourth examples only occurred to me after consulting a dictionary.
Edit: it occurs to me that the preferred order may also vary depending on whether the object is a pronoun or a noun.
Helrasincke (talk) 07:31, 19 June 2024 (UTC)[reply]
Support rules 1–3. As for using "the", I think it should be avoided unless the entry title is a fixed phrase, like a proverb, or would sound extremely unnatural without it (admittedly subjective). Ioaxxere (talk) 08:31, 23 June 2024 (UTC)[reply]
Generally, I am also in Support of the first three rules. I agree with Helrasincke in that we are mostly lacking guidance about the correct (or most common) usage of pronouns for different senses of English phrasal verbs, which should probably also be standardized in the future. However, agreement on page titles is a good first step towards normalization. Einstein2 (talk) 11:54, 5 July 2024 (UTC)[reply]

The right to bear ewes

[edit]

A usual way of qualifying the restricted applicability of a verb sense is to have a label saying, of a .... For example, for the verb proceed:

6. (intransitive, of a rule) To be applicable or effective; to be valid.

Since the verb is intransitive, this can only refer to the subject of the verb. For transitive verbs, there is an ambiguity: does the restriction apply to the subject or the object of the verb?

Here is an example. At bear, Etymology 2, we see both

1.2. (transitive, of garments, pieces of jewellery, etc.) To wear.

and

1.3. (transitive, rarely intransitive, of a woman or female animal) To carry (offspring in the womb), to be pregnant (with).

Common sense tells us that the first sense does not mean to refer to diamonds wearing a smile and the second sense not to being pregnant with a ewe. But common sense may not be good enough in cases where both interpretations make sense.

Is there a way to disambiguate this that does not depend on common sense?  --Lambiam 16:31, 9 June 2024 (UTC)[reply]

Granting that this doesn't help someone who is unfamiliar with our subtle norms (and doesn't help if the norms aren't followed): in theory I think the nature of the restriction is supposed to be clarified by the form and placement of the restriction: "of..." labels precede the definition and restrict the subject, whereas restrictions on the object are supposed(?) to occur within the definition itself, not as a label, and not normally with "of" (although clearly this is not always followed, and maybe my sense of this is wrong!). Hence "To carry (offspring)" uses "(offspring)" to indicate that the thing in the womb is normally restricted to being offspring, and that if a surgeon left a surgical implement in a woman's womb after surgery she wouldn't normally be described as bearing it in this sense. (However, there was a discussion recently where a set of "of..." labels were moved—because they had been using {{a}} or {{q}} or manual formatting—from being in front of the definition, to being qualifiers after the definition, which made things [even] less standardized/predictable in this respect.) In theory we could make this explicit by saying things like "(SUBJECT is a pregnant person)", "+ OBJECT (offspring)" or something modelled on however we express objects being in the accusative-vs-dative (etc) already. - -sche (discuss) 17:45, 9 June 2024 (UTC)[reply]
Agree with User:-sche here about using of (before or after the definition) to indicate subject restrictions, and parens after the definition without of to indicate object restrictions. Preposition restrictions should use {{+preo}}. Other sorts of predicate restrictions should use {{+obj}}. (I have a sandbox version of {{+obj}} that reworks it to support prepositions and such much better than {{+preo}} currently does; you can see examples at User:Benwing2/test-obj. At some point I will finish this and deploy it.) Benwing2 (talk) 18:45, 9 June 2024 (UTC)[reply]
If I understand this correctly, a more appropiate way to express sense 1.2 of bear above is
1.2. (transitive) To wear (garments, pieces of jewellery, etc.).
It would be nice if this was documented in some form of guidance to creating good definitions.
But note that there is a slight problem in applying this to sense 1.1. We get
1.1 (transitive) To carry (weapons, flags or symbols of rank, office, etc.) upon one's person, especially visibly; to be equipped with (weapons, flags or symbols of rank, office, etc.).
(although I can't immediately think of a use covered by the second part not already covered by the first part).  --Lambiam 19:09, 9 June 2024 (UTC)[reply]

Batch editing Wiktionary with AWB

[edit]

As discussed in February, there are cases where for both US and UK Englishes, the voiced alveolar approximant /ɹ/ is transcribed as the trill /r/. Our team at CUNY (myself and @Yaejunmyung) would like to use the AWB tool to a batch editing, mapping all instances of the trilled /r/ to /ɹ/ for both US and UK Englishes. Please let us know if you see any issues with this batch editing. If it sounds okay to you, could you please add me and @Yaejunmyung to the enabled user list? Thank you! Cpeng2 (talk) 19:31, 9 June 2024 (UTC)[reply]

This seems reasonable. Indeed, it's possible that the replacement could be fully automated (for specific accents where it's known that trilled /r/ is not phonemic and thus that it can be replaced systematically). I will wait to see if any bot-maintainer wants to run it as a bot task, or if anyone has objections; if not, I can add you to the AWB list after ~a week, or someone else can feel free to do that sooner. (For other people, let me provide a link to the February discussion; this seems like a more limited and safer proposed change than the changes to parenthetical (ɹ).) - -sche (discuss) 20:00, 9 June 2024 (UTC)[reply]
OK, based on Surjection's comment it looks like it would be better for this standardization to be done by someone more familiar with Wiktionary, so we can be sure it's done correctly. (I do think that now that accents have been incorporated into T:IPA, it would be possible for a bot operated by one of en.Wiktionary's competent bot operators to do this if they are reading this and have time; indeed, it might even be possible for the T:IPA template to know that if the input is /r/ + an accent that doesn't have trilled /r/, it should simply correct the displayed output to /ɹ/ and/or add a cleanup category, the last of which is possibly the safest option.) - -sche (discuss) 17:06, 12 June 2024 (UTC)[reply]
Yes, we'd appreciate it if it could be done by a bot operated by one of en.Wiktionary's competent bot operators. Is there's a way that we can reach out to them to coordinate this? Cpeng2 (talk) 17:33, 25 June 2024 (UTC)[reply]
Pinging @JeffDoozan, Surjection, Erutuon, Benwing2, as operators of bots: How feasible do you think it would be to find and replace instances of /r/ (with →/ɹ/) in dialects of English that don't have trilled /r/? Or does it seem worthwhile (balancing how much effort it'd take to do vs benefit) to simply have {{IPA}}, whenever the input is /r/ but the accent is tagged as GA, RP, etc, simply output/display /ɹ/? Or do you think it's better to wait for a pronunciation module? - -sche (discuss) 21:29, 25 June 2024 (UTC)[reply]
@-sche This should be possible. Erutuon already set up a tracking category here Special:WhatLinksHere/Wiktionary:Tracking/IPA/en/plain r for tracking uses of r in English pronunciations. There are 1,183 pages linked so this would have to be done by a bot. I don't know which accents legitimately allow /r/ but if you have an idea I can generate a list of all template invocations using r in English and scan through them manually to see if any are tagged with the relevant accents. Benwing2 (talk) 21:49, 25 June 2024 (UTC)[reply]
/r/ exists in some Scottish dialects and, if Wikipedia is to be believed, some Welsh, South African(?),and Indian dialects. One idea would be, as a first step, to change any /r/ which was either tagged as GA/US/UK/RP or presented as pan-dialectal, and then review what's left. - -sche (discuss) 22:15, 25 June 2024 (UTC)[reply]
@-sche I generated a list of all the English pronunciations with plain /r/ but I don't think they are amenable to simply replacing /r/ -> /ɹ/ by bot because many or most of them have all sorts of other issues in them; the /r/ is often the canary in the coal mine indicating that the creator didn't really know what they were doing. The list of just the instances of {{IPA|en}} with /r/ in them is here: User:Benwing2/IPA-en-plain-r; but what might be more useful is the list of all the English pronunciations on each page with /r/ in any English pronunciation, which is here: User:Benwing2/IPA-en-plain-r-all. The latter is only about 25% larger than the former. In both cases I took out page 679, Appendix:Protologisms/Long words/Titin, and put it here: User:Benwing2/IPA-en-plain-r-titin because it alone blows up the total list size by about 4x bytes. If you want to go about cleaning up these entries, you can do it directly in any of the userspace lists I linked, and I will then run a bot script to push all the changes to the respective pages, as long as you follow these rules:
  1. Don't change the <begin> or <end> markers.
  2. If you (or someone else) edits any of the listed pages directly, that's fine; I have kept a copy of the unchanged versions of the above lists, and the bot script compares the unchanged version against what's currently present and won't make any changes if there's a mismatch.
  3. If you want to delete a line (e.g. because it's correct or because you made a change to the page directly), that's fine as well, but in that case it's best to delete all the lines associated with a given page.
Benwing2 (talk) 05:36, 26 June 2024 (UTC)[reply]
Thanks! Some of those (e.g. hour) are probably fine; others (world) look wrong. I think the ones that give /r/ as Australian are also wrong. I will try to edit the list (more) later. - -sche (discuss) 17:14, 26 June 2024 (UTC)[reply]
Oppose granting AWB. I have had to block Yaejunmyung twice for bot-like edits so careless that they did not even check which language they were editing (exhibit A, exhibit B, exhibit C, exhibit D). Some other edits are also inexplicable. This level of editing is simply not acceptable, and if this is what we can expect, we absolutely should not be making it any easier. — SURJECTION / T / C / L / 09:17, 11 June 2024 (UTC)[reply]

The final text of the Wikimedia Movement Charter is now on Meta

[edit]
You can find this message translated into additional languages on Meta-wiki. Please help translate to your language

Hi everyone,

The final text of the Wikimedia Movement Charter is now up on Meta in more than 20 languages for your reading.

What is the Wikimedia Movement Charter?

The Wikimedia Movement Charter is a proposed document to define roles and responsibilities for all the members and entities of the Wikimedia movement, including the creation of a new body – the Global Council – for movement governance.

Join the Wikimedia Movement Charter “Launch Party”

Join the “Launch Party” on June 20, 2024 at 14.00-15.00 UTC (your local time). During this call, we will celebrate the release of the final Charter and present the content of the Charter. Join and learn about the Charter before casting your vote.

Movement Charter ratification vote

Voting will commence on SecurePoll on June 25, 2024 at 00:01 UTC and will conclude on July 9, 2024 at 23:59 UTC. You can read more about the voting process, eligibility criteria, and other details on Meta.

If you have any questions, please leave a comment on the Meta talk page or email the MCDC at mcdc@wikimedia.org.

On behalf of the MCDC,

RamzyM (WMF) 08:45, 11 June 2024 (UTC)[reply]

Proposal for a Turkish conjugation module

[edit]

(Notifying İtidal, Fytcha, Vox Sciurorum, Lambiam, Whitekiko, Ardahan Karabağ, Orexan, Moonpulsar, Lagrium):

I've noticed that the current conjugation tables for Turkish verbs are incomplete, sometimes wrong (korkmak has korkmış as its inferential past 3rd person singular form, according to the table) and different from one another, albeit for minor things (etmek and gitmek seem to be, together with their derivates, the only verbs that show the polite imperative forms in their table). These reasons, together with the fact that as of now there are way too many templates (Template:tr-conj, Template:tr-conj-v, Template:tr-demek-yemek, Template:tr-conj-*tmek) that require way too many parameters (tr-conj requires the verb's stem, the last vowel in the verb's stem, the stem with the aorist suffix, the last vowel when the aorist suffix is attached and a t/d to know which consonant to use in the suffix -dI) to conjugate Turkish verbs, have made me decide to work on a module that could summarize every possible Turkish verb's conjugation, adding more forms, requiring parameters only if strictly necessary (i.e. if the verb's aorist suffix is unpredictable of if it ends in a t which turns into d before vowels) and making the default table smaller too by setting some forms as collapsible, and I'd like to propose that we switch to this module (here are some sample verbs to display the table)

Trimpulot (talk) 12:18, 12 June 2024 (UTC)[reply]

The module is very impressive. I would totally support switching to the module version. Lagrium (talk) 12:49, 12 June 2024 (UTC)[reply]
It looks like a huge improvement.  --Lambiam 15:06, 12 June 2024 (UTC)[reply]
In many ways this looks like a huge improvement over what we have now. Before we ship it could you squeeze in some more info? Like the formal imperative forms, maybe? And you added the verbal noun but the -iş form is not there. Maybe these two should be listed on the same row to save space horizontally. Rn -me form is there in its own mansion of a box. Same things with adverbial forms. You listed 2 but many are missing. Like -ince, -ip, -e -e, -dikçe, -eli, -esiye and maybe a few more if I'm forgetting any. Whitekiko (talk) 15:58, 12 June 2024 (UTC)[reply]
@Whitekiko: The formal imperative forms (as well as -sene and -senize labeled as informal imperatives since I didn't know how else to name them for the time being), -ince, -ip and -e -e are already on the table but aren't shown as a default, mostly because I tought it would overcrowd the table. As for the other forms you mentioned I did miss some of them but I'm not sure adding -iş is really necessary since as far as i can tell it's more of a derivational suffix more akin to -im or -i, whereas -me has actual grammatical functions.
Trimpulot (talk) 16:52, 12 June 2024 (UTC)[reply]
There is enough space for adding the -ince forms:
    temporal adverbs           açınca, açarken          
However, speaking in general, tables for Turkish forms will never be complete. For example, the verbal nouns are declined like all nouns, including case forms of possessive forms. Under ekmek we give the form ekmeğime, so shouldn’t we also, for the sake of completeness, give the form ememememe (as used in ememememe bakmayın! – “don’t mind my inability to suck!”) under the impotential verbal noun emememe? What about the passive, causative and reciprocal forms? And the causatives of reciprocals, like uyuşturmak, or the causatives of causatives, like öldürtmek? The just-do-it suffix -(y)iver? Maybe, one day, we’ll have a module for analyzing Turkish forms, but attempts to be complete in tables are doomed to fail.  --Lambiam 20:03, 12 June 2024 (UTC)[reply]
@Lambiam: Of course we can't include the entire noun-like declension of the verbal noun nor do we need to as it is implicit in the fact that it is a verbal noun. As for the adverbial forms though, -ince is actually included, but it only appears after toggling the "Show complex tenses" switch for no reason in particular other than if all the hidden adverb and participle forms were visible by default they would overcrowd the table in my opinion, as they would outnumber the finite TAMs. Also I don't think listing them all on the same line would work because that way they wouldn't get any description of their usage or function at all, however small it may be: if -esiye and -eli where in the same box separated by a mere comma how is one supposed to understand that they are pretty much polar opposites in meaning?
Trimpulot (talk) 20:20, 12 June 2024 (UTC)[reply]
Could you also add -er -mez ("as soon as") as a temporal adverb? I forgot to mention that. As with -iş... Our current template has it and I think that's for a reason. -im comes only after a finite number of verbs to derive nouns and these nouns always appear on dictionaries. On the other hand every verb has an -iş form and it always means "the way someone does x". It'll help users that are beginners in Turkish find the infinitive of the verb. -iş has a weird status. There was a debate around it, idk how it ended. We weren't sure what to call it, if it should be a lemma or a non lemma, if the pages should be created. Whitekiko (talk) 08:26, 14 June 2024 (UTC)[reply]
@Whitekiko: I don't think that -iş always having the same meaning and being able to be applied to any verb is enough of a reason to include it in the conjugation table, since that argument could also be made for -ici and similar suffixes in other languages as well, like -tio and -tor in Latin, but those are left out. Of course the line between what counts as conjugation and what doesn't isn't precise but it has to be drawn somewhere and I think that semantically heavier suffixes with little grammatical or syntactical meaning should be left out.
As for -er -mez, I would like to add it but I still don't understand if it works with polarities other than positive, and if so how? If you can help me figure it out I'll see it added.
Trimpulot (talk) 11:38, 14 June 2024 (UTC)[reply]
It's just that the first part takes the aorist and the 2nd part takes negative aorist. I've added the def and an example under -er some time ago, rather then creating -er -mez and such. Not sure which one's the right thing to do. Putting these 2 suffixes together will create 6? combinations because first part can go through vowel changes.
Maybe we're of different opinions but I'd like to see -iş and -ici forms too somewhere on the table. I don't think adverbial forms are considered conjugations either but I loved to see them. I don't know the technicalities behind this but it would be revolutionary if we could add "ghost texts" to the templates. Yalayış and Yalayıcı, for example should pull yalamak as a result. In case users run into it in the wild, and they surely will. Whitekiko (talk) 12:46, 14 June 2024 (UTC)[reply]
Proposed module looks great. I had noticed the irregular behaviour with certain above mentioned verbs, but unfortunately I'm module illiterate. And I've always thought the current template gives terrifyingly too much info to an absolute beginner checking one of the simplest conjugations, so the drop-down menus are smart. The details can be discussed and smoothed out, but I definitely support this improvement.
By the way, there are a few more active native editors of Turkish, who might have something of their own to say about this; @Hswehli, Blueskies006, Kakaeater, Science boy 30. Orexan (talk) 20:39, 12 June 2024 (UTC)[reply]
@Trimpulot: That looks really good! I have a few suggestions:
  • Make the colours a bit more muted (see {{es-conj}} {{la-conj}} for good examples).
  • Make sure each link has #Turkish.
  • The "show complex tenses" button should be inside the table itself to make it more clear that it expands the table rather than showing a new table.
  • I feel like the infinitive, being the lemma form, should be at the top.
  • You may want to use to indicate an "impossible" conjugation (although leaving the cell empty works as well).
  • You may want to have a slightly different colour scheme for each table.
  • You could add a disclaimer explaining which forms aren't included.
Ioaxxere (talk) 02:35, 13 June 2024 (UTC)[reply]
@İtidal, Fytcha, Vox Sciurorum, Lambiam, Whitekiko, Ardahan Karabağ, Orexan, Moonpulsar, Lagrium, Hswehli, Blueskies006, Kakaeater, Science boy 30.
I have updated the module with some minor changes (fixed the links, moved the "Show complex tenses" button inside the table and added some missing adverbial forms) however I would like like to ask if you think it makes sense to have those "complex tenses" hidden at all. At first it was meant to make the table more readable by hiding all of the forms that employ more than one suffix but I noticed that even without hiding them it is still relatively small and readable. Also let me know if there is a better way to label the forms in -eli, -esiye and -dikçe since I really don't like using a translation as a label but I also don't know how else to call them.
Trimpulot (talk) 13:19, 13 June 2024 (UTC)[reply]
My personal opinion of the complex tenses drop down menu is not only should it stay but it should also have a high-vis warning that says something like "Attention! May cause shock, anxiety, dizziness, despair in beginner learners. Abandon all hope, ye who click here!" Maybe that's a little much, but it should definitely stay.
I would like to float the idea of adding "-cesine" meaning something like "as if ...", forming adverbials. It is productive with a myriad of suffix combinations, see here ("-ercesine", "-mişçesine" "-yorcasına" "-ecekçesine" etc.) as well as "noun + -cesine" though unrelated. It's a common enough usage to encounter. I guess you would only display the "Simple" aorist form like you did with "-ken" (which is also productive as "-erken", "-yorken", "-mişken", "-ecekken" etc.).
Also, as a native and someone who's got above average grasp on English but isn't a grammar expert, the labels don't mean anything to me, even some of the conjugated forms don't mean anything unless I see it in an example sentence. I literally had to google "-esiye" to see what the hell it was used for. I assume it would be similar with other natives or learners, so I think coming up with labels is kind of an exercise in futility. I get that each form will point to a page of their own, where a text like temporal adverb "until" inflection of açmak would look strange. I'm not saying the module should include example sentences for each and every usage, but translations actually make it easier to have some idea about how or when something is used. At the end of the day, nothing short of turning the table into a full scale grammar book will go very far in the way of helping someone understand the contexts in which these conjugations are used, at least beyond the most basic ones like "açarım, açıyorum, açtım" etc. — Orexan (talk) 14:54, 13 June 2024 (UTC)[reply]
@Orexan: That makes sense. As for the various forms suffixed with -ken and -cesine, I've been thinking about putting them all on thesame line like so:
    temporal adverb        simple           açarken, açıyorken, açmışken, açacakken          
    modal adverb        "as if"           açarcasına, açıyorcasına, açmışçasına, açacakçasına          
Or alternatively we could just display the aorist form as you said and add a note of some kind to explain that those suffixes are actually way more productive.
Also for the drop down menu, do you think it should stay even for those participle and adverb forms, and the formal and "informal" imperative forms, even though they don't employ more than one suffix?
Trimpulot (talk) 15:32, 13 June 2024 (UTC)[reply]
I'm not sure if the line is long enough to contain all combinations of some suffixes, especially with verbs with longer roots than two letters like "aç-". For reference, the paper I linked lists "açarcasına, açıyorcasına, açmışçasına, açacakçasına, açarmışçasına, açmazcasına, açmazmışcasına, açıyormuşçasına, açacakmışçasına, açmacasına, açmamacasına" but I highly doubt any mortal could possibly identify all possible combinations. Displaying the aorist form only, with a note indicating their productivity, and maybe a link to that suffix's lemma page, which hopefully one day comes to be and shows at least a good portion of these combinations and the meanings they convey and, if one can be so bold to ask, one or two example sentences while one's at it, would maintain the table's structural integrity and still be helpful, even if the suffix pages that don't exist yet aren't made in the near future. A suffix page of this comprehensivity is a ton of work, though. I tried to put something together for "-sa" a while back, which is in dire need of an update and some cleanup. That was painful.
The participle and adverbials could be outside the drop down, yeah. But the alternative forms of the imperative are good within, in my opinion. — Orexan (talk) 16:20, 13 June 2024 (UTC)[reply]
@Orexan: I see and I agree with you. I have updated the module as well.
Trimpulot (talk) 18:54, 13 June 2024 (UTC)[reply]
First thoughts, good. It's very big, though, like Swahili. There are some parts that are grammatically correct that might be omitted to save space. I suggest putting some boolean constants near the top of the module to control behaviors we might change our minds about.
  • the -abilmek forms are not necessary (and the -ivermek forms should not be added)
  • omit passive imperative (probably requires a template argument), potential imperative, and maybe even formal and informal imperative
  • what about the -iş verbal noun form?
  • I suggest packing the impersonal particlple and gerund/adverb forms into as few lines as possible even if that means omitting the less common forms or losing some of the labels ("impersonal participles | açan, açmış, açacak")
A minor coding style issue, the initializers for local variables lv and hv should have line breaks in the same places. Vox Sciurorum (talk) 16:49, 14 June 2024 (UTC)[reply]
  • I agree that -ivermek shouldn't be added, but omitting -ebilmek while -ememek is included is just asymmetrical
  • I see why you say to omit the potential imperative (as well as the impotential, I assume), but why the others as well, especially the informal and even more so the formal imperatives?
  • as I said before, I think -iş is past the boundary of what counts as conjugation and what doesn't
  • cramming all of the participle or adverb forms on the same line without any hint as to what distinguishes them from one another wouldn't really be helpful in my opinion
As for the line breaks, I just put all the vowel inputs that return the same vowel on the same line, that's the only reason for it being the way it is.
Trimpulot (talk) Trimpulot (talk) 17:34, 14 June 2024 (UTC)[reply]
@Vox Sciurorum: I have an idea on the last point: how about placing all impersonal participles or adverbs on the same line by default but separating them when the table is expanded? You can see the table like that here as the conjugation for açmak
Trimpulot (talk) 09:09, 15 June 2024 (UTC)[reply]
I guess this makes sense, as per an earlier comment of mine, the labels and even the conjugated forms don't mean much in a vacuum like this, so this setup makes sense at least from a design point of view. Orexan (talk) 15:10, 15 June 2024 (UTC)[reply]

Names of people

[edit]

[Thread moved from Tea Room]

Van Gogh

  1. Vincent van Gogh, Dutch draughtsman and painter.

Monet

  1. Claude Monet, French painter.

Picasso

  1. Pablo Picasso (1881–1973), Spanish painter, best known as a founder of the Cubist movement.

Can anyone clarify upon what basis we have these entries and others similar? Mihia (talk) 19:18, 11 June 2024 (UTC)[reply]

There are others, like Einstein. This issue has come up before. Personally, I do not think they comply with WT:CFI, and thus should not be in the dictionary. I think it is worth having a formal vote to clarify the wording in the CFI. — Sgconlaw (talk) 20:00, 11 June 2024 (UTC)[reply]
I agree. CFI says about names "No individual person should be listed as a sense in any entry whose page title includes both a given name or diminutive and a family name or patronymic. For instance, Walter Elias Disney, the film producer and voice of Mickey Mouse, is not allowed a definition line at Walt Disney."
However, it says nothing about entries for individual persons under their family name only (or given name only, for that matter). This seems to be an omission, perhaps because there is no agreement.
(By the way, I also think that the "Walter Elias Disney" example introduces an unnecessary complication/distraction, being different from "Walt Disney". I think it would be clearer to use an example such as, let's say, Pete Tong, which does not have this complication.) Mihia (talk) 20:15, 11 June 2024 (UTC)[reply]
@Mihia: I don't have strong feelings about the "Walt Disney" example, but have no objection if the example is changed as you suggest. — Sgconlaw (talk) 22:41, 11 June 2024 (UTC)[reply]
I think it may also be worth taking this opportunity to clarify the following, which have also come up before:
  • Whether terms which are a combination of an honorific or title and a name are permitted, e.g., King Charles (meaning Charles III) and Queen Mum (meaning Queen Elizabeth The Queen Mother). I'm generally of the view that we shouldn't allow such terms, because we may then get entries like King Louis (Louis I, Louis II, Louis III, etc.) and Pope Leo (meaning Leo I, Leo II, Leo III, etc.). The relationship between such terms and nicknames (e.g., Brangelina), which I believe are generally thought acceptable, needs to be considered. Perhaps the rule should be that a term which is a combination of an honorific title and a name are generally not permitted unless it is a widely used nickname.
  • Whether senses which mean "a work by a person with the surname X" are allowed, e.g., Picasso (meaning "an artwork by Picasso") and Roy (meaning "a book by Arundhati Roy"). Again, I am not in favour of such senses because any surname can be used in this way.
Sgconlaw (talk) 22:33, 11 June 2024 (UTC)[reply]
Yes, based on the consensus in the spate of RFDs now at Talk:Michelangelo, uses of NAME to mean "a work by NAME" should not be included; if we can formalize this somewhere, all the better. Regarding the Walt Disney example, I would only add Pete Tong, but not remove Walt Disney: having Walt Disney as an example is useful for showing that you can't defend having someone's name just because you didn't enter their full name. - -sche (discuss) 00:49, 12 June 2024 (UTC)[reply]
Right, I see what you mean. Mihia (talk) 09:25, 12 June 2024 (UTC)[reply]
@Mihia, -sche: actually I realized that Pete Tong might be better as an example since we don't have Walt Disney as an entry at all. — Sgconlaw (talk) 12:50, 13 June 2024 (UTC)[reply]
I do see -sche's point, though, that the Disney example does illustrate how the policy applies even if the article title is not the exact full name. Mihia (talk) 14:36, 13 June 2024 (UTC)[reply]

Draft proposal

[edit]

For discussion purposes, I've taken the liberty of drafting a proposed amendment to be inserted under "Wiktionary:Criteria for inclusion#Names of specific entities". — Sgconlaw (talk) 12:34, 13 June 2024 (UTC)[reply]


Original text

However, policies exist for names of certain kinds of entities. In particular:

  • No individual person should be listed as a sense in any entry whose page title includes both a given name or diminutive and a family name or patronymic. For instance, Walter Elias Disney, the film producer and voice of Mickey Mouse, is not allowed a definition line at Walt Disney.

Proposed amended text

However, policies exist for names of certain kinds of entities. In particular:

  • Names of people are subject to the "People's names" section of this page.

People's names

  • In an entry consisting of both a given name or diminutive and a family name or patronymic, including a pseudonym, no individual person should be listed as a sense. For instance, at the entry Pete Tong, the following sense is not allowed: "Peter Michael Tong (born 1960), the English disc jockey." The entry Mark Twain is not allowed if its only sense is "The pen name of Samuel Langhorne Clemens (1835–1910), the American author". However, any figurative sense is allowed.
  • In a forename or surname entry:
    • No individual person should be listed as someone having that forename or surname. For instance, at the entry Mariah the sense "Mariah Carey (born 1969), American singer" is not allowed, and at the entry Van Gogh the sense "Vincent van Gogh (1853–1890), Dutch draughtsman and painter" is not allowed.
    • As a corollary, a sense meaning "a work by a person with the surname" is not allowed. For instance, at the entry Picasso, the following sense is not allowed: "An artwork by the Spanish artist Pablo Picasso (1881–1973)."
  • A nickname for a person, or two or more persons collectively, which is not their legal name, is allowed. For example, the entry Brangelina (defined as "The couple consisting of celebrities Brad Pitt and Angelina Jolie, together from 2005 to 2016") is allowed. Ye defined as "Kanye West, American rapper, songwriter, record producer, and fashion designer" is allowed, because it was a nickname before West legally adopted it as his name in 2021.
  • An entry consisting of an honorific or title and a name is not allowed unless it qualifies as a nickname as described above or has a figurative sense. For instance, Lord Byron (defined as "George Gordon Byron, 6th Baron Byron (1788–1824), the English poet") and Prince William (defined as "William, Prince of Wales (born 1982)") are not allowed. Prince Albert, meaning (among other things) a Prince Albert coat, is allowed.

@Sgconlaw: Would Jack the Ripper be deleted as a pseudonym? J3133 (talk) 13:02, 13 June 2024 (UTC)[reply]
@J3133: my initial impression is no, because it does not consist of "both a given name or diminutive and a family name or patronymic". — Sgconlaw (talk) 13:06, 13 June 2024 (UTC)[reply]
@Sgconlaw: I assume you do not mean we could have anyone’s pseudonym as long as there is no family name included. J3133 (talk) 13:10, 13 June 2024 (UTC)[reply]
@J3133: Yes in general, but I haven't given full thought to this point. I think we would want to extend the general forename + surname rule to pseudonyms (perhaps including names like Cardi B and Malcolm X which are in the same format), but if a pseudonym is only a single word it comes close to becoming (or may be indistinguishable from) a nickname, in which case there may be consensus for including such names. — Sgconlaw (talk) 13:34, 13 June 2024 (UTC)[reply]
I suggest the nickname portion of the proposal be amended to explicitly disallow stage names and assumed names. As it stands, the proposal as written would technically allow entries for Malcolm X, The Rock, Grimes, Pink, etc. None of those monikers are legal names. But they go beyond being simply nicknames. They're how those individuals identify and are identified publicly. The nickname policy was designed to allow for informal/colloquial nicknames for people. E.g. King of Pop for Michael Jackson, RPattz for Robert Pattinson, Elongated Muskrat for Elon Musk, or Maggie for Margaret Thatcher. Those entries have lexical value that entries based on stage names don't. Someone seeing a celeb news headline like "RPattz to play Dark Knight" might not think to punch "RPattz" into Wikipedia. Whether readers can easily connect a nickname to its bearer via WP depends on whether there's a redirect or disambiguation page. Alternatively, someone is unlikely to encounter a headline like "Elon Musk and Claire Boucher split." Everyone knows her as "Grimes." It's what her Wikipedia entry is titled. There'd be no benefit in having a definition for her at Grimes. WordyAndNerdy (talk) 21:08, 13 June 2024 (UTC)[reply]
Hmm... to me, someone seeing "RPattz to play Dark Knight" (or seeing "RPattz" anywhere else) and thinking "I should look that up in a dictionary" seems even less plausible, vs. them thinking to google it or thinking Wikipedia might have a redirect from that to the article on whoever it is. No? I mean, if I'm not going to find out what "Slipknot to appear in new John Wick" or "Grimes and Pink to appear in Barbie sequel" means from a dictionary, and I'm not going to find out what/who Margot Robbie ("Margot Robbie to reprise Barbie role", etc.) is from a dictionary, why would I expect to find out about RPattz from a dictionary? What is the rationale for having RPattz in a dictionary, and not having Grimes, Pink, Slipknot and Margot Robbie? - -sche (discuss) 21:37, 13 June 2024 (UTC)[reply]
@-sche You're right, but people do tend to click on things that pop up on Google search results, which is why Urban Dictionary is so successful. Theknightwho (talk) 22:34, 13 June 2024 (UTC)[reply]
RPattz is a proper noun that's used exclusively as informal slang. Margot Robbie, Slipknot, and Grimes are the "official" names (legal and self-styled) of various entities. Slang is something a descriptive dictionary should aim to document. Proper nouns like Margot Robbie, Slipknot, etc. are best left to the encyclopedia side, where they can be covered with the depth and detail afforded by biographical articles. The purpose of the RPattz entry is to tell readers this term means "Robert Pattinson, British actor," while the goal of RPattz's Wikipedia entry is to tell you where he was born, how many siblings he has, his first acting job, etc. WordyAndNerdy (talk) 23:03, 13 June 2024 (UTC)[reply]
I have never heard of either "RPattz" or "Grimes". I would have no reason to imagine that I could look up the former in Wiktionary but not the latter. Mihia (talk) 23:31, 13 June 2024 (UTC)[reply]
This is one those areas where someone's individual knowledge base seems likely to inform their perspective in nuanced and hard-to-pin-down ways. Regional variations in English, differences between native speakers vs. proficient secondary speakers, generational differences, differences in interests and subcultures (follows celeb news vs. doesn't). I don't think there's a "right" or "wrong" answer to some of the questions being raised in this thread. I just think some approaches are generally more workable than others. More conducive toward hitting the sweet spot of a dictionary that's more inclusive and up-to-date than Oxford but vastly more serious and reliable than UD. WordyAndNerdy (talk) 23:57, 13 June 2024 (UTC)[reply]
We certainly do not want to emulate the sea of crap that is UD. However, although it somewhat goes against my personal instincts, I do think it is at least worth considering allowing ALL proper names that meet some reasonable requirement of widespread mention sufficient to prevent a tidal wave of trivia. In this way we would avoid the need to make fine policy distinctions that might make sense to us at the time but are probably lost on ordinary users, such listing "RPattz" but not "Grimes", or listing "Mona Lisa" because we can find references to "a Mona Lisa smile" but not "Barbara Streisand" because we can find references to "a Barbara Streisand nose", or whatever it might be -- and also avoid the need to be perpetually debating these distinctions. If you asked me, or had asked me, I would say that every single tiny place name definitely was not dictionary material, and yet that policy was agreed. If we can have every tiny place name, then why not also "Grimes", "Monet" and the rest of them? What is the difference, essentially? They are no more or less encyclopedic than the place names, in my opinion. Mihia (talk) 11:50, 14 June 2024 (UTC)[reply]
@Sgconlaw: I have a couple of comments on your proposed text.
I wonder whether allowing nicknames, with no further restriction, could open the door to some potentially unwanted entries. Strictly speaking, as the text stands, there seems nothing to prevent me from adding an entry for my mate nicknamed "Bagger". I wonder whether we want unrestricted coverage even for well-known people. I was going to give the example "Giggsy", which is a fairly trivial nickname for a footballer called Ryan Giggs, as something that we wouldn't want to include, but now I see that we actually already DO have this entry! I guess someone thought it was suitable for inclusion.
To be doubly clear, I wonder whether we could explicitly mention that stage names are excluded as pseudonyms.
You mention the exclusion of "a work by a person with the surname"; I wonder if at the same time we should consider making some exclusions as to what does not count as "figurative" use. In my opinion, the following are all candidates for exclusion (in fact, these apply to other proper nouns as well as to people). It seems to be possible to find examples of these for almost anyone/anything that one has heard of, or certainly anyone famous.
  • "like X", referring to some characteristic of X.
  • "the X of Y", e.g. "The Ronald Reagan of liberalism".
  • "do a X", "pull a X", referring to some behaviour associated with X, e.g. "do a Ronald Reagan".
  • "an X moment", e.g. "a Ronald Reagan moment".
By the way, would it be appropriate to move this discussion to the Beer Parlour, as it concerns general policy? Mihia (talk) 17:59, 13 June 2024 (UTC)[reply]
@Mihia: yes, by all means relocate the discussion to the Beer Parlour, and we can continue it there. — Sgconlaw (talk) 18:05, 13 June 2024 (UTC)[reply]
Sorry, just one other point that occurred to me. Would it be simpler/shorter to specify under what circumstances definitions that consist ONLY of a real person actually ARE allowed, rather than listing the exclusions, which seem to cover most cases? Mihia (talk) 18:37, 13 June 2024 (UTC)[reply]
@Mihia:
  1. I seem to recall from previous discussions that there seems to be a consensus that nicknames are generally allowed, though it seems that this isn't reflected in the CFI. I don't think entries for nicknames of people's random friends will be an issue—it's almost certain that such entries won't pass the verifiability standard.
  2. I'm not clear what you mean by "whether we could explicitly mention that stage names are excluded as pseudonyms". Are you suggesting that stage names should or should not be allowed as entries? (I assume the latter?)
  3. Yes, I think it is a good idea to clarify what counts as a figurative use. Feel free to work that into the draft.
  4. Personally, I think it is clearer to specify in the policy both what is allowed and what isn't, otherwise later on we may be in a difficult position of trying to discern what the applicable rule is from the silence of the text. But maybe it would be clearer to specify what is allowed first, followed by what is therefore not allowed.
Sgconlaw (talk) 18:45, 13 June 2024 (UTC)[reply]
Yeah, for better or worse, even obvious abbreviations of first+last names have tended to be included (Talk:RPattz, Talk:JBiebs), although if enough people comment here we might get a sense of whether there's appetite to reconsider that. I think our CFI are bizarre when it comes to what names we do vs don't include. Why is it considered that I need to know which specific person Giggsy is, but not which specific person Dua is? Last I checked, I was only able to find 2-3 people with the name Dua (and our current presentation of it as an Albanian female given name fails to reflect that two of the 3 bearers got it from Arabic, and one is nonbinary) but perhaps more works have been digitized and the name is better attestable now. Do we include band names, e.g. Slipknot, Rammstein, Einstürzende Neubauten? It seems we do not, and that seems reasonable to me... but then why is Slipknot referring to a set of individuals not included, but Brangelina referring to a set of individuals is? (This is only the tip of the iceberg, consider e.g. fictional places' names.) For Prince William et al., cf. Talk:George VI.
I will opine that if a nickname is used for multiple individuals, and especially if it's productively applicable to e.g. everyone with the surname Giggs, it is probably better defined as "a nickname for people with the surname Giggs" [etc] rather than as "the nickname of [specific person], [specific other person], [specific third person], [specific fourth person], [specific fifth person], ...", similar to how we treat Ed. - -sche (discuss) 19:21, 13 June 2024 (UTC)[reply]
Things would be a lot simpler if we decided either "No definitions at all are allowed that simply describe a proper noun -- go and look at Wikipedia for that" OR "Every proper noun (attestable to some minimum level) is allowed"! Mihia (talk) 19:32, 13 June 2024 (UTC)[reply]
@Mihia That seems unhelpful at best: we document terms, whereas Wikipedia documents the referents for those terms. Excluding terms that describe a certain class of referent because people might be looking for information about that referent would lead to us excluding English tree because the Wikipedia article Tree exists. Obviously that's a silly example, but it underlines the point that it's not sound logic to be basing policy on. If you don't care about proper nouns that's fine, but quite clearly many users and editors do. Theknightwho (talk) 22:26, 13 June 2024 (UTC)[reply]
On the contrary, I believe that it is an EXCELLENT idea to choose one or the other. Judging by your last sentence, you seem to have missed my second option. Mihia (talk) 22:53, 13 June 2024 (UTC)[reply]
@Mihia It's only an excellent idea if you prioritise swift policy decisions over anything else, but I don't think it would even achieve that: any set of rules always raises questions about what does and does not qualify, unless it is infinitely permissive or restrictive, but neither of those stances would improve the dictionary, in my view. Theknightwho (talk) 16:10, 14 June 2024 (UTC)[reply]
The problem with this is that not every language/variety has the privilege of having a Wikipedia. Even if they do have Wikipedia's that Wikipedia might be prescriptive. For example, the official Persian word for Malaysia is مالزی (malezi) in Iran and مالیزیا (malīziyā) in Afghanistan (those are the respective terms used by news agencies in both countries). However, Persian Wikipedia is extremely prescriptive and considers standard Iranian Persian "correct" and standard Dari "wrong". Mentioning that the country is called مالیزیا (malīziyā) in Afghanistan is actually not even allowed and would be reverted. So it's not as though we can implement a hard rule that says "go look to Wikipedia for Proper nouns" because in some cases, the only place it can be documented is on Wiktionary!! — BABR (talk) 02:44, 14 June 2024 (UTC)[reply]
Brangelina is an informal nickname for a celebrity couple used in the media and colloquial speech. Celebrity couples generally don't present themselves to the public by such monikers in the same way bands collectively identify as Radiohead or Slipknot. Official band names can treated like stage names. They're names that individuals have chosen for themselves and thus seemingly fall outside our scope. Whereas informal/colloquial nicknames call under the umbrella of documenting language as it exists. We have Fab Four (informal nickname), but Beatles should probably only exist as a plural of Beatle. This is a complex and somewhat subjective line to draw. Which is why I think CFI should ideally leave room for case-by-case considerations. Nailing down hard and detailed rules about what is and isn't inclusion-worthy in this area might create more headaches than it resolves. WordyAndNerdy (talk) 21:59, 13 June 2024 (UTC)[reply]
Nevertheless, the present situation is a mess, whereby there are perpetual case-by-case arguments. Mihia (talk) 22:20, 13 June 2024 (UTC)[reply]
I generally agree that having clear and consistent policy is favourable to vague (and often unwritten) rules. But in this particular case I'm not sure that exhaustively itemizing what's includable would be an improvement. Would the clarity make for swifter resolution to discussions, or would it create new opportunities for bickering? I just don't see heated disagreements erupting over whether "a Monet" used in reference to an individual work of art is sufficiently figurative to warrant inclusion (it is, IMO) outside a call to explicitly disallow such terms. People often remain indifferent to policy considerations until their hard work is on in the chopping block. Which is the main reason I've tried to take an inclusionist approach. People gauge the relevance of language to Wiktionary's mission differently. I've never seen the relevance of taxonomical names. But clearly a number of Wiktionarians do and have put in good work in that area. WordyAndNerdy (talk) 23:32, 13 June 2024 (UTC)[reply]
1. Yes, good point about verification.
2. I think it would be helpful to mention that "pseudonyms" includes stage names, if that is indeed the intention (or not, if that is the case, I guess). I mention this because "pseudonyms" can sound more "literary".
4. It seems to me that the silence of the text is more likely to be an issue IF we try to explain both what is allowed and what is not, since "almost inevitably" some case will later arise that is not mentioned at all. If we were to say "these are the only cases when people are allowed as definitions, and everything else is excluded" there can't be any room for doubt. Of course, anything can be challenged later if it transpires that something important has been overlooked. Mihia (talk) 19:27, 13 June 2024 (UTC)[reply]
What about foreign renderings of names? Such as 忽必烈 (Hūbìliè) and Hốt Tất Liệt for Kublai Khan. MuDavid 栘𩿠 (talk) 01:36, 14 June 2024 (UTC)[reply]
@Sgconlaw Re: the draft proposal above and adding to what MuDavid brought up here - the proposal as it stands seems to fall short in the case of borrowed names of specific individuals in corpus languages. For example, we have zero evidence that 𐌰𐌻𐌰𐌹𐌺𐍃𐌰𐌽𐌳𐍂𐌿𐍃 (alaiksandrus) was a given name in Gothic; it is "encyclopedic" content in that sense, it seems to just refer to an individual. Yet it is valuable to include, because names such as this constitute valuable linguistic and onomastic evidence in otherwise poorly attested languages. Another similar case in Old High German Ōtacher, which is very valuable evidence from a philological standpoint, but which refers again to a specific individual only without attested use as a given name afaik. — Mnemosientje (t · c) 12:23, 14 June 2024 (UTC)[reply]
I don't think it is the function of principal namespace to be a repository of unattested terms whose only justification is their possible value to linguists. DCDuring (talk) 12:54, 14 June 2024 (UTC)[reply]
These are not unattested. As neither of these is "an entry consisting of both a given name or diminutive and a family name or patronymic", I don't see how they would fall under the exclusions outlined in the proposal. Including such terms in the case of extinct languages with a relatively closed corpus seems clearly preferable to me.--Urszag (talk) 13:03, 14 June 2024 (UTC)[reply]
You are, of course, as right about attestation as I was wrong. I really don't think that principal namespace should have entries for terms whose main justification is the convenience of linguistic researchers, that doesn't meet our standards for inclusion for all languages. I believe that there is nothing that prevents the use of names in etymologies. Whether we would want to have Appendices of such items is a separate question. DCDuring (talk) 14:33, 14 June 2024 (UTC)[reply]
Gothic and Old High German are extinct and are Limited Documentation Languages, so unless otherwise excluded, terms meet criteria for inclusion if they are attested by one use in a contemporaneous source or one mention in a source accepted by the community of editors for that language. I don't see a reason to have a stricter policy for proper names of the type Mnemosientje mentioned than for other terms.--Urszag (talk) 14:43, 14 June 2024 (UTC)[reply]
@DCDuring whose main justification is the convenience of linguistic researchers Who else do you think is interested in entries on Gothic or Old High German at all? Theknightwho (talk) 22:07, 14 June 2024 (UTC)[reply]
Support. I don't feel strongly about the details but I think reducing ambiguity is always a good idea. Ioaxxere (talk) 08:31, 23 June 2024 (UTC)[reply]

Jatki and Western Punjabi

[edit]

First off, I believe Jatki (i.e. the Lahnda dialects of Jhangli, Shahpuri and Dhanni) need to be given their own language code under Lahnda. Currently Jatki entries have to be put as dialectal Punjabi, which doesn't make sense as all the other Lahnda dialects (Saraiki, Pahari-Potwari, Northern and Southern Hindko) get their own language codes.

Secondly, there is an issue where Punjabi (the Wiktionary sense) is not exactly Punjabi anymore. Because although half or more of Lahnda speakers (of Jatki and Pothwari particularly, up to 50 million people) call their language Punjabi, Punjabi of the Wiktionary sense only includes the Eastern dialects (Majhi, Doabi, Malwai, Puadhi).

So I have a wild suggestion; rename Punjabi as it is now to Eastern Punjabi. (I know this would have a tonnn of complications, just a suggestion :D)

Assuming this did happen, it kind of brings up another problem, because "Eastern Punjabi" does not correlate with "Eastern Punjab" (the Punjab state of India), which could cause confusion. Majhi (the taken standard and central dialect of Punjabi) is an eastern dialect and shares its grammar with other eastern dialects. However, the majority of its speakers are from Western Punjab (Pakistan).

Thoughts? OblivionKhorasan (talk) 14:21, 14 June 2024 (UTC)[reply]

English pronunciation module

[edit]

I am soliciting comments for a possible English pronunciation module. I originally thought of doing this using English-style respelling but it occurs to me it may be too complicated to do it this way. For comparison, I wrote a German pronunciation module that uses respelling based on standard German spelling conventions and is mostly finished; it runs to 2400+ lines and supports only a single dialect (the prescribed one with /ɛ:/ for long ä). You can see testcases (lots and lots of them) here, here and here. So I'm thinking of reusing something similar to enPR notation, i.e. something that can map fairly directly onto phonemes but abstracts out the dialectal differences as much as possible. It would be pan-dialectal as much as possible, at least across conservative GenAm (i.e. without the cot-caught and merry-marry-Mary mergers) and RP, so that if a distinction is made in either dialect it needs its own symbol. But it would also support giving separate per-dialect respellings to handle one-off differences like in controversy and advertisement. Does this make sense to people? What do people think of "augmented enPR" as a notation?

BTW by "augmented enPR" I mean enPR with some additional symbols. For example, enPR calls for writing short o in cot as ŏ and au in caught as ô, but cases where short o is pronounced like au in GenAm (the lot-cloth split, as in dog, long, moth, coffee, chocolate, etc.) would need an additional symbol, maybe ŏ*. Similarly for the RP trap-bath split, where affected words (class but not crass, path but not math etc.) would need an additional symbol, maybe ă*. And probably similarly for the weak vowel merger, because (I think) some unstressed /ɪ/ vowels do not turn into a schwa in GenAm (although I can't say which ones other than bring up the canonical minimal pair Rosa's ~ roses). Ideally in this augmented enPR notation people would write hw in words like which and whale, and ōr in words like hoarse and borne that are distinct from horse and born in accents without the horse-hoarse merger; although in practice the latter might be hard to get right as I'm not sure which dictionaries still notate the distinction. (Update: The Longman Pronunciation Dictionary does indicate this distinction for GenAm, as a secondary pronunciation in the cases where the hoarse vowel can exist. For example, force writes the RP pronunciation as only /fɔːs/ but the GenAm pronunciation as primary /fɔːrs/, secondary /foʊrs/.)

Probably we'd have to manually put spaces or hyphens at all syllable boundaries as this is hard to do automatically, although possibly there could be defaults.

There would have to be parameters for the supported dialects so you can specify different pronuns for each (or some subset) as needed, but it might also make sense to have a way of adding pronunciations with arbitrary accent labels.

I might ditch the standard primary and secondary stress symbols that go after the syllable in question (rather than before as in IPA), or at least let you also use IPA-style symbols that go before, as well as probably acute and grave accents that go on the stressed vowel. (The latter would result in lots of double-accented vowels but most modern fonts support them reasonably well, and at least on my Mac using the ABC-Extended keyboard layout, it's easy to type acute and grave accents but harder to enter IPA or enPR stress marks.)

Thoughts? Benwing2 (talk) 03:46, 16 June 2024 (UTC)[reply]

An excellent starting point is the diaphonemes listed here. One will need a distinct way to represent each of them. Nicodene (talk) 07:16, 16 June 2024 (UTC)[reply]
@Nicodene That is quite a table. I won't be starting off with anywhere near the coverage of dialects given here; probably just traditional GenAm (w/o cot-caught and Mary-marry-merry), "new" GenAm (w/cot-caught and Mary-marry-merry), and RP, maybe also GenAus. Benwing2 (talk) 08:16, 16 June 2024 (UTC)[reply]
Still usable for reference, whatever dialects one chooses to include. Nicodene (talk) 07:10, 17 June 2024 (UTC)[reply]
Support. My only request is that you consider generating reconstructed Early Modern English pronunciations (see w:Shakespeare in Original Pronunciation). For example, the Oxford Dictionary of Original Shakespearean Pronunciation glosses knight as /(k)nǝɪt/, although I would prefer /(k)nǝɪ(x)t/ to reflect the fact the fact that it existed in other EME dialects (and in some cases unexpectedly shifted to /f/, e.g. thruff) [9]. Simon Roper's videos are also an invaluable resource. By the way, I thought @Theknightwho was working on this too? Ioaxxere (talk) 16:30, 16 June 2024 (UTC)[reply]
@Ioaxxere I definitely don't want to step on User:Theknightwho's toes but they said they wouldn't be getting to this for awhile. The approach in their prototype was quite different, using English-based respelling and a whole bunch of rules taken I think from a Git package for text-to-speech (which were maybe RP-specific?) to convert to IPA. Benwing2 (talk) 19:02, 16 June 2024 (UTC)[reply]
I'm of two minds about how to handle widespread mergers (especially the horse-hoarse merger): on one hand I support notating what the pre-horse-hoarse merger pronunciation was (and very much support notating what the non-wine-whine and non-Mary-etc merger pronunciations are), and indeed I like the idea of making it easier to notating the full phonological history, mentioning what the Early Modern pronunciation was, what the pre-pane-pain merger sound was, etc. On the other hand, if no or few people make a particular distinction anymore... and we expect the single required 'main' input to make that distinction... people won't make the distinction correctly. Realistically, they'll be notating a hoarse word and it'll be 50/50 chance whether they look at and copy the notation of a hoarse vs a horse word, because they don't realize there's any difference between those (because there isn't any difference, for any of the major modern national standards, nor most of the subnational dialects, AFAICT). So, it might be safer to have the 'main' input be horse-hoarse-merging, and require the horse-hoarse-distinction sound to be input as a separate value? This means the extra horse-hoarse-distinguishing line will be missing most of the time (like at present), but perhaps that's better than it being wrong much of the time(?). But I concede that there's only so far we could go in that direction: if a speaker doesn't make the Mary-marry-merry or wine-whine distinction or the trap-bath split, they'll likewise just use whatever sounds right in their dialect without realizing they were supposed to make a distinction for the sake of some other dialect, and yet I understand the desire to have the 'main' input to make the distinction... and for things like cot-caught I fully agree the main input should make the distinction (since it's still the norm AFAICT) even though this does mean people who merge the sounds will indeed sometimes notate the wrong sound (e.g. [10]). Other than that, I'll just observe that if one input generates multiple outputs, e.g. both US and UK, then an American adding an American pronunciation may not realize if/that the auto-generated British pronunciation is wrong, and vice versa; maybe we could provide a parameter so that at least conscientious users (if not blithe ones) could add "foobar|USonly=1" (or whatever) so only the US pronunciation they could vouch for was generated, and then entry went into some maintenance category so a Briton could check whether "foobar" also generated the correct British pronunciation and then remove the "USonly=1"...? IDK. PS I hope there's a key mapping the notation to IPA; I have to look at a key whenever I need to figure out what some enPR is intended to be.😅😂 - -sche (discuss) 22:14, 16 June 2024 (UTC)[reply]
@-sche What you say makes sense and I was thinking of adding parameters to allow arbitrary pronunciations to be input (either using enPR or whatever respelling or direct IPA) with an accent qualifier added to indicate which accent would be involved; so possibly the horse-hoarse distinction could be handled that way. Take a look at {{pt-IPA}}; the way it handles multiple accents is similar to what I was thinking of doing here (except it doesn't provide parameters to input arbitrary accents). Basically, if you put a pronunciation in |1=, it applies everywhere unless you override a particular accent using e.g. |us=, |uk=, |rp=, etc.; but if you just put a pronunciation in e.g. |us=, it applies only to that accent or set of accents (depending on the parameter), and all the others are considered unspecified and don't display. As for enPR, we could have people input some diaphonemic version of IPA like is used in Wikipedia, but I would be concerned that people would have difficulty using it correctly and would tend to input whatever IPA they felt like inputting, leading to an inconsistent mess just like we have now. The advantage of enPR or English respelling is that it is a clear abstraction layer separate from IPA and doesn't allow as much flexibility, reducing the likelihood of inconsistency. And yes I'd definitely provide a key indicating how the enPR symbols map to IPA in different accents.
Another possibility of dealing with the horse-hoarse issue is to provide different notations to indicate "the horse sound", "the hoarse sound" and "the merged horse-hoarse sound". For example, hōrs "hoarse" vs. hôrs "horse" vs. hors (merged horse-hoarse). That way someone who doesn't know the difference could at least avoid being wrong, and in that case the module would only generate the merged version and not the unmerged version. (Maybe the same thing could be done with the cot-caught distinction, which is very unpredictable for words spelled with o. I don't know.)
I also think we might have to have flapping indicated explicitly, or at least have symbols to override whatever the default rules are for deciding whether a given t is flapped. There's no way, for example, the module could automatically know that capitalistic has a flapped t but militaristic doesn't. (Unless maybe it goes by whether the t is placed in the preceding or following syllable? Hence kằp-ĭt-əl-ĭ́st-ĭk vs. mĭ̀l-ĭ-tər-ĭ́st-ĭk?) Another similar case is with so-called "Canadian raising" of /aɪ/, which IMO should definitely be shown (since it's probably by now the majority pronunciation in the US?) and which has unpredictable exceptions, like spider and tiger (at least for me, where tiger has "Canadian raising" but taiga doesn't). Benwing2 (talk) 22:42, 16 June 2024 (UTC)[reply]
@-sche Please take a look at User:Benwing2/enPR-table. This is my attempt so far at coming up with a list of enPR-style symbols for vowels and their mapping in three accents: RP, "traditional" GenAm and "merged" GenAm. It's not complete (but getting there), and there are certainly mistakes in the table as well as places needing further discussion. There's a column for GenAus but it's so far not filled in. Note that in some cases there are two possible symbols, particularly before r that is not followed by a vowel: a more expressed symbol (i.e. with more diacritics) and a less expressed one, corresponding to the fact that in this context there are a reduced set of possibilities. The two symbols would be equivalent. Benwing2 (talk) 05:30, 17 June 2024 (UTC)[reply]
Re having three symbols, for "horse", "hoarse", and "merged horse-hoarse", I suppose the usefulness of that depends on whether we think the average person adding a pronunciation is more likely to look up the documentation page where we can spell out "if you have the same sound in horse and hoarse and don't know which of those originally-distinct classes a word is from, just use notation X; if you do know which class the word is from (consult Longman's, Dictionary.com, the old 1930s OED, [etc other references]), use Y for horse or Z for hoarse" — in which case doing so is sensible — or if they are more likely to just mimic what they see in other entries, e.g. if I know court sounds like horse or hoarse to me (just with h->k and s->t), maybe I just go to [flip a coin: one or the other of those entries] and copy what's there, changing h->k and s->t, in which case it's a coin flip as to whether I've used the right notation.
It also occurs to me that another thing people might do if we use enPR-like notation is just copy the enPR-like notation of the AHD, MW, Dictionary.com, old gazetteers, etc (and if they don't know IPA and the pronunciations used in all national dialects we're outputting, never notice if that causes wrong IPA to be output) . . . but there may be no intelligible notation system which would avoid that problem, since using IPA we equally have people who copy IPA from places without understanding whether it makes sense, e.g. blithely putting length marks and /ɒ/ in GenAm, using /r/, etc.
If we deploy the template semi-manually, not just bot-converting IPA to it, I suppose we could aspire to manually check and correctly input the horse-vs-hoarse class of words as we went along (and the wine vs whine class, etc) and then just ... maybe try to track new additions with an edit filter or something to ensure they were right? And in that case, just having the one main input make the horse-hoarse and wine-whine etc distinctions would indeed be less effort than having a separate w=wh / w=w or hh=hors vs hh=hoars (or whatever).
Regarding Canadian raising and /ʌɪ/: is this phonemically contrastive with /aɪ/? (AFAIK the contrast between writer and rider is viewed as being phonemically /t/ vs /d/?) If it's not contrastive, I would suggest leaving it as a [narrow bracket] thing (and might not consider it important to require the 'main' template input to distinguish it, though if displaying the [phonetic] difference can be done automatically and/or with simple added symbols like your +/- idea, great). Likewise, I would consider not requiring flapping to be indicated in the input (if it's not phonemically contrastive and isn't present at all in one of the major dialects our inputters will be coming from), but if it too can be accomplished by add-ons like you suggest, great. I will note that using hyphen-minus, while it has the appealing advantage of being intuitive, has the disadvantage that it'll cause unexpected behavior/interpretations when people retain orthographic hyphens when inputting the pronunciation of e.g. sky-high and hit-and-run, if the template takes ī- / t- to be signalling something about raising or flapping but in fact the inputter just meant it to signal "there's a hyphen here". (OTOH, if what the template displays in response to that hyphen is nonetheless correct — if sky-high-type words indeed don't raise and hit-and-run-type words don't flap — then I suppose it doesn't matter, hah). - -sche (discuss) 20:16, 17 June 2024 (UTC)[reply]
@-sche Hmm, your point about hyphens is a possible issue, as I was thinking of using hyphens to separate syllables. Maybe instead I will use dot (.), which is also intuitive. As for whether /ʌɪ/ is phonemically contrastive, aside from cases like writer vs. rider, there are near-minimal pairs at least in my dialect of spider /ʌɪ/ vs. spied-her /aɪ/, tiger /ʌɪ/ vs. taiga /aɪ/, high school "secondary school" /ʌɪ/ vs. high school "a school that is high (e.g. in elevation)" /aɪ/, etc. I don't know if those pairs are universal, but I think at least the spider and high school exceptions are pretty standard. My thought was that the template would have a default rule "use /ʌɪ/ before an unvoiced sound, /aɪ/ otherwise" that would work in the large majority of cases, so the cases needing a specific ī+ or ī- override would be fairly rare. Similarly for flapping, the rule might be something like "syllable ends in vowel + t or rt and the next syllable is unstressed and begins with a vowel", which should work in the majority of the cases provided people put the t in the right place (which of course isn't guaranteed, but as you've shown, it's difficult to make something foolproof).
As for trying to catch people misusing the template, I think that IPA is very easy to misuse (as you've given examples of) and hopefully the use enPR will be a little less so; at least, I was thinking of having the code check for erroneous usages and throw errors in those cases to make it more likely they get fixed. Examples of erroneous usages would be omitting syllable breaks (there should never be more than one vowel in a syllable), using an unmarked vowel other than in the particular cases where it's allowed, putting two of the same consonant in a row, etc.
As for whether it makes sense to have a symbol for cases of mergers, I'm not sure what the right answer is here. If we do have such symbols, we can have cleanup categories for their use. If we don't, we can use the WT:Tracking mechanism to track cases where e.g. the horse and hoarse symbols are used, but I'm not sure how to "mark off" the ones we've checked other than e.g. to have a page somewhere containing a whitelist of terms that have been checked. Can you elaborate on how you think an edit filter would work? Benwing2 (talk) 20:53, 17 June 2024 (UTC)[reply]
Dot for syllable breaks is intuitive. I've been worried people would use hyphen for things like hĭt-bī-pĭtch.ĭs — we were discussing a while ago the various unstandardized ways people indicate various kinds of word breaks in the hyphenation template — so I was thinking if we used a different symbol than - for indicating flapping / (non)raising (e.g. use t^ or something), then if people do use - to mean "there's a hyphen here", the template can easily flag it as something to clean up, whereas if the template expects t- as valid input, I was worrying it'd be harder for it to know whether a given instance is right, but your proposed checks against two vowels etc sound like they'd catch any problems. (I may be wrong to think people will input hĭt-bī-pĭtch.ĭs, anyway.)
Re raising, hopefully more people can weigh in; for my part I would rather be conservative and wait until I see more literature referring to it as phonemic rather than allophonic (AFAICT it is near-uniformly referred to as allophonic), before moving it out of [brackets] and into /phonemic/ status. (The various near-minimal pairs I see mentioned all seem to be said to exist for only some speakers and dialects; besides spider and tiger I also see hire vs higher mentioned as a pair some people distinguish, but not others.) BTW, on the subject of ʌɪ, the OED gives the British pronunciation of all these words, tiger, taiga, rider, writer, etc, as /ʌɪ/ (and the American pronunciation of them all as /aɪ/), although I suppose that's not RP.
Re an edit filter, I meant a filter could tag all new additions of the pronunciation template with a horse/hoarse vowel in the input, so people could manually review those additions to see if they were correct; this would be labor intensive / inefficient. I like your idea of having the third / merged symbol add a cleanup category; that's probably the best approach, though I think it only helps with aware/conscientious users (who know they're supposed to make the distinction, and can use the "merged" symbol if they're unsure), whereas I'm thinking about the users who don't realize they're supposed to make a distinction (but perhaps nothing can be done about them). - -sche (discuss) 04:04, 19 June 2024 (UTC)[reply]
What you're saying about phonemic vs. allophonic of Canadian raising and flapping makes sense; I'll have them indicated as allophonic (but still provide symbols for inputting them if needed, since it's hard for the module to always get it right). BTW one possibility for notating the horse/hoarse distinction is to use similar +/- or whatever symbols rather than ör vs. ōr, so that e.g. you'd have merged hors vs. something like ho-rs "horse" and ho+rs "hoarse" (or maybe some other special characters); maybe that will make people more likely to look up the documentation and see that it's OK to write hors if you're not sure. (The idea is that the + and - additions will always indicate finer distinctions that can be left out in cases of doubt.) Dunno though. Also, I've been looking at sample words to come up with how I would structure the arguments to {{en-IPA}} or {{en-pr}}; the first two words I picked were tree and three and both of them have weirdly narrow IPA transcriptions added. IMO these *really* should not be there; e.g. I don't see how [t̠͡ɹ̠̊˔ʷɪi̯] possibly helps anyone. Benwing2 (talk) 04:46, 19 June 2024 (UTC)[reply]
@Benwing2, FWIW, Canadian raising is at least borderline phonemic for me (writer and rider are a minimal pair in my idiolect, as are house [noun] and house [verb] (or house verb and how's)), but I pronounce spider and spied her identically and I don't raise the vowel in tiger. Andrew Sheedy (talk) 23:36, 25 June 2024 (UTC)[reply]
@Andrew Sheedy Thanks! The writer ~ rider difference is widespread but often analyzed as underlyingly /ɹaɪtəɹ/ vs. /ɹaɪdəɹ/, where Canadian raising applies before voiceless sounds earlier than flapping applies. Similarly house (noun) would be /haʊs/ and house (verb) would be /haʊz/ (do you pronounce the final consonant differently in these two words?). Also there may be a difference between "Canadian raising" in Canada vs. "Canadian raising" in the US; certainly, there is no raising of /aʊ/ in the US, and I think the raising of spider is fairly widespread in the US although maybe not universal. (Strangely I do have raising of /aɹ/ in my speech, making Carter and carder sound different.) Benwing2 (talk) 23:54, 25 June 2024 (UTC)[reply]
@Benwing2: Ah, yes, I remember hearing writer/rider analyzed that way. I do find the sounds quite distinct though—ice cream without the raising is one that always stands out to me as distinctly "American". As for house, I pronounce the noun /hʌʊs/ and the verb /hʌʊz/. So the verb house forms a minimal pair with how's, which I pronounce /haʊz/. (Likewise, houses is /hʌʊzəz/ and espouse is /ɛˈspʌʊz/. I didn't make the connection between louse and lousy until I was in my 20s, because to me the words had completely different vowels (/lʌʊs/ vs. /laʊzi/).) I think my having phonemic Canadian raising is fairly idiosyncratic. In a word like houses most people where I live either retain /s/ or lower the vowel to [aʊ]. I'm surprised to see our entry for spider list [ˈspʌɪ̯ɾə(ɹ)] as the Canadian pronunciation. I'll have to pay more attention, but I don't think that pronunciation is widespread in the Prairies. Andrew Sheedy (talk) 18:33, 26 June 2024 (UTC)[reply]
@Andrew Sheedy Interesting. The Wikipedia article on Canadian raising talks about idiosyncratic raised [ʌɪ] in the words "tiny, spider, cider, tiger, dinosaur, cyber-, beside, idle (but sometimes not idol), and fire" as at least possible in certain East Coast and Midwest US accents. For me, tiny, cider and idle/idol can go both ways whereas spider and tiger are usually raised and the remainder not. It definitely seems like this is an idiosyncratic phonemic split in the process of happening. Benwing2 (talk) 18:53, 26 June 2024 (UTC)[reply]
@Benwing2: It's interesting that I don't have that split. In my case, regular Canadian raising of /aʊ/ has simply been cemented as phonemic (I don't think that applies to /aɪ/ and /ʌɪ/ for me, though I perceive them as quite distinct in certain environments (but not others)). As far as I can tell, it only affects words that would have a different realization of /aʊ/ according to the form of the word. It's worth noting that our entry for dinosaur lists that pronunciation as following the "idle-idol split" (also mentioned in idle), which I've never heard of before. Andrew Sheedy (talk) 19:04, 26 June 2024 (UTC)[reply]
@Andrew Sheedy Hmm, Googling for "idle-idol split" only brings up a few Wiktionary links and a couple of Reddit topics e.g. [11]. Sounds like something that a Wiktionary contributor might have made up. Benwing2 (talk) 19:24, 26 June 2024 (UTC)[reply]

Numerals

[edit]

Aside from the obvious numerical definitions, the pages on numbers like 3 and 4 include defs for one topic: indicating phonological tones in tonal languages. Is that a rule? Why include that and not other non-numerical things representable by numbers? Dewey decimal numbers, Hornbostel-Sachs numbers, Fujita scale, etc. Or are those things includable? I don't see anything relevant in WT:CFI and I have no interest in adding such defs, I was just wondering if this has been discussed. What's special about tonal markers? Mazzlebury (talk)

Probably because they are used in the transcription of words, e.g. in Jyutping. Voltaigne (talk) 13:57, 16 June 2024 (UTC)[reply]
Oh ok, that makes sense, it's more like a character, I get that. Mazzlebury (talk)

AWB request (Brainulator9)

[edit]

I would like access to AutoWikiBrowser, for use in helping with doing things such as diffusing categories like Category:English terms prefixed with un-. I already have been approved for this tool on English Wikipedia and Wikimedia Commons and have used them with little issue. -BRAINULATOR9 (TALK) 23:42, 16 June 2024 (UTC)[reply]

Can you be more specific about what you plan to do? Generally categories like Category:English terms prefixed with un- are not supposed to be added manually. Benwing2 (talk) 05:32, 17 June 2024 (UTC)[reply]
In this case, I would be adding |idN= parameters to the {{suffix}} templates, putting them in categories like Category:English terms prefixed with un- (negative). I'm not sure if every little task needs to be brought up here first, but that's the specifics for the task I mentioned. -BRAINULATOR9 (TALK) 14:16, 17 June 2024 (UTC)[reply]
OK that sounds fine, I just want to make sure you have some idea what you're doing :) ... if no one objects in a couple of days, I'll add you to the list. Benwing2 (talk) 18:54, 17 June 2024 (UTC)[reply]
Thank you! Hopefully, what I've done so far isn't cause for concern (I stopped make one type of change partway in case there wasn't consensus to do it, especially en masse). -BRAINULATOR9 (TALK) 04:15, 27 June 2024 (UTC)[reply]

Japanese historical kana transliteration

[edit]

Hello, is there maybe a problem with how historical kana spellings are currently transliterated? I've had to change 柔和's "にうわ" to "にう.わ", because, for some reason, the former was producing "niwa" instead of "niuwa". I just now went on 飢える and find that "うゑる" is transliterated as "weru"... is there some sort of mistake here? Why would this be the default behavior and require a "." to fix it? Kiril kovachev (talkcontribs) 01:23, 18 June 2024 (UTC)[reply]

This looks like a problem with how the template handles labialized consonants that were present in middle Japanese. IIRC only k and g could be labialized, and the spelling for these would use く or ぐ before a わ行 sound, e.g. くわ for kwa, ぐゑ for gwe, etc. (sounds other than くわ and ぐわ specifically may be counted as ancient readings instead?). It looks like this logic is being extended to all う段 sounds though, not just く and ぐ, resulting in things like うわ > wa, すゑ > swe, etc. I've already fixed this on articles for , , and , but no doubt others are affected. No idea how to fix the template so you don't have to manually insert a period though. Any ideas? @Eirikr Horse Battery (talk) 20:29, 26 June 2024 (UTC)[reply]
Thank you for the heads-up, but sadly I am not of much use when it comes to our module infrastructure. Agreed that this romanization behavior as a marker for labialization should only apply to the "w" kana when immediately following either く (ku) or ぐ (gu). Even there, we need a means of indicating when this should not happen, in those rare cases such as words like 久遠 (modern kuon, classical kuwon; "boundless time, eternity; far in the past or future"). I think the current practice of adding a period should work just fine for these corner cases. ‑‑ Eiríkr Útlendi │Tala við mig 21:57, 26 June 2024 (UTC)[reply]
@Eirikr @Horse Battery @Kiril kovachev I am working on Japanese modules at the moment, so I'll try to make time to look at this. That being said, I haven't ever touched anything to do with historical kana transliteration, so I'll need to get up to speed with it first. It's on the to-do list, at least. Theknightwho (talk) 20:33, 27 June 2024 (UTC)[reply]
@Eirikr @Horse Battery We get the same problem with いゆ (iyu) becoming "yu". Unlike labialisation, it looks like palatalisation is the correct behaviour for most consonants, but there are still exceptions like this.
The way the module's been implemented makes this a little tricky to fix, but it should be doable. Theknightwho (talk) 17:07, 1 July 2024 (UTC)[reply]
@Eirikr Sorry for the double ping - should palatalisation be applied in cases like () (kiyu), where (-yu) is a Classical Japanese suffix? The output is currently "kyu". I assume not, but just want to double check. I suppose my broader question is whether we always want a morphemic break between furigana and okurigana in the kanji readings section, as it seems odd for to have "kyu" as a historical reading as it does right now. Theknightwho (talk) 17:58, 1 July 2024 (UTC)[reply]
Ugh, ya, the in 消ゆ is a suffix, and suffixes are not valid cases for palatalization. Diachronically, sure, that happens, but then it's not a suffix anymore and instead a fused morpheme.
There will be a few words like this, which might affect your implementation. 見ゆ (miyu), 聞こゆ (kikoyu), 覚ゆ (oboyu, omoyu), 冷ゆ (hiyu), 煮ゆ (niyu), among others.
There are cases that aren't suffixing, and that would also be affected by this, such as きやきや (kiyakiya), にやにや (niyaniya), ちちよちちよ (chichiyochichiyo), as a few examples. That said, I am uncertain if there are enough non-suffixing exceptions to warrant explicitly coding for these. It might be enough to have editors use the medial-period workaround to force the template(s) to treat these as separate morae. ‑‑ Eiríkr Útlendi │Tala við mig 18:43, 1 July 2024 (UTC)[reply]
@Eirikr As a first step, I'll change {{ja-readings}} so that it always puts a morpheme boundary between the furigana and okurigana. For some reason, it's already doing that for modern readings, but not historical or "ancient" ones, so I assume there was an oversight at some point.
You can see this in action already in the newly-revamped kanji category descriptions (e.g. Category:Japanese kanji with historical kun reading き・ゆ), since it has to work from a parallel implementation. (As a side point, I have replaced the hyphen with the middle dot, as that's what Daijisen use, among others, and it's a lot more legible; this doesn't affect user input - only the names of the automatically-generated categories.) Theknightwho (talk) 00:55, 2 July 2024 (UTC)[reply]
(Edit: forgot to ping @Eirikr - see below. Theknightwho (talk) 19:15, 17 July 2024 (UTC))[reply]
One issue that I've noticed is that we are extremely inconsistent with small kana usage in historical spellings (e.g. see Category:Japanese terms historically spelled with ゎ). However, I do think I have a solution to this:
  1. For input, small kana should be used, just like with modern spelling rules.
  2. For output, small kana will not be displayed or linked to, but will be accounted for in the transliteration.
This has three advantages:
  1. It guarantees consistency in our historical kana entries. Even if someone uses full-size kana, the worst that will happen is the transliteration will be wrong; the link will still be to the correct entry, since historical kana entries should never use small kana in the title.
  2. It reduces the need for manual overrides in transliteration. Plus, being able to break things down by mora allows for more sophisticated transliteration (which is the approach I'm taking in the rewrite of the transliteration module that I'm currently working on).
  3. It's intuitive for users who are familiar with Japanese, but who aren't very experienced with wikitext/our templates, which keeps the barrier to entry low.
What do you think? Also pinging @Fish bowl.
Theknightwho (talk) 19:13, 17 July 2024 (UTC)[reply]
Using small kana is a good idea IMO. However, I still have worries about the "historical romanization" system itself being poorly defined, and also wonder if it could be unified with a general system for romanizing quotes in pre-war orthography in general. But here it gets quite thorny because those quotes have been romanized according to how people would pronounce them today, not in any half-assed "historical romanization". Providing "Historical romanization" is also misleading for more modern terms that can easily be written in the orthography (as it is just an orthography, not to be confused with historical attestation, although it is in multiple areas *based on* historical attestation). Frankly I would like to see "historical romanization" retracted from ja-headword, and further deliberation on the treatment of middle Japanese vs. Meiji-era pre-war Japanese. —Fish bowl (talk) 20:24, 20 July 2024 (UTC)[reply]
@Fish bowl Yeah, I agree with all of that. This also affects the sokuon as well, which we don't really account for at all at the moment: e.g. we've got historical spellings like をつとつせい (wotutotusei), which should probably be wottossei, and it gets worse if the pre-reform orthography uses something other than , like がくかう (gakukau), which would be better as gakkau. Not sure if it's possible to solve that with small kana, either, as I don't think the etymological mora would be predictable from an input like がっかう (though if it is, that's great).
In terms of your last point, I think it's high time we split Middle Japanese out as its own L2, because blending everything from the end of Old Japanese in c. 800 to the orthographical reform in 1946 into a "historical kana orthography" is over-simplified. Category:Middle Japanese currently only contains derived terms, so we're basically pretending it doesn't exist outside of etymology sections at the moment. Yes, there are 190 terms in Category:Classical Japanese, but that's not the same thing as the vernacular language, which we simply label "obsolete". Theknightwho (talk) 01:32, 21 July 2024 (UTC)[reply]
idea: abandon any thoughts of "historical romanization" or any attempt to code a pre-war orthography romanizer, and instead force editors to provide the modernized reading for texts, and retool ruby code to be able to handle this (actually i forgot about the obvious need to provide historically accurate furigana; or should we actually also abandon editorializing furigana for pre-modern texts, considering the large variation/表記ゆれ before [and even into] the standardized Meiji era? which i actually have worried about before, come to think about it). this would extend to the headword of words in pre-modern orthography: しづけし {{ja-adj|しずけし|infl=shiku}} (cf. デジタル大辞泉 デジタル大辞泉 しず‐け・し〔しづ‐〕) —Fish bowl (talk) 04:49, 4 August 2024 (UTC)[reply]
Personally I think we have the larger problem that "historical kana transliteration" is invented from whole cloth and undocumented. If I recall correctly, people instructed me to create something that "seemed appropriate" for the sake of {{ja-readings}} (which also has support for a badly-defined "ancient kana" feature), and it was later unilaterally added to Module:Jpan-headword by User:Huhu9001, despite issues such as morpheme boundaries for words like きやきや as stated above. —Fish bowl (talk) 22:05, 1 July 2024 (UTC)[reply]
Based on this discussion, I feel like "historical kana transliteration" seems to be being used as a confusing blend of two separate things: representation of historical spellings, and representation of historical pronunciations. Those are far from the same thing, since historical Japanese spelling was not perfectly phonetic, so I guess I would support removing it and replacing it with something better (which might be separate entries with their own pronunciation sections). For example, from my perspective I don't see why "wotutotusei" should be romanized as "wottossei": to my mind, transliteration is better in this context than transcription, since the Japanese text をつとつせい is an orthographic form, not a phonemic transcription. Likewise, to me it seems better to just use kiya, kuwa, etc. regardless of whether the pronunciation is syllabic or non-syllabic, since that is how the historical spelling system worked.--Urszag (talk) 05:30, 4 August 2024 (UTC)[reply]
@Urszag I agree with you when it comes to gemination, but disagree when it comes to digraphs, since the number of morae is a pretty fundamental aspect of Japanese. Theknightwho (talk) 06:08, 4 August 2024 (UTC)[reply]
My point is, however important the number of morae is to a phonological description of Japanese pronunciation (current and historical), it isn't connected to the topic of historical kana spelling. The ambiguity of historical kana spelling in certain contexts is a fact about that spelling system that gets obscured if we silently adjust the romanization to add extra information to resolve those ambiguities. While there are cases where we romanize words in a way that adds information not in the native spelling system, usually that's in the context of referring to lexemes as a whole, but "historical kana spellings" as currently presented are not treated as full lexemes.--Urszag (talk) 06:26, 4 August 2024 (UTC)[reply]

Updates to WT:AINE

[edit]

It was suggested that I update WT:AINE to better reflect common convention, so I went ahead and did so. Probably the biggest change is deleting some of the sort rules, which were a bit complicated and therefore mostly ignored. @Mahagaja, Rua, This, that and the other, Nicodene, Benwing2 --{{victar|talk}} 05:41, 18 June 2024 (UTC)[reply]

@Victar Seems reasonable to me from looking over the changes. Benwing2 (talk) 05:46, 18 June 2024 (UTC)[reply]

AWB request (Babr)

[edit]

Was originally gonna wait a few weeks after the first AWB request cuz I didn't want to ask too soon after someone else, but then someone else asked again so I guess I can't control how close my request is to someone else's.

Anyway, I will be using it to clean up Tajik entries, for examples of what I am changing compare what this entry looked like before I cleaned it up to what it looks like after I cleaned it up. I've already cleaned up about ~400 entries so I will just continue what I'm already doing at a faster pace.

BTW I am User:Sameerhameedy, I just changed my username a few days ago. — BABR (talk) 08:13, 18 June 2024 (UTC)[reply]

@Babr Hi! I added you to Wiktionary:AutoWikiBrowser/CheckPageJSON. Please let me know if this works; some users have said that this page doesn't work and you have to be added to Wiktionary:AutoWikiBrowser/CheckPage despite that page saying it's superseded. Benwing2 (talk) 21:08, 18 June 2024 (UTC)[reply]
Unfortunately it would probably be disruptive and a bad idea to test this (if some users' AWB use in fact depends on the non-JSON CheckPage existing), but . . . iff Wiktionary:AutoWikiBrowser/CheckPage has in fact been superseded, I wonder if the issue might be that the page nonetheless still exists (with names on it and everything), so perhaps AWB first looks there, sees it exists, assumes it's operating on an old wiki that still uses the old name for the page, and looks for names there and doesn't find them: I wonder if not having a page with that title would force it to look for the new JSON page. - -sche (discuss) 02:09, 19 June 2024 (UTC)[reply]
Hmmm, that is an interesting hypothesis. I wonder if we can check this in some other fashion, maybe by looking through the AWB docs or asking one of the AWB developers (wherever they hang out). Benwing2 (talk) 02:17, 19 June 2024 (UTC)[reply]
BTW on Wikipedia, their CheckPage is a hard redirect to CheckPageJSON. Maybe that would work for us? Benwing2 (talk) 02:25, 19 June 2024 (UTC)[reply]
I guess we could try it and revert if it turns out to cause problems. (Might as well move the useful text to WT:AWB while we're at it.) - -sche (discuss) 04:06, 19 June 2024 (UTC)[reply]
We'd need the cooperation of someone who has AWB access as well as AWB installed so they could try things out to see if anything breaks. (I don't have AWB installed because (a) I'm on a Mac and (b) I have bot scripts for downloading sets of pages, editing them offline and pushing them in a batch; this is the source of those (manually assisted) notations in my bot changes.) Benwing2 (talk) 04:50, 19 June 2024 (UTC)[reply]
I have AWB and can test whether it still works if the page is redirected. (If only some users find that being added to the JSON page isn't enough, that isn't foolproof; it might be better to get one of the people who found that merely being added to the JSON page wasn't enough for them.) - -sche (discuss) 05:12, 19 June 2024 (UTC)[reply]
So far I can still edit, but next I'll try closing and restarting AWB, as I suspect it performs its check on startup. - -sche (discuss) 05:41, 19 June 2024 (UTC)[reply]
I closed AWB and started it afresh, and: "Logged in, user and software enabled", it says. - -sche (discuss) 05:43, 19 June 2024 (UTC)[reply]
OK, hmmm. Let me try redirecting the page then. Benwing2 (talk) 05:59, 19 June 2024 (UTC)[reply]
@-sche OK, I merged the two pages, copying the non-user text to WT:AutoWikiBrowser, and redirected Wiktionary:AutoWikiBrowser/CheckPage to Wiktionary:AutoWikiBrowser/CheckPageJSON. Let me know if it still works after closing, logging out explicitly (if possible), logging in and seeing if you can make an edit. Benwing2 (talk) 06:12, 19 June 2024 (UTC)[reply]
OK, I logged out in my other browsers, opened AWB, logged in, explicitly logged out in AWB, closed it, reopened it, logged back in (in AWB), hit the "refresh status" option (I figured if anything would "refresh my 'has-AWB-rights' vs 'doesn't' status", that seemed like a likely candidate, although I think it in fact refreshes some list of typos somewhere), logged out and back in again for good measure, and it still says I'm approved. - -sche (discuss) 06:32, 19 June 2024 (UTC)[reply]
OK great! So hopefully everything is sorted now. Benwing2 (talk) 06:34, 19 June 2024 (UTC)[reply]
Didn't get a chance to test it until now but it works just fine! — BABR (talk) 07:19, 20 June 2024 (UTC)[reply]

Could we change the behaviour of this template please? Currently it's a mere copy of {{affix}} / {{compound}} with a specific categorisation. But univerbations are a specific type of compound: they are originally entire phrases/syntagms which came to be joined together.

For example, French aujourd’hui is not the mere sum of au + jour + de + hui: it's the whole phrase au jour(-)d’hui (now obsolete) rewritten and felt as one word.

Imo the template shouldn't take parameters (so we'd write {{univerbation|fr|[[au]] [[jour]] [[de|d']][[hui]]}} instead of {{univerbation|fr|au|jour|de|hui}}), and it certainly should not output a "+" between the components. PUC17:43, 18 June 2024 (UTC)[reply]

@PUC This is a pretty major change. If we were to implement this we'd need to figure out a strategy for migrating the 2,000 or so pages that currently use the old format to the new one. Benwing2 (talk) 02:26, 19 June 2024 (UTC)[reply]
I feel like knowwhaddamean is a good candidate for this template. So I Support the formatting you have in mind. Ioaxxere (talk) 08:31, 23 June 2024 (UTC)[reply]

CFI for translations?

[edit]

Is there some bare level of attestability needed for translations? I ask because of the translations of Mummerset, a word that will be used vanishingly rarely - if ever - in other languages. We have Finnish and Russian Macedonian translations, but it doesn't look like these words have ever been used in those languages (and it's arguable whether they are right: Mummerset isn't actually a dialect, it's just a stage accent). Smurrayinchester (talk) 13:38, 19 June 2024 (UTC)[reply]

Some people insist the same CFI applies to translations as does for entries. I cannot find anything in the policy to support it personally. At the same time, I don't know what people expect to happen when translation requests are added indiscriminately, without paying any attention to how likely it is for the term in question to ever be used in the target language, if outside English at all. — SURJECTION / T / C / L / 15:44, 19 June 2024 (UTC)[reply]
I would support that CFI for translations be the same as other entries, and I share your confusion. Vininn126 (talk) 20:07, 19 June 2024 (UTC)[reply]
In theory, because CFI applies to entries, what you're supposed to do if you think a translation (or redlinked Derived term, etc) is wrong, is: create an entry for it. Then you RFV it and it gets removed as both an entry and a translation if it fails RFV. Because this is rather ... faffy ... there are people, as Surjection mentions, who prefer to just apply CFI directly to the translation and remove it if it fails ATTEST without creating an entry for it first, but this gets a surprising amount of pushback, so... if you think something is wrong and doesn't exist, you can always fall back on creating an entry for it. - -sche (discuss) 16:09, 19 June 2024 (UTC)[reply]
If so, there has to be a solution to removing the only translation from an entry and people simply requesting a translation to be added later "because it's missing". — SURJECTION / T / C / L / 19:43, 19 June 2024 (UTC)[reply]
Also using {{not used}} and {{no equivalent translation}} seem inappropriate sometimes. Vininn126 (talk) 20:12, 19 June 2024 (UTC)[reply]
I think there are still a lot of cases where the language may have an equivalent translation, but it's not attested/attestable. Like pretty much any place name in pretty much any LDL. Thadh (talk) 23:12, 19 June 2024 (UTC)[reply]
This also happens with some multiword terms, or something similar, where the given English word is idiomatic, a language would translate it the same way, but that exact phrase isn't attested. Vininn126 (talk) 23:14, 19 June 2024 (UTC)[reply]
@Thadh @Vininn126 This is why {{no attested translation}} exists, but it's hardly used. Theknightwho (talk) 23:16, 19 June 2024 (UTC)[reply]
@Theknightwho, Vininn126: When the topic becomes technical, it necessarily occurs: I remember adding a natural SOP translation in my native German which should have such a term but has not dropped it anywhere yet … so I just starred it preceded by “suggestion”: attrition bias. There is one for non-response bias or participation bias, so here we see the effect of another bias by which scientists select the keywords they write, publication bias or filter bubbles, ironically. Fay Freak (talk) 22:44, 23 June 2024 (UTC)[reply]
Yes, clearly CFI applies to translations as well. Otherwise I would be able to add ⡰⪘◰ⵗ⥙⽟⪢ⷼⲋⴑ⎽⪬⣏⫙ as the Arabic translation of neurohistopathologist and no one could stop me. I assume that any challenged translation which is a redlink can be removed on the spot since there isn't even an entry to RFV. Ioaxxere (talk) 08:31, 23 June 2024 (UTC)[reply]
This is correct. So there should be criteria for inventions, in the case translations are needed irrespectively of attestation. Due to the rarity of the problem, it does not need to be formulated just yet however, since rules are gameable. Fay Freak (talk) 22:44, 23 June 2024 (UTC)[reply]

Entries by Geshiza

[edit]

Following up on this BP discussion back in March, after which I notified Geshiza that they'd be given time to fix their entries before they were moved out of the main space. Its been 3 months since then and Geshiza has been completely inactive and hasn't requested and language code for Eastern Geshiza, so we cannot even fix the entries they made ourselves if we wanted to. I think the best solution now is to move the entries they made to their user space and notify them that their entries have been moved there and can be fixed and readded (if and when they return). I'd be happy to move the entries myself if there is consensus to do it, but someone would still need to delete every redirect page so I suppose it's better someone else does it.

Notifiying Chuck Entz, Benwing2 and User:Theknightwho, who were involved in the BP discussion from March — BABR (talk) 07:14, 20 June 2024 (UTC)[reply]

@Babr Let's go for it. Can you identify a list of pages to be moved and deleted? Benwing2 (talk) 07:50, 20 June 2024 (UTC)[reply]
@Benwing2 pretty much all the entries they've made need to be moved. Lucky it seems they've added all their entries to Category:Eastern Geshiza nouns. Though, since they tagged the category manually, it's possible they missed some (though I didn't notice any missing entries when comparing the category to their contributions).
BTW I was planning on notifying them of where their entries went, so if you plan on moving the entries yourself then please let me know when they have been moved (unless you were planning on notifying them). If not, I could move the entries myself, but I would need extended mover in order to do so. I'm fine either way, so I'll leave that up to your discretion. — BABR (talk) 08:17, 20 June 2024 (UTC)[reply]
@Babr Apologies for the delay. I have given you the extended mover right. Let me know if you need help moving or deleting any pages. Benwing2 (talk) 19:38, 23 June 2024 (UTC)[reply]

Htoklibang Pwo

[edit]

See this paper. Htoklibang Pwo is apparently a Pwo lect that is not mutually intelligible with any of the other Pwo languages, and is not culturally more related to any Pwo group over another. It seems neither Glottolog nor Wikipedia have entries or even descriptions of this lect, and according to this, this lect was only first identified in 2008! Should we make a distinct code for this? I have currently kept the one entry as Eastern Pwo, but that doesn't seem like a good solution. Thadh (talk) 10:10, 20 June 2024 (UTC)[reply]

Support Theknightwho (talk) 03:49, 21 June 2024 (UTC)[reply]

West-Central Thailand Pwo Karen

[edit]

Another Pwo lect that has escaped addition as a code. this paper makes clear that this is a group of Pwo lects that aren't well intelligible with any other Pwo group, and forms a sociolinguistic group on its own. The request for an addition of a code was rejected by SIL, with an argument that, in my opinion, are absolute hogwash, namely that SIL "found no evidence that [...] the [...] West-Central Thailand variety of Pwo Karen was not intelligible with the Eastern Pwo Karen", even though the paper I linked above just says this outright in the first line... Anyway, we're not bound by SIL's decisions, let's just add this code as well so we can document these languages properly. Thadh (talk) 22:20, 21 June 2024 (UTC)[reply]

P. S. Perhaps better to call this group by a different name, perhaps either "West-Central Thailand Pwo" or "Southern Pwo". Thadh (talk) 22:21, 21 June 2024 (UTC)[reply]
I've made User:Thadh/Pwo for comparison. We might want to even split it three-way, and handle Southern separate from WCT. Thadh (talk) 16:02, 23 June 2024 (UTC)[reply]
I know little about these lects but my general experience is that splits are easier to implement than merges, so I would favor a more conservative approach when splitting (in this case, a two-way rather than a three-way split). FWIW the cognate sets in the table you've created don't look so different to me (except for /sʷɛ˥˥/ and /θʷa˩˩/ from /mɛ˥˥/ — how does that work?) but this doesn't say all that much, as this is a very small sample. Benwing2 (talk) 19:49, 23 June 2024 (UTC)[reply]
@Benwing2: Do take into account that the first three lects are written in the Thai script, while the last three are written in Burmese. Also, tonal contrast is quite a distinction, and so are the vowels - what I have given there are all vowels that are phonemic for all lects afaik. Also, I have gone ahead and added cognates, but of course there will also be tons of differences lexically and usage-wise. Finally, the paper discussing WCT Pwo specifically distinguishes it from Southern Pwo, both of which are still in Thailand. Thadh (talk) 20:00, 23 June 2024 (UTC)[reply]

Decluttering the altform mess

[edit]

(Previous discussion.)

At the moment part-of-speech categories are practically unusable for languages with numerous altforms. For instance iluec and its 270 variants account for nearly half (!) of all entries in Category:Old French adverbs.

This state of affairs would be greatly improved by adding an optional parameter to {{head}} which disables the normal categorizations handled by that template and instead puts entries in categories named '[language name] alternative forms'.

Thoughts? Nicodene (talk) 03:48, 21 June 2024 (UTC)[reply]

i agree. its worth noting that iluec is an extreme outlier, but even so, if there are a lot of examples where the same word shows up several times, a reader would waste time trying to guess which was the correct one. Soap 09:36, 21 June 2024 (UTC)[reply]
Clearly something needs to be done about this situation. A lot of scripts, bots and tools (like OrangeLinks) depend on every entry being categorised into either a "LANG lemmas" or "LANG non-lemma forms" category (or a couple of other special categories for Japanese entries iirc).
Is your proposal specifically that if a POS header consists solely of one or more alternative form sense lines and no other senses, it should be categorised into "LANG alternative forms" instead of "LANG lemmas"? Sounds like a maintenance headache, if I'm honest. I don't think it's possible for the {{head}} template to detect this automatically, so it would need to be done manually or by bot. Wondering if @Benwing2, Theknightwho have any input on this.
If your concern is that these entries are cluttering the POS categories specifically, let's keep them in "LANG lemmas" and substitute the POS category with the "LANG alternative forms" category: {{head|fro|alternative form}} This, that and the other (talk) 01:23, 22 June 2024 (UTC)[reply]
I don't actually think it's such a big deal to change things like the OrangeLinks gadget to know about alternative forms as an alternative (so to speak) to lemmas or non-lemma forms. It's usually just a one-line change and I don't think there are that many tools or scripts that would need changing. Your suggestion of using alternative form (probably with a shorter alias altform provided) as the POS is a good one, I think, although if we want to use lang-specific templates to provide inflections of these alt forms, we'd probably need to add a parameter |altform=1 or similar to the lang-specific templates. Whether we actually want alt forms to be in lemmas might depend on the particular language. In particular, non-standardized languages like Old French and Middle English are IMO materially different from (semi-)standardized languages like English or Portuguese that have more than one spelling convention. In the former case, it makes sense IMO to put the canonical lemma at a standardized spelling and make all the other spellings be alt forms that don't appear in LANG lemmas, but in the latter case, we can't reasonably privilege one standard spelling over another (although we could do the Old French thing for obsolete and superseded spellings). Benwing2 (talk) 01:42, 22 June 2024 (UTC)[reply]
@This, that and the other: Another possibility would be to have subcategories with "alternative" prefixed: Category:Middle English lemmas would have a subcategory "Category:Middle English alternative lemmas", for instance. This could be triggered by something like |isalt=1 in the headword template. There may be some cases where the same headword has both altform and mainform senses, though. We would also have to consider whether there might be some category names that would be just too long.
The advantage of this is it would preserve the lemma/nonlemma noun/noun form, etc. distinctions and require less modification of the code. It would also mean that all of the forms we have together now would still be findable from the old category name, but the alternative forms would be out of the parent category. Chuck Entz (talk) 01:53, 22 June 2024 (UTC)[reply]
This seems like a good idea to me. Benwing2 (talk) 19:50, 23 June 2024 (UTC)[reply]
I don’t have any particular objection against there being categories like ‘Old French plurals of alternative noun lemmas’ but it isn’t clear to me how someone could find them useful.
By the way, regarding cases like British vs American English, I think a language having more than one official standard isn’t really a matter of ‘altforms’. In an ideal world I think we’d simply have a template that binds for example center to centre, such that they both automatically share the same content (definitions, etymology, and so forth). Someone for instance flips a coin and decides that American English should be the default on Wiktionary. Then the entry for centre is reduced to just {{mirror|en|center|UK}} which displays a copy of the corresponding American English entry. (With the headword spelling adjusted of course.) How feasible that would be I don’t know. Nicodene (talk) 21:55, 23 June 2024 (UTC)[reply]
This is definitely feasible. In fact we've discussed doing exactly this for Serbo-Croatian and Punjabi, where there are multiple scripts for the same language and we don't want to prioritize one over another. It would probably be an extension of the existing {{tcl}} ("transclude") template, which transcludes individual meanings from one entry to another, or something else similar in spirit. It can get tricky in English because there are some complex cases where e.g. only spelling A is used in American English but spellings A and B are both used in British English with different meanings (I can't think of an example but I know they exist). There's also the issue of what to do with quotes and usexes; e.g. presumably quotes should maintain the original spelling but do we want different quotations illustrating the respective spellings, or does it not matter? Should usexes have the spelling automatically adjusted, and if so what about other terms needing spelling adjustments? But these sorts of issues should be solvable, one way or another. Benwing2 (talk) 22:49, 23 June 2024 (UTC)[reply]
(@Benwing2: what about license which is used as both a noun and verb in American English, but only as a verb in British English (the noun being spelled licence)?) — Sgconlaw (talk) 22:54, 23 June 2024 (UTC)[reply]
@Sgconlaw Yes, this is one. The one I was thinking of was draft, which is split draft ~ draught in British English in a very complex fashion. There's also program in American English vs. programme ~ program in British English, and disk ~ disc in both varieties with different preferred usages. Benwing2 (talk) 23:06, 23 June 2024 (UTC)[reply]
I don’t think quotations will be an issue. At present the lemma usually has quotations with all alternative forms, while alternative form entries only have quotations with that specific form. If there’s some sort of transclusion, then we just won’t need to list quotations with alternative form spellings separately at the alt entries. Regarding usage examples, maybe the solution is just to give several examples using the different forms and add qualifiers stating “American spelling”, “British spelling”, etc. A more difficult issue is what spelling to use in definitions. I’m not sure how that should be dealt with. — Sgconlaw (talk) 23:11, 23 June 2024 (UTC)[reply]
I suppose one way is to use a combination of automatic conversions with manual overrides as necessary. For example, the automatic conversions could be a combination of pattern matches (for ise ~ ize words, with overrides as needed) and individual entries, and manual overrides specified in the Wikicode maybe as {{~|disc|disk}} for words or grammatical differences that can't be handled automatically. In cases like draft vs. draft ~ draught, ideally the Wikicode would have the British spelling, because in this case British -> American can be done automatically but the other way can't. Similarly for license vs. license ~ licence and program vs. program ~ programme. (Or we could just punt the whole issue and let the spelling be whatever, although I don't consider that ideal.) Benwing2 (talk) 23:22, 23 June 2024 (UTC)[reply]
We could save ourselves the trouble and leave the quotes as-is. We already include quotes with altforms on lemma entries - why not quotes with other official spellings?
For cases like draft~draught, I think it’d be easiest to lemmatize forms found in both British and American English (draft, program, license, and arbitrarily either disc or disk), altformify the others, and then explain regional differences in a usage note. For license the usage note would mention “in British English the noun is spelt licence”. And the entry for licence would just have “noun, British spelling, alternative form of license”.
I can’t think of splits that go in the other direction (distinct American spellings that are homographs in British English) so this may be a good reason to treat American spelling as a general default. I say this despite personally using British (Oxford) spelling. Nicodene (talk) 00:08, 24 June 2024 (UTC)[reply]
Support Ioaxxere (talk) 08:31, 23 June 2024 (UTC)[reply]
The current situation where some Franco-Provençal entries are categorised into dialect categories only (e.g. garda), continues to cause great headaches for the todo lists project.
I'd like to propose the following way forward:
Experimentally, and limited for the time being to Franco-Provençal, Old French, and any other minor Romance languages Nicodene is working on, we create Cat:LANG alternative forms and place entries that
  • (a) consist only of form-of sense lines, and
  • (b) would normally be categorised into Cat:LANG lemmas
in Cat:LANG alternative forms instead of Cat:LANG lemmas.
If this works well, we can think about rolling this out more widely. Pinging @Nicodene, Benwing2, Ioaxxere, Chuck Entz, Soap to see if there are any objections. This, that and the other (talk) 06:15, 16 October 2024 (UTC)[reply]
Sounds good to me. Nicodene (talk) 06:26, 16 October 2024 (UTC)[reply]
There being no objections (noting that all the people I pinged have been active since the ping, so I take it they have seen this), I created Category:Franco-Provençal alternative forms and added an |altform=1 parameter to {{head}}. (It would have been possible to achieve the same outcome using {{head|...|noposcat=1|cat2=alternative forms}}, but that feels too long-winded for "everyday" use, and it also doesn't work if we want to add per-POS alternative form categories like "LANG noun alternative forms" in the future.) This, that and the other (talk) 10:10, 18 October 2024 (UTC)[reply]

Hyphenation: syllabi(fi)cation in writing

[edit]

What is the algorithm followed in entries such as man·u·script? JMGN (talk) 15:37, 22 June 2024 (UTC)[reply]

@JMGN: usually the etymology of the term. — Sgconlaw (talk) 15:45, 22 June 2024 (UTC)[reply]
If a word is a compound, it can be split before the second component; e.g. knights-wort. The (phonetic) onset of a syllable after a hyphen must be phonotactically possible in English. Because [-ptə(ɹ)] is impossible, helico-pter is not an acceptable hyphenation, pace its etymology. The quality of the vowel of the syllable preceding the hyphen plays a role: compare po-stern and pos-ture. But it is both disci-ple and disci-pline, in spite of the difference in quality, so there may not be a straightforward rule based on the pronunciation. See further the article Syllabification on Wikipedia.  --Lambiam 20:35, 24 June 2024 (UTC)[reply]
@Lambiam: For some consistency, I wonder if it is worth coming up with a set of guidelines and seeing if we can find consensus for them, then updating "Wiktionary:Pronunciation#Hyphenation" (a draft proposal) to specify them. — Sgconlaw (talk) 21:22, 24 June 2024 (UTC)[reply]
The only guideline I can think of is to consult a major dictionary that indicates how English words are hy‧phen‧at‧ed,[12] such as The American Heritage Dictionary of the English Language. This will not reveal a different British English hyphenation, as of the word knowledge. I am not aware of online sources for British English hyphenation.  --Lambiam 07:42, 25 June 2024 (UTC)[reply]

Could the revision history of this page be restored? Ioaxxere (talk) 08:09, 23 June 2024 (UTC)[reply]

It's not what he would've wanted. Denazz (talk) 21:22, 23 June 2024 (UTC)[reply]
Never change, WF. Nicodene (talk) 21:41, 23 June 2024 (UTC)[reply]
WP has a rule against deleting user talk pages (w:WP:DELTALK), on the grounds that "they are usually needed for reference by other users". But it seems we don't follow that here. See the deletion histories of some prominent contributors' user talk pages.
Personally I think it is a poor look to delete your own talk page, and it shouldn't be done. I wouldn't ever honor a {{d}} request of this kind (unless the user was a non-contributor of course). But there are clearly several admins who are comfortable with the practice of deleting a user's talk page when they ask. I'm interested to hear what others think here. This, that and the other (talk) 12:57, 24 June 2024 (UTC)[reply]
I would support a rule against allowing the deletion of a user talk page. Honestly, the fact that it's done currently in cases of controversial users or criticism just shows to me that it's done to hide critiques, rather than protecting the user from anything. AG202 (talk) 14:56, 24 June 2024 (UTC)[reply]
I am inclined to agree. Benwing2 (talk) 19:41, 24 June 2024 (UTC)[reply]
Me too. If the user wants to blank the page that's up to them, but unless there's a very good reason (e.g., it contains someone's private information) administrators shouldn't delete the page. — Sgconlaw (talk) 21:24, 24 June 2024 (UTC)[reply]
I too am inclined to agree. There might be edge cases where it'd make sense to just delete a talk page, like if (nearly) every revision contained personal information and so deleting it was simpler than revdelling every revision (maybe if a user in good standing edited under their full name and then decided to be globally renamed to "Renamed user 2345675434" to vanish, but their talk page would be full of them commenting under their old username?), but in general, it seems like we can revdel specific revisions without needing to delete a whole page / revision history. Seeing multiple users in good standing asking for this page to be restored, I'll restore it now, whether we adopt a general policy or not. I agree that a user merely blanking their page is a different matter. - -sche (discuss) 22:48, 24 June 2024 (UTC)[reply]
Due to the number of revisions involved, my first attempt got: "To avoid creating high replication lag, this transaction was aborted because the write duration (6.0507435798645) exceeded the 3 second limit. If you are changing many items at once, try doing multiple smaller operations instead. [...] Fatal exception of type "Wikimedia\Rdbms\DBTransactionSizeError". Restored it on the second try. - -sche (discuss) 22:54, 24 June 2024 (UTC)[reply]
I'm adding to the chorus to agree. I don't see the actual justification for deleting and the purported reason in the log is "tl dr", which is not helpful. —Justin (koavf)TCM 22:55, 24 June 2024 (UTC)[reply]
I agree generally, but maybe allow deleting in cases related to harassment? CitationsFreak (talk) 23:07, 24 June 2024 (UTC)[reply]
I would say revert or revdel as needed and then protect the page, but wholesale deletion is using a rocket launcher when you need a flyswatter. —Justin (koavf)TCM 23:17, 24 June 2024 (UTC)[reply]
Speaking of which, I note that the old history of @Victar was hidden in the same way, and think it should be restored into an archive. Happy to make a separate thread to request this, if necessary. Theknightwho (talk) 01:23, 25 June 2024 (UTC)[reply]
@Theknightwho inspection of the complete deletion log shows that in fact the page was restored in 2023 on Victar's own request. There are no deleted revisions. This, that and the other (talk) 05:32, 25 June 2024 (UTC)[reply]
Alright. Theknightwho (talk) 05:37, 25 June 2024 (UTC)[reply]
In case of privacy violation, it can and will be hidden either way, and we even comply with GDPR requests, isn’t it. So I don’t see how talk pages should be deleted only because they are in the user-space, which in reality is only a topic-space with varyingly loose relations to a user personally. For the sake of the scientific argument, or also good coding practice, the default assumption should be that a user page is kept.
There are of course low-relevance cases of drive-by IPs only receiving a welcome message and some random notes, but we would needs withstand deletion if, say, one of Theknightwho or Benwing2 becomes loony and deletes reasonings given on his talk page for module implementations, likewise the contributions of many users to philologic argument are too high for one to suffer suppression of the tracks of the scientific discourse well. It would be as if we worked on something irrelevant. So I figure how Theknightwho has the general impression of deleting a talk-page not being right; the personal preference has an objective basis here. Fay Freak (talk) 04:41, 25 June 2024 (UTC)[reply]

Voting to ratify the Wikimedia Movement Charter is now open – cast your vote

[edit]
You can find this message translated into additional languages on Meta-wiki. Please help translate to your language

Hello everyone,

The voting to ratify the Wikimedia Movement Charter is now open. The Wikimedia Movement Charter is a document to define roles and responsibilities for all the members and entities of the Wikimedia movement, including the creation of a new body – the Global Council – for movement governance.

The final version of the Wikimedia Movement Charter is available on Meta in different languages and attached here in PDF format for your reading.

Voting commenced on SecurePoll on June 25, 2024 at 00:01 UTC and will conclude on July 9, 2024 at 23:59 UTC. Please read more on the voter information and eligibility details.

After reading the Charter, please vote here and share this note further.

If you have any questions about the ratification vote, please contact the Charter Electoral Commission at cec@wikimedia.org.

On behalf of the CEC,

RamzyM (WMF) 10:52, 25 June 2024 (UTC)[reply]

The slashes in the transcription (ts=) parameter

[edit]

Could we please change the (ts=) parameter so that it uses something else instead of slashes to delimit transcriptions? I understand that transcriptions are intended to accompany transliterations for certain languages, where it's useful to have a literal transliteration followed by a transcription that shows how it was actually read, but the problem with slashes is that they make it look like the transcription is meant to be a phonemic IPA pronunciation, which isn't what's intended most of the time. Indeed, commenters in the original discussion explicitly didn't want people to use ts= for pronunciations; however, 6 years on, and taking a completely random selection of uses, I can sort them into three buckets:

  • Very likely intended as IPA:
    • Korean (/⁠ʌ⁠/) - transliteration suppressed and transcription added; clearly referring to pronunciation
    • Chagatai قزاق (qazāq /⁠qazaq⁠/) - seems to be a slightly broader version of the pronunciation on the entry (/qɑ.zɑq/), intended to show the lack of length distinction
  • Resemble IPA on the surface, but the nature of the language means they must be deciphered/reconstructed readings:
  • Definitely not IPA:
    • Old Persian 𐎭𐎠𐎼𐎹𐎺𐎢𐏁 (d-a-r-y-v-u-š /⁠Dārayauš⁠⁠/) - capitalised proper noun
    • Old Turkic 𐰖𐰉𐰕 (y¹b¹z /⁠yabïz⁠/) - interpreting this as IPA /ï/ would be hopeless (is it even allowed?), but ï is a common Turkic transcription of /ɯ/
    • Phoenician 𐤏𐤋𐤉𐤑 𐤏𐤁𐤀 (ʿlyṣ ʿbʾ /⁠ʿaliṣ-ʿuboʾ⁠/) - ʿ and ʾ are not part of IPA

The ones that really concern me are those in the second group, because without any indication it's invalid IPA, it runs the risk of misleading even experienced users who may not be familiar with the language in question; especially when we do give pronunciations for some languages from hundreds/thousands of years ago. Hell, at Middle Persian 𐭬𐭤 (mh /⁠čē⁠/) we even have a pronunciation section with /tʃeː/ and the transcription /čē/ on the headword line, which is very silly. Obviously the transcription is useful to have, but we shouldn't be using slashes for two different things within the same entry, especially when we don't give readers any clue that that's what's happening, so a naive reader may assume one of them is simply a mistake.

From reading the discussion linked above, the basis for using slashes seems to have been that (a) one user started using them because that's how A Dictionary of Manichaean Middle Persian and Parthian uses them, and (b) someone else mentioned that Russian dictionaries sometimes include Cyrillic transcriptions in square brackets (which is maybe sort of the same thing if you squint really, really hard). However, It's all very well for a publication to use slashes to mean something other than IPA, so long as it's consistent, but the big difference between us and that dictionary is that we also use slashes to refer to IPA, and indeed some editors have used ts= for genuine IPA, as you can see above.

Is there anything else we could use instead? Theknightwho (talk) 14:46, 25 June 2024 (UTC)[reply]

I completely agree. I don't know what would be best though. Either we could use some other sort of delimiters (but which ones?), some sort of font or color indication (but how?), or some abbreviation like ts., appropriately linked so readers will have some idea what it means. Benwing2 (talk) 20:41, 25 June 2024 (UTC)[reply]
@Benwing2 I'd prefer delimiters, as anything involving colours runs into accessibility issues with colour-blindness etc that I don't want to figure out. We should pick something that doesn't have another meaning, which rules out most of the common delimiters, but I think these three probably work okay, with the first being my preference:
Theknightwho (talk) 19:36, 26 June 2024 (UTC)[reply]
@Benwing2, Theknightwho: just wanted to highlight that the second example above is showing up for me as two rectangles when viewed on a mobile device. I assume that isn’t the desired output. — Sgconlaw (talk) 22:37, 26 June 2024 (UTC)[reply]
Can you paste a screenshot to imgur.com? For me the second example contains two half-brackets, one in the top left and one in the top right. Benwing2 (talk) 22:39, 26 June 2024 (UTC)[reply]
@Benwing2 I have the same issue when viewing it on an iPhone, so I assume the boxes are replacement characters. Theknightwho (talk) 22:40, 26 June 2024 (UTC)[reply]
Interesting, I see the same thing with the chars ⸢⸥, while the overly tall 「」 look totally fine (they look like how the ⸢⸥ chars look on my desktop). This can be fixed with CSS if necessary. Benwing2 (talk) 22:45, 26 June 2024 (UTC)[reply]
@Benwing2 Great - if we can use CSS to fix any size issues then 「」 sounds like the best option. Theknightwho (talk) 22:59, 26 June 2024 (UTC)[reply]
@Theknightwho Not sure it's possible to use CSS to fix size issues like this (the issue is rather that the characters 「」 on the desktop extend to the full height of the bounding box rather than going halfway down) but I think if the characters are appropriately tagged with a CSS class, you can e.g. use ⸢⸥ and make the surrounding CSS class on mobile have "display: none;" (to not display the character) and use the ::before selector to insert a different character before, something like this:
.ts-left {
    display: none;
}
.ts-left::before {
     content: "「";
}
and similarly for the bottom-right half-bracket. I would prefer we use ⸢⸥ as the actual chars and the above hack on mobile only (maybe iPhone only if it's possible to have a selector for that), since it seems to be a bug in the iPhone's handling of the chars, and since the ⸢⸥ chars seem to be preferred over the 「」 chars (which are in the U+FFxx compatibility area). Benwing2 (talk) 23:17, 26 June 2024 (UTC)[reply]
I'm having the same issue, and I'm using the latest iPhone too! If my default font doesn't support it then I think theres a good chance that a lot of mobile devices don't. I think that means ⸢⸥ are out of the question. (edited cuz I didn't read the thread before commenting) — BABRtalk 00:28, 27 June 2024 (UTC)[reply]
My preferences are ‹qazaq› followed maybe by 「qazaq」 and then ⸢qazaq⸣, although I think the latter two both look somewhat strange. If you want to use box corners (or whatever you call them), maybe ⸢qazaq⸥ would be better; this uses top-left and bottom-right "half brackets" rather than the taller versions in 「qazaq」, which appear to be called "halfwidth left corner bracket" and "halfwidth right corner bracket". Benwing2 (talk) 19:56, 26 June 2024 (UTC)[reply]
Would abslutely not support this change for the reasons found in the original discussion. --{{victar|talk}} 20:48, 26 June 2024 (UTC)[reply]
The original reason was that you (and you alone) were used to slashes because of a single dictionary, which isn't a reason at all really. They're misleading, and need to be changed to something else. Theknightwho (talk) 21:11, 26 June 2024 (UTC)[reply]
@Benwing2 I find ‹qazaq› slightly hard to make out, but ⸢qazaq⸥ works well. Theknightwho (talk) 21:16, 26 June 2024 (UTC)[reply]
⸢qazaq⸥ and 「qazaq」 look good to me, ‹qazaq› is too similar to ⟨⟩ employed for representations of writing forms. Fay Freak (talk) 21:23, 26 June 2024 (UTC)[reply]
Show me dictionaries or academic papers that that use ⸢⸥ or 「」. --{{victar|talk}} 21:50, 26 June 2024 (UTC)[reply]
Are you going to address the point that using / / to mean two different things is a problem? Show me a single academic dictionary or paper that does that. To be honest, I'm curious to know if there are any sources outside of that one dictionary that use / / for transcriptions. Theknightwho (talk) 22:30, 26 June 2024 (UTC)[reply]
Yep, {{R:ira:Novak:2013}} and Basharin (2013) also use slashes for transcripts. Reusing symbols happens everywhere. If people are not reading the documentation and erroneously putting IPA characters in the |ts= field, there should be better errors to stop them. --{{victar|talk}} 00:03, 27 June 2024 (UTC)[reply]
@Victar Where do they do that in the second one? They're just using single slashes as "X/Y" to mean "X or Y", which is completely different. Even if they were using slashes in the way you've chosen to use them, I don't see any instances of IPA in that paper, so it doesn't count, because the whole point is that we're being inconsistent. Try again.
It's not possible to throw errors for this, because symbols used in transcription fequently make for legal IPA, as I've shown above. This is why it's a problem, as I have already pointed out. Theknightwho (talk) 00:15, 27 June 2024 (UTC)[reply]
Basharin (2013) p 114: HḄWṢYNʾ hlbyck /xarbīčak, xarbūčak/ ‘water-melon’; Novak (2013) p91: wšw /ʷəxšú, ᵊxʷəšú/.
There are certain character which are only used in IPA, like ˈ and ː, but we can also do is have languages that do set manual transcriptions, allow them to specify which characters are acceptable and which aren't. --{{victar|talk}} 00:34, 27 June 2024 (UTC)[reply]
Took me no time to find, actually the first modern thing in my Semitic folder: Peter Stein: Lehrbuch der Sabaischen Sprache 2 vols. 2012–2013 uses transcriptions between slashes.
As Victar implies, this has permeated the philologies of the relevant languages, so I felt surprised and gaslighted by the offence in it. I only reviewed the visual merits of various options notwithstanding usage.
Turns also out that ⸢qazaq⸣ cannot be used because it is already used for some conjecture kind of thing, e.g. in Akkadian Written by Egyptian Scribes in the 14th and 13th Centuries BCE in Proceedings of the 53th Rencontre Assyriologique Internationale Vol. 1 page 805 (2010) they cite from the cuneiform version of the treaty between Ramses II and Ḫattusilli III, without explanation, so it is known in the field:
ul-tù ⸢dá⸣-ri-ti ilu(DINGIR-LIM) ú-ul i-na-an-⸢din⸣ a-na e-pé-ši nukurti (LÚ.KÚR) i-na be-ri-šu-nu/ [i-na ri-ki-il-ti a-d]i da-a-ri-ti
‘from the beginning the god did not ever permit the making of hostilities between them [by means of a treaty for]ever’ (lines 10–11; Edel 1997:6, 18). Fay Freak (talk) 00:09, 27 June 2024 (UTC)[reply]
Wiktionary is the best glossary and has entry on ⸢ ⸣ , too, which Theknightwho has edited two times. Fay Freak (talk) 00:13, 27 June 2024 (UTC)[reply]
I have no particular attachment to ⸢ ⸣. I just want us to use something that isn't going to mislead users by standing for two different things. Theknightwho (talk) 00:18, 27 June 2024 (UTC)[reply]
I agree it's a bit problematic to use /.../ for multiple different types of transcriptions, but I'm somewhat conflicted by the idea of using something other than slashes, since slashes are pretty ubiquitous with transcription. Like, just spit-balling here, but what if we did something like:
قزاق (qazāq, trans.?/qazaq/) or
قزاق (qazāq, /qazaq/?)
with the ? taking you to a section of the documentation page that explains not to use it for IPA?? Idk, just an idea.
BABRtalk 00:50, 27 June 2024 (UTC)[reply]
The usage of the proposed new delimiters for transcriptions seems unprecedented, while / / has a long-standing tradition behind it. To my eyes there is no inconsistency in how we are using them: / / always means "transcription", and when it is an IPA transcription in particular there is usually a blue "IPA (key)" text before it. In any case, by this logic we would have to strip away [ ] from its usage for normalisations, brackets=on, corrupted portions of quotations, translation/transliteration of book titles, etc. only because it is used already for phonetic transcriptions. Catonif (talk) 00:57, 27 June 2024 (UTC)[reply]

(vnv) meaning

[edit]

The pronunciation of several French words is shown that way, e.g. "crevé":

What means (vnv) here?

I searched in the help pages to no avail. Jlliagre (talk) 00:15, 26 June 2024 (UTC)[reply]

That seems to have been added by WingerBot. I also noticed the |pos=v examples at {{fr-IPA}} don’t give what I’d expect. @Benwing2, you seem to be the boss there. MuDavid 栘𩿠 (talk) 01:14, 26 June 2024 (UTC)[reply]
@MuDavid Oops, I standardized the handling of various parameters and I forgot that |pos= had a special meaning for {{fr-IPA}}. Will fix. Benwing2 (talk) 01:25, 26 June 2024 (UTC)[reply]
Fixed. I'm not sure what pos=vnv was intended to mean; it has no effect now. Benwing2 (talk) 01:33, 26 June 2024 (UTC)[reply]
Thanks! Jlliagre (talk) 01:41, 26 June 2024 (UTC)[reply]
In October 2019, when WingerBot added vnv to the entry, the module was being rewritten to cover various things like "ihV". But even at the end of that round of edits, I don't see vnv in the module. If no-one knows what pos=vnv was intended to do, should we just remove it...? - -sche (discuss) 02:09, 26 June 2024 (UTC)[reply]
Yeah I agree. I looked back to the introduction of pos= in 2016 and it always only had the value of "v". Benwing2 (talk) 03:33, 26 June 2024 (UTC)[reply]

Headword word IDs in multiword entries

[edit]

We need some way to add sense IDs when linking to words in the headword line. Having to do e.g. |head=[[olla]] [[kukko#Finnish: animal|kukko]]na [[tunkio]]lla is not very practical. My first idea was to support some kind of new parameter for {{head}} that supports e.g. inline modifiers after links, like |head2=[[olla]] [[kukko]]<alt:kukkona><id:animal> [[tunkio]]lla. — SURJECTION / T / C / L / 12:28, 26 June 2024 (UTC)[reply]

I would support something like this. Note that I've implemented a special syntax for some languages (so far, English and various Romance languages) to make it easier to correctly handle multiword linking in long terms. It's documented under Module:en-headword#Link modifications and is enabled when you use a value for |head=, |head2=, etc. that begins with ~. I have thought of extending this to all languages but haven't done it yet. Inline modifiers could be added to this syntax and/or to the plain |head= syntax, as in your example. Benwing2 (talk) 19:00, 26 June 2024 (UTC)[reply]
I'm not sure i understand the benefit of the ~ syntax. In any case, modifier support could be added to |head= too, but one'd have to avoid break existing uses. — SURJECTION / T / C / L / 08:36, 4 July 2024 (UTC)[reply]
I think that could be done. The reason I introduced the ~ syntax was to avoid you having to repeat the whole head e.g. in a 5+-word lemma when e.g. you need to modify the default linking of one or two words. It doesn't save you effort in a case like |head=[[olla]] [[kukko]]na [[tunkio]]lla when you have only 3 words and two have to modify the default linking, but it helps in a case like English admiral of the Swiss Navy, where the default linking is [[admiral]] [[of]] [[the]] [[Swiss]] [[Navy]] and you want to change the linking of Navy to [[navy|Navy]] while keeping the remainder unchanged. So you'd write ...|head=~[N:n]avy instead of ...|head=[[admiral]] [[of]] [[the]] [[Swiss]] [[navy|Navy]], a savings of 49 - 11 = 38 characters of typing.
I think modifier support could be added without breaking existing uses; a simple solution, which I already have implemented, is to not parse modifiers if HTML is detected at top level. See Module:parse utilities#L-209. This allows HTML inside of qualifiers and such but if you e.g. use {{l|foo|bar}} at top level inside of |head= (which some people do), or <sup>...</sup>, it won't trip up the parser. We could make the parser correctly handle inline modifiers in the presence of <sup>...</sup> at the cost of a bit more complexity in the parsing code. Benwing2 (talk) 09:07, 4 July 2024 (UTC)[reply]

Words of uncertain reading, etc.

[edit]

I have a bit of a cunundrum. Many Old Polish words have uncertain readings, for which we have Appendix:Old Polish terms of uncertain reading. However, I discovered that some also have uncertain parts of speech, like Appendix:Old Polish terms of uncertain reading#baze. What would be the best way to handle this? Vininn126 (talk) 18:35, 26 June 2024 (UTC)[reply]

@Vininn126: maybe put the word under the most plausible part of speech, and add a usage note explaining that there’s uncertainty about this. — Sgconlaw (talk) 22:40, 26 June 2024 (UTC)[reply]
that's the thing. It sort of looks like a preposition but it could be a noun. I'm not sure there is a most plausible here. Vininn126 (talk) 22:52, 26 June 2024 (UTC)[reply]
@Vininn126: in that case it probably doesn’t matter which one you pick. — Sgconlaw (talk) 22:56, 26 June 2024 (UTC)[reply]
Sgconlaw's idea sounds reasonable, but to spitball some other ideas in case any are more appealing: I notice assalay just doesn't put a part of speech at all. I have seen "particle" used as a catchall / wastebasket for anything that doesn't clearly fit somewhere else. I have seen Chinese entries just use "Definitions" as the L3 / POS header (not without considerable controversy). - -sche (discuss) 23:01, 26 June 2024 (UTC)[reply]
@-sche That's an interesting idea. Vininn126 (talk) 05:42, 27 June 2024 (UTC)[reply]
We also have kelaunikui under the POS heading "Word"... This, that and the other (talk) 09:12, 27 June 2024 (UTC)[reply]
Hm, I suppose since appendices often are the wild west, this could work, too. This might be my favorite option so far. Vininn126 (talk) 09:14, 27 June 2024 (UTC)[reply]
Is |3=lemma documented in {{head}}? Vininn126 (talk) 09:15, 27 June 2024 (UTC)[reply]
I don’t think new part of speech headings not sanctioned by Wiktionary:Entry layout should be used without consensus. — Sgconlaw (talk) 10:46, 27 June 2024 (UTC)[reply]
[edit]

Do we have any sitewide guidelines or policies or policy-adjacent recommendations regarding links within glosses? By that I mean for example eau (water) versus eau (water). I have long been under the impression that links within glosses were discouraged, but I have seen other editors not just use links in glosses when writing new text, but actively adding links to glosses in existing text. Is this a "do whatever you like" situation or is there actual guidance somewhere? I couldn't find anything at WT:ELE or in the documentation for {{l}}. —Mahāgaja · talk 08:55, 27 June 2024 (UTC)[reply]

I'm interested to know what others say as well. I personally don't have strong opinions either way, so I'm willing to adapt to what people think the practice should be. Vininn126 (talk) 09:00, 27 June 2024 (UTC)[reply]
I was once advised not to link in etymology sections, but I do it in definitions since we usually link key words in definitions anyway. — Sgconlaw (talk) 10:48, 27 June 2024 (UTC)[reply]
It would make sense to ensure that words used in glosses didn't require links to be understood. If an obscure, technical, or highly polysemic word is used in a gloss, then a link is warranted and even essential. DCDuring (talk) 13:57, 27 June 2024 (UTC)[reply]
@DCDuring I treat glosses of English and non-English terms differently (and I don't think I'm the only one): for non-English terms, I prefer to give a precise English equivalant, if possible (i.e. something you'd use if you were translating it into English as part of a larger piece of text). For English terms, glosses are purely explanatory. In the latter case, I agree that they should use straightforward language that most people would understand without needing a link, but in the first case that's not always possible (or even appropriate, if the term is technical or esoteric), so a link makes sense. If I see a non-English term with an explanatory gloss, that indicates to me that there isn't an English equivalent (or the editor didn't know what it was). Theknightwho (talk) 16:00, 27 June 2024 (UTC)[reply]
Yes. If anything, we need more explanatory glosses in non-English L2s. It is a very lazy approach that simply inserts a polysemic English term as a gloss. In those cases, in particular, a link to a senseid'ed definition would be a good alternative to an unlinked explanation. DCDuring (talk) 16:10, 27 June 2024 (UTC)[reply]
@DCDuring Yeah, if the best English equivalent is a polysemic term, then I'd definitely opt for the double-approach (with maybe an abridged explanation, depending on context): 德國人德国人 (Déguórén, German; person from Germany).
Otherwise, we risk more "vessel" incidents (Rajkiandris added a stub with the definition "vessel", and linked it as a translation under "ship", but the actual meaning was "drinking vessel"). Theknightwho (talk) 20:17, 27 June 2024 (UTC)[reply]
Yeah, I think this is more or less my approach as well. Vininn126 (talk) 16:40, 27 June 2024 (UTC)[reply]
That's my approach. I avoid linking glosses in etymologies and much prefer them not to be linked there. I link only difficult terms in glosses in FL entries, though I don't usually remove links that are already there. Andrew Sheedy (talk) 17:30, 27 June 2024 (UTC)[reply]
I agree with the themes herein. Thus: Try to use simple words in glosses, which don't need links; but when a nonsimple word is apt (and circumlocuting around it is counterproductive), just ensure that it is linked so that any user who wants to find out what it means can easily click/tap. Corollary: in this context, take a moment to bother to send them straight to a POS anchor or ID anchor. They don't want to land at the top of a big-ass entry and then hunt for what was meant in the context that they came from. Quercus solaris (talk) 18:34, 27 June 2024 (UTC)[reply]
It used to be done a lot back in the old days -- not so much anymore. I generally remove them when I see them, so I would be in support of hardcoding it into the rules. --{{victar|talk}} 18:56, 27 June 2024 (UTC)[reply]
I think User:Theknightwho summed up the principles well for how to write good definitions. I would just add/clarify (in the following I'm thinking specifically of non-English terms):
  1. Don't use obsolete, dated or archaic terms in definitions, even if the term or sense itself is obsolete. E.g. I just came across مَنَّ (manna) with the definition #6 "to reproach, to upbraid, to exprobrate". I don't know what "exprobrate" means and it adds negative value to the definition; "reproach" and "upbraid" are enough ("scold" would be even better) and including an archaic term just confuses things. Similarly, definite #4 says "# (obsolete) to jade, to tire". Even though the meaning itself is obsolete, you should not use obsolete or obscure terms like jade in definitions.
  2. Don't give more than 3 synonyms. E.g. Stephen Brown (RIP?) and certain other contributors would sometimes list 10 or more synonyms in a definition. This is IMO unhelpful esp. as most of the time the different synonyms all have different shades of meaning in English, so it's not clear which ones are the best translations.
  3. If a term used in a definition has more than one possible meaning, you should include context to clarify the meaning; either a label, or a synonym, or an explanatory qualifier, etc.
  4. When giving the definition of a non-English term, you should strive very hard to find the equivalent English term; give a multiword definition only if there is no equivalent English term or you really can't find it (e.g. for some technical terms in foreign languages, it can be extremely hard to figure out the equivalent in English unless you're an expert in the field in question). You want to think in terms of translation equivalence as much as possible. At the same time, some foreign terms have extra shades of meaning that aren't conveyed by the closest English term; those shades should be conveyed using qualifiers.
Benwing2 (talk) 06:54, 28 June 2024 (UTC)[reply]
@Benwing2 One last thing I'd add is not to shy away from using the term itself as the first word in the gloss if English has borrowed it in an unadapted form. e.g. 호떡 (hotteok, hotteok; a type of filled pancake popular as street food in South Korea) might look a bit silly, but it tells the reader that English does have a term for it, and it just-so-happens to be a direct borrowing. You sometimes see a similar phenomenon in non-English entries where the term is the same as English (e.g. manga), where editors don't bother linking to the English entry and instead write an explanatory definition (or copy the English one), not realising that that implies English has no equivalent term. The fact it's the same in both languages shouldn't make any difference, really, since a naive reader isn't going to know that until you tell them. Theknightwho (talk) 19:16, 29 June 2024 (UTC)[reply]
Totally agreed. Benwing2 (talk) 19:57, 29 June 2024 (UTC)[reply]

Request for Template/module editing permissions

[edit]

Been working on templates/modules for the last few days(mostly clearing out Category:Categories_that_are_not_defined_in_the_category_tree), this would allow me to do such, been also wanting to work on updating Template Data on various templates, was told this was the right place to ask such a request. Akaibu (talk) 16:40, 27 June 2024 (UTC)[reply]

@Akaibu Can you be more specific as to what you are interested in working on and what permissions you're looking for? Are you referring to Wiktionary:Template editors? Generally we don't give template editor permissions unless there's a very good reason to do so and for a long-established user with a solid record of work on modules, because template editor permissions give you the ability to make changes to core modules that can massively mess things up if not done carefully. If there is a specific module you want to work on that is template-editor-protected, it might be better to downgrade the permissions on the module (depending on which module it is). Also, when you say "Template Data" are you referring to the TemplateData documentation stuff at the bottom of documentation pages such as the one for Template:mention? You don't need template editor rights to edit doc pages. Benwing2 (talk) 21:29, 27 June 2024 (UTC)[reply]
I suppose the various sub pages of Module:category_tree/poscatboiler/data is what i'm intending to work on? idk I've just encountered a number of cases of needing to ask someone else to modify a data module in the course of trying to clear that maintenance. some of the subpages i am able to edit but it's kinda arbitrary from what i understand. Akaibu (talk) 02:11, 28 June 2024 (UTC)[reply]
@Akaibu Yeah we need to fix the permissions of any of these submodules that aren't set to "autoconfirmed". Let me know if you find any. Benwing2 (talk) 18:32, 28 June 2024 (UTC)[reply]
hi, sorry for late reply, now that i'm looking into it, for examples i can't add anything to Module:category_tree/topic_cat/data/Places currently Akaibu (talk) 20:47, 1 July 2024 (UTC)[reply]

new version of Template:+obj

[edit]

We currently have three very underpowered templates {{+obj}}, {{+preo}} and {{+posto}} for indicating governance of verbs, nouns, adjectives and the like. Back in Jan 2021 I created a better and much more powerful replacement, but I wasn't satisfied with the formatting so it's sat on the back burner. Today I made a bunch of formatting changes that IMO make it look significantly nicer, along with better support for qualifiers and support for alternants etc. in the case associated with a given adposition. I am thinking it's ready for deployment, but I'm soliciting some further comments esp. on the formatting. See User:Benwing2/test-obj for a bunch of examples. Some comments:

  1. I'm not wedded to the square brackets surrounding the whole governance structure. This is consistent with how {{+obj}} etc. currently work, but maybe we should just switch to parens.
  2. I'm also not totally wedded to the glosses written small in single quotes like this: ‘towards’. I did it this way because I felt that regular-sized glosses with double quotes distracted from the overall structure.
  3. The conjunctions "along with" and "and" mean the same thing; "along with" is used when joining arguments that contain alternants, as in [with accusative ‘whom’, along with genitive or an ‘which matter’]; here, if you take out the "or an", it would change to [with accusative ‘whom’ and genitive ‘which matter’]. The reason for using "along with" is to avoid the ambiguity that would result from having "foo and bar or baz".

My plan is to rename {{+obj}} to {{+obj/old}}, deploy the new {{+obj}} (which subsumes {{+preo}} and {{+posto}}), with proper documentation, and convert all the old uses to the new syntax. This should be mostly doable by bot, along with some manual cleanup to handle cases where the underpowered nature of the current templates results in people using weird hacked-up notation to express situations that aren't handlable in the current syntax. Benwing2 (talk) 02:03, 28 June 2024 (UTC)[reply]

I personally actually like the formatting. I think brackets are better than than parentheses. I guess making the object of the preposition not small would be marginally better. We may want to establish on whether "something/someone" would be better as the input or "what/whom". I'm supposing most people would prefer the indefinite pronoun. I think I have a slight preference for "whom/what".
being able to list multiple prepositions/cases that have the same meaning (i.e. let's say you have Polish przy czymś(loc)/nad czymś (ins) or maybe even "co/nad czymś" or what-have-you would be nice (unless I am missing something). Vininn126 (talk) 18:36, 28 June 2024 (UTC)[reply]
@Vininn126 You can in fact list multiple prepositions/cases with the same meaning. I put an example of that at the very bottom with czekać. Internally they are slash-separated but currently it displays using with foo + case or bar + other-case. Benwing2 (talk) 20:35, 28 June 2024 (UTC)[reply]
Great! Vininn126 (talk) 20:37, 28 June 2024 (UTC)[reply]
I am in complete assent and approval of the initiative. I especially like the way this allows me to add register qualifiers like ‘(formal) with preposition or (informal) with accusative’—to say nothing of the liberty to combine prepositional and non-prepositional objects. ―⁠Biolongvistul (talk) 19:10, 28 June 2024 (UTC)[reply]
Looks and performs better, so Support. Fay Freak (talk) 19:29, 28 June 2024 (UTC)[reply]
Does it have all the functionalities of {{indtr}}? PUC14:30, 29 June 2024 (UTC)[reply]
@PUC Overall it is much more powerful than {{indtr}} but also fundamentally different as it is intended to go after the definition rather than mixed in with the labels. As a result, for example, it doesn't have support for including arbitrary labels; the intention is that you use {{lb}} for that. Benwing2 (talk) 19:59, 29 June 2024 (UTC)[reply]
BTW it is intended to replace {{indtr}} by using {{lb}} for labels and {{+obj}} for governance. Benwing2 (talk) 20:01, 29 June 2024 (UTC)[reply]

This looks good :) for the kinds of languages which currently use those other templates, languages in which verbs etc govern cases. I take it that it is to be used for those languages (not English, which has basically no cases to speak of)? The only English uses I spotted (so far) are the ones raised here, back out, back where it seems more confusing to me (with the equals sign and the invention of an accusative case) than just using parentheses to indicate like the object like other English verbs do. (A comment there prompted me to copy this sentiment over to here.) - -sche (discuss) 00:45, 30 June 2024 (UTC)[reply]

Yes, I agree we should just put regular objects in parens. Right now I'm in the process of offline-converting uses of {{+obj}} in the old syntax to the new one, and in cases of a simple object I've changed them to use parens instead of {{+obj}}. Occasionally I've found it necessary to explicitly indicate transitivity using {{+obj|&transitive}}, e.g. in Catalan reparar (to notice, to pay attention to), where I've written {{+obj|ca|&transitive/:en<someone/something>}} which looks like [transitive or with en ‘someone/something’] (where the initial & suppresses the with that would normally precede the word "transitive"); this is to indicate that you can say either "reparar something" or reparar en something". Benwing2 (talk) 02:14, 30 June 2024 (UTC)[reply]

Category for subjunctives

[edit]

Should one be created to accompany Category:English imperative sentences? J3133 (talk) 02:40, 28 June 2024 (UTC)[reply]

Would that it were. DCDuring (talk) 21:19, 28 June 2024 (UTC)[reply]
@DCDuring: I have created Category:English subjunctive expressions. J3133 (talk) 17:30, 3 July 2024 (UTC)[reply]
If I knew whether the category should include conditionals or redirects that contain would to entries that do not, I would increase the population of the category substantially. DCDuring (talk) 18:46, 3 July 2024 (UTC)[reply]

As with Korean, which is similiar. Here we see Category:Korean ideophones, which encompasses both sound symbolism and expressives (emotions, sights, and so on), as the umbrella category, and then Category:Korean onomatopoeias within it for the sound symbolism only.

As the situation seems similar in Japanese, I think we should move the present onomatopoeia category to ideophones and then create a new category within it for the sound-based words.

For example, there is a word どぎまぎ (dogimagi), which seems like a loose parallel, both in meaning and in rhythm, for English heebie-jeebies, but we dont call heebie-jeebies onomatopoeia.

Also, is there a reason why Category:Japanese onomatopoeias is not a subcategory of Category:Japanese lemmas? It took me quite a while just to find it, because I was expecting it to be categorized like that of Korean and other languages. Thanks, Soap 19:19, 28 June 2024 (UTC)[reply]

I just noticed the existing category Category:Gitaigo, which contains terms that I would consider to be expressives, which is to say, they're the non-sound-based ideophones, which I believe are the counterpart to Korean's 의태어. There are certainly more than 3 of these words in Japanese, so I propose that this category be expanded and perhaps renamed so that it will be easier to find. Soap 19:29, 28 June 2024 (UTC)[reply]
It may be that one reason why the Japanese categories seem so under-documented is that we don't usually have an etymology for these words, and that is where the {{onom}} template goes in the other languages' words. Soap 19:30, 28 June 2024 (UTC)[reply]
  • I would be supportive of recategorizing such terms as "ideophones". It's awful difficult to think of terms like シーン (shīn) as "onomatopoeia", when it's supposed to be the "sound effect" of no sound at all — silence. Or things like ぴりぴり (piripiri), a "pins and needles" feeling or possibly the feeling of spicy food on the tongue. There's no sound to that either. The "ideophone" label would seem to cover these, as well as actual imitative adverbs like ぞっと (zotto, with a shudder) or ばんばん (banban, literally bang bang).
‑‑ Eiríkr Útlendi │Tala við mig 01:05, 29 June 2024 (UTC)[reply]
I think there's also quite a distinctive difference between adverbial ideophones, such as the ones you describe, and basic onomatopoeic interjections like カー (, caw, sound of a crow) or SFX sound effects in manga like チャッ (cha', shing, a sharp metallic sound).
I think a lot of the issue is caused by the fact that it's often difficult to translate ideophones idiomatically into standard English, so they get lumped in with sound effects and/or translated in ways that don't convey the ideophonic effect, but the same phonemenon is actually pretty common in colloquial English - especially when spoken: "my chest was thump thump thump-ing as I..." (どきどき (dokidoki)); "I crept forward tip-toe-tip-toe and then..." (ちょこちょこ (chokochoko)) etc. etc. They're verbal or (quasi-)adverbial terms that evoke some kind of vibe, not literal sound effects, and are often completely ad hoc. Theknightwho (talk) 22:41, 29 June 2024 (UTC)[reply]
Sounds good to me. If some ideophones right now are classes as onomatopoeic despite not really representing a real-world sound, it may be sort of misleading to keep it as it is. Kiril kovachev (talkcontribs) 21:21, 9 July 2024 (UTC)[reply]

Requesting template editor right

[edit]

Requesting template editor right so I can edit the topical label data modules and the like. I just added a new category for Brazilian politics but cannot add related label data now (I previously added new label data for similar regional politics categories, such as for Philippine politics and Palestinian politics but the level of protection has been increased lately due to disruptive additions from other users). TagaSanPedroAko (talk) 21:25, 30 June 2024 (UTC)[reply]

@TagaSanPedroAko I lowered the protection one notch to "autopatrollers" so you can edit this. Let me know if you run into issues with any other modules. Benwing2 (talk) 03:41, 1 July 2024 (UTC)[reply]