Jump to content

Wiktionary:Grease pit/2022/January

From Wiktionary, the free dictionary

Happy New Year everyone!

{{metathesis}} was created by an IP editor a month or so ago. The cap=1 paramater has been implemented using MediaWiki template logic rather than Lua, so it throws an ugly error when used: Metathesis. Could someone with the necessary rights go in and check it out? Thanks! This, that and the other (talk) 03:21, 1 January 2022 (UTC)[reply]

@This, that and the other: After looking at the Lua code, I noticed it had an "ignore-params" parameter for parameters that aren't used by the Lua code, so I added |ignore-params=cap to the template, and that seems to have fixed it. Chuck Entz (talk) 04:09, 1 January 2022 (UTC)[reply]
Thanks Chuck! This, that and the other (talk) 04:55, 1 January 2022 (UTC)[reply]

automated creation of several hundred entries

[edit]

I have an open-source project I'm working on, https://github.com/bcrowell/ransom/tree/master/glosses , as part of which I'm in the process of compiling a complete set of English definitions for all the Greek words appearing in the Iliad. The project is under the same license as Wiktionary, and in fact many of my entries are paraphrases of Wiktionary's or even verbatim copies. However, quite a few are based on public-domain sources, especially a 1924 dictionary by Cunliffe. Roughly a third of them are for words that presently have no entries in Wiktionary, totaling at present about 600. It would be quite trivial for me to write a script that would generate basic Wiktionary entries for these.

However, I want to make sure that I do this in a way that is helpful to Wiktionary and doesn't inadvertently create the need for a lot of extra work by folks here. I'm thinking I would probably do a dozen words or something at first, ask for comments, and then do more. Or I could generate an online file of all the entries, which folks could then examine and comment on before I upload any at all. If I have the stamina to complete the project, then I would be doing many more of these on an ongoing basis, probably ultimately amounting to a few thousand new entries.

Technical things I could use help with: (1) a script to upload an entry (something that runs on linux); (2) thoughts on how to avoid duplication. The main thing I'm concerned about in terms of duplication is that there can be Homeric forms that are just respellings or contractions of Attic forms that already have entries in Wiktionary. To some extent I have this covered already because I can easily look up what is Project Perseus's lemma for a given word. Usually this is the Attic form. So for example, Homer has ξεῖνος, whereas the Attic form is ξένος, but I can easily detect this on an automated basis and avoid creating a redundant entry for ξεῖνος. I also have data on frequencies of words, so another pretty straightforward precaution would be not to upload any new entry for a word whose frequency is above some cut-off -- such a word would be likely already to have a Wiktionary entry.--Fashionslide (talk) 16:34, 1 January 2022 (UTC)[reply]

This is very exciting! From a non-technical standpoint I fully support you in this endeavour. Our coverage of Ancient Greek has many gaps and could really use some kind of corpus-based or dictionary-based import. I'm coincidentally in the middle of a similar project to import a bunch of missing Latin entries.
It sounds like you've given this a good deal of thought. Based on my experience so far with the Latin entries, it's important to try and prevent mistakes from creeping in, as Ancient Greek entries don't always get a lot of attention from critical eyes. Perhaps @Mahagaja might have some thoughts. This, that and the other (talk) 01:20, 2 January 2022 (UTC)[reply]
Very cool to see what you're doing -- obviously great minds think alike :-) I'm surprised that you're excluding hapaxes, but maybe I need to think about that more. I would think that the Homeric hapaxes would be some of the most straightforward words to include. Often they're just straightforward compounds like παλινάγρετος. And I certainly wouldn't want to miss πτύω, to spit. Have you gotten to the point of looking at scripting the actual creation of the entries?--Fashionslide (talk) 01:41, 2 January 2022 (UTC)[reply]
There are a few reasons why I'm not doing this particular import fully automatically. The scans of the dictionaries have been automatically parsed (by Perseus) based on OCR detection of bold and italic formatting in the originals, and the results for Lewis and Short are so patchy that a fully automated import would be a disaster. Elementary Lewis came out better, but it lacks some senses and grammatical info. Plus, I enjoy researching and writing etymologies, so I'm taking the time to add those manually. That's just a choice I made; the entries would be totally fine without etymological information and someone else would eventually come along and add it.
As for hapaxes, I'm skipping them because I want to use my limited time in the most optimal way possible. If I could do the import fully automatically I would not be so worried about that. (Spurious forms would still need to be removed though. Sometimes L&S crossreference A to B, then at the entry B it says that it is "not spelled A".)
It's definitely possible to fully script the creation of entries if your data is cleaner than mine - perhaps someone who has done a similar project before might be able to comment here. This, that and the other (talk) 05:36, 2 January 2022 (UTC)[reply]
I see, thanks for explaining. Like you, I'm writing the definitions myself. I guess the difference is just that I've already got them compiled for a separate project. All I want to automate is the final step of putting them on Wiktionary.--Fashionslide (talk) 13:32, 2 January 2022 (UTC)[reply]

It looks like the standard tool for making bots for WP is pywikibot, and WP has an elaborate set of policies for proposing, testing, and approving bots. Does anyone know whether pywikibot works for wiktionary, and whether wiktionary has a similar formal process?--Fashionslide (talk) 15:52, 2 January 2022 (UTC)[reply]

I am willing to be corrected by others, but I would say that the automated creation of a finite set of entries that you've prepared yourself isn't truly a "bot" in the scope of WT:Bots. A bot is a script that goes ahead under its own steam and edits pages without supervision for an indefinite period of time. The Ancient Greek creation could very well take place under your main user account using pywikibot (albeit with a time lag between edits to avoid flooding Special:RecentChanges). This, that and the other (talk) 00:53, 3 January 2022 (UTC)[reply]
@Fashionslide, This, that and the other pywikibot works for Wiktionary. I use it for all the bot work I do, along with mwparserfromhell, which works well for parsing MediaWiki templates if you want to write a script to modify existing entries. I have written scripts to generate Russian, Bulgarian and Ukrainian entries from manual specifications, and scripts to push entries to Wiktionary, and I have added a lot of entries in this fashion (esp. Russian entries). So yes, this is definitely possible. My scripts are written in Python and run on MacOS, so they should work in Linux with few if any changes. As for ξεῖνος vs. ξένος, it is useful to have both, of course without duplication; one should simply point to the other. As for bots, Wiktionary does have a formal process for getting a bot account. Running a bot using your own account rather than a bot account is possible but normally not a good idea. In this case it might be acceptable in the short run (while waiting to get a bot account approved), but if you plan to stick around a bit it would be a good idea to start the process to get a bot account. It's not a huge pain to do so, but it does require a vote, which usually lasts two weeks. Benwing2 (talk) 03:53, 3 January 2022 (UTC)[reply]
@This, that and the other, Benwing2 Thanks, Benwing2, that's very helpful. I will try cautiously getting started with pywikibot. If I get something working that seems OK, I will ask for feedback and initiate the process of requesting a separate bot account.--Fashionslide (talk) 16:44, 6 January 2022 (UTC)[reply]

You folks have been super nice, but the unrelenting hostility and dysfunction of Wikipedia has prompted me to mung the password on my Fashionslide account on both WP and Wiktionary and to stop contributing to both projects. If anyone is interested in continuing this project, here is the software I wrote to generate wiktionary files https://github.com/bcrowell/ransom/tree/master/bot , and here is its current output: http://lightandmatter.com/wiktionary_greek_entries.txt (In my browser, the Greek characters in the file show up munged, but if I actually download the file and look at it, it's fine.)--Fashionslide (talk) 17:24, 8 January 2022 (UTC)[reply]

@Fashionslide Very sorry to hear that. What happened on the Wikipedia side? In many ways they are quite separate from Wiktionary; same underlying MediaWiki software but otherwise one is independent of the other. Wikipedia policies do not apply here (and vice versa). Benwing2 (talk) 20:56, 8 January 2022 (UTC)[reply]

References: bullets and numbers

[edit]

According to Wiktionary:References, references should be preceded by a bullet point, as shown in the example provided. However, if I have inline references, e.g. in an etymological section, which need to be displayed as numbers, I have to use <references/> beneath the reference header. This tag prepends numbers to references, rather than bullet points, leading to inconsistencies between entries which have references only at the bottom and those which also have inline references; cf. Сараево (Saraevo) and течнокристален екран (tečnokristalen ekran). How should this be avoided? Martin123xyz (talk) 15:33, 2 January 2022 (UTC)[reply]

The references section in Macedonian Сараево (Saraevo) is actually misused (should be "Further reading"). See Wiktionary:Votes/2016-12/"References"_and_"External_sources" and Wiktionary:Votes/2017-03/"External_sources",_"External_links",_"Further_information"_or_"Further_reading". Fytcha (talk) 15:42, 2 January 2022 (UTC)[reply]
@Fytcha The reference section at Macedonian Сараево (Saraevo) may be misused, but it is in keeping with Wiktionary:References, which says that "references referring to an entry as a whole, or many parts of an entry should be listed directly under the ===References=== header, usually preceded by a bullet (*). However, there is no formal policy on when to use the <ref></ref> syntax and when to use bulleted lists." Daniel Carrero has written that he "edited WT:EL to conform with the results of this vote", but WT:EL still contains a link to Wiktionary:References, where the example for "water" is formated like the references at Macedonian Сараево (Saraevo). This contradicts point 3 of the 2016 vote, which passed except for point 4, so it seems that the results of the vote were not implemented on all pages where they should have. This is a problem because users looking for information regarding the proper way to format entries are much more likely to look up policy pages than vote pages, which they may not even know how to find. When I need information regarding references, I naturally type "wiktionary" and "references" into Google and then read Wiktionary:References. If this page had been updated, perhaps there would not have been so many Macedonian pages with a reference section corresponding to what should be a further reading section, as I am just finding out. Martin123xyz (talk) 16:22, 2 January 2022 (UTC)[reply]
I also cannot reliably discern the difference between the two headers, neither from the policy pages nor the votes, and the issue with numbered references seems to have been completely ignored ever since the regulation that “the reference section requires using footnotes marking the specific statements […]” has not passed, Fay Freak (talk) 16:32, 2 January 2022 (UTC)[reply]

alternative spelling vs alternative form

[edit]

Is foo-bar an alternative form of foobar or an alternative spelling? How about foo bar? General Vicinity (talk) 19:24, 2 January 2022 (UTC)[reply]

@General Vicinity I would say alternative spelling. Alternative spellings differ only in spelling, while alternative forms usually differ in some other property as well (pronunciation, ending, inflection, etc.). Benwing2 (talk) 03:55, 3 January 2022 (UTC)[reply]
I'd say foobar is an entirely different word? Foobar > (clipping) foo, (clipping) bar; > (composition) foo-bar? I'll owe you the quotations because it's predominantly used in introductory texts and discussions about programing, that is not well quotable as far as mentions of the terms used in code is concerned. The word that could be used in arbitrary sentences is a different matter, but the spelling is thus explained. 2A00:20:6001:6615:9FAD:C63B:A1AA:C581 15:03, 23 June 2022 (UTC)[reply]

Words used solely by non-native speakers

[edit]

Moved to Wiktionary:Tea room/2022/January#Words used solely by non-native speakers DCDuring (talk) 16:22, 3 January 2022 (UTC)[reply]

Link changed from BP to TR as the Tea Room is where it was actually moved. - -sche (discuss) 23:37, 4 January 2022 (UTC)[reply]

Bot needed? (review template wikipedia links)

[edit]

I looked at entry disuse which has had a {{wikipedia}} link in it since the entry was created in *2006*. There is certainly no such article at Wikipedia now. And I doubt it was deleted yesterday because I can't find any such tracks.

Experimenting, I checked a few searches with "insource:/[{][{]wikipedia[}][}]/" and relatively quickly found another mis-linked example, Malukus. There isn't such an article at Wikipedia, though there is a disamb page w:maluku, and w:Maluku Islands, w:Maluku (province), and a few others.

I'm thinking a bot that reviews entries here and notes all the {{wikipedia}} mis-links somewhere would be a nice little project, and then that generated list could drive a cleanup effort. Simply deleting the mis-links would be inappropriate given the example above Malukus, but then that example also points out that entries here can be less than precise (see that definition) and need repair.

Not knowing where to mention this request I mentioned it at WT:ID and they suggested coming down this avenue. As noted there I can't implement this myself now. Could someone consider this project? Shenme (talk) 00:43, 3 January 2022 (UTC)[reply]

If someone makes the list, I'll give it a run-through. bd2412 T 01:32, 3 January 2022 (UTC)[reply]
I generated a list at User:This, that and the other/broken Wikipedia links/2022-01-01. As you can see, the vast majority are links to articles ending in "language" or "phonology" which need to be created as redirects on Wikipedia and/or fixed in our Lua modules. The list also includes links to non-main-namespace Wikipedia pages; for example, our entry Cyclorrhapha links to w:Talk:Brachycera. This talk page does exist, but I don't think it makes sense for our entry to be directing the reader to what is an internal Wikipedia work page. This, that and the other (talk) 03:06, 3 January 2022 (UTC)[reply]
Okay I see now that list isn't super useful, and it wasn't quite what you were asking for either, as it included all links to Wikipedia, however constructed. How about User:This, that and the other/broken Wikipedia links/2022-01-01/only via wikipedia template? This, that and the other (talk) 03:33, 3 January 2022 (UTC)[reply]
There are many more Wikipedia templates and redirects, such as {{wp}}, {{Wikipedia}}, {{wiki}}, {{pedia}}, {{slim-wikipedia}}, {{slim-wp}}, {{swp}}, {{in wikipedia}} and probably more. DTLHS (talk) 03:44, 3 January 2022 (UTC)[reply]
My list accounts for {{wikipedia}}, {{wiki}} and {{wp}} only for now. This, that and the other (talk) 03:57, 3 January 2022 (UTC)[reply]
I added {{slim-wikipedia}}, {{pedia}} and their redirects to the list as well. Happy cleaning! This, that and the other (talk) 05:00, 3 January 2022 (UTC)[reply]
I've been working on the Translingual entries with these problems, which are a large share of the total. In the future, if it isn't too much trouble, could you segregate these? That would make both the Translingual entries and the others easier to deal with. Also dividing the list into sections of 20, 50, or even 100 would make striking or deleting the ones that have been corrected much easier. DCDuring (talk) 14:36, 5 January 2022 (UTC)[reply]
The Translingual entries, especially, would also benefit from the same kind of lists for Commons and Wikispecies. For all of these, once the backlog is cleaned up an annual run of each would be helpful because Wikipedia articles are deleted or moved, often without due consideration of the need of other wikis for redirects to the new title. Oh, yea, THANKS. DCDuring (talk) 14:41, 5 January 2022 (UTC)[reply]
Thanks for the feedback - I'll regenerate the list from the next dump with your suggestions in mind. This, that and the other (talk) 01:26, 6 January 2022 (UTC)[reply]
@Shenme, BD2412, DTLHS, DCDuring, Jberkel, Fytcha I've regenerated the lists of broken interwiki links to English Wikipedia and Wikispecies (not Commons yet) at User:This, that and the other/broken interwiki links/2022-01-20. I'll keep generating these lists as subpages of User:This, that and the other/broken interwiki links for as long as people are helping to clear the backlog. Happy cleaning! This, that and the other (talk) 03:54, 23 January 2022 (UTC)[reply]
I see it now. bd2412 T 04:00, 23 January 2022 (UTC)[reply]

Taser

[edit]

Why is my edit on taser being reverted? It is a constructive edit. Vandalism would be writing “COUNTRYBOY603 WAS HERE” in all capitals. --75.166.166.170 03:18, 3 January 2022 (UTC)[reply]

Vandalism isn't just the addition of bad content (analogy: spraypainting graffiti), it's also wanton removal of valuable content for no reason (analogy: destruction of property). You removed:
  • a relevant link from the etymology section,
  • the number of syllables from pronunciation,
  • a valid anagram,
  • a recording of the pronunciation in Dutch
As for the substantive revision, I'm not sure I agree with that change either, but at least it's not clearly vandalism. You changed the definition so that tasering must result in unconsciousness, and the target must be a person. However, we can find or imagine uses of the verb "taser" where neither of those applies, e.g., "police tasered the dog to no effect". 70.172.194.25 03:45, 3 January 2022 (UTC)[reply]

Reducing Lua memory errors

[edit]

@This, that and the other, Eruton I see that User:This, that and the other created {{inh-lite}}, {{m-lite}} and friends to reduce memory usage. I'm thinking of another approach, which is to do the equivalent of {{multitrans}} for large chunks of a page. {{multitrans}} is used to wrap translation tables and ensures that the code to implement {{t}}, {{t+}} and friends is loaded only once. Essentially, you wrap the whole translation table in {{multitrans}} and replace all occurrences of {{t}} with {{tt}} and {{t+}} with {{tt+}}. These latter templates are just pass-throughs, i.e. they do nothing but generate special text, which is interpreted by {{multitrans}} to make calls to the translation module. On pages with large translation tables, it makes a massive difference: I've seen it reduce the memory from over 52M (the limit) to around 25M. This was motivated by a change I made to Module:table, where I added three new small functions shallowContains(), shallowTableContains() and shallowInsertIfNot. By itself this increased the memory usage of some pages by over 3MB, leading to memory errors. Deleting these three functions in [1] resolved the issue; but it shows that even small code changes made to commonly used modules can have huge effects when the modules are loaded over and over and over. Loading the modules just once can save a huge amount of mmeory. The new template might be called {{reduce-memory}} or something and might handle something like {{m}}, {{l}}, {{inh}}, {{bor}}, {{cog}}, {{der}}, {{ux}}, {{uxi}}, {{lb}} and {{q}}/{{i}}/{{qualifier}}/{{qual}}. Each handled template would need a pass-through equivalent; not sure what to name the pass-throughs, but it should be easy to type; maybe {{*m}}, {{*l}}, {{*inh}}, etc.? The advantage over {{l-lite}}, {{inh-lite}} etc. is that the new pass-through templates use exactly the same syntax as the regular templates and handle most or all of their features, rather than having only a limited subset as the "lite" templates currently do. Thoughts? Benwing2 (talk) 08:29, 4 January 2022 (UTC)[reply]

@Erutuon Sorry, typo :( ... Benwing2 (talk) 08:29, 4 January 2022 (UTC)[reply]
Also pinging @Surjection, who has done significant Lua hacking, and @Rua, who came up with the original idea for {{multitrans}}. Benwing2 (talk) 08:30, 4 January 2022 (UTC)[reply]
I was thinking about this over the New Year and I am beginning to believe that, to a certain extent, we are barking up the wrong tree on this issue. The single vowel pages in particular have got so long that even on a powerful computer, they are difficult for a reader to interact with. The problem is not so much the Lua memory limit but the fact that the pages are simply too long, Lua or no Lua. From this standpoint, the only solution is to split them, perhaps along the lines of User:This, that and the other/a, with appropriate code added to {{l}} etc so that when a "split entry" is linked to, the link goes to the appropriate subpage.
In general though, I'd definitely be in favour of anything that works to reduce memory usage and allows {{l-lite}} etc to be deleted! The generic {{head}} template and the {{g}} template are two more that could be considered for your {{reduce memory}} concept. This, that and the other (talk) 08:50, 4 January 2022 (UTC)[reply]
The correct solution for letter entries is Wiktionary:Votes/2020-07/Removing letter entries except Translingual, but unfortunately it failed to pass. — SURJECTION / T / C / L / 15:44, 4 January 2022 (UTC)[reply]
@Surjection: I think we have a good shot at passing that if we reproposed it as applying only to letters used (natively) in more than N (≈10) languages. Fytcha (talk) 16:05, 4 January 2022 (UTC)[reply]
@Surjection, This, that and the other, se isn't a letter and doesn't host translations, yet it's still running out of memory. I do think we should run Fytcha's suggested modification of the letter vote, but it's not a solution. The only long-term solution that has been proposed is WT:Per-language pages proposal. —Μετάknowledgediscuss/deeds 18:54, 4 January 2022 (UTC)[reply]
I don't think removing letter entries is likely to pass in any guise. Plus, (a) the opposers made some cogent arguments that I'm inclined to agree with, and (b) merging the letter entries was never going to solve the Lua memory errors in any case. If the {{reduce memory}} idea doesn't succeed on these very long entries, it would be worthwhile having a discussion or vote on splitting these entries. This, that and the other (talk) 00:49, 5 January 2022 (UTC)[reply]
Reading the per-language proposal again, I see that one hurdle it highlights is just how much time and effort would be needed to split all of our millions of pages, handle cases where a word has a slash in it already, etc, which brings to my mind something I think I've opined about before, which is that we don't need to split all of our millions of pages, because even if we eventually have thousands 😱 of entries that have memory errors, that's ... a ten-thousandth of a percent of our total number of entries. That tiny tail needn't wag the whole dog; we could just per-language split only those few pages which actually have memory issues. Or do a coarser split like This,that mocked up. I'd support either of those; memory errors are a severe problem in the few entries they affect, so even drastic changes to those entries should be on the table... - -sche (discuss) 01:24, 5 January 2022 (UTC)[reply]
I guess it's doable. I created {{q-lite}} as an experiment and it was actually possible to completely implement that template without using any Lua, albeit without support for arbitrarily many qualifiers. — SURJECTION / T / C / L / 12:45, 4 January 2022 (UTC)[reply]
I started testing something like this on Module:User:Surjection/invoker (Example wrapper module). It doesn't handle nested templates correctly yet, but I'm working on that. — SURJECTION / T / C / L / 17:15, 4 January 2022 (UTC)[reply]
Now it does: Special:Diff/65183310SURJECTION / T / C / L / 17:25, 4 January 2022 (UTC)[reply]
I also created Module:User:Surjection/wrapper which makes it easier to implement the double-brace templates, and hopefully it has a small enough footprint to be usable at least for some templates where hardcoding them (as {{tt}} does) would not be practical. — SURJECTION / T / C / L / 17:50, 4 January 2022 (UTC)[reply]
@Surjection Thanks. When would Module:User:Surjection/wrapper be needed? E.g. in {{col}} or high-numbered {{q}} params? In such case I could imagine writing template code to check for arguments likely requiring special handling and fall back to pure template code otherwise; e.g. {{#if:{{{4|}}}{{{5|}}}{{{6|}}}|<invoke wrapper>|<do pure template code>}}. Benwing2 (talk) 02:44, 5 January 2022 (UTC)[reply]
Yes, those are possible use cases. Having a pure template code fallback is also a good idea and would further reduce the memory footprint. I was more thinking about templates like {{compound}} that can have arbitrarily many elements, each of which can have their own glosses, sense IDs, etc. — SURJECTION / T / C / L / 12:10, 5 January 2022 (UTC)[reply]
Just had an idea to make {{multitrans}} and the like more efficient: enclose the thing in nowiki tags to prevent the inside from being interpreted as wikitext and then use mw.text.unstripNoWiki on it in the Lua module that implements the {{multitrans}}-like thing.
I've done this before on some of my list pages, but never thought about using it for {{multitrans}}.
Then we could use the original template names like {{t}} and {{t+}} instead of {{tt}} and {{tt+}} because they would be handled by the Lua module rather than the wikitext parser: {{multitrans-with-nowiki|<nowiki>* French: {{t+|fr|mot}}</nowiki>}} instead of {{multitrans|* French: {{tt+|fr|mot}} }}. This would probably speed up page parsing because the server would no longer have to reformat the templates with ⦃¦⦄. It would disable various features that assume that what's in nowiki tags isn't wikitext, like wikitext syntax highlighting, if you have that turned on in the editor. — Eru·tuon 23:20, 6 January 2022 (UTC)[reply]
@Erutuon How would we handle other templates inside of {{multitrans}}, or even in arguments to {{t}}/{{t+}}? Benwing2 (talk) 01:53, 7 January 2022 (UTC)[reply]
@Benwing2: Good question. That's the trouble with the idea. I'm thinking maybe we could expand embedded templates by recursively matching %b{} and translating common templates to module functions to reduce overhead and handling others with frame:preprocess(). It would be hard, but at least with translation sections the syntax is relatively restricted, so it might not be impossible. — Eru·tuon 20:35, 8 January 2022 (UTC)[reply]
A demonstration of the idea at Module:User:Erutuon/multitrans using Module:templateparser. Even though it still calls module functions to expand {{t}} and {{t+}}, it manages to go from 24 MB to 12 MB of Lua memory. — Eru·tuon 21:16, 8 January 2022 (UTC)[reply]
@Erutuon Very cool. Can you try this on some existing pages that use {{multitrans}} to see how much memory reduction there is in some real-world cases? It may depend a lot on whether and how much other languages are mentioned outside of the {{multitrans}} block(s). Some examples: red, wolf, grass, flower, four, etc. Benwing2 (talk) 02:26, 9 January 2022 (UTC)[reply]
@Benwing2: Unfortunately the module isn't quite ready for entries because Module:templateparser doesn't parse piped links. I got a pretty confusing error message for {{t|ja|{{...}}から[[見る|見]]た|...}} about an invalid gender because the pipe in the link was parsed as a parameter separator. There might be no piped links in some of those translation sections, but someone could add one at any time. So User:Surjection or I will have to implement piped link parsing in Module:templateparser before we can test Module:User:Erutuon/multitrans further. — Eru·tuon 18:55, 9 January 2022 (UTC)[reply]
Fixed: Special:Diff/65254978SURJECTION / T / C / L / 19:05, 9 January 2022 (UTC)[reply]
@Surjection: Thanks! @Benwing: I tried replacing {{multitrans}} with Module:User:Erutuon/multitrans in red (~30 MB -> ~26 MB in the English section) and wolf (~28 MB -> ~25 MB in the English section), grass (~30 MB -> ~27 MB over the whole page), flower (~29 MB -> ~25 MB in the English section), four (~29 MB -> ~28 MB). Pretty good results! I wasn't expecting it to be slightly better than the current {{multitrans}}. — Eru·tuon 20:33, 9 January 2022 (UTC)[reply]

Update on nowiki multitrans: it uses Module:translations/multi-nowiki and {{multitrans-nowiki}} now. The translation adder gadget apparently sort of works with nowiki multitrans, but it needs to be changed to not insert stuff like {{subst:#invoke:languages/templates|getByCode|csb|getCanonicalName}} (diff) into the nowiki tag. See also the discussion on my talk page. — Eru·tuon 22:57, 11 March 2022 (UTC)[reply]

Nupe templates

[edit]

I've just started adding Nupe entries, but I wasn't sure how to do the headword-line templates (verb and noun) as there're a couple of considerations to make:

  • Having an optional plural parameter for nouns.
  • Having the tone marks of every lemma in the heading and not in the title, so tone marked links would lead to the untonemarked page. (The way it works for Yorùbá and Hausa)

@Metaknowledge - Would you be able to help? — This unsigned comment was added by Oníhùmọ̀ (talkcontribs) at 17:29, 5 January 2022 (UTC).[reply]

@Oníhùmọ̀: I'm excited to see some work on Nupe. I'll add plurals to {{nup-noun}} for you, but it looks like you already got {{nup-verb}}, and {{nup-pos}} to work on your own. What source are you using? I have Blench's dictionary, but that's it. By the way, you need to leave your signature in the same edit as a ping for it to actually work. —Μετάknowledgediscuss/deeds 07:06, 6 January 2022 (UTC)[reply]
Mi jin yèbo sánrányí (Thanks), I'm using Blench's dictionary too (as well as the one on plants), but unfortunately it doesn't conform to the proposed orthography conventions by I.S.G. Madugu. I also use a blog called edukonupe and I use audio as there're a few Nupe channels on YouTube and I live with a native speaker, so I'm able to confirm the tones. Then for grammar I mainly use Kandybowicz's work. Oníhùmọ̀ (talk) 09:38, 6 January 2022 (UTC)[reply]
@Oníhùmọ̀: Could you please add WT:About Nupe and detail the orthographic conventions that we should use? You can use WT:About Yoruba as a model, and let me know if you need anything. —Μετάknowledgediscuss/deeds 04:08, 7 January 2022 (UTC)[reply]
I've made WT:About Nupe now. There's a minor issue with linking, links with tone-marked syllabic nasals aren't leading to the right page, ǹná (mother) for example is leading to ǹna and not nna. Oníhùmọ̀ (talk) 00:30, 8 January 2022 (UTC)[reply]
@Oníhùmọ̀, Metaknowledge I tried to fix the handling of Nupe diacritics. It now unilaterally removes all acutes, graves, circumflexes, carons and macrons. Let me know if this is incorrect and I can go back to the old way of specifying per-character. Benwing2 (talk) 19:50, 8 January 2022 (UTC)[reply]
@Benwing2 @Metaknowledge, While we're at it, could we add these changes for Edo (bin), Igala (igl), & Nupe?
  • Edo (Taken from the Ẹdo standard orthography and other sources):
  • Sort order: a, b, d, e, ẹ, f, g, gb, gh, h, i, k, kh, kp, l, m, mw, n, nw, ny, o, ọ, p, r, rh, rr, s, t, u, v, vb, w, y, z.
  • Diacritics: ◌́ (acute accent used for high tone), ◌̀ (grave accent used for low tone), ◌̄ (macron accent used for downstepped high tone), ◌̏ (double grave accent used for downstepped low tone)
  • Sort order: a, b, ch, d, e, ẹ, f, g, gb, gw, h, i, j, k, kp, kw, l, m, n, ny, ñ, ñm, ñw, o, ọ, p, r, t, u, w, y
  • Diacritics: ◌́ (acute accent used for high tone), ◌̀ (grave accent used for low tone), ◌̄ (macron accent sometimes used for mid-tone or mid-high tone), ◌̇ (dot above used on ṅ to show an extra-high tone), ◌̍ (vertical line above used on n̍ as an alternative way of spelling ṅ as suggested by the standard orthography)
  • Sort order: a, b, c, d, dz, e, f, g, gb, h, i, j, k, l, m, n, o, p, r, s, sh, t, ts, u, v, w, y, z, zh.
For other Nigerian languages, I'll be back later with a more comprehensive list once I get through my backlog. Thank you! AG202 (talk) 03:30, 15 January 2022 (UTC)[reply]
@AG202 By sort order do you actually mean that e.g. for Edo, the order should be ga ... gz, gb, gh? Similarly that ẹa sorts after ez? For diacritics you mean that these should be stripped when creating page names? Benwing2 (talk) 03:53, 15 January 2022 (UTC)[reply]
@Benwing2 Re: sort order, yes, similar to how Yorùbá currently has ẹ̀bà is after ewé and agbábọ́ọ̀lù after Àgùàlà. Re: diacritics, yes as well, those are the diacritics that should be stripped from page names. Apologies for the confusion there. AG202 (talk) 04:18, 15 January 2022 (UTC)[reply]
@AG202 I made changes for those three languages above and pinged you on the changes. When you have a chance, please verify that they work correctly. Thanks! Benwing2 (talk) 04:55, 15 January 2022 (UTC)[reply]
@Benwing2 It all looks good! Thank you so so much once again! AG202 (talk) 05:17, 15 January 2022 (UTC)[reply]

Making Template:quote-book and Template:quote-hansard compatible with older LCCNs

[edit]

Both Template:quote-book and Template:quote-hansard have parameters set out to accept LCCNs which are used to create permalinks of the style https://lccn.loc.gov/##########, but what an LCCN is has changed over time. Currently it is defined to mean "Library of Congress Control Number", but it has previously been used to mean "Library of Congress Catalogue Card Number". Under the catalogue card scheme there was a wider variety of styles for LCCNs, with assigned numbers including things like 99-1 and gm 71-2450. In general, LCCNs in the catalogue card scheme included hyphens, sometimes letter prefixes, and where of variable length, though always had less than eight digits. When the switch to newer control numbers happened, many were assigned new, standardized LCCNs in the Library of Congress' database. The newly assigned control numbers were formed by replacing the hyphen with the number of zeros necessary to bring the total number of digits to eight. Working with aforementioned examples, 99-1 was standardized as 99000001 and gm 71-2450 as gm 71002450 (note that the space is removed when creating the permalink). Would it be possible to modify the templates so that when provided with older catalogue card numbers the templates use the standardization process to generate valid permalinks? Thanks to Library of Congress reference librarian Elizabeth L. Brown and this page on numbers found in LC catalog records for helping me understand the standardization scheme. Thanks for any help and take care. —The Editor's Apprentice (talk) 23:10, 6 January 2022 (UTC)[reply]

@The Editor's Apprentice This can be done but I need an exact description of how to standardize older LCCN's. Benwing2 (talk) 01:52, 7 January 2022 (UTC)[reply]
@Benwing2: I am unsure how much more exact I can get, but I'll try. I'll focus on how the inputed older LCCNs should be transformed so that a valid permalink can be made. To begin, an older LCCN can have up to three parts, one optional and two mandatory. By optional I mean "does not exist in all older LCCNs" and by mandatory I mean "exists in all older LCCNs". I'm not sure if there is better terminology for those ideas. The first part is an optional prefix of one or two letters. The second part is required, is the first of two sets of digits, and is always two digits long. The third part is required, is the second of the two sets of digits, and is one to six digits long. If there is a letter prefix, it is separated from the rest of the LCCN by a space. The first digits of the LCCN are separated from the second set of digits by a hyphen. Using the example of "99-1", this old LCCN has no prefix, its first set of digits is "99" and its second "set" is "1". For "gm 71-2450", the prefix is "gm", the first set of digits "71", and the second set of digits is "2450". To create a valid permalink, start by removing any spaces that might be in the supplied LCCN. Next check if there is a hyphen, if there is then the LCCN is older. If there is no hyphen, the LCCN is newer and simply prepend https://lccn.loc.gov/, then your done. Given that the LCCN is older, count the number of digits in the given LCCN. Next, replace the hyphen in the LCCN with a number of zeros equal to 8 minus the number of digits already in the LCCN. Next prepend https://lccn.loc.gov/. The result should be a valid permalink. Using "99-1" as an example, no spaces are removed, a hyphen is found confirming its old, and three digits already exist. The hyphen is then replaced with five zeroes resulting in "99000001" and then the URL part is prended resulting in the valid permalink https://lccn.loc.gov/99000001 . Using "gm 71-2450" as an example, a space is removed resulting in "gm71-2450", a hyphen is found confirming its old, and six digits already exist. The hyphen is then replaced with two zeroes resulting in "gm71002450" and then the URL part is prended resulting in the valid permalink https://lccn.loc.gov/gm71002450 . Hope this helps. —The Editor's Apprentice (talk) 03:07, 7 January 2022 (UTC)[reply]
@The Editor's Apprentice Try it now. Benwing2 (talk) 04:58, 7 January 2022 (UTC)[reply]
It has created the correct permalinks in all of my tests. The one change that I would make is having the display for the link still be the LCCN in the old format as it inputted. Doing just generally feels more honest to me, might give the reader a bit more information, and might be useful in case the Library of Congress ever changes the format again. Thanks for the good work. —The Editor's Apprentice (talk) 05:30, 7 January 2022 (UTC)[reply]
@The Editor's Apprentice Done. Benwing2 (talk) 06:59, 7 January 2022 (UTC)[reply]

This page (only) is generating a RDBMS error including when accessing any diffs in its history. Maybe temporary? -- GreenC (talk) 04:08, 7 January 2022 (UTC)[reply]

"A database query error: [9a1fa30c-f477-4f10-a7b2-a7a961442a01] 2022-01-07 04:09:16: Fatal exception of type "Wikimedia\Rdbms\DBQueryError"
See [2]. Now causing database errors due to the use of DynamicPageList on the documentation page ([3]). DTLHS (talk) 04:15, 7 January 2022 (UTC)[reply]

Google Books quotation template generator

[edit]

Hello,

The following script, when added to Greasemonkey on Firefox, will add a button on Google Books Search results page for easy quotation: [4]. You should still check that the info provided is valid; notably, it won't pick up chapter-specific titles and authors in books where that's relevant, and I've sometimes noticed Google Books provides incorrect page numbers (but usually they are right).

I haven't tested it with Tampermonkey on Chrome, but replacing GM.xmlHttpRequest with GM_xmlhttpRequest may help.

If any of you use Firefox with Greasemonkey and want to test it, feel free to report back with your opinions. 70.172.194.25 07:45, 7 January 2022 (UTC)[reply]

Screenshot to entice you: [5]. 70.172.194.25 07:49, 7 January 2022 (UTC)[reply]
Can confirm it works with Tampermonkey on Chrome after making the change I mentioned. 70.172.194.25 15:04, 7 January 2022 (UTC)[reply]
Thanks, 70.172.194.25. I tried it on Chrome, and it works quite well. It will certainly be quite useful. —Svārtava [tcur] 04:22, 8 January 2022 (UTC)[reply]
Works great thus far, thank you so much for this. This has always been my least favorite part about Wiktionary but this will now change. — Fytcha T | L | C 04:29, 8 January 2022 (UTC)[reply]
I'm glad it was useful. Let me know if you have any suggestions and I can try to implement them. :)
Might spend a day this week making one for Google Scholar since that's also in frequent usage.
Minor thing, I noticed that it might be a good idea to have this @include line in addition to the one there:
// @include https://www.google.com/search?tbm=bks&*
That's because some of the templates around here, at least {{googles}}, use links that start with ?tbm=bks& instead of having &tbm=bks& in the middle like normal (although you can also just press the search button again in such cases). 70.172.194.25 04:34, 8 January 2022 (UTC)[reply]
It doesn't work here (no buttons): [6] I was unfortunately unable to debug the problem. — Fytcha T | L | C 04:37, 8 January 2022 (UTC)[reply]
That is exactly the problem I mentioned in my last post. Adding the extra @include line should help. :) 70.172.194.25 04:38, 8 January 2022 (UTC)[reply]
That fixed it. Thanks a lot again! — Fytcha T | L | C 04:41, 8 January 2022 (UTC)[reply]
Would it be possible to also add such a button to the in-book view? Even if querying and adding the passage automatically is not possible, the other parameters would still be worth it. Be aware that there are two different kinds of in-book views: Full preview and minimal preview. — Fytcha T | L | C 16:53, 9 January 2022 (UTC)[reply]
Minor bug report: The button does nothing here for the result "Top banana - 3 Dec 1951 - Page 75". — Fytcha T | L | C 16:57, 9 January 2022 (UTC)[reply]
The bug report is easily dealt with. New code, now with ISSN detection (you have to do the minor change for Chrome again): [7]. I'll respond to the feature request later. 70.172.194.25 18:19, 9 January 2022 (UTC)[reply]
Major update

I have added more features, including an ambitious interpretation of the one Fytcha requested above: it should work in search result mode, page preview/reader mode, and book information screen mode, on both the old and new versions of Google Books. Additional features include (inconsistent, based on Google's database) OCLC/LCCN detection, volume and issue numbers, and series titles. The code is here: [8]. This same code should work on both Firefox and Chrome. Because the code is more complex now, there may be bugs/undesirable behavior. Feel free to report anything. 70.172.194.25 04:54, 10 January 2022 (UTC)[reply]

Minor bugfix to the above. Sorry, I had to fix it since I noticed it so quickly (page numbers were not working for in-book view on the new version of Google Books). [9]. 70.172.194.25 06:01, 10 January 2022 (UTC)[reply]
Works great! Thank you again so much for this! — Fytcha T | L | C 01:48, 11 January 2022 (UTC)[reply]

Google Scholar version

[edit]

For Greasemonkey on Firefox: [10]. For Tampermonkey on Chrome: [11].

You have to click the normal "Cite" button, which now will pop up with the Wiktionary quotation format on top. Similar caveats apply as above. I also added a button to run a (couple second) longer check to see if a journal article is on CrossRef, which sometimes (but not always) gives the DOI and language, although you can also obtain that information elsewhere or omit it if deemed not worth it.

Let me know what you think. 70.172.194.25 10:47, 9 January 2022 (UTC)[reply]

Wow, nice, it works great. Thanks a million! Also, I think you should definitely create an account.Svārtava [tcur] 13:04, 9 January 2022 (UTC)[reply]

Archive.org book version

[edit]

Link: https://pastebin.com/TrUyug8X

Let me know if there are bugs. 70.172.194.25 20:44, 5 February 2022 (UTC)[reply]

Google Groups version

[edit]

Link: https://pastebin.com/LuUCAukE

Again, let me know if there are bugs. It also includes an option to hide all non-Usenet results from searches. 70.172.194.25 01:19, 6 February 2022 (UTC)[reply]

putting back no entries

[edit]

It wouldn't let me revert calculusless to "no entry" because it stripped l3s. General Vicinity (talk) 00:44, 10 January 2022 (UTC)[reply]

Save the Date: Coolest Tool Award 2021: this Friday, 17:00 UTC

[edit]

<languages />

Hello all,

The ceremony of the 2021 Wikimedia Coolest Tool Award will take place virtually on Friday 14 January 2022, 17:00 UTC.

This award is highlighting software tools that have been nominated by contributors to the Wikimedia projects. The ceremony will be a nice moment to show appreciation to our tool developers and maybe discover new tools!

Read more about the livestream and the discussion channels.

Thanks for joining! andre (talk) -08:02, 6 January 2022 (UTC)[reply]

Feasibility of a bot to line up columns in translations with the trans-mid template?

[edit]

I just edited welcome because the list of translations for the interjections included far more lines in one column than another. It seems trivial to me as someone who couldn't make a bot to save his life to make a bot that would insert {{trans-mid}} in the middle of the list of translations, plus or minus a few entries. Is this something that someone can do and feels like is worth doing? Thanks. —Justin (koavf)TCM 09:42, 12 January 2022 (UTC)[reply]

@Koavf I'm pretty sure there used to be a bot that did exactly that, maybe it was run by User:Ruakh? Benwing2 (talk) 03:36, 13 January 2022 (UTC)[reply]
@Koavf I found it. See e.g. [12] on fraught, by User:NadandoBot, run by User:DTLHS. Benwing2 (talk) 07:00, 13 January 2022 (UTC)[reply]
Thanks! @DTLHS:, looks like it doesn't run anymore. Can it do this task? —Justin (koavf)TCM 07:05, 13 January 2022 (UTC)[reply]
I'm not interested in running it at this time. DTLHS (talk) 16:24, 13 January 2022 (UTC)[reply]
Me, either. It's just too hard to decide where the {{trans-mid}}, partly because of difficulty translating the wikitext into dimensions (different characters take up different amounts of space, to say nothing of templates and so on), and partly because line-wrapping depends on screen size. I prefer bot tasks where I can feel reasonably confident that they're making an improvement. Hopefully CSS support for balancing columns will become widely-enough available someday that we can get rid of {{trans-mid}} and have the browser handle this for us. (Though even in the meantime, maybe we can use some JS to achieve it? Not sure.) —RuakhTALK 02:32, 15 January 2022 (UTC)[reply]
@Ruakh what is the problem with current browser functionality re column balancing? According to caniuse.com, 97.9% of desktop users can reap the benefits of the column-fill CSS property, but in my tests, this isn't even needed - just applying column-count: 2 does the trick. What am I missing? This, that and the other (talk) 06:03, 15 January 2022 (UTC)[reply]
You may not be missing anything. I had Googled `css balanced columns` and found https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_Columns/Spanning_Columns, which gave me the impression that it wasn't well-supported. But I suppose that page might be old, or too conservative, or something. —RuakhTALK 06:56, 15 January 2022 (UTC)[reply]
I made a mockup with translation tables divided into columns using pure CSS at User:This, that and the other/subject - what do you all think? This uses pure CSS to achieve the column effect, with the added bonus that it adapts the number of columns to what can comfortably be displayed on the user's screen. Most users will continue to see 2, while mobile users will see 1 and users on very large screens will see 3 columns. (The translation <table> element now only has one cell and is not needed, but I preserved it in the mockup so the translation-adder gadget didn't break.) This, that and the other (talk) 08:08, 15 January 2022 (UTC)[reply]
Hmm, I'm seeing only one column in Firefox 95 (even if I zoom out so much that there's room for six to eight columns). It works well in Chrome, though. If we can get it working as well for the other major browsers, it will definitely be an improvement over the status quo. :-) —RuakhTALK 09:07, 15 January 2022 (UTC)[reply]
I might have fixed it, but I would need an administrator to change the content model of "User:This,_that_and_the_other/subject/styles.css" to "Sanitized CSS" using Special:ChangeContentModel.... This, that and the other (talk) 09:29, 15 January 2022 (UTC)[reply]
Sorry, no dice: the system won't let me change the content model of a page in your user-space. Furthermore, when I tried copying the CSS to a page in my user-space and changing its content model to 'Sanitized CSS', it wouldn't let me do so because the hyphen-prefixed CSS properties aren't recognized.
So, I think there are two options here:
  1. If you're pretty confident that this is the desired CSS (like, there won't be several more rounds of testing and updates), then I can add it to MediaWiki:Common.css, just with the selector changed to .this-that-and-the-other .translations ul so that you can use it from your page without affecting existing pages until we're ready.
  2. For purposes of your own testing, you can use the HTML syntax for an unordered list instead of the wikitext syntax, and then put the CSS inside <ul style="...">. (I've just tested, and these properties are let through in that context. Dunno why 'Sanitized CSS' is so strict if the same properties are allowed in inline CSS, but whatever.)
RuakhTALK 11:13, 16 January 2022 (UTC)[reply]
Ah, you probably need to be an interface-admin to change the content model of someone else's user pages. I didn't think of that.
The problem with the raw <ul> tag is that MediaWiki automatically creates a new <ul> element to wrap the * list items. Using <li> tags instead of * would probably prevent this, but I'd rather not muck around with all the nested lists to change them to use <li> tags - that also runs the risk of the mockup diverging from real-world practice. The CSS is pretty simple and I don't anticipate any issues, so I'd be grateful if you could put the code in common.css so the mockup page can be tested by more users. Thanks @Ruakh! This, that and the other (talk) 11:38, 16 January 2022 (UTC)[reply]
Done Done; see MediaWiki:Common.css?diff=65358919 for details. —RuakhTALK 19:56, 16 January 2022 (UTC)[reply]

Plopping this down here because the thread is getting long. It looks like column-width has been supported for almost 6 years by all the major browsers (the latecomer being Firefox and Firefox for Android on 2016-11-15), and the vender-prefixed versions have been supported for even longer. That probably falls within our compatibility obligations (mw:Compatibility#Browsers, and we've already been using the similar column-count properties in list templates like {{col3}} anyway. (For some reason the un-vendor-prefixed column-count was supported later — 2017-03-07 in Firefox — but again there were vendor-prefixed versions available much earlier.)

column-width might be better than column-count because it's less likely to squeeze the words in the columns if the viewport is narrow. But we should check how column-width treats translation terms that are too long; maybe we want to somehow select a column width that's wider than the longest translation? Might require translation box classes with several different widths where you can choose one and put it in {{trans-top}} or {{trans-top-see}} or {{checktrans-top}}. — Eru·tuon 21:59, 16 January 2022 (UTC)[reply]

I think column-width makes more sense than column-count for this, and indeed for all our columnar templates. I've never understood why we have {{col2}}, {{col3}} and {{col4}} as separate templates used without rhyme or reason - it would make far more sense (to me) to have a single {{col-box}} that sets a column width and, in doing so, adapts to the space available on the user's screen.
For the translations table at User:This, that and the other/subject I picked a column width of 35em, which seems adequate for typical translation tables. If translation lines are long, they will wrap - that is the current behaviour, and there is no risk of ambiguity thanks to the bullet at the start of every translation line. Moreover, bear in mind that 35em is a minimum. Hardly any users will ever see a column that is exactly 35em wide.
The only place I can think of that might require a wider column width is phrasebook entries, perhaps via a {{trans-top-phrasebook}} template. This, that and the other (talk) 00:27, 17 January 2022 (UTC)[reply]
@Ruakh, @Erutuon: I found this discussion from 2019 where more or less the same conclusion was reached. Rather than wait another 3 years, why don't we go ahead and get it done. Here's the plan:
If there are issues, the changes can be fixed or reverted. Later, we can set about getting rid of the useless one-cell table (and updating the translation adder gadget accordingly). What do you think of this plan? This, that and the other (talk) 09:17, 25 January 2022 (UTC)[reply]
I agree in principle — and thank you for doing this — but it will probably be a week or so before I have a chance to take a deep enough look to feel comfortable proceeding with it. (Of course, if someone else with the requisite permissions has already taken a deep look and is comfortable proceeding, (s)he should feel free to do so.) —RuakhTALK 09:04, 26 January 2022 (UTC)[reply]
Actually, wait a sec, I see that User:This, that and the other/subject still shows only one column in Firefox. Are you sure this is quite ready? —RuakhTALK 09:06, 26 January 2022 (UTC)[reply]
@Ruakh It works fine for me in Firefox versions 52 and 96 on Windows. Have you done a Ctrl+F5 to refresh the CSS? Also if you have a narrow screen you'll only see one column instead of two cramped ones - maybe try zooming out to see the proper effect. This, that and the other (talk) 10:27, 26 January 2022 (UTC)[reply]
I don't consider my laptop screen "narrow" — my browser viewport is about 10.8" wide (and about 1825px), and when there's only one column I get a lot of empty space — but I can confirm that if I zoom out a bit then I do get two columns now. So I guess it's not a technical issue, just a disagreement about how wide the columns need to be. (Or, something a bit more subtle. In Chrome I obviously have the same browser viewport size, but I do get two columns. It looks like the real issue is that in Firefox I've set up CSS to make the text a bit bigger so it's easier to read, and since the column-widths are specified in 'em' that effectively makes the viewport more narrow. But either way, the cutoff seems too high to me.) —RuakhTALK 00:20, 28 January 2022 (UTC)[reply]
I also don't see two columns normally (also in Firefox 96), but it's apparently because of the width of the content area, because when I go into the developer tools and change the column width to 10em, there are columns. [Edit: Yeah, the content area is 60em wide; when I widen it to 70em;, the translations show with columns.] — Eru·tuon 22:24, 26 January 2022 (UTC)[reply]
@Ruakh, @Erutuon: yes I see that now, when I turn off "Use Legacy Vector" in my preferences. Let's make the column width 30em instead, then. This, that and the other (talk) 00:53, 28 January 2022 (UTC)[reply]

Translation column width implementation

[edit]
Input needed
This discussion needs further input in order to be successfully closed. Please take a look!

@Ruakh, @Erutuon: I'm still keen to push this into use.

The implementation required is:

It's entirely possible there will be complaints or bugs from certain quarters, but I am confident they will be minor. Any chance that one of you could have a go at this? This, that and the other (talk) 05:33, 25 February 2022 (UTC)[reply]

Also ping @Fytcha who is more active. This, that and the other (talk) 05:36, 25 February 2022 (UTC)[reply]
@This, that and the other: I'm not an interface admin so I can't change MediaWiki:Common.css and neither do I know my way around more advanced CSS. I think I'm not of any help here, sorry! — Fytcha T | L | C 08:24, 25 February 2022 (UTC)[reply]
I'm sorry; I definitely support the concept, but I haven't had a chance to validate the details and apply the change, and I definitely won't be able to do so for at least the next week and a half. :-/   Hopefully someone else will handle it during that time, but if not, I'll take a look when I can. —RuakhTALK 08:42, 25 February 2022 (UTC)[reply]
Thanks both for your quick input, and sorry to bother you. I'm keen to get this done before it scrolls off the GP page - I'll ping Benwing2 for another pair of eyes. This, that and the other (talk) 00:51, 26 February 2022 (UTC)[reply]
Two questions: does the CSS work with the old versions of the translation box templates? Does the same column width work well for all translation boxes?
Templates take longer to update than CSS so if the CSS causes the old versions of the translation boxes to behave weirdly, people will see it for a while, possibly for a week or even a month depending on the whims of the servers.
Also curious if there are translation boxes with particularly long or short words so that we will want wider or narrower columns. 30em is pretty wide, so there probably aren't many words longer. I have a translation database (used by the enwikt-translations website) so I can look for the longest word myself. — Eru·tuon 20:02, 26 February 2022 (UTC)[reply]
@Erutuon Thanks for taking a look at this. Good point about CSS cross-compatibility. But in this case it's actually not a big deal; while new-CSS with old-template could technically result in people seeing 4 columns of translations, this would only apply to people with extremely wide screens, who will end up seeing 4 columns of translations with new-CSS + new-template anyway! I just tested it out and it does indeed work this way.
As for column width, I was mainly wanting to mimic existing behaviour where the columns are quite wide. It could definitely be narrower, but perhaps that can be adjusted once it is in use? This, that and the other (talk) 21:56, 26 February 2022 (UTC)[reply]
Okay, here are the longest words in translation terms. The longest ones are on Tau­ma­ta­wha­ka­ta­ngi­ha­nga­koauauo­ta­ma­tea­tu­ri­pu­ka­ka­pi­ki­mau­nga­ho­ro­nu­ku­po­kai­whe­nua­ki­ta­na­ta­hu. The very longest translation there is 100 letters and it wants 52em width on that website at least. There isn't a way to set arbitrary column widths in the translation template, so we'll need to select a default width and then some smaller and larger widths, and add a class for each to a stylesheet that's loaded when the translation box templates are used, and add a parameter that selects a non-default class. The minimum width needs to be at least 12em, because that's about the length of the longest language name, Communicationssprache. (That language doesn't have any translations, but most of almost-as-long language names seem to be Australian languages, like Pitjantjatjara and Pallanganmiddang, some of which do have a good number of translations.) I don't know if language names and translation terms contain the longest words in translation sections, but they're the easiest for me to check. — Eru·tuon 02:28, 27 February 2022 (UTC)[reply]
I'd suggest that extremely long translations are very rare, but nonetheless, perhaps we need a {{trans-top-wide}} that specifies 50em or 55em column width. How many entries have translations containing single words with more than, say, 45 characters? This, that and the other (talk) 03:01, 27 February 2022 (UTC)[reply]
In the list I linked, it's just translations of Tau­ma­ta­wha­ka­ta­ngi­ha­nga­koauauo­ta­ma­tea­tu­ri­pu­ka­ka­pi­ki­mau­nga­ho­ro­nu­ku­po­kai­whe­nua­ki­ta­na­ta­hu and pneumono­ultra­microscopic­silico­volcano­coniosis that are longer than 45 characters. Some of the scripts in that list end up being longer than 30em, even with fewer characters, like Japanese and Korean, but they're all on those two pages. 30em ends up being quite wide enough for normal words.
I'd like it if there were narrower columns for basic concepts like water ("clear liquid H₂O") that have lots of languages and mostly short words. There's a lot of empty horizontal space there at the moment with approximately 30em columns, with the vector skin at least.
Maybe a separate template if we only want two widths, but a parameter that only accepts a few values would be more extensible. Like {{trans-top|...|column-width=wide}} for those two long words, {{trans-top|id=Q283|clear liquid H₂O|column-width=narrow}} for water/translations, otherwise a default. And the parameter could make the template add classes that would set particular column widths. The parameter would error for unexpected things without classes like |column-width=insert vandalism here. If someone didn't like how narrow the H₂O columns were, they could use their personal CSS to widen the narrow column class, or get rid of columns altogether. Narrow columns aren't crucial, so they could be figured out. 30em seems a reasonable width to start with. — Eru·tuon 08:28, 27 February 2022 (UTC)[reply]
@Erutuon I made the necessary change to User:This, that and the other/trans-top-css-columns so that it is now ready to copy over to {{trans-top}}. Obviously the corresponding CSS classes would need to be added, and I endorse your proposal for the widths. This, that and the other (talk) 10:43, 2 March 2022 (UTC)[reply]
@This, that and the other I know this is from a while ago, but have you looked at the CSS grid property? Specifically, grid-auto-flow? That might be a more elegant way to do this that obviates the need for different templates if there's a particularly long translation. Theknightwho (talk) 13:55, 28 April 2022 (UTC)[reply]

falloir

[edit]

fallant is given as present participle, but does not exist — This unsigned comment was added by JohnWheater (talkcontribs) at 08:08, 13 January 2022 (UTC).[reply]

@JohnWheater fallant is rare and perhaps obsolete (indeed, Molière used it) but it does seem to exist. To dispute the existence a word at Wiktionary, you may follow the process outlined at Wiktionary:Requests for verification/Non-English. This, that and the other (talk) 08:51, 13 January 2022 (UTC)[reply]

Category or some such for Language translations

[edit]

It'd be nice to have a page where I could check English words with a given languages translations - people add some weird translations that are incorrect and it's hard to check all of them - if they were centralized it would be easier. It could be like xlanguage with translations. Vininn126 (talk) 18:06, 13 January 2022 (UTC)[reply]

Addendum: perhaps this could be a fullblown search engine, but that might require much more work. Users could look up content in glosses, or by gender, aspect, etc (really anything in the translation box), or for transliterations etc, if that sort of thing is posisble. Vininn126 (talk) 18:23, 13 January 2022 (UTC)[reply]
For the time being, there's this: [13]Fytcha T | L | C 18:25, 13 January 2022 (UTC)[reply]
@Fytcha Translation subpages (created when the sheer number of translations causes "out of memory" errors) aren't in Category:English lemmas. Aside from that, it's very useful, and I now have the version without "incategory" bookmarked. So far I've found someone who was using {{t}} in etymology sections, and some other problems I wouldn't have found otherwise. Chuck Entz (talk) 03:58, 15 January 2022 (UTC)[reply]
Thanks! Vininn126 (talk) 20:25, 14 January 2022 (UTC)[reply]
Several people have requested a translation search engine and it seems kind of fun, so I'm working on a translation database to start with. At this point it can extract the glosses from {{trans-top}} and the translation information from {{t}} and {{t+}} and the rest. Hopefully someday I'll actually make a Toolforge site that uses it. — Eru·tuon 03:35, 15 January 2022 (UTC)[reply]
Forgot to mention. The translation searching website (enwikt-translations) is up now. — Eru·tuon 08:30, 27 February 2022 (UTC)[reply]

I tried to add a quotation but it would not let me save. 70.172.194.25 05:17, 14 January 2022 (UTC)[reply]

For this reason (among a ton of other reasons) you should consider creating an account. —Svārtava [tcur] 05:33, 14 January 2022 (UTC)[reply]
Sorry for the inconvenience. The abuse filter entry is Special:AbuseLog/1254533; an unfortunate false positive. Autoconfirmed users are exempt from this filter. — Fytcha T | L | C 11:47, 14 January 2022 (UTC)[reply]
Could you add it to the page? 70.172.194.25 16:32, 14 January 2022 (UTC)[reply]
Why don't you create an account? It's very easy and has a lot of advantages for regular contributors. —Svārtava [tcur] 16:38, 14 January 2022 (UTC)[reply]
Done. — Fytcha T | L | C 17:37, 14 January 2022 (UTC)[reply]

Escaping spaces in url= in {{quote-journal}}

[edit]

I tried both with a space and with %20. Anyone know the right way to do this? [14] 70.172.194.25 20:49, 15 January 2022 (UTC)[reply]

Never mind, there was another space I forgot to escape. I'm not sure why the template can't just do this automatically, though. 70.172.194.25 20:51, 15 January 2022 (UTC)[reply]
What is the issue exactly? Are you wanting it to automatically URL-encode the param in |url=? I think that will break existing URL's in this param, although it could potentially be hacked to do this only for spaces. OTOH it could be argued that the value of |url= should be a valid URL already. Benwing2 (talk) 04:30, 17 January 2022 (UTC)[reply]
It should at least emit an error if there is a non-trailing/leading space, instead of silently failing. The current behavior is to use the first part before the space as the URL and the rest as the link text, as in [https://example.org/Some/Unescaped Path/Here] => Path/Here. 70.172.194.25 05:02, 17 January 2022 (UTC)[reply]
I implemented this; hopefully it won't break anything. Benwing2 (talk) 06:18, 17 January 2022 (UTC)[reply]
Cool, much appreciated! Perhaps you could add a tracking category to find such errors (and see if it broke anything). 70.172.194.25 06:19, 17 January 2022 (UTC)[reply]
No need for a tracking category, they show up in CAT:E. I already fixed a few of them. Benwing2 (talk) 06:33, 17 January 2022 (UTC)[reply]

Flamenco tag

[edit]

Hi. I tagged falseta as being a flamenco term, yet it doesn't show up in Category:es:Flamenco. Can someone please add flamenco as a valid tag? Br00pVain (talk) 14:20, 17 January 2022 (UTC)[reply]

@Br00pVain Try now. Benwing2 (talk) 18:54, 17 January 2022 (UTC)[reply]

I've made this change which can already be seen to be working in Urlaubssemester. Unfortunately, I didn't quite figure out how to make use of the preexisting infrastructure in Module:de-noun#L-45; could someone help me do that? Also, it would be really nice if we could use the same parameters for {{de-noun}} and {{de-decl-noun-f}} (etc.) such that editors can just copy down the headword template, switch the template name and everything works. I believe some bot intervention will be necessary for that as the modules have different defaults, see the plural parameters in Urlaubssemester. — Fytcha T | L | C 21:07, 17 January 2022 (UTC)[reply]

@Benwing2: Thanks for adding this to your list. To expand slightly: I think backward-compatibility has to go out the window for this one unfortunately. The templates just do fundamentally different things, most notably with empty arguments: The headword tries to do clever guesswork based on the gender with no plural given whereas the declension template just adds nothing. Additionally, the headword supports "-" for no plural but this was deprecated for the declension template (for some reason). If I could make a wishlist, I'd want: 1. that the two templates' arguments be identical for most cases 2. that an empty argument default to adding nothing; I don't want to repeat the page name to denote null affixation all the time. This combination maximizes predictability (I never rely on the default forms; I just write them myself; saves a preview) and minimizes typing. It also reduces the chance of errors greatly as the arguments of the two templates can be compared by just looking at the code (and copying always produces the right result; not the case currently). — Fytcha T | L | C 05:22, 23 January 2022 (UTC)[reply]
@Fytcha I completely agree that the templates should take the same params. This is how e.g. the Latin and Russian noun headword and declension templates work as well. I'm not sure at this point what the right param format would be, but it's highly likely I'll create a new unified declension template, something like {{de-ndecl}}, to replace all the existing templates in Category:German noun inflection-table templates. This template will probably work something like the existing {{la-ndecl}} or {{uk-ndecl}} templates (and for that matter like the existing {{de-conj}} template), where you specify the declension using <...> after the noun or adjective in question. That allows you to handle arbitrary multiword combinations, instead of having specialized templates for every combination like adjective+feminine-noun etc. So for example you might write something like (for schwarzes Loch)
{{de-ndecl|schwarzes<+> Loch<N.gens:es.pl^er>}}

where in this case, N means "neuter", gens:es means the genitive takes either -s or -es and pl^er means the plural takes -er with umlaut. This might alternatively be written something like

{{de-ndecl|schwarzes<+> Loch<N.gen:Lochs:Loches.pl:Löcher>}}

Here, the colon directly following the case specifiers gen and pl means what follows is a full word rather than an ending.

Note, this is off the top of my head, based on how I handled the manifold complexities of Ukrainian noun declension. I'd need to study the range of possibilities for German before settling on the final syntax. Whatever syntax gets adopted for {{de-ndecl}}, the same syntax will be used for {{de-noun}} so you can just copy from one to the other. Benwing2 (talk) 06:05, 23 January 2022 (UTC)[reply]

@Benwing2: Sounds great thus far but I think we must distinguish between (e)s and es genitive words. Some words like Beschluss only have the -es genitive, though it may be predictable (with only those ending in -s, -ß, -z and perhaps -x not allowing the -s genitive (this is however of course a phonological feature so the written forms of the words could be deceptive in some edge cases)). I hope however that for the simple Loch, making it possible to just write (e)s|^er instead of Loch<N.gens:es.pl^er> is also on your agenda. Also, maybe the adjective should be passed in the lemma form; makes stuff much easier and prevents some edge cases where the code lemmatizes incorrectly. Thank you so much again for helping with this! — Fytcha T | L | C 15:09, 23 January 2022 (UTC)[reply]
@Fytcha Can you clarify the various proposals you described above with examples? I'm afraid I don't completely understand what you mean e.g. by "we must distinguish between (e)s and es genitive words". As for something like (e)s|^er, I can't use the pipe symbol so easily in the format I'm proposing, but what I can do is (a) make the angle brackets optional in the case that the lemma being qualified is the same as the pagetitle, (b) support parenthesized (e)s as a shortcut for s,es, and (c) simplify the format so you don't have to explicitly specify gen and pl, something like this for Loch:
{{de-ndecl|N.(e)s,^er}}
which is equivalent to
{{de-ndecl|Loch<N.(e)s,^er>}}
That is similar to what you've proposed, except you need to specify the gender (I'm not sure how I can infer this, unless it's safe to assume plurals in -er are neuter by default), and I've used a comma in place of the pipe symbol. Benwing2 (talk) 18:08, 23 January 2022 (UTC)[reply]
@Benwing2 The gender is best not inferred to be honest (it's one of those features where natives would probably always enter the gender anyway just to be sure because it takes no time, whereas non-natives might over-rely on the automatic deduction instead of consulting a dictionary). It's probably also easier to leave it lower-case (less keystrokes, less chance of a typo).
What I said regarding the genitive was targeted at this statement of your: "gens:es means the genitive takes either -s or -es" I think it's probably better to have s, es and (e)s as we do currently. It sounded like you wanted to make -es emit either ["-s","-es"] or ["-es"] (based on the word's ending). Hope this clears it up.
I strongly agree with (a)-(c) and I think the templates should be made such that by default the users don't have to specify the arguments' names. I assume however that using , between genitive and plural in your last example was a mistake and that you wanted to use . again, right? For a difficult case (such as Mann), it would probably ideally look like {{de-noun|m.(e)s.^er,,en.^chen,^lein.f=^in}}, but I think it should optionally be possible to specify the full forms too (helps for reading and c-F'ing the source): {{de-noun|m.(e)s.Männer,Mann,Mannen.^chen,^lein.f=Männin}}
Pinging also (Notifying Matthias Buchmeier, -sche, Atitarev, Jberkel, Mahagaja, Fay Freak): to have more input. — Fytcha T | L | C 18:58, 23 January 2022 (UTC)[reply]
Also, if we're revamping the headword template completely anyway, I think having a syntax for "usually uncountable" and perhaps "unattested plural" and "countable and uncountable" (though I was never a fan of the latter; better to have the labels show this) would be helpful, though there is currently a lingering debate spread on several discussion pages about whether "countable" and "permitting a pluralized form" are the same thing. See for instance Umdreherin: The plural form is unattested but it's the only sensible plural form that can exist and there's no reason why the word couldn't be countable. Not sure what to do with something like that. If this is easily added later on then just forget it for the time being. — Fytcha T | L | C 19:11, 23 January 2022 (UTC)[reply]
@Benwing2 What does the + mean in schwarzes<+>? To indicate the umlaut, the most obvious character to use would be ¨, but that might be tricky to type for most people. And if we have named parameters, do they have to be part of the format string? For example, could it be {{de-noun|m.(e)s.^er,,en.^chen,^lein|f=^in}}? I'm a bit worried that the syntax will get overly complex and deter people from making edits. – Jberkel 20:00, 23 January 2022 (UTC)[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── @Jberkel, Fytcha The + means it's an adjective. This is the same syntax used in {{uk-ndecl}}, {{be-ndecl}}, {{hi-ndecl}} and several other places. As for |f=, I agree it should be a separate named parameter, probably same with |dim=. This is also consistent with how {{ru-noun+}}, {{uk-noun}} and others work. So I would recommend something like this for Mann:

{{de-noun|m,(e)s,^er::en|dim=^chen,^lein|f=^in}}

The general idea is that a dot (period) separates what I call "indicators":

  • Gender + genitive + plural together constitute an indicator (with commas separating gender, genitive and plural, and colons separating variants for genitive or plural). The reason for combining them like this is to make it possible to omit the genitive and/or plural when it makes sense to do so, as with weak nouns.
  • plonly is a possible indicator, specifying a plural-only noun.
  • sgonly is a possible indicator, specifying a singular-only noun. (If these nouns are common enough, we could use a shorter code, such as sg.)
  • + is a possible indicator, specifying an adjectivally-declined noun.
  • weak is a possible indicator, specifying a weak noun, which has -n in all but the nominative singular (or the nom/acc singular for neuter nouns) if the lemma ends in an -e, otherwise -en.
  • Individual case overrides can be possible indicators, such as datpl:Foon if there is a need to override the dative plural individually (or datpl:Foon:Fooen to indicate two possible variants for the dative plural).

In general, indicators can come in any order, but the gender + genitive/plural must be first. So for Präsident, we might have:

{{de-noun|m.weak|f=~in}}

Here, with the weak indicator, there is no need to give the genitive or plural (although they could be supplied if desired). The ~ means to substitute the lemma in place of the ~. Similarly, with Name:

{{de-noun|m,ns.weak}}

Here, the plural defaults to -n, as with all weak nouns, but the genitive is irregular, so we need to specify it. For Abgeordneter, we might have:

{{de-noun|m.+|f=+}}

This is an adjectivally declined noun. + as an indicator specifies adjectival declension, and + in place of the value of a param such as |f=, |m= or |dim= requests the "default" value; in this case, Abgeordnete. An alternative could be something like |f=~~e, where ~~ means to substitute the stem of the lemma, minus any inflectional ending (in this case, Abgeordnet). Benwing2 (talk) 21:21, 23 January 2022 (UTC)[reply]

I was pinged and though I don't have time to parse the technical side of the discussion right now, I had just recently been thinking these templates could stand to be made smarter in various ways, including that (similar to how {{en-verb}} was recently made smarter) it could stop defaulting to adding another e when guessing the plural of something that ends in e (e.g. for Zischägge its guess is Zischäggeen, wrong). I'm a little concerned by how technical some of the proposed inputs above look, but I suppose it's not much less human-reading than some of our existing input formats and it's just a matter of getting used to them. - -sche (discuss) 07:02, 24 January 2022 (UTC)[reply]
@-sche I think it's maybe more technical-looking than it actually is. I will give a think to how to make it clearer. Benwing2 (talk) 07:34, 24 January 2022 (UTC)[reply]
@Benwing2 I am glad that the German headwords are getting improved. I have also missed a few technical details here but I'm sure this can be understood in real examples. The wikicodes may be difficult to follow for everyone, admittedly but I haven't followed almost each step like I did for example, with Bulgarian, Ukrainian and Belarusian headwords to appreciate all the benefits. It might be worth supplying a test page with different cases, as you did with the above languages and others to sell it better :) - only if you feel it will help you get more people to support this, of course. --Anatoli T. (обсудить/вклад) 08:35, 29 January 2022 (UTC)[reply]
@Fytcha See User:Benwing2/test-de-noun and User:Benwing2/test-de-ndecl. It is very close. The main missing functionality is the support for adjectival nouns like Erwachsener and adjective-noun combinations like schwarzes Loch, because the adjective module isn't yet written. Benwing2 (talk) 05:01, 5 February 2022 (UTC)[reply]
(Notifying Matthias Buchmeier, -sche, Atitarev, Jberkel, Mahagaja, Fay Freak, Fytcha): I am writing a script to convert German headwords and declensions to the new format. I have a question, though. The current {{de-noun}} and {{de-decl-noun-m}}/{{de-decl-noun-n}} support the notation (e)s for the genitive being either -es or -s, and displays them in that order. Similarly, the notation (s) for the genitive displays as -s or no ending, in that order, and (for {{de-noun}} only) the notation (es) for the genitive displays as -es or no ending, in that order. These codes are widely used in the declension templates but not the headword templates, because support for them was only recently added to {{de-noun}}. Instead, {{de-noun}} will list both genitives explicitly in either order, with no apparent rhyme or reason as to which one goes first. Should we preserve this order as is, or is it OK to convert to the (e)s / (s) / (es) codes and standardize the order? My instinct is to standardize the order, as I think probably the existing order is random and doesn't actually indicate anything in most cases; but I want to check with some others. Benwing2 (talk) 01:42, 6 February 2022 (UTC)[reply]
@Benwing2: I'd say for most (e)s words, -es is more common but from a quick glance at ngram there do seem to be exceptions: Wörterbuchs>Wörterbuches. I would be fine with standardizing the order in the case of (e)s as I assume that nobody checks ngram before adding the genitive forms anyway. The same argument applies for the two other cases (es)/(s) but there I'm less sure which genitive form is more common, probably the marked one as opposed to the unmarked one. — Fytcha T | L | C 07:28, 6 February 2022 (UTC)[reply]
Regarding order of genitive (e)s, it seems to depend on the context/sentence rhythm etc. Duden, dewikt list es first, then s, so maybe we could just follow that. –Jberkel 18:24, 6 February 2022 (UTC)[reply]
@Benwing2: Looks great from what I can see so far. I've added two failing tests at the bottom of the former page: Umlauting a double vowel should just give one (Boot ^chen -> Bötchen). Only occurs with oo and aa from what I can tell. — Fytcha T | L | C 07:14, 6 February 2022 (UTC)[reply]
@Benwing2: Is it possible to have deprecated support for the old {{de-noun}} syntax? This has the benefit that we could keep using {{de-noun}} instead of having to create {{de-noun+}} for the new syntax (I really, really don't like the extra plus) without sacrificing readability of old revisions. — Fytcha T | L | C 07:31, 6 February 2022 (UTC)[reply]
@Fytcha Hmm. I was thinking of renaming the existing {{de-noun}} to {{de-noun-old}}, installing the new stuff under {{de-noun}}, and converting the old to the new. I spent much of today writing a conversion script; out of 44,188 entries, it was able to convert all but around 2,000. (These remaining entries are either adjectival nouns or have errors of various sorts, e.g. mismatches between the headword and declension that can't automatically be worked around. The adjectival nouns, about 500 or so, will get done once I implement that support, but the others will have to be done by hand.) In terms of supporting the old format without renaming, that should be mostly possible because the old format usually has |2= or |3=, which won't be present in the new format. There are occasional cases like Abbreviation that don't use |2= or |3= currently, but for the most part they don't need any parameter changes. There are very occasional cases where this doesn't work, like Alternativmedizin, where the old template form uses {{de-noun|f}} and the new one uses {{de-noun|f,en}} (the default plural here according to the new algorithm would wrongly be #Alternativmedizinnen). There are a few very other occasional cases like Aristokratin and Abdampfrate where the current template uses {{de-noun|f}} but wrongly so as an incorrect plural is generated; in these cases, the new default plural algorithm is correct. If I were to maintain this sort of compatibility, it will add a certain extra burden of code maintenance, so I'd like to rip out the stuff you recently added that supports endings like en or (e)s rather than keeping it around forever; I imagine there are few current entries using it, so the compatibility gain in terms of making old revision histories legible will be minimal. Benwing2 (talk) 07:51, 6 February 2022 (UTC)[reply]
So the way to do this conversion is approximately as follows:
  1. Add a tracking category for any existing entries that use {{de-noun}} without |2=, |3= or |4= or |old=. This will gradually fill up, but if anyone creates a new such entry or modifies an existing entry, it is guaranteed to get added to the tracking category.
  2. Add |old=1 to all existing uses of {{de-noun}} that don't use |2=, |3= or |4=.
  3. Install the new code, which conditionalizes on the presence of |2=, |3=, |4= and |old=, calling the old code if any of these params are present, otherwise calling the new code.
  4. Run the conversion script to convert as much as possible to the new format, removing |old= whenever an entry is converted.
  5. Run the conversion script on the tracking category. (Its purpose is to catch anyone modifying entries during step 2.)
  6. Remove the tracking category code.
  7. (Gradually) fix the remaining errors.
I have done this type of process before for English headword templates and it works well.
Benwing2 (talk) 08:03, 6 February 2022 (UTC)[reply]
@Benwing2: That would fine with me too. All but having to type an extra char for life :) A small point regarding your conversion script, I've noticed in the past that there are some articles where either the headword or the declension template only has one of two possible genitives/plurals (particularly in the case of (e)s). In the simple cases (e.g. where one of the two has -s and the other one has -es/-s), it is fine to merge them, i.e. to use the union of the forms for the new template. Hopefully this brings down the number some more. — Fytcha T | L | C 08:07, 6 February 2022 (UTC)[reply]
@Fytcha Thanks. I have been allowing this sort of mismatch in one direction: If there are extra genitives or plurals in the headword, they take precedence, but if there are extra in the declension, or some other sort of mismatch, a warning is output and nothing is done. This is because there are very often errors in one or the other. Benwing2 (talk) 08:28, 6 February 2022 (UTC)[reply]
@Benwing2: You can maybe also make use the fact that compounds always inflect identically as and have the same gender as the last component. In case of a conflict, a compound's parts are more likely to be correct I'd say because shorter words are more common and more common words are looked at more often and have a lower chance of being incorrect. — Fytcha T | L | C 20:51, 6 February 2022 (UTC)[reply]
@Fytcha I have pushed the code and I am converting as much as is possible to the new format. Documentation is present for {{de-noun}} and {{de-ndecl}}. There are still 2,051 warnings, here: User:Benwing2/convert-de-noun-warnings. This was reduced from about 4,000 originally (out of about 44,000 nouns), but I've hit the point of diminishing returns in trying to further handle them automatically, except for adjectival nouns and adjective-noun combinations, which I still need to finish the support for. If you want to help fix some of them, that would be much appreciated. Benwing2 (talk) 04:34, 7 February 2022 (UTC)[reply]
@Benwing2: Started working on it, will fix more of them in a couple of days as I'm currently a bit busy. Many seem to be nominalized adjectives which I hope we can remove from the list once the templates support them because they make going through the list a lot more cumbersome. A last point: For words where there's only one etymology but two disagreeing genders with the same inflection parameters (like Bezirk), can we please tie-break the correct gender by bot using de.wikt as a reference? — Fytcha T | L | C 10:58, 7 February 2022 (UTC)[reply]
@Benwing2: Can we please get accelerated creation support for diminutives in the headword again? — Fytcha T | L | C 11:05, 7 February 2022 (UTC)[reply]

pl-pronunciation

[edit]

Recently we've been updating our templates from {{pl-IPA}} to {{pl-pronunciation}} manually. Would it be possible to get a bot to do this? If a page has pl-IPA at all it's safe to replace with pl-p. Second of all, we've been having a problem with Module talk:pl-IPA nki and ngi in final position, and my lua skills are too poor to understand what the problem is. Third, I was pestering @Derbeth on his talk page if we could get a bot to automatically add audios from the commons to pages without audio. I dunno if anyone else would be able to help. If not, I can be patient there. Finally, would it be possible for me to get a list of Polish pages WITHOUT IPA, so that I could clean that up? Vininn126 (talk) 16:39, 18 January 2022 (UTC)[reply]

The list of Polish lemmas without IPA information can be found here: [15] If you want to get those that have IPA but not one of the automated templates, simply remove the minus sign in front of the last "insource".
I can do the bot thing if you give me a couple of weeks, I wanted to submit a bot account for some German cleanup soon anyway. — Fytcha T | L | C 17:00, 18 January 2022 (UTC)[reply]
Awesome, thanks a ton. Vininn126 (talk) 17:31, 18 January 2022 (UTC)[reply]
@Vininn126 Just saw this. If it's a simple matter of replacing {{pl-IPA}} with {{pl-p}}, that can easily be done by bot. However I'd like to hear from others, e.g. User:Surjection, if there are additional complexities. When I created {{it-pr}} I bot-converted all uses of {{it-IPA}} to use {{it-pr}}, but it was more than just replacing the template; I had to incorporate existing uses of {{rhymes}}, {{hyph}} and {{audio}} into the template and verify that there weren't inconsistencies between the manually-specified rhymes/hyphenation and the new auto-generated ones. Benwing2 (talk) 04:22, 22 January 2022 (UTC)[reply]
@Vininn126 has corresponded with me suggesting that the new pronunciation template should always generate the correct hyphenation and rhymes so that no extra checking should be necessary, but I'm not entirely sure (nor was he as far as I could tell) if that applies to all entries. Audio templates would need to be integrated, though. — SURJECTION / T / C / L / 10:21, 22 January 2022 (UTC)[reply]
@Benwing2@Surjection I wrote on Fytcha's userpage explaining what things would need to be done, namely the bot should be able to check for respellings, homonyms, and audio, but NOT rhymes, as before we implemented {{pl-p}}, we were using a different method of notation for that, and using a bot would actually increase consistency. As for hyphenations - they should be covered by any respellings incorporated in the {{pl-IPA}} as it is. Vininn126 (talk) 11:43, 22 January 2022 (UTC)[reply]
@Vininn126 I read the text on Fytcha's userpage and what you wrote above and I'm a bit confused. Can you clarify, ideally by pointing to some existing pages, what "The bot should be able to include respellings, but should probably disclude hyphenations and rhymes." means? For example, if the bot encounters a Pronunciation section that includes both a call to {{pl-IPA}} and a call to {{hyph}} or {{rhyme}}, what should it do? Convert the {{pl-IPA}} and leave the others alone, convert and summarily delete the other template calls, convert and delete the other template calls after verifying that the auto-generated and manually specified hyphenation and/or rhymes match, ... ? Presumably the verifying step is not needed based on what User:Surjection said, but otherwise I'm not sure the correct action. What if there multiple calls to {{pl-IPA}}, or a call to {{pl-IPA}} with a qualifier of some sort added manually (using {{a}}, {{i}}, {{q}}, etc. or just some random text without such a template)? Thanks for any specifics you can give. Benwing2 (talk) 23:15, 22 January 2022 (UTC)[reply]
@Benwing2 What pages have multiple pl-IPA? I'd honestly be surprised if that were a thing. And if it were to encounter pl-ipa with a respelling in it, it should include that, if there's also hyph and rhymes, then yes, it should delete those. There will be some pages, like matematyka with two spellings, and one of them will have a q2=, that should easily be able to be added. Vininn126 (talk) 23:22, 22 January 2022 (UTC)[reply]
@Vininn126 These are hypothetical situations that I am guessing might occur. I don't know anything much about {{pl-IPA}} or {{pl-p}}, and in general, bot requests should be specific about what needs to be done, as the bot code needs to be able to do something on any page it visits (even if that something is to just flag the page as unfixable). From experience, pretty much anything you can imagine that might occur will occur somewhere. Benwing2 (talk) 23:26, 22 January 2022 (UTC)[reply]
Also, if you can give me more examples like matematyka that contain complicated situations (which can include pages you already converted manually), it would make my life a lot easier. Benwing2 (talk) 23:28, 22 January 2022 (UTC)[reply]
Fair enough. So, with that, I'll provide a few more details on this. Most pages will have just pl-ipa and nothing more, those can easily be just replaced with pl-p. Some will have audio, those should be absorbed into the pl-p box a=. Exceedingly few will have two audios, or perhaps more. Those should according be added through a1=, a2=, and either Audio 1/Audio 2 etc, or with whatever other qualifier was added. Some will have a respelling in the ipa, some might have two with one or more qualifiers, each having q1 or q2, those should be absorbed as well, with the same parameters. If there is existing hyphenation and rhymes, those should be deleted. If there is a homophone(s) listed, that should be absorbed with hh=, each separated by commas. Vininn126 (talk) 23:32, 22 January 2022 (UTC)[reply]
P.S. in reference to the two audios, adaptować is an example of a page (already converted, but) with two audios and separate text. A page like Riedel might have a respelling, which should be incorporated. Vininn126 (talk) 23:36, 22 January 2022 (UTC)[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── @Vininn126 I wrote a script to convert {{pl-IPA}} to {{pl-p}}. It went fairly fast as I already have similar scripts for Spanish and Italian. I ran it on all 66,591 pages that use {{pl-IPA}}. It generated 109 warnings of various sorts. These include things like multiple occurrences of {{pl-IPA}}, extra text of various sorts, misformatted lines, etc. (This is why I asked you to give me specifics of how to handle these sorts of edge cases ...) See User:Benwing2/pl-IPA-warnings. My bot refused to change these pages; they'll have to be handled manually. Benwing2 (talk) 05:15, 24 January 2022 (UTC)[reply]

@Vininn126 BTW are you *sure* you want all rhyme and hyphenation templates unilaterally removed? For example, the page quiz has a hyphenation template {{hyph|pl|quiz}} and a rhyme template {{rhymes|pl|is}} (plus an audio template). My bot's replacement {{pl-p|kłiz|a=LL-Q809 (pol)-Olaf-quiz.wav}} results in neither hyphenation nor rhyme. Benwing2 (talk) 07:27, 24 January 2022 (UTC)[reply]
There are many other similar pages: meeting, aria, Georgia, Schadenfreude, etc. @Surjection How come {{pl-p}} isn't smart enough to generate at least rhymes for these pages? Given the respelling, you should always be able to generate the appropriate rhyme regardless of whether the respelling equals the pagename. This is how {{it-pr}} (which generates pronunciation, hyphenation and rhyme for Italian terms) works. Benwing2 (talk) 07:32, 24 January 2022 (UTC)[reply]
@Benwing2 I will go through these warnings - 109 is much fewer than I was expecting. Also here is why I want them removed - on the vast majority of pages with them we were adding them in a way that we don't want anymore. Could you generate me a list of pages with respellings? It will probably be also relatively small, and adding h= and r= manually shouldn't be too much of a hassle. Also, thanks for the fast turnaround! Vininn126 (talk) 08:46, 24 January 2022 (UTC)[reply]
@Benwing2 I went through all of the errors and fixed the ones that needed if. If you run it again, hopefully you should only find letters and prefixes, which need {{IPA|pl}}, and should be ignored by the bot. Incidentally, would it be possible to have a bot add {{pl-p}} to all the non-lemmas without? It would have to check the lemma and see if there's a respelling, and if not, to add it, but if there is a respelling in the lemma, to skip it. Vininn126 (talk) 10:14, 24 January 2022 (UTC)[reply]
Thought: I suppose the hyphenation could be absorbed, but there's a chance it won't match up with the IPA. Vininn126 (talk) 10:57, 24 January 2022 (UTC)[reply]
@Vininn126 I will try to get to this tomorrow; today I was swamped due to RL work. What you are proposing for non-lemmas is something I already implemented in Russian, although the Russian code acutally propagates the non-standard pronunciation to the non-lemmas, which definitely takes more work. First I will focus on converting the {{pl-IPA}} uses; getting a list of pages that have respellings is easy enough. Benwing2 (talk) 06:40, 25 January 2022 (UTC)[reply]
@Vininn126 See User:Benwing2/pl-IPA-has-respelling. WARNING: There are 2,947 such pages. Benwing2 (talk) 06:52, 25 January 2022 (UTC)[reply]
@Benwing2 Okay, no rush. Thanks for the list! Vininn126 (talk) 09:16, 25 January 2022 (UTC)[reply]
@Vininn126 See User:Benwing2/pl-IPA-has-respelling-along-with-rhyme-or-hyph. This contains only the pages that have both respelling and either a rhyme or hyphenation template. This is a much smaller list; only 297 pages. I am going to start converting {{pl-IPA}} but leave these particular pages alone. Benwing2 (talk) 03:50, 26 January 2022 (UTC)[reply]
@Benwing2Thanks for the lists. Would it be possible to send the bot through both lists? The only thing I'll have to work on are the hyphenations of the respellings. If a page has both, the bot should be able to absorb them both, yeah? Vininn126 (talk) 12:01, 26 January 2022 (UTC)[reply]
@Vininn126 Done. Benwing2 (talk) 04:02, 28 January 2022 (UTC)[reply]
@Vininn126 FYI my script to convert {{pl-IPA}} to {{pl-p}} has finished. There are about 150 entries still using {{pl-IPA}}; almost all are single letters and prefixes/infixes, per your request to not convert these. (If you want these converted to {{pl-p}}, let me know.) You might want to check the pages that aren't single letters, prefixes or infixes that are using {{pl-IPA}} and fix them. Benwing2 (talk) 04:37, 28 January 2022 (UTC)[reply]
@Vininn126, Surjection BTW, something is broken in {{pl-p}} with pandemii. The generated pronunciation has three syllables but the syllabification has 4 syllables. Benwing2 (talk) 05:05, 28 January 2022 (UTC)[reply]
@Benwing2 Thanks for your time and work! It saved us a lot. Mazab IZW has been going through those pages with respellings. And it looks like Surjection already fixed it - we were having some issues with the syllabification recently. Vininn126 (talk) 11:23, 28 January 2022 (UTC)[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── @Vininn126, Surjection I am doing a run now to add {{pl-p}} to non-lemma forms where the corresponding lemma has {{pl-p}} without respelling. See User:Benwing2/pl-p-non-lemma-warnings for the warnings generated (1,786 of them). Most of them are due to lemmas with respellings. Note that there are only 141 distinct warnings issued, corresponding to 121 lemma pages with respellings, 10 lemma pages without {{pl-p}}, and 10 miscellaneous warnings of various sorts. One thing I notice is a large majority of the respellings are due to the syllabification algorithm not generating the right result by default. It seems to me many of these could be fixed by changing the algorithm. In particular I see a lot of respellings like a'graf.ka, na'ple.tek, prze.pro'wa.dzać, za'trud.niać. Most languages (including all the languages I've written syllabification algorithms for — Italian, Spanish, Portuguese, Latin, Russian, Ukrainian, Belarusian, etc.) group obstruent + l/r together by default when syllabifying; if Polish did the same it would very likely reduce the need for respellings. Other things I see a lot of are (a) spellings to keep s + obstruent together (o'skar.żać, po'sta.rzać) and (b) spellings to keep unlikely onset clusters apart, esp. C + k (pust.ka, rand.ka, trąb.ka, łącz.ka). Both are things that could be handled automatically. For (b), especially, onset clusters like tk, czk are possible but uncommon in Polish, and onset clusters like bk, dk are probably impossible. All of the issues I've mentioned are currently handled by at least the Spanish, Italian and Russian syllabification algorithms. It's true this might require occasional respellings in the opposite direction (i.e. respellings that aren't currently required), but overall it seems it should reduce the respellings, and you can use the template tracking mechanism to find such places (that's what I've done in the past esp. when changing the Russian pronunciation-generation algorithm). Benwing2 (talk) 23:09, 29 January 2022 (UTC)[reply]

@Benwing2Part of the problem is l/r but also prefixes in general. Whenever a word is prefixed, the prefix gets its own syllable, even when the start of the next word starts with a consonant cluster (containing a liquid or not). So adding liquid syllabification would fix some, but not all. Vininn126 (talk) 00:01, 30 January 2022 (UTC)[reply]
@Vininn126 Indeed. In the Russian pronunciation module we have a list of prefixes and try to do smart things with them, but the existence of both po- and pod-, na- and nad- etc. makes things difficult. But it still seems we could be smarter than we are now, including with certain prefixes (e.g. recognizing roz- as a common prefix). Benwing2 (talk) 00:11, 30 January 2022 (UTC)[reply]
@Benwing2 My knowledge of Lua isn't good enough to deal with that. However, adding the common Slavic prefixes ending with a vowel would be lovely (so z(e)-, roz(e)- wy-, nad(e)-, od(e)-, po-, prze-, przy-, u-, w(e)-, za-. Vininn126 (talk) 00:14, 30 January 2022 (UTC)[reply]

Request for Romagnol

[edit]

Is there anyone that want to help me with Module:rgn-pronunciation? For questions write down.--BandiniRaffaele2 (talk) 13:24, 20 January 2022 (UTC)[reply]

Why isn't the module working?--BandiniRaffaele2 (talk) 13:23, 22 January 2022 (UTC)[reply]

@BandiniRaffaele2 Hi Raffaele. Your module code has a lot of mistakes in it. There were 100 or so pages with errors in them that were caused by the module, so I disabled the module call in Template:rgn-IPA. Now, all pages that invoke {{rgn-IPA}} display a message like this:
This module is broken. User:BandiniRaffaele2 needs to fix it.
This isn't ideal but it's better than showing a Lua error. In general, you should not leave a module in a broken state like this. Benwing2 (talk) 23:19, 22 January 2022 (UTC)[reply]
If Raffaele isn't capable of writing correct module code, I don't see how this will help. Shouldn't the template simply be removed from those entries until it's able to produce correct output? —Μετάknowledgediscuss/deeds 04:58, 23 January 2022 (UTC)[reply]
@Metaknowledge Yes, ideally, although there are around 100 pages that Raffaele added the template to, so it's non-trivial to remove them all. My point is directed mostly at Raffaele: You shouldn't leave stuff half-finished with errors and expect others to clean up after you. Benwing2 (talk)

@Metaknowledge: I think, since the Romagnol pronunciation is akin to Italian, the module is almost ready now, with being simply the reproduction of the Italian module. I have only to distinguish some IPAs things (as the fact that the "r" is mute in the verbs ending in -êr, -ìr and -ér in Romagnol).-BandiniRaffaele2 (talk) 14:11, 24 January 2022 (UTC)[reply]

The plural is given as !, which means there's some wrong code in there. I'm assuming the editor meant plural unattested. What's the fix? Br00pVain (talk) 18:44, 21 January 2022 (UTC)[reply]

@Br00pVain: Maybe I've fixed it; thanks for the report!--BandiniRaffaele2 (talk) 18:49, 21 January 2022 (UTC)[reply]
@Br00pVain, BandiniRaffaele2, Fay Freak: I've changed it to plural unattested. It's a rare alt-form of a countable noun, so the plural not being attested is more a statistical phenomenon rather than a linguistic one. — Fytcha T | L | C 19:00, 21 January 2022 (UTC)[reply]
Surely, and as is well understandable to WF who created more English nouns than anyone else, I wanted to claim “countable and uncountable” and “plural unattested” at the same time, which is not illogical but in spite of a practice of people equating countability with the capability of having plural forms, which has been recently disputed, and the code probably has behaved differently then as it was just when @Benwing2 had revamped the code even explicitly stating in his commit message that an exclamation mark and a tilde can be combined. Fay Freak (talk) 00:31, 22 January 2022 (UTC)[reply]
@Metaknowledge, Fay Freak I see now that User:Metaknowledge undid my change to support both ! and ? together, at a time I was on a wikibreak, which probably broke the ! support. I'm still confused a bit why people are opposed to ? by itself displaying a message; if "plural unknown or uncertain" is wrong, some other text would be better than no text IMO, maybe "plural unspecified". For example, if we specify ? for the gender, it shows up as ? rather than without gender. Benwing2 (talk) 04:29, 22 January 2022 (UTC)[reply]
You added that text without asking what anyone thought, let alone getting consensus. I removed it with community consensus (discussion). I continue to oppose adding text of any kind, and I think that discussion reflects that many people oppose it. —Μετάknowledgediscuss/deeds 05:46, 22 January 2022 (UTC)[reply]
@Metaknowledge Fine, I will leave it alone and try to fix the current brokenness. I will note, however, that your comment sounds accusatory (to me at least), as if I was intentionally doing something controversial without seeking consensus. This was far from the case; I had no idea it was the least bit controversial, and it was just part of a more general cleanup (IIRC, someone asked me specifically to add simultaneous support for multiple qualifiers). IMO if leaving out a message corresponding to a specific qualifier is intentional, it needs a comment in the code explaining this. Benwing2 (talk) 23:04, 22 January 2022 (UTC)[reply]
It was a bit accusatory, although you didn't act with any ill will, so you didn't deserve that tone from me. I felt defensive, as I was being blamed for breaking something, when I had simply been trying to follow consensus (and had pinged you to fix it yourself). I think what you normally do is to post about what you intend to do in a forum like this, wait a couple days, and then do it if nobody has raised objections, and had you done that in this case as well, we wouldn't have to discuss it multiple times. —Μετάknowledgediscuss/deeds 23:59, 22 January 2022 (UTC)[reply]
@Metaknowledge Thank you, and I am sorry I missed your ping; I was off of Wiktionary for about 2-3 months during that period, and missed several pings that happened during that period. Benwing2 (talk) 05:42, 23 January 2022 (UTC)[reply]

Suggestion: a Gadget to Fix Google Domain Suffixes

[edit]

I've noticed a problem for some time where I can't see content when clicking a google link posted by someone from another country that the person who posted it can see just fine. This can be fixed by manually changing the url to "google.com" from "google.co.uk", for instance.

Some background:

Google divides the world up into country-level domains: in the US, we have google.com, the UK has google.co.uk, Australia has google.co.au, etc. I think this has something to do with differences in copyright laws- if a work is public domain in one jurisdiction, they can show more of it than in other jurisdictions that have it still under copyright.
The problem comes in when someone in one jurisdiction follows a hyperlink for the google domain in another jurisdiction, e.g. someone in the UK clicks on a link ending in google.com. Apparently google defaults to the least common denominator in such cases: it only shows what it permissible in the most restrictive jurisdiction anywhere. That means that if someone from the UK follows a link ending in google.com and someone from the US follows a link ending in google.co.uk, they won't be able to view the same content that's viewable in the US at google.com or the UK at google.co.uk

I would link to suggest a gadget that uses javascript to convert any google hyperlink on Wiktionary to the country-level domain of one's choice: if someone from Australia posts a link to books.google.co.au, the link in my browser will be to books.google.com, even though the wikitext still says books.google.co.au. I know just enough about javascript to be confident that this should be fairly easy- it's a simple string replacement in the url, and the code just has to be given the destination domain. If it helps any, it can simply ignore the "x" in google.co.[x] in the source domain, thus having to deal with only .com and .co.[x] (no humongous data tables).

I don't know js well enough to do this myself, so I'm throwing this open to anyone with the expertise and the right permissions @Erutuon, perhaps? Chuck Entz (talk) 06:20, 22 January 2022 (UTC)[reply]

Last two days for submitting proposals

[edit]

Tomorrow is the last day for submitting proposals for the Community Wishlist Survey 2022.

Also, everyone is welcome to translate, promote, and discuss proposals. SGrabarczuk (WMF) (talk) 14:45, 22 January 2022 (UTC)[reply]

Synonym template

[edit]

Just spitballing here, but would it be possible to get {{syn}} to work like {{also}}? As in, i have all of the same words separated by pipes, and it filters out the current page name? I'm not sure if that would cause any problems, but it would make adding long lists of synonyms to multiple pages easier. We'd have to consider how it interacts with q=, however. Vininn126 (talk) 15:31, 23 January 2022 (UTC)[reply]

@Vininn126 I think you're asking for adding the feature where {{syn}} ignores the current pagetitle if it's specified as a synonym? That can definitely be done. As for interacting with params, IMO the best thing to do is to use the new inline modifiers that I recently implemented (see the documentation of {{syn}}); this avoids the issue entirely of numbering params. But what I'd presumably implement is that ignored synonyms still count in the numbering, so if on page bar you say
  1. {{syn|en|foo|bar|baz|q3=partially}}
it displays only foo and baz but the qualifier is still associated with baz. Benwing2 (talk) 18:13, 23 January 2022 (UTC)[reply]
If that could be done, that'd be great. Is this something other editors might want? we primarily use this for Polish entries. Vininn126 (talk) 19:29, 23 January 2022 (UTC)[reply]
Thanks for bringing this up. I've definitely thought about something like this in the past on several occasions and would welcome it (the last case of which was Semesterstart). What would be even better in my opinion is to be able to have some sort of "deep inspecting" synonyms template: The synonyms would be listed in the most common word's entry (let's call it X) with the appropriate qualifiers, then the articles of the synonyms could contain something like {{syns of|de|X}} which would display all the synonyms with the right qualifiers etc. in the right order (kind of similar to how {{desctree}} inspects the referenced article). This would increase consistency immensely in the case where not every article of a synonym group properly references all other synonyms within the group (which is, like, most pages on Wiktionary...). Some care has to be taken of course when it comes to multiple senses etc. (maybe give the synonyms group an ID similar to {{senseid}}). — Fytcha T | L | C 19:41, 23 January 2022 (UTC)[reply]
Now that would be quite the project and tool. If it's doable, that would be very interesting. And would even categorize Thesaurus entries, in a way, sorta like we did with rhymes (due to me and Thadh's complaining). OFC it would be nice if we could apply this potentially to antonyms or even hyper- and hypo-nyms, but synonyms would probably be the most useful. Vininn126 (talk) 19:51, 23 January 2022 (UTC)[reply]
It's only convenient until someone removes things from one of the entries and they're out of synch. We need to decide how the module will deal with things going wrong like the landmarks used by the template in entry A being removed in entry B. Even worse would be someone switching things around so that the landmarks for one part of the section now point to another part- worse because there would be no obvious sign that anything is wrong. Having one entry dependant on another entry with no indicator of that in the other entry is asking for trouble, as the example of {{desctree}} illustrates. I'm usually the one who ends up fixing {{desctree}} when someone removes a Descendants section without checking for desctree pointing to it- I don't want to be fixing synonyms, too.
Another aspect that no one thinks about: in order to examine another entry, the module has to read the wikitext for the entire page into Lua memory. When we're dealing with complex pages similar to the one- or two-character spellings already in CAT:E, this can be enough to push memory usage over the edge. It's not a coincidence that a disproportionate number of the entries in CAT:E are CJKV character entries- many of the templates for those languages use this strategy.
I'm not saying categorically that it should never be done, but any implementation will need to address the potential for such problems. Chuck Entz (talk) 21:07, 23 January 2022 (UTC)[reply]
I say categorically that it should never be done. DTLHS (talk) 21:19, 23 January 2022 (UTC)[reply]

Why does that category appear empty? There are definitely tons of redlinks in translation boxes out there! While we're at it, can we get a category for Romanian too? — Fytcha T | L | C 23:04, 23 January 2022 (UTC)[reply]

polish too pls Vininn126 (talk) 23:50, 23 January 2022 (UTC)[reply]
My recollection is redlinks are enabled only with caution for Latin script languages because they put so much load on the system. Vox Sciurorum (talk) 00:25, 24 January 2022 (UTC)[reply]
There is an older BP/GP discussion somewhere. Wikimedia now offers full HTML dumps ("enterprise" dumps), we could make use of them to do exhaustive link checking for all languages. The two approaches currently in use don't scale (Category:* Redlinks) and are incomplete (User:Jberkel/lists/wanted). – Jberkel 10:59, 24 January 2022 (UTC)[reply]
IMO the old redlink system should be disabled entirely. — SURJECTION / T / C / L / 15:10, 24 January 2022 (UTC)[reply]
Wiktionary:Requests for deletion/Others#Template:redlink categorySURJECTION / T / C / L / 16:05, 24 January 2022 (UTC)[reply]
@Fytcha: The redlinks categories are implemented by a call to {{redlink category}} from a module that handles all links from linking templates everywhere. If you enable a language, that means that Module:redlink category is loaded and run by every instance of {{l}} and {{m}}, as well as for etymology templates like {{der}}, {{inh}}, {{cog}} that link to terms and for templates like {{alter}}, {{syn}}, {{inflection of}} and that's just scratching the surface. Calling a module multiple times every time every page with linking templates anywhere on Wiktionary is viewed is an extremely inefficient and wasteful way to generate lists of things that can only change when someone edits a page.
Pretty much all of the terms in the exclusion list at {{redlink category}} have been added due to their showing up in CAT:E with out-of-memory errors. Adding an entry to this list is the first thing I do to address new out-of-memory errors, and generally it solves the problem without doing anything else. That should tell you something. Chuck Entz (talk) 15:51, 24 January 2022 (UTC)[reply]

Split template frontend modules

[edit]

e.g. Module:links/templates, Module:etymology/templates, etc. (others?) to have only one template per submodule. Since every code transclusion uses some memory, having something like {{der}} get the code for something like {{desc}} every time is very wasteful. — SURJECTION / T / C / L / 15:12, 24 January 2022 (UTC)[reply]

@Surjection I am in favor of this. Although keep in mind that I have tried splitting modules before and sometimes the memory actually goes up; Lua memory usage is a bit of a mystery. Benwing2 (talk) 06:42, 25 January 2022 (UTC)[reply]

A very common function that would benefit from splitting, but I'm not sure what the best approach is. There is all kinds of kludge here (replacing spaces with newlines for vertical scripts, removing hyphens for Korean text (why?), tracking...). IMO this function should be simple as possible so getting rid of all three things and moving the remaining small and tidy function to a submodule could do wonders. — SURJECTION / T / C / L / 14:47, 25 January 2022 (UTC)[reply]

The deal with Korean hyphens is that we want Korean links containing hyphens to display in the romanisation, but not in the Korean text. —Μετάknowledgediscuss/deeds 16:07, 25 January 2022 (UTC)[reply]
If it only applies to links, shouldn't it be in link-related code? In fact, doesn't Korean have a custom link template {{ko-l}} in the first place? — SURJECTION / T / C / L / 18:08, 25 January 2022 (UTC)[reply]
@Surjection I put the Korean-related code there on request of User:Tibidibi. This was the only place to put it that worked given the way they wanted the display to work. It should work not only in {{ko-l}} (and not only in links, I think) but in all displayed Korean text. I wrote the code carefully to avoid increasing memory usage. Benwing2 (talk) 03:47, 26 January 2022 (UTC)[reply]
@Benwing2 IMO there should be a better place to put it, but I guess that can be looked into later. In the meantime, was Old Korean specifically exempted or why can't it check if the script code is Kore instead of comparing through a list of possible language codes? — SURJECTION / T / C / L / 10:08, 26 January 2022 (UTC)[reply]
@Surjection I did not specifically exempt Old Korean. It looks like I originally wrote the code only to address Korean, and User:Erutuon added Middle Korean and Jeju. As for using script code Kore, as long as findBestScript returns Kore for both Hangeul and Hanja text in Korean, this should probably work fine. I see that Korean also has Braille as a possible script but I doubt we have to worry about this edge case. Benwing2 (talk) 04:30, 28 January 2022 (UTC)[reply]

Restoring Template:q, Template:gloss, Template:sense, Template:non-gloss definition etc. back to non-Lua variants

[edit]

At some point in 2017, templates like {{q}}, {{gloss}}, {{sense}} and {{non-gloss definition}} were converted to Lua. I recently created non-Lua versions {{q-lite}}, {{gloss-lite}}, {{sense-lite}} and {{n-g-lite}} initially for the purpose of using them on pages which kept running out of memory. However, these templates serve the exact same purpose and can achieve the same task without using any Lua, so I don't see why the original templates were converted in the first place. IMO this conversion should be reversed, as we should try to jettison as much Lua usage as possible (to have a chance of getting rid of the memory errors) and this to me seems to be in hindsight a perfect example of overreliance on Lua, and the -lite variants should be merged back into the original templates. The only problem is posed by {{q}} supporting arbitrary many qualifiers, which is not possible with pure template syntax, but it should be possible to, for example, check for the 11th parameter and fall back to Lua and otherwise use a plain wikitext template version. The other templates only accept a single parameter to begin with. @Jberkel, JohnC5 as the people who seem to have carried out the conversion to Lua. — SURJECTION / T / C / L / 15:23, 25 January 2022 (UTC)[reply]

Hmm, that was in 2017, I think memory problems weren't that acute back then. Yes, we should simplify them, and these templates are high-usage. Regarding {{q}}, writing something which requires 11 qualifiers seems like a problem in itself. – Jberkel 15:52, 25 January 2022 (UTC)[reply]
Yes, this seems reasonable to me for such simple, high-frequency templates. I wish Lua were more efficient, but that's not a reasonable goal. —*i̯óh₁n̥C[5] 18:16, 25 January 2022 (UTC)[reply]
As the documentation of {{non-gloss definition}}} says, it has the Lua-supported functionality of converting bare links to English-section links, which I guess is superfluous as one would use {{l}} or {{m}}, perhaps more exactly with IDs, when this would be needed, and English sections are on top anyway, and one might want to link translingual or even a foreign language instead, as in the particular constellation of palo santo, where the conversion bricks specific links to other sections. Fay Freak (talk) 21:21, 25 January 2022 (UTC)[reply]

I think {{IPA}} could default to a simpler template too, in a large number of cases. I tried with {{IPA-lite}} but I couldn't get it to reduce memory on single-letter entries as much as I thought it would. Cases where the full Lua module is required would be:

  • When the number of transcriptions exceeds the number hardcoded.
  • When the language is one of the (relatively few) that has syllable-counting enabled.
  • When any qualifiers or references are used (this might be able to be implemented in template-code rather than Lua, but I did not attempt it)

In other cases, I think Lua can be avoided entirely. 70.172.194.25 01:31, 26 January 2022 (UTC)[reply]

Also, I apologize for making a lot of templates and then asking for them to be deleted (although I think IPA-lite and IPAchar-lite should be kept in case anyone wants to take a stab it). I was careful along the way to not break anything, and I believe everything is exactly the same as before I started. I do not plan to attempt any related project again; this is a very frustrating problem to try to chip away at, and it seems like every time you re-save a page the placement of the first Lua error is at a different random spot. 70.172.194.25 01:46, 26 January 2022 (UTC)[reply]

Wikipedia boxes on narrow screens

[edit]

The Wikipedia boxes {{wikipedia}} completely mess up the layout on mobile: [16] We should fix this, not only because it looks atrocious but also because people are trying to fix it themselves: diff (seen it multiple times) — Fytcha T | L | C 21:42, 26 January 2022 (UTC)[reply]

The project boxes also don't work well at the top of an entry page with a table of contents, especially a long one. I use inline project templates in such cases. DCDuring (talk) 22:31, 26 January 2022 (UTC)[reply]
Unfortunately these ugly and pixel-y boxes have many fans here. Maybe something could be done with CSS to collapse them on small screens? – Jberkel 00:31, 27 January 2022 (UTC)[reply]
IMO the big box looks better on a computer, where the small box is too easy to miss, as is the "put an inline link at the bottom of the page" approach. But like Fytcha, I see users add brs because the big boxes look bad on a phone. I think the overall solution has to be making entries display differently on mobile vs desktop, because it's not just Wikipedia boxes, things like images have the same page-smooshing effect on mobile, and I see people remove them on that basis, but images are useful and shouldn't be sacrificed just because our mobile site sucks atm. (Our mobile site also has way too much empty space, e.g. after the headers. Could we have the content begin immediately after and on the same line as the header, like "Etymology: From Old French..." and Noun: foobar plural: foobars"?) Could we make the Wikipedia box display on mobile as a simple link? Could we make images also display as just a link someone can optionally click to load? - -sche (discuss) 08:56, 30 January 2022 (UTC)[reply]

CAT:E is empty!

[edit]

...for now at least - the Chinese module documentation page comes and goes. I mainly blame Surjection for this highly irregular scenario ;) This, that and the other (talk) 13:56, 27 January 2022 (UTC)[reply]

I can't remember the last time this happened. There are a couple of borderline cases that have popped in and out since I first noticed this yesterday, but the fact that it could happen at all is good news. The Chinese module and its documentation page are due to intermittent processor-time issues rather than memory, so they can be ignored (they generally clear with null edits, anyway).
Of course, there have been improvements before that faded due to a general upward trend in memory usage- it's been varying within the range of about 4 and about 20 for a few years as increased efficiency and increased bells and whistles have fought each other. We'll have to see how long this holds up. Chuck Entz (talk) 15:25, 27 January 2022 (UTC)[reply]

Are there any remaining uses of |lang= or is that otherwise a salient problem? This is quite a resource hog in its current form and it'd probably a good idea to get rid of it if possible. — SURJECTION / T / C / L / 20:45, 27 January 2022 (UTC)[reply]

Considering that this calls {{deprecated code}}, which adds Category:Pages using deprecated templates to pages, and that the aforementioned category is empty, it looks like there are no remaining uses of |lang= to remove. 70.172.194.25 20:57, 27 January 2022 (UTC)[reply]
@Surjection It is purely template code, so I'm not sure why you believe it is a resource hog. Do you have specific numbers for this? The reason why it is present is to allow historical versions of pages that use |lang= in place of |1= to show in a reasonable form while still discouraging people from using |lang= and allowing us to track any cases where |lang= is being used. For awhile the Category:Pages using deprecated templates category regularly filled up and had to be emptied. Perhaps this is no longer the case and people have been trained to use |1=; I'm not sure. Benwing2 (talk) 04:21, 28 January 2022 (UTC)[reply]
@Benwing2 The checks are indeed entirely template code and I don't think it uses any Lua except to categorize. It being a resource hog is easy to see if you check the page report on larger pages; it is usually in top 3, often on the top spot, when it comes to transclusion time. On a, it and {{deprecated code}} together take up about 20% of time:
Transclusion expansion time report (%,ms,calls,template)
100.00% 8372.204      1 -total
 10.75%  899.950    174 Template:check_deprecated_lang_param_usage
 10.28%  860.407    174 Template:deprecated_code
  8.17%  683.865    124 Template:IPA-lite
  7.24%  606.177    103 Template:head
  7.22%  604.791     32 Template:cite-book
  7.17%  600.236     36 Template:cite-meta
  6.45%  540.362    118 Template:l
  6.32%  529.336    149 Template:redlink_category
  6.24%  522.828    117 Template:inh
which seems like a waste considering how little the template actually does these days. I doubt there are any remaining uses of |lang=, but we could check anyway, and then get rid of it and stop treating |lang= exceptionally on modules as well. — SURJECTION / T / C / L / 09:35, 28 January 2022 (UTC)[reply]
It has to be observed that this is a measure of time, not memory. Is there some reason why we should be worried about transclusion time? Presumably this template is at the top of the list because various slow template calls are wrapped in it, although I wouldn't know how to verify that.
As for removing the template, as Benwing alluded to, its only remaining reason for existence is that it allows old revisions to be rendered at least somewhat readably. That's the only functionality left in the template that's worth preserving. This, that and the other (talk) 12:11, 28 January 2022 (UTC)[reply]
This is indeed a measure of time, and I noticed this when I was keeping an eye on parser statistics while trying to track down way to cut down on Lua memory usage. The thing about transclusion time of the inner templates being included in a good one, and seems to mostly hold true - the time consumed by {{check deprecated lang param usage}} specifically appears to only account for a few hundred milliseconds in total on a, the same page I took the statistics above from. Transclusion time may end up mattering in the end if one day (perhaps in 20 years, or maybe that is still too optimistic) Lua GC on memory limit actually becomes a thing. Beside the transclusion time, {{check deprecated lang param usage}} also has a significant impact on the post-expand include size and template argument sizes, the former of which is already over 50% of its maximum limit but neither of which is urgent right now. — SURJECTION / T / C / L / 10:41, 29 January 2022 (UTC)[reply]
@Surjection I have no objection to removing compatibility with |lang= (thereby making old revisions less readable), although many e.g. User:Chuck Entz seem to like being able to have old revisions somewhat readable. I agree with your assessment that we can revisit this if/when it becomes more of an issue. One possibility would be to make -lite versions of templates for use on large pages, as you've already done in various cases, that dispense with things like {{check deprecated lang param usage}}. Benwing2 (talk) 01:37, 30 January 2022 (UTC)[reply]

Change code of Toki Pona

[edit]

This week Toki Pona received a valid ISO code: "tok". This can be verified here and here. Please change "art-top" to "tok" and move it from Module:languages/datax to Module:languages/data3/t. Robin van der Vliet (talk) (contribs) 12:59, 28 January 2022 (UTC)[reply]

Judging by insource:"art-top" there are ~280 pages to be updated as part of doing this. - -sche (discuss) 08:28, 30 January 2022 (UTC)[reply]
I am willing to update all those pages. Maybe we can first add it to Module:languages/data3/t, and remove it from Module:languages/datax after all those pages are updated. Robin van der Vliet (talk) (contribs) 00:13, 31 January 2022 (UTC)[reply]

@-sche: Can you add the following piece of code to Module:languages/data3/t:

m["tok"] = {
	"Toki Pona",
	36846,
	"art",
	Latn,
	type = "appendix-constructed",
}

After that I will update all the pages. And after that, the old code can be removed from Module:languages/datax. Robin van der Vliet (talk) (contribs) 23:01, 2 February 2022 (UTC)[reply]

@Robin van der Vliet: Done Done. Please fix the pages as soon as possible, and let me know when you have finished. —Μετάknowledgediscuss/deeds 23:43, 2 February 2022 (UTC)[reply]
@Metaknowledge Done Done. Robin van der Vliet (talk) (contribs) 00:00, 3 February 2022 (UTC)[reply]
@Robin van der Vliet: Thanks. Your ping didn't deliver, which is probably because you have a template as your signature instead of using a real signature. (It's also against our semi-policy at WT:Signatures.) —Μετάknowledgediscuss/deeds 00:04, 3 February 2022 (UTC)[reply]
Thank you for your help! The final thing that needs to be done is updating Module:languages/extradatax and Module:languages/extradata3/p. After that the old code is completely gone. Robin van der Vliet (talk) (contribs) 00:11, 3 February 2022 (UTC)[reply]

Wikipedias of dead languages

[edit]

Could some module wizard change the category code so the top level language category (e.g. Category:Ottoman Turkish language) does not link to the language's Wikipedia if the language has extinct=1? These links are permanently red (e.g. ota.wikipedia.org for Ottoman Turkish) or go to sites written in a constructed language sharing the name of a dead language (e.g. ang.wikipedia.org for Old English). Vox Sciurorum (talk) 20:55, 28 January 2022 (UTC)[reply]

I don't know why it links to a Wiktionary that doesn't exist. @Surjection may know. As for your proposal, we link to all sister Wiktionaries, even if we happen to personally dislike them. If you think that ang.wiktionary.org shouldn't exist, you should take that up on Meta. —Μετάknowledgediscuss/deeds 23:49, 2 February 2022 (UTC)[reply]
I am talking about Wikipedias, not Wiktionaries. Vox Sciurorum (talk) 23:53, 2 February 2022 (UTC)[reply]
@Vox Sciurorum User:Metaknowledge is right; I don't see any links to ota.wikipedia.org on the Ottoman Turkish language page, only a red link to ota.wiktionary.org. Omitting a red link would be ideal but I don't know if there is a technical way of doing this. There are various not-so-well-documented extension modules mentioned in the MediaWiki site; maybe one of them lets you determine if a given language Wiktionary exists. Benwing2 (talk) 03:38, 3 February 2022 (UTC)[reply]
This can be accomplished via mw.site.interwikiMap. I have written up an example function to accomplish this: Module:Sandbox/42.
  • {{#invoke:Sandbox/42|doesWiktionaryExist|tr}} => true
  • {{#invoke:Sandbox/42|doesWiktionaryExist|ota}} => false
Hopefully it won't use up too much memory given that it would only be used on language categories, which are very short pages. 70.172.194.25 02:31, 15 February 2022 (UTC)[reply]

As nobody else had made it, Wonderfool created Template:quote-twitter. They used it at whereupon for a Donald Trump tweet. As expected, it totally fucked up the page and was a total failure. Hopefully in the long run it'll be fixed by someone else Br00pVain (talk) 22:22, 28 January 2022 (UTC)[reply]

Komering

[edit]

The Komering language needs Latin and Arabic in its category. --Apisite (talk) 04:12, 29 January 2022 (UTC)[reply]

A clean-up tool request

[edit]

I've been slowly working on cleaning up all the Polish entries, and I was wondering if I could request a very specific java tool. Currently, the two biggest categories needing cleanup are pl-adj or pl-adv without comparative form and pl-adj without corresponding adverb, and the current headers of words within these categories might be {{pl-adj|-}}, {{pl-adj|adv=-}}, or {{pl-adj}}, or even {{pl-adv}}, when they should have something in there. The vast majority of them are relative adjectives, which do not compare and do not get adverbs, so having a button add the missing parameter to make it look like: {{pl-adj|-|adv-}} or {{pl-adv|-}} would be very useful. Of course this wouldn't be used on every page, but would make my life a lot easier. Vininn126 (talk) 00:44, 30 January 2022 (UTC)[reply]

@Vininn126 I can't help you with JavaScript; maybe User:Erutuon could. But an alternative is to make a list of all the adjectives and adverbs without comparatives, and all the adjectives without adverbs, and given them I can easily do a bot run to add the needed params. To help you with this, I made two pages User:Benwing2/pl-adj or pl-adv without comparative form and User:Benwing2/pl-adj without corresponding adverb; put a * or other consistent mark by each page that needs the params added and let me know when you're done and I'll do the bot run. Benwing2 (talk) 01:20, 30 January 2022 (UTC)[reply]
@Benwing2 I could potentially generate a list of ones the bot should add -'s to, but I'd feel much more comfortable doing it by hand with semi-automation (like with etyl clean-ups), as I might have to check which have adverbs after all and that point I might as well just do it by hand. Vininn126 (talk) 11:10, 30 January 2022 (UTC)[reply]
@Vininn126 OK. The other possibility is what I normally do when I need to do a lot of similar edits, which is to load all the pages in question into a file and edit it en masse in a text editor. This is what I did yesterday with relational adjectives and makes it easy to do things like search and replace across several pages. (Technically what I do is keep track of the original as well as the changed file, so I can merge in any changes other people have made to the same pages in the meantime.) The file of all Polish adjectives is 3.6MB, which is unfortunately too big to paste into a userspace page, but I could email it to you or maybe upload it to Commons, maybe as a compressed file (it's about 400KB bzip2-compressed), if Commons allows arbitrary files to be uploaded. Or you could just do whatever you do currently when you refer to "semi-automated" (is this using JWB or something?). Benwing2 (talk) 23:54, 30 January 2022 (UTC)[reply]
@Benwing2 I am currently entering these manually, what I meant is it would be nice to have a java-tool like Eru's etyl clean up tool that provides a button. You could potentially email the file (check my userpage). Vininn126 (talk) 23:56, 30 January 2022 (UTC)[reply]
@Vininn126 I emailed you a message. I created a file of just those pages that occur in either pl-adj or pl-adv without comparative form or pl-adj without corresponding adverb; there are only 1,918 pages among the two categories since there is a lot of duplication. Benwing2 (talk) 00:21, 31 January 2022 (UTC)[reply]

Get language name to language mapping from

[edit]

Hi, I am parsing a Wiktionary dump from a file, and in the "Translations" part there is a list of translations to different languages.

The items are formatted as first with the name of the language in English, and then the translated word in that language.

Where can I find the mapping between the name of the language and its language code?

For example, with the Hebrew wiktionary article בית (home) the translation to English:

* אנגלית: {{ת|אנגלית|home|house}}

Appears, where "אנגלית" is English, but nowhere does the 'en' language code appears.

I want to automate parsing different dumps from different languages and I don't want to manually map language name to language code in every language, is there a convention? Thanks a bunch! — This unsigned comment was added by Alonmln (talkcontribs).

There is no standard across different language editions of Wiktionary on how language codes are represented. — SURJECTION / T / C / L / 21:57, 31 January 2022 (UTC)[reply]
I can only speak for the standards here at English Wiktionary. If you go to Wiktionary:List of languages, you will find a list of all the language codes and languages we recognize, and Wiktionary:Language treatment explains our reasoning for many of the choices we've made.
You should be aware that there are serious disagreements about what language code to use in some cases. For instance, we treat Bosnian (bs or bos), Croatian (hr or hrv), Montenegrin (cnr) and Serbian (sr or srp) as all one language: Serbo-Croatian (sh). I'm sure folks at those Wiktionaries consider us clueless barbarians- one of our own admins (not from that part of the world, either) even accused us of linguistic genocide.
Then there are the exception codes, necessary because the people who set the standards haven't gotten around to assigning a code for every language out there. In such cases we take an existing language or family code and add some more letters at the end. for instance, we made the language code art-top for the constructed language Toki Pona, which now finally has its own code, tok. I'm sure other Wiktionaries have their own systems for these. Chuck Entz (talk) 05:12, 1 February 2022 (UTC)[reply]
Oh that's unfortunate. Can you think of a way to find the relevant language page in different wikis? Even manually Alonmln (talk) 17:16, 1 February 2022 (UTC)[reply]
@Alonmln The only thing I can think of is e.g. if the Hebrew wiki uses language names, you can check the pages thereby referenced and see which other-language wikis have entries by that name, and correlate the two of them. For example, if you don't know what אנגלית means, you can check all the אנגלית pages and see which-language wiki contains the most such pages, and it's probably the language associated with אנגלית. This kind of approach definitely requires some manual intervention but it is better than nothing. If you want to be more sophisticated you can use some sort of bipartite graph matching algorithm, but that might not be necessary. Benwing2 (talk) 02:43, 2 February 2022 (UTC)[reply]