Jump to content

Wiktionary:Grease pit/2023/March

From Wiktionary, the free dictionary

Some templates not showing up in linked Category?

[edit]

I recently made some new templates which should be showing up in Category:Gujarati noun inflection-table templates, but for some reason only a few of them are appearing on the Category page. Namely, the ones that are showing up correctly are {{gu-ndecl-f-table}}, {{gu-ndecl-table}}, and {{gu-ndecl-f}}, while the ones not showing up are {{gu-ndecl-m}}, {{gu-ndecl-n}}, {{gu-ndecl-um-v}}, and {{gu-ndecl-um-c}}. I placed

<includeonly>
[[Category:Gujarati noun inflection-table templates]]
</includeonly>

at the bottom of each documentation page, which appeared to be a good method based on some other templates I looked at. I'm a little stumped at this point, I can't see anything else being done differently between templates that would make them behave differently. Does anybody have any ideas on what could be happening? – Guitarmankev1 (talk) 15:16, 1 March 2023 (UTC)[reply]

Transclusion of categories between templates is weird. When you add a category to a transcluded template, it is displayed on the template that transcludes it, but the transcluding page won't be added to the category until it's edited. If you do a null edit (clicking edit, then clicking publish changes without changing anything) on the transcluded page, it will take care of that. Chuck Entz (talk) 15:33, 1 March 2023 (UTC)[reply]

Unsupported titles, part 3

[edit]

Can someone add :| to MediaWiki:Gadget-UnsupportedTitles.js? Thanks. Binarystep (talk) 03:13, 2 March 2023 (UTC)[reply]

@Binarystep: Yeah, added it to MediaWiki:Gadget-UnsupportedTitles.json. — Eru·tuon 04:59, 2 March 2023 (UTC)[reply]

Login / notification oddity

[edit]

I don't know if this is something anyone here knows the answer to or has also experienced, or if I should post on Phabricator, but:

  1. If I log in to en.Wiktionary and then browse over to simple: or de: or another Wiktionary, I'm also logged in there. If I go over to Commons, it takes a split second but then it swaps out the not-logged-in header text (with the "log in" button etc) for the logged-in header text with my notifications etc. But if I browse over to en.Wikipedia, I have to log in there separately.
  2. Once I'm logged in on any site other than en.Wiktionary, whether automatically (if I log in manually on en.Wiktionary and then go over to fr.Wiktionary etc and am automatically logged in there) or manually (on Wikipedia, where I have to log in separate even if I already logged in to en.Wiktionary), my notifications on that site will include all my notifications from any wiki. On en.Wikipedia or fr.Wiktionary, it'll show me where I've been notified, pinged, thanked, etc not only on those sites but on en.Wiktionary, other Wikipedias, etc. But on en.Wiktionary, I only see notifications for pings and things that happen on en.Wiktionary.

This has been the case for as long as I can remember and it's clearly never bothered me enough to do anything about it. I suspect the fact that en.Wiktionary is my "primary" account from the point of view of the software/site might have something to do with it. But my en.Wiktionary and Wikipedia accounts are connected: when I'm on Wikipedia it notifies me if I've been pinged on en.Wiktionary, and if I log out on either site it logs me out on the other site (and all wikis). - -sche (discuss) 22:31, 2 March 2023 (UTC)[reply]

"But on en.Wiktionary, I only see notifications for pings and things that happen on en.Wiktionary."—Might be obvious but do you have cross-wiki notifications enabled in the notifications tab under preferences? For point 1, your observations are similar to mine, I've noticed needing to log in separately on en.wp as well. —Al-Muqanna المقنع (talk) 22:35, 2 March 2023 (UTC)[reply]
Aha, I didn't!
I'm intrigued why Unified Login only unifies login on some of the accounts/wikis it links; skimming meta:Help:Unified login, nothing jumped out. - -sche (discuss) 23:02, 2 March 2023 (UTC)[reply]
On one occasion I had to suspect it has to do with sequential loading and race conditions, as it happened for two editors on one particular page on Wiktionary even. Fay Freak (talk) 04:47, 4 March 2023 (UTC)[reply]

Categorising Pali inflection modules

[edit]

Template {{autocat}} happily creates Category:Pali inflection-table templates. Is there potentially a similar category for modules? We currently have about a dozen Pali modules supporting inflection. May we therefore have a similar category for modules. We could potentially eliminate most of them, though I think it would upset @Octahedron80, who has found external uses for them, and we might actually want a Thai-specific tables for annotating Thai and Lao script masculine locative singulars in SARE UE. --RichardW57m (talk) 13:56, 3 March 2023 (UTC)[reply]

Do you mean something more specific than Category:Pali modules? Benwing2 (talk) 21:39, 3 March 2023 (UTC)[reply]
In fact somewhere among all my bot files is a half-completed overhaul of the module categorization code which I think adds 'LANG inflection modules' as a subcategory of 'LANG modules'. Benwing2 (talk) 21:40, 3 March 2023 (UTC)[reply]
When I try using {{auto cat}} for 'Category:Pali inflection modules', I just get an error message. Should I just start populating the category and wait? Should I create the category page as well? --RichardW57 (talk) 22:19, 3 March 2023 (UTC)[reply]
In general you need to wait until the proper code gets written. In this case I went ahead and made inflection modules auto-categorize into 'LANG inflection modules'. Depending on the module suffix, it may or may not be auto-recognized as an inflection module. If it's not, add a call to {{module cat}} on the module's doc page, like this:
{{module cat|type=Inflection}}
See Module:it-verb for a full example of using {{module cat}}. Benwing2 (talk) 06:18, 4 March 2023 (UTC)[reply]
Thanks. I've now put the modules used for generating inflection tables into cat:Pali inflection modules, though I don't understand how unused (at least on Wiktionary) modules Module:pi-decl/noun/Thai-var1 and Module:pi-decl/noun/Mymr-var1 sneaked into the category. (They are orphaned because of the lack of consensus on how to add variant, indeed, splintered, writing systems.) Would it be because Module:pi-decl/noun/Thai and Module:pi-decl/noun/Mymr have been put in the category? Or did I just not wait long enough for the others to be automatically added to the category without invoking {{module cat}}? RichardW57 (talk) 16:49, 4 March 2023 (UTC)[reply]
@RichardW57 It appears that there's autodetection of module types based on the name of the module, but it works only if there's no documentation page. The autodetection chops off anything starting with a slash and then looks for a name consisting of a language code, hyphen and certain suffixes, of which decl is one. IMO it should be changed to autodetect also when there's a documentation page but have some way of overriding this (e.g. if there's a call to {{module cat}} on the documentation page). Benwing2 (talk) 20:57, 4 March 2023 (UTC)[reply]

Looking for advice about a {{quote-comic}} template

[edit]

I've been adding quotes from comic strips to various entries lately, variously using {{quote-book}}, {{quote-journal}}, and {{quote-web}} depending on which I think best suits the task at hand. There could also be a call for {{quote-av}} to reference animated strips hosted on YouTube or similar sites. However I wonder if there's room for a new quote template designed as a cover-all for all comics, whether they are in print newspapers, books, on the web, or in a video. Often, a comic strip will be published in more than one medium ... for example, even in this day and age, publishing a print book is often a goal for many webcomic artists, and it would be helpful to be able to reference both a book and an electronic version of a strip in the same template.

My idea for this template is to mostly base it on quote-book, with a few additional parameters taken from the other templates, some unnecessary parameters removed, and some re-ordering of the parameters we keep.

I would like to say more, but I first want to get the community's opinion on whether this is a good idea. If we agree to go forward with this, then I will go into details about what I would most like to see, and will ask for help with the coding of the parameters. Best regards, Soap 17:50, 4 March 2023 (UTC)[reply]

@Soap: my question would be whether there are special parameters required for comics that are not already available in {{quote-book}}, {{quote-journal}}, or {{quote-av}}. If not, I don't see the point of another quotation template. It would be better to quote a comic in book form using {{quote-book}}, in video form using {{quote-av}}, and so on. — Sgconlaw (talk) 19:38, 4 March 2023 (UTC)[reply]
I dont think I'd need any parameters that are not available in those templates, no, but then, I can't use three templates for one quote, either. To be honest, I was hoping for an answer more like Of course, there's always room for another template, just so long as you're going to use it, since I see other people creating templates on their own. but now I'm on the spot and not good at explaining myself. Basically a separate quote template would do two good things .... 1) allow us to give all the relevant parameters, not just most of them .... 2) to align the parameters in a more intuitive order, ... for example, putting (comic) after the name of the strip instead of always forcing it to the end of the line, after the name of the publisher or journal.... I can get around that by just not giving those parameters, but again, a separate comic template would provide the best solution.
Webcomics are probably more popular today than print comics, although as I said, buying a print book is a common way to show appreciation for an artist who shares their work online for free, so many webcomics are print comics. A separate comic template would help with that as well, as we could use some parameters from {{quote-web}} that aren't available on the other templates, without giving up the parameters that are only available on quote-book. I ended up having to go with quote-book even for a webcomic recently just because there was no other way to indicate the name of the comic, for example, but this artist does not have a print book, so I feel wrong about it.
Im not so wedded to this idea that i'll just give up adding quotes if we decide not to create a new template, but I do think it would help us out, and that once people see it, it wouldnt be just me using it. Soap 05:43, 5 March 2023 (UTC)[reply]
[edit]

Before this week, and probably even more recently, a transliteration supplied to a link template, such as {{link}}, would provide a clickable link as the transliteration. This no longer happens. For example, {{link|pi|เทวะ|tr={{l|pi|deva}}}}, which yields เทวะ (deva), would supply link to both the Thai script lemma and the Roman script lemma. This no longer happens. Why? It's broken links in most non-Roman script inflected forms. --RichardW57 (talk) 18:39, 4 March 2023 (UTC)[reply]

@RichardW57 I'm guessing a side effect of some change made by User:Theknightwho. Let me take a look. Benwing2 (talk) 21:03, 4 March 2023 (UTC)[reply]
@RichardW57 User:Theknightwho made a bunch of changes today to Module:links in the area that handles transliterations so I'm guessing one of these changes broke this. User:Theknightwho can you do a better job of making sure all your changes have associated changelog msgs? Otherwise I have to guess when looking at these changes. For example in this change [1] you added parens around a bunch of function calls with no changelog message; I'm guessing that's because you changed the functions in question to return multiple values, but most people won't understand that. (BTW IMO this use of multiple values is a bad idea; Lua's handling of multiple values is terrible and leads to tons of bugs. I would much prefer you not change basic functions like makeEntryName() to return multiple values, but instead create new multi-valued versions of those functions and leave the existing functions single-valued.) Benwing2 (talk) 21:13, 4 March 2023 (UTC)[reply]
Correction: It's broken links in most Pali non-Roman script inflected forms. --RichardW57 (talk) 21:16, 4 March 2023 (UTC)[reply]
Also I don't understand in general the gist of all the changes you're making, which makes it very difficult to debug them when something goes wrong. Can you please create a page outlining (a) your overall vision regarding the changes you've already made and the ones you're planning on making, (b) detailed information on how those changes are going to be made, including which modules need changing and how. Otherwise you risk losing people's trust in your ability to make changes without breaking everything. Benwing2 (talk) 21:19, 4 March 2023 (UTC)[reply]
As an example, there are currently 346 pages in CAT:E, most of which are due to some breakage in Module:ja-ruby. This module hasn't been changed in several days, so I'm guessing something you changed recently has broken this, but as is I have absolutely no way of debugging this because I don't know what changes you've been making and why. Benwing2 (talk) 21:23, 4 March 2023 (UTC)[reply]
@RichardW57 @Benwing2 The issue in relation to Pali isn't really a bug as such, as it's down to the fact that Pali doesn't have link_tr enabled in Module:languages/data/2; the changes I made worked on the assumption that we wouldn't (and shouldn't) be including links in transliterations for languages which don't have that, as it was a necessary prerequisite for getting embedded links to work in linked transliterations. This, in turn, made it straightforward for Module:cmn-translit and Module:yue-translit to scrape embedded links from the respective pages (and paves the way towards having the full functionality of {{zh-x}}, which works on a similar principle, but to a more limited degree).
I'm inclined to agree that we should probably have a different solution other than returning multiple values, as it's proven a lot more awkward to work with than I anticipated (mostly due to the number of pre-existing uses that I had to patch up). I'll have a think about the best way to go about changing this.
Just as an FYI, I really need to get some rest right now, but I will look into this as soon as I'm available again. Theknightwho (talk) 21:40, 4 March 2023 (UTC)[reply]
@Theknightwho OK, please do get some sleep. But when you wake up think about what I wrote about giving an overview of all the changes you've made and are planning to make, and why. Thanks! Benwing2 (talk) 21:52, 4 March 2023 (UTC)[reply]
Thanks, and I will do.
Before I forget - I just wanted to say that I really don’t think it’s a good idea to put link templates through the transliteration parameter under normal circumstances (in the same way that we wouldn’t normally use link templates inside other link templates). It just feels like a kludge. While I’ve certainly made more than my fair share of mistakes, quite a few of these breakages are because modules were relying on workarounds/unintended behaviour. Theknightwho (talk) 22:07, 4 March 2023 (UTC)[reply]
This reminds me of the Jabberwocky scene in the blind cooper's workshop (viewable on YouTube). Whatever one may say about the system that was modified, one has to look at what is dependant on things being in a certain place and in a certain state at a certain point in the process. Simply changing things without checking can lead to unanticipated disasters. Chuck Entz (talk) 23:11, 4 March 2023 (UTC)[reply]
@Chuck Entz I have spent quite literally hundreds of hours editing the core modules at this point, and about 2/3 of those were spent on laying the groundwork in making sure things are compatible before rolling them out - which you can see from the hundreds of modules I've edited in the last 24 hours alone. I'm perfectly happy to own up to my mistakes, but characterising me as some kind of blind, naive idiot is unacceptable. Thanks. Theknightwho (talk) 23:44, 4 March 2023 (UTC)[reply]
I never called you that- you're not blind and you're not an idiot. In the scene in question, the blind cooper represents the status quo, and you're more like Dennis, the outsider who triggered the catastrophe. He was bright and had good ideas, but acted without full understanding.
You've been very good about cleaning up after yourself, but the truth is that every step in the laying of the groundwork that you're referring to has resulted in dozens, hundreds, or even thousands of module errors. From the beginning, you've been making changes without discussion. When you do talk about what you're doing it's in the context of explaining how something bad happened. You've been quite honest after the fact, but it would be better to have prevented the problems in the first place.
Our Lua inventory is the result of dozens of editors creating a huge number of unrelated modules that depend on other modules, often without consulting each other. It ranges from impeccable showcases of first-class programming to amateurish kludges held together by bailing wire and duct tape. It's a huge mess, but it's what we've got. No one person really understands all of it, and I'm skeptical that simply studying the modules on one's own could ever give complete mastery of all its flaws and quirks. I'm sure you'll know by the end of all this what you should have known before you started, but that will be too late. You really need to discuss your methodology and ask questions of people from all over the community at each step. Collaboration is an inefficient, often painfully slow process, but all the alternatives are worse. Chuck Entz (talk) 00:53, 5 March 2023 (UTC)[reply]
@Theknightwho I agree with the gist of Chuck's comments. I asked you at least a month ago to discuss your changes before making them; IP 70.* made the same request. Yet here we are a month later without any such discussions. I've also asked you multiple times to follow good software engineering practices, but it's not happening; either you're simply unaware of such practices, or are intentionally ignoring them. Among those practices are
  1. using sandbox modules so that your changes, when they're made, come in a single well-thought-out commit with a clean, explanatory commit message, rather than 25 piecemeal commits that are impossible for anyone else to wade through;
  2. making sure all changes have an attached changelog message that clearly describes what was changed and why;
  3. respecting backward compatibility so you don't cause temporary breakage all over the place (the change to add multiple return values to various functions in Module:languages is a good example of this; this is a breaking change and should have been implemented in a backward-compatible manner, e.g. by keeping the existing calling conventions and making new functions that return multiple values);
  4. shopping around your changes in advance so that other Wiktionary coders catch problems *before* the code is committed (in most companies this is implemented through a formal code review process, but we don't have any such thing so it's essentially an honor system to do this).
Honestly, I'm very frustrated with the current state of affairs. I don't want to get to the point where I feel the need to forcibly back out your changes, but it's getting there. Benwing2 (talk) 04:08, 5 March 2023 (UTC)[reply]
Some people put links through the translation parameter. Now, the ones I can recall are non-fragment wikilinks, but what if someone wants to use a sense or etymology ID? --RichardW57 (talk) 23:14, 4 March 2023 (UTC)[reply]
I don't have an issue with putting links through translation parameters, as they're essentially just glosses (and therefore analogous to definitions). Those should work fine. Theknightwho (talk) 23:44, 4 March 2023 (UTC)[reply]
@Theknightwho:: The primary problem with Pali is whether homophonous homonyms in the same writing system should be transliterated the same. (There is a Lao script writing system for Pali which loses distinctions also lost in Laotian speech. Think of Walpole's Latin.) A secondary problem is that a transliteration may be very doubtful as Roman script Pali. For these reasons, whereas setting link_tr would work well for most words, we need a mechanism to override it. A previous request for such a mechanism was ignored; the result was {{pi-nr-inflection of}}, which also supports the manual display of the Roman script equivalent when different to the transliteration. {{pi-link}} has what I think is a neater way of doing it. In each case, I wanted to avoid re-inventing {{inflection of}} and {{link}}, so I use as them in these new templates as much as I can.
I'm wondering whether there has been some confusion over the meaning of table notranslit from Module:headword/data. Non-Roman script Pali should generally be transliterated, but that has been deemed unnecessary in two cases:
  1. The list of fully equivalent forms in other writing systems, a decision copied through at least to Sanskrit (I invite comment from @Kutchkutch). Transliteration is switched off when the list is generated.
  2. In headwords when the transliteration is the same as the equivalent Roman script form and that form is given in the following definition line, overwhelmingly by {{pi-sc}}. Thus, at @Benwing2's urging, its display has been disabled by the headword templates. This disabling could be overridden, either by giving an explicit transliteration (undesirable) or, arguably hackily, by giving white space as the transliteration. This was working nicely on 20 February, though there remain direct usages of {{head}} where it usually had to be switched off manually.
If we were to decide that the roots of verbs should be given in the same script as the verb, we would want them to be transliterated. So far, I have successfully urged against non-Roman script roots, and I finally found the formatting parameter to enable aesthetically satisfactory Roman script for terms in the headword line of non-Roman script 'lemmas'. Similarly, if we switched to giving the not totally predictable feminines of Pali adjectives in the headwords, instead of burying them in a ragged declension table, we would want them transliterating. --RichardW57 (talk) 22:47, 4 March 2023 (UTC)[reply]
@RichardW57 This all makes sense - I'll review it tomorrow, as it sounds like we need a more flexible way of handling this (particularly when it comes to languages with large numbers of scripts). A very obvious first step would be making link_tr possible to use on a script-by-script basis (which is pretty easy to implement), but there also needs to be a method to turn it off (as you say). Something analogous to {{l|pi||XXX}} in links. Anyway - I'm off. Goodnight. Theknightwho (talk) 23:44, 4 March 2023 (UTC)[reply]
{{l|pi|EQV|XXX|id=ID}} is a good analogy - XXX is the transliteration, EQV is the equivalent in my usage for Pali, and ID is something I haven't implemented but should. I can't think of a good use case for switching link_tr on and off on the basis of script, even though I did transliteration last in inflection tables for Thai and Lao last. The reason for that was that I had worries about being able to get the transliteration working. One problem was that I had even mixed abugidic and alphabetic writing systems in the same inflection table, something I now do much less of, and didn't record which system was used for which form. --RichardW57 (talk) 02:26, 5 March 2023 (UTC)[reply]
While Pali occasionally needs 3 forms - writing system's, transliteration, and standard Roman, Sanskrit and Prakrit should often need 3 forms - writing system's, transliteration, and respectively standard Devanagari and standard Brahmi. Can the declared communities ((Notifying AryamanA, Bhagadatta, Svartava, JohnC5, Kutchkutch, Inqilābī, Getsnoopy, Rishabhbhat): ) please confirm or explain why not? The primary usage would be in {{inflection of}}. Possibly, though, they want users to do more hopping around to get to the definitions:
  1. Look up form
  2. Look up stem in same writing system
  3. Look up stem in main writing system of main script
I have noticed that they are less keen on giving quick glosses to jog users' memories. --RichardW57 (talk) 08:52, 5 March 2023 (UTC)[reply]
There is already (pretty basic) infrastructure in place relating writing systems in Module:writing systems, but it is little-used at the moment. I’m sceptical that a universal way to detect of writing systems is possible, but depending on what you had in mind it might be possible to put something together that involves treating text differently depending on what writing system is in use. Theknightwho (talk) 17:34, 5 March 2023 (UTC)[reply]
I mean writing system in Daniels' sense, which is missing at writing system, and is not the type of system addressed by the module you mentioned. One example of writing systems is simplified and traditional Chinese; another is US and Commonwealth spellings, in so far as that isn't more generally a difference of language. In the case of Thai and Lao script writing systems for Pali, the main difference is whether the alphasyllabary is an abugida or an alphabet in Daniels' sense, i.e that 'all' vowels are marked. One can usually detect the difference by the difference in the character repertoire, though there there is a Lao alphabet for Pali that uses LAO SIGN PALI VIRAMA as a nukta. The block of code in Module:pi-translit following elseif sc == 'Thai' or sc == 'Laoo' then (currently at line 267) until we get to the declaration of ngf1 (currently at line 318) is mostly concerned with detecting the writing system. The conclusion is stored in the variable explicit, and for the Lao script, also the variables yLao and nuktaed. Those three variables provide all the information needed for transliteration. The logic is tailored to the known (or rather, encountered) Thai and Lao writing systems for Pali, and is not generic. RichardW57 (talk) 21:07, 5 March 2023 (UTC)[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── @Theknightwho Did you read my comments above about good software practices and the need to slow down your changes, shop them around *before* doing them, and make a document outlining the overall gist of your changes? Benwing2 (talk) 04:28, 6 March 2023 (UTC)[reply]

@Benwing2 Yes, and I'm writing something up. The changes I've made today have been dealing with prior issues, which meant they had to be prioritised. Theknightwho (talk) 04:30, 6 March 2023 (UTC)[reply]
@Theknightwho Yes, understood. But my larger point is you haven't agreed to the software practices I mentioned above or even acknowledged that there's a problem with the way you are currently operating. It still feels to me like you're not taking this seriously enough. I have a lot of patience in general but it's seriously worn thin at this point; if and when it runs out, I *WILL* start reverting your changes and seek to block you from further code changes. Benwing2 (talk) 05:58, 6 March 2023 (UTC)[reply]
@Benwing2 I do understand, and I do acknowledge that there is an issue. I would much rather respond to something like this properly, though, and it's obviously going to take time to do that with what you asked for. I understand that you're frustrated, but I'd also appreciate a little understanding, too. I'm not just ignoring you, which I tried to make clear by tagging you in my edit to Module:headword. Please also remember that (a) I have a real-life job and commitments, too, and (b) I can't just leave things when bugs become apparent. Theknightwho (talk) 15:49, 6 March 2023 (UTC)[reply]
@Theknightwho So far I've asked essentially for three things: (1) over-arching documentation/plan concerning what you've done and are planning on doing and why, (2) discussions before making further major changes, (3) changes in your software engineering practices. You've agreed to do (1), I'm not sure about (2) and I've heard nothing about (3). I really need to get your agreement on (3); I can help you if you are not sure how to go about doing things other than piecemeal trial-and-error, but you can't just ignore this and hope it will go away. For an example of how I think things need to be done, see [2], where I rewrote Module:headword to support qualifiers, references and separators attached to individual headwords; added full support for per-headword scripts, which you tried to add but never really finished; and generally reorganized and cleaned up the module. As you can see, this is a single commit, with a detailed change log message [which in truth could stand to be even more detailed, although I added a lot of comments in the code itself], and there was only one minor bug that needed fixing in a follow-on commit. Before pushing, I put the sandbox version at Module:User:Benwing2/headword and created Module:User:Benwing2/test-headword and User:Benwing2/test-head to test it. With this in place, I could also make changes to Module:User:Benwing2/test-headword on-the-fly and use the "preview this template/module" functionality to preview User:Benwing2/test-head without needing to save the changes. The actual rewrite took a couple of evenings, but the testing took only about an hour. The total effect was that the commit history of Module:headword stays clean, there is no churn in CAT:E, and the code is much more likely to be clean. The way you've done things has, frankly, introduced a lot of tech debt that will take a while to clean up, and I'd like to not see this trend continue. Benwing2 (talk) 07:21, 10 March 2023 (UTC)[reply]
@Benwing2 I’m happy to agree to all three requests, and I do understand why it’s an issue. I’ve had an unexpectedly busy end of week (which is why I’ve not been very active these last couple of days), but that doesn’t mean I don’t care about this issue, because I realise that it’s causing unnecessary stress. I’ve just not had the time to put the things together that I wanted to. Theknightwho (talk) 00:40, 13 March 2023 (UTC)[reply]
@Theknightwho Great, thank you. I understand RL pressures take precedence and I'm not pushing you to follow any particular time schedule, just that when you do find the time to do more core module coding, you follow practices to reduce stress, as you put it. In my experience, with a bit of practice, these procedures don't end up slowing you down and do help a lot in the longer term. Benwing2 (talk) 01:31, 13 March 2023 (UTC)[reply]
@Theknightwho, Benwing2: Transliteration as links are now functioning as links again. Thank you for relenting. Is this an indefinite change, or should I continue with my efforts to rewrite {{pi-nr-inflection of}} to bypass that technique, as {{pi-link}} does? (So far I've progressed as far as revamping the unautomated testing.) --RichardW57 (talk) 08:38, 9 March 2023 (UTC)[reply]
I've rewritten it anyway and partially the utility {{pi-ml}} beneath {{pi-link}}. --RichardW57 (talk) 22:49, 10 March 2023 (UTC)[reply]

How to report transliteration failures

[edit]

What is the best way to report that a transliteration module has accepted a string as something it should strive to transliterate, but failed? I had been returning the partially converted string as a useful hint that something had gone wrong, but at least when invoked by {{xlit}}, @Theknightwho just over an hour ago admitted to converting a result with lurking non-Roman characters to nil. (This is a template-breaking change, but seems only to have affected a documentation page.) Would raising a module error with details be appropriate? If it's simply a case of text in an inappropriate script, then returning nil is probably appropriate. --RichardW57 (talk) 01:19, 5 March 2023 (UTC)[reply]

I notice that {{l|bo|࿕ཀ࿖}} currently does not transliterate, but merely yields "࿕ཀ࿖ (࿕ka࿖)", whereas I'm sure it would have done before. This is because of the flanking good-luck symbols, which Unicode foisted on the Tibetan block, egged on by Michael Everson. Unicode does not classify them as being in the Tibetan script, but rather as 'common' across scripts. So, do we now require these symbols to be transliterated when transliterating strings from the Tibetan script, or is someone going to have to go round refining our definitions of character content of the scripts. In many cases, we have just applied the future-proofing of allocating entire Unicode blocks to a single script; getting a few characters wrong rarely mattered for deciding between ISO scripts. Now it does matter. --RichardW57m (talk) 13:51, 6 March 2023 (UTC)[reply]
Where is the guidance on automatic transliteration of dingbats? --RichardW57m (talk) 13:51, 6 March 2023 (UTC)[reply]

The worst case would probably be a quotation in Pali with a tiny bit of Shan in it that relied on automatic transliteration. I'm not sure there'd be a good way to defer the work of transliterating the text bit by bit and stitching the bits together. Might lying about the language work reasonably? --RichardW57 (talk) 01:19, 5 March 2023 (UTC)[reply]

The string I had tried was ພຸດໂທ (buddo), whose decipherment requires one to raise the algorithm's game to work out the writing system and then transliterate it. I use a backdoor to transliterate it in the declension table - I had to choose the writing system to generate it! --RichardW57 (talk) 01:19, 5 March 2023 (UTC)[reply]

@RichardW57 One of the reasons there seem to be consistent issues happening with Pali is that you keep trying to find workarounds/bodges instead of fixing the issue properly. Relying on things like partial transliterations or (your suggestion here) of using the wrong language code make it nearly impossible to make changes that don’t break things you’ve written. It was the same when you ran into issues with using unsupported scripts - which is clearly not something anyone can reasonably anticipate you were doing, and you should not have been relying on it to remain that way, as the only reason you were doing it in the first place was as a workaround.
On a side note - I would appreciate if you stopped being so passive-aggressive, because it makes me a lot less inclined to help you. I did not “admit” anything, either - I told you that things were working as intended, and that you needed to come up with something better than displaying partial transliterations to users. That remains the case. Theknightwho (talk) 18:26, 5 March 2023 (UTC)[reply]
Partial transliteration is what happens naturally. Unless it is all in some sense fake, the local Pali writing systems of various Tai groups in the Burmese Empire present transliteration problems waiting to happen. There's also an element of chicken and egg. What I think should happen if we can't find reliable external documentation of the systems - and I fear that Unicode Technical Note 11 is the best we can do - is that we rely on people to find texts and start transferring the information here. When we have samples, we can start extending the transliteration scheme. In the meantime, non-Roman characters leaking through is one way of raising and pinpointing the problem. It's not deliberate, it just happens. The same would happen if mixed script text (other than mixed with Roman script) were submitted for transliteration. Returning nil is distinctly unhelpful. --RichardW57 (talk) 19:57, 5 March 2023 (UTC)[reply]
Or are there categories recording the problems? They may not have been populated because in this respect I've managed to be careful. I tend to provide manual transliterations because I work my quotations hard. I'm not sure that anyone else but the banned Intobesa has added non-Roman script Pali quotations or usage examples.
As to fakeness, I've tried hunting for Shan parittas on-line. I found a few, but the clear Pali within them was all in the usual Burmese orthography. I've found a few examples of mixed Shan and Pali in Sai Kam Mong's book the History and Development of the Shan Scripts (references in {{RQ:pi:Sai Kam Mong}}), but the Pali text was in either the Lanna script or the normal Burmese orthography. --RichardW57 (talk) 19:57, 5 March 2023 (UTC)[reply]
@Benwing2, @Theknightwho, @Chuck Entz On the same page, many collocations & usage examples for Korean & Jeju are broken now when it comes to automatic transliterations. Some examples: 은혜 (eunhye), (gap), (beot), ᄒᆞ다 (hawda), the latter of which is also now having some weird capitalization happen now as well. AG202 (talk) 15:35, 5 March 2023 (UTC)[reply]
@AG202 I have a good idea of the problem here, and I’m looking into it. Theknightwho (talk) 18:27, 5 March 2023 (UTC)[reply]
[edit]

Hi, I've been directed here by an error which popped up whilst I was adding language-specific hash anchors to wikilinks over at the Frequency lists (specificaly Wiktionary:Frequency lists/TV/2006/12001-14000. The error reads as follows:

This action has been automatically identified as harmful, and therefore disallowed. [...] A brief description of the abuse rule which your action matched is: -meme.

Any feedback on what the issue might be and how it can be avoided in future would be much appreciated. Especially if there's a better way to do what I'm trying to do, as I had intended to do a lot more of it and do not wish to cause any problems for others or the project. Cheers Helrasincke (talk) 00:16, 6 March 2023 (UTC)[reply]

@Helrasincke: try again. I made autoconfirmed users exempt- if you aren't autoconfirmed yet, you soon will be. If it still happens, try breaking your huge edits into smaller ones: the filter looks for new users adding large numbers of certain groups of letters, and a large enough sampling of random words coincidentally contains many repetitions of almost any given sequence. Chuck Entz (talk) 01:18, 6 March 2023 (UTC)[reply]

Problem with Japanese romanizations

[edit]

On Japanese romanization pages (e.g. shōri), the headword's appearance has changed recently and looks enlarged on my computer. Its HTML element used to have class="Latn headword", now it has class="Jpan headword". Can this be fixed please? Rdoegcd (talk) 00:17, 6 March 2023 (UTC)[reply]

This is fixed. Theknightwho (talk) 02:56, 6 March 2023 (UTC)[reply]

Kazakh transliteration

[edit]

What has happened to the Kazakh transliteration? It seems it has gone to a previous version. Not matching Module:kk-translit. I know it's imperfect and would require either complex logic or lists of exceptions but the module is closer to the current Kazakh Latin standard.

@Theknightwho, Vtgnoq7238rmqco, Benwing2: FYI. Anatoli T. (обсудить/вклад) 02:42, 6 March 2023 (UTC)[reply]

I've changed it to an exact match. I must have been tired when doing this. Theknightwho (talk) 02:55, 6 March 2023 (UTC)[reply]

Bot request for Persian transliterations (fa)

[edit]
  1. Convert all transliterations to lower case.
  2. ā -> â, ō -> ô, (diphthong) "ou" -> "ow"
  3. Remove |sc=fa-Arab.

Anatoli T. (обсудить/вклад) 02:45, 6 March 2023 (UTC)[reply]

@Atitarev Can you give some example pages with the bad translits? Also, several years ago I cleaned up Arabic and Russian transliterations and wrote code that listed all the templates I could think of that had translit params and what the appropriate head and tr params were, but (a) it needs updating to reflect recent template changes/additions, and (b) it doesn't know anything about Persian-specific templates. So I need to know the names of all the Persian-specific templates that have translit params in them, and what the names of those translit params are and (ideally) the names of the corresponding Persian-script params.
Also probably something similar needs to be done with Arabic translit. It looks like User:Fenakhay recently changed the translit of hamza and ayin in Arabic, but it was done in a half-assed fashion, so that currently e.g. in انت we have a mess of old-style and new-style transliterated hamzas across multiple different Arabic lects. I'm not even sure there was consensus to make this change but if we are to make it, IMO the current change should be undone, and then a proper plan made to fix it properly across all Arabic lects and in all manual translits, and only then should it be redone. Benwing2 (talk) 04:06, 6 March 2023 (UTC)[reply]
@Benwing2: An example is this this revision where e.g. رسوایی (rosvâyi) was transliterated as "rosvā'i". There is an agreement on Persian transliteration but there are some old, anonymous or transliterations not matching WT:FA TR.
Unlike Arabic, Persian is currently not automated, so a manual update won't affect anything badly. I am fixing a number of bad transliterations every week, so just would like to save time on this. There are other issues but I only included the ones that won't have any negative impact and won't require double-checking (if a transliteration wasn't made in error).
If it makes it easier, I am only concerned about cases using translation templates {{t+}}, {{t}}, {{tt+}} and {{tt}}. Anatoli T. (обсудить/вклад) 04:38, 6 March 2023 (UTC)[reply]
2nd priority is Persian entries themselves, headword templates, usage example templates, {{l}}, {{m}}. Anatoli T. (обсудить/вклад) 04:40, 6 March 2023 (UTC)[reply]
@Atitarev I already have the code to handle {{t+}}, {{t}}, {{tt+}}, {{tt}}, {{l}}, {{m}}, {{head}}, {{ux}}, {{alt}}, etymology templates, etc., and in general it will handle any template that has params similar to {{l}} or {{m}} and isn't a lang-specific template. What would help me is (a) the Persian-specific templates, and (b) a full list of substitutions. For example, I just enabled tracking for usage of manual translit (see Special:WhatLinksHere/Template:tracking/links/manual-tr/fa) and the first thing that popped up was schizoid personality disorder, transliterated iḵtilāl šaxsiyat askeazwād which contains several dispreferred chars per Wiktionary:Persian transliteration. If you could make a full list of substitutions to make based on Wiktionary:Persian transliteration and any other dispreferred sequences you know of, I can do the conversion. Benwing2 (talk) 05:53, 6 March 2023 (UTC)[reply]
@Benwing2: It may be difficult to make a list of all dispreferred symbols because they may be sporadic, based on user's personal background, knowledge or preference.
There are many problems in the transliteration of اختلال شخصیت اسکیزویید. I am not sure about the exact pronunciation of the last word but it's incorrect. They used "ḵ" for the first "خ" but "x for the 2nd! Obviously any case of
  1. "ḵ" should be replaced with "x" ("خ"). It's a rare case you found. but
  2. ḫ, kh = x ("خ")
  3. "w" when used as a consonant (not part of a diphthong) be "v". "و" + vowel
  4. "ch" = č ("چ")
  5. "zh" = ž ("ژ")
  6. "sh" = š ("ش")
  7. "ǧ" = j ("ج")
I hesitated to ask about those. For the above, need to make sure, the digraphs are used for one letter/sound, not two sounds. Since you know the script, you can figure out, I hope.
I correct cases when "غ" is represented with "q" instead of "ğ" (there is no difference in modern Iranian between "ق" (q) and "غ" (ğ). So, here, perhaps a check should be that "q" is used but there is no "ق". I found quite a lot of cases where "غ" was incorrectly transliterated as "q" (easier to type!).
Cases with "gh" should be checked, if they are "ق" (q) or "غ" (ğ).
Any dotted Arabic styles are not used, e.g.
  1. ṣ - s
  2. ḍ, ż, ẕ - z
  3. ʔ = ' or nothing at the beginning of a word
In modern Persian (not classical or Dari):
  1. ē = ê
  2. ū = u (no length)
  3. ī = i (no length marked)
Again, I wasn't sure if classical Persian will be used, so I didn't ask to check those. Anatoli T. (обсудить/вклад) 06:25, 6 March 2023 (UTC)[reply]
@Atitarev I will see what I can do. I may need more info. There are two ways to operate, one is with reference to the Arabic script and one is without it. More canonicalizations can be done by looking at the corresponding letters in the Arabic script but sometimes the translit can't be matched with the Arabic if the translit is too messed up. BTW what is the status of Dari vs. Iranian Persian (vs. Tajiki?) translits? Are there any circumstances under which Dari translits occur? Benwing2 (talk) 06:34, 6 March 2023 (UTC)[reply]
@Benwing2: Tajik translit is fully automated.
Dari translits do occur. The preference, AFAIK is to give Dari specific words (no equivalent in Iranian Persian) its own transliterations but I am much less familiar with it. There is no separate policy document for it yet. The consonant symbols are the same. It's usually about short vowels
  1. Iranian "e" = Dari or classical "i". Dari also has "e"!
  2. Iranian "o" = Dari or classical "u"
  3. Iranian final "e" = Dari or classical "a"
  4. Iranian diphthong "ow" = Dari "aw"
Or the reading is different altogether with some vowel difference.
Dari transliterations are inconsistent or incorrect, including some dictionaries and there are errors here.
Iranian editors often oppose vocalisations (diacritics) because they don't match the classical or regional Persian, even though some dictionaries use them. Anatoli T. (обсудить/вклад) 06:51, 6 March 2023 (UTC)[reply]
@Atitarev Hum. Is there any way of telling automatically whether a given term has or should have a Dari translit, so I can skip them? Maybe in the longer run now that User:Theknightwho added support for multiple variants of a term in a single link, we can think about displaying both the Iranian and Classical/Dari translits. Benwing2 (talk) 06:54, 6 March 2023 (UTC)[reply]
@Benwing2: Based on my request (substitutions), I can't see cases where it would break transliterations for Dari, perhaps only with long vowels with a macron (ē, ū or ī), not sure. Can you see other cases? It's hard to decide what could go wrong, if there is no page for Dari. Anatoli T. (обсудить/вклад) 06:59, 6 March 2023 (UTC)[reply]
@Benwing2: Dari terms are usually marked with {{qualifier|Dari}} or {{qualifier|Afghanistan}}. I use {{qualifier|Dari}}. Anatoli T. (обсудить/вклад) 07:02, 6 March 2023 (UTC)[reply]
@Atitarev OK when I get a chance to implement this I'll get back to you. Benwing2 (talk) 07:02, 6 March 2023 (UTC)[reply]
@Atitarev Any example pages you can supply where these qualifiers are used? Benwing2 (talk) 07:04, 6 March 2023 (UTC)[reply]
@Benwing2: ticket#translations. Anatoli T. (обсудить/вклад) 07:13, 6 March 2023 (UTC)[reply]
@Benwing2Wait could you elaborate? Like are you saying there can be an alternative versions of an entry with different transliterations? Like you can switch between what transliteration is used??? If it's what I think, it seems very interesting and I think it's worth considering. Sameerhameedy (talk) 05:16, 8 March 2023 (UTC)[reply]
@Benwing2: The ping wasn't done correctly, since @Sameerhameedy put it in a different edit. (pings and signatures need to be in the same edit). Just repeating. Anatoli T. (обсудить/вклад) 05:32, 8 March 2023 (UTC)[reply]
@Atitarev Also I don't know that much about Persian-specific handling of the Arabic script. What are the gotchas? E.g. under Roman road we have the translation خیابان رومی with translit xiyâbân-e Rumi. The -e (ezafe?) has no equivalent character in the Arabic script. Is that normal? Benwing2 (talk) 07:00, 6 March 2023 (UTC)[reply]
@Benwing2: Yep! That's ezafe, written either as "-e" or "-ye" (after vowels). ایالات متحدۀ آمریکا (eyâlât-e mottahede-ye âmrikâ), note the use of "ۀ", a wriiten ezafe only when the letter is "ه".
Letter "ه" in the final position after consonants is "e" in Iran or "a" in Dari/classical.
The Tajiks simply write an "и", no hyphen: Иёлоти Муттаҳидаи Амрико (Iyolot-i Muttahida-yi Amriko)
Gotchas are when "و" or "ی" stand for long or short o/ô or ê/e in loanwords, like in Arabic! Anatoli T. (обсудить/вклад) 07:11, 6 March 2023 (UTC)[reply]
@Atitarev One other thing, the ZWNJ, when does this occur and how is it transliterated? Always as a hyphen? Contrarily, if there's a hyphen is there always a corresponding ZWNJ in the Arabic script? Benwing2 (talk) 07:16, 6 March 2023 (UTC)[reply]
@Benwing2: ZWNJ and ezafe, haven’t come across the other. Anatoli T. (обсудить/вклад) 07:39, 6 March 2023 (UTC)[reply]
{{ping|Atitarev]} What do you mean you haven't come across the other? Benwing2 (talk) 08:04, 6 March 2023 (UTC)[reply]
@Atitarev Oops. Benwing2 (talk) 08:04, 6 March 2023 (UTC)[reply]
@Benwing2: Hyphen is only used for ZWNJ and ezafe, AFAIK. I don’t remember seeing odd hyphens. Maybe foreign words where they are also used? Anatoli T. (обсудить/вклад) 08:20, 6 March 2023 (UTC)[reply]
@Atitarev I guess what I mean is, if there is a ZWNJ in the Arabic script, is there always a corresponding hyphen in translit? And if there's a hyphen in the translit, and it's not followed by the ezafe endings -e or -ye, is there always a corresponding ZWNJ in the Arabic script? Benwing2 (talk) 08:31, 6 March 2023 (UTC)[reply]
@Benwing2: Generally yes. That should be the case but let me if you find cases otherwise. Anatoli T. (обсудить/вклад) 08:41, 6 March 2023 (UTC)[reply]
@Atitarev , Loanwords from Arabic almost never have a long 'ê' or 'ô', those vowels are mostly from Classical Persian (and modern Dari). Both of which have majhûl and ma'rûf varients of vowels. wâw-i/vâv-e ma'rûf = û, wâw-i majhûl = ô; yâ-'i ma'rûf = î, yâ-'i majhûl = ê. Diacritics can also be majhûl vowels in front of glottal consonants. Majhûl vowels were (mostly) removed from modern Iranian Farsi but have "leftovers" in some words. Also in the Tehrani dialect some diphthongs have collapsed into long vowels like ow -> ô or ey -> ê, but that's usually non-standard. About ی and و acting as short vowels, that can be from Arabic but is mostly because in both Iranian Persian and Dari the syllable 'wa' has (inconsistently) collapsed into a short o and u respectively. Sameerhameedy (talk) 18:23, 8 March 2023 (UTC)[reply]
@Sameerhameedy:. Long and short vowels are always going to be a problem, since users may added them based on their own background, preference or understanding.
BTW, û and î symbols are dispreferred in the translit, since "u" and "i" stand for long vowels (these are always long) but "ô" and ê are used. (I understand you're using û and î to highlight that they are long).
How does it affect the transliteration policy at WT:FA TR?
Are you able to make a comparison table, contrasting modern Iranian, classical and Dari in terms of transliteration to be used?
You asked @Benwing2, if it's possible to use multiple transliterations. I don't know. It must be difficult. There's a similar discussion about Hebrew (modern Israeli vs classical). Perhaps we can leave that in the pronunciation section only?
This old vote had the majority for it - Wiktionary:Beer_parlour/2021/December#Persian_automated_transliteration but native speakers were cold about it. So, I guess it won't happen. We discussed transliteration challenges. Some good points there. (I know I made some incorrect assumptions there, since I am a beginner at Persian and don't know Dari at all). Anatoli T. (обсудить/вклад) 00:16, 9 March 2023 (UTC)[reply]
In dari there is a short 'i' and a short 'u' sound
dil (دِل), shêr (شیر), shîr (شِیر), and mehmânî (مِهمانی) are all different because of majhûl-ma'rûf vowel rules. (that's why Afghanistan and Tajikistan are transliterated with an 'i', instead of 'Afghanestan' or 'Tajikestan') Majhul and Ma'ruf vowels don't exist in Iranian Persian, so In Iran these would be pronounced del, shir, shir, mehmâni. So yes, In Iranian Persian, i and u are always long, but that is not the case for other dialects. Which can make unmarked transliterations very confusing, which is why I think a table in the pronunciation section can be helpful. It fine for Iranian Persian to be the "default" since it's the largest dialects, but without a multiple romanizations in the pronunciation section I feel like it might be confusing for speakers of eastern Dialects. Sameerhameedy (talk) 20:52, 9 March 2023 (UTC)[reply]
@Atitarev That's fine, it would be absurd of me to expect everyone to know the phonology of every standard dialect of Persian. I just keep mentioning it because think it's important that there's some place where the Eastern (and/or classical) transliterations can be shown to avoid confusion. Sameerhameedy (talk) 20:55, 9 March 2023 (UTC)[reply]
@Benwing2: There is another transliteration difference I have recalled.
Similar to Iranian diphthong "ow" = Dari "aw" there is also
  1. Iranian diphthong "ey" = Dari/classical/Tajik "ay". E.g. غیرت (ğeyrat) vs Tajik ғайрат (ġayrat).
I made some assumption but also need to clarify that long vowels "u" and "i" in modern Iranian. They are always transliterated with plan letters "u" and "i", no circumflex or macron. There is no short "u" and "i" in modern Iranian, they are "o" and "e".
The vocalisation in Persian may confuse a bit, if you only know Arabic vocalisations.
  1. رُو = row
  2. رو = ru (unmarked, long u)
  3. رَو = raw (happens in Dari and classical, not modern Persian)
  4. رِی = rey
  5. رَی = ray (happens in Dari and classical, not modern Persian)
  6. ری = ri (unmarked, long i)
As a result, pls see how diphthongs differ and how they are vocalised:
  1. غِیرَت (ğeyrat) - modern Iranian
  2. غَیرَت (ğayrat) - Dari or classical
And
  1. نُوروز (nowruz) - modern Iranian
  2. نَوروز (nawrôz) - Dari or classical (not sure about long vowels in classical Persian or Dari)
Anatoli T. (обсудить/вклад) 03:31, 7 March 2023 (UTC)[reply]
@Atitarev Dari translations and phonology guides are very inconsistent, between different articles on English wikipedia and even on non-wiki articles. However, unlike English Wikipedia, the Persian wikipedia has multiple detailed articles about (standard) Dari pronunciation; and even seems to be more in-line with the phonology used by the fa-IPA template, "Persian Phonology" wikipedia page, and the phonology used by the US Library of Congress for their standard transliteration. Using the Persian Phonology wikipedia page as reference seems to be the current practice but I can pull additional information from the Persian Wikipedia and add it somewhere(?). Also, It's worth considering a transliteration table (using a similar conversion system as the fa-IPA) that presents the transliterations in a similar fashion to the transliteration tables used on the Korean Wiktionary. The table could even transliterate voiced Arabic text into ~three transliteration styles. It would also be more consistent and eliminate the need to use dialect labels in the headers, as people can see which transliteration is being used by looking at the table. The only issue is that while Classical Persian and modern Dari can be converted to Iranian Persian, the reverse is not true, as classical and Dari persian have more complicated vowel rules than modern Iranian persian. Sameerhameedy (talk) 05:07, 8 March 2023 (UTC)[reply]
There was consensus with a supermajority, so I don't know what you are saying. — Fenakhay (حيطي · مساهماتي) 09:52, 6 March 2023 (UTC)[reply]
SInce nobody's pointed it out (unless I missed something), the reason there's a mix of two different symbols on the انت page is because the change we made only affected how they appear when rendered through the template, but a lot of the entries still have manually entered transliterations. Undoing what we did won't fix that problem .... we are going to either write a script and run it through a bot, or slog through the entire list of Arabic entries replacing the transliterations one by one. Hopefully the former option will work, and I think that is our plan. Soap 13:20, 6 March 2023 (UTC)[reply]
@Soap, Fenakhay: This discussion is about Persian, not Arabic. User:Benwing2 didn't oppose the change and he did mention in the relevant discussion that a lot of manual transliterations need to be updated. I guess the point is, that Benwing2 doesn't want to rush it but think it through and check thoroughly.
Re: Arabic transliteration, it's not complete until manual transliterations (standard and possibly dialectal) are updated as well. The WT:About Arabic needs to be brought in line with the decision, otherwise, any changes might be considered vandalism by someone who hasn't seen the vote and the discussion. Anatoli T. (обсудить/вклад) 23:04, 6 March 2023 (UTC)[reply]

Arabic transliteration change not done right

[edit]
@Soap, Fenakhay I stand by my "half-assed" comment. IMO *before* making the switch, you should have already written the bot script to go through and fix the manual transliterations appropriately, and then *as soon as* the switch was made, run the bot script, so there isn't a multiweek or likely multimonth period (as currently) where we have a mess of inconsistent transliterations. You should have also already rewritten WT:About Arabic and have it ready to slot into place as soon as the switch is made. In sum, you need a carefully-thought out plan before making major, breaking changes like this. And undoing the change *will* fix things by putting things back to status quo ante, where the manual and automatic transliterations were consistent with each other. So I have to ask now ... who is going to write the bot script to fix things, and when? Or are you expecting "someone else" (e.g. me) to do it for you? And who is going to update WT:About Arabic, and when? Benwing2 (talk) 09:00, 8 March 2023 (UTC)[reply]
@Soap, Fenakhay I am going to revert the change to the Arabic transliteration scheme in 2 days unless I hear from one of you about a concrete plan to fix the manual transliterations to match the new auto-transliteration scheme. I am not against the new scheme per se but I am definitely against having an ill-thought-out transition plan and a resulting mishmash of transliterations. Benwing2 (talk) 06:43, 9 March 2023 (UTC)[reply]
I imagine you pinged me because I posted above, but this really doesnt involve me. While Ive been eagerly following this issue, my only participation in the original thread was to ask a question about web fonts that is entirely moot now, and my only participation in this thread so far has been to point out that it wasn't a bug or a failure on our part that caused the incomplete transition to the new transcription. So, in essence you're giving a demand to just Fenakhay, .... maybe they can handle this situation alone, but I would urge you to contact other people who were involved in the original thread, particularly those who voted in it, and wait for their input before taking any action. Some people who are vested in this may not be aware that the discussion has been revived as the title of this section suggests it's about Persian. Soap 08:52, 9 March 2023 (UTC)[reply]

Persian transliteration cleanup by bot

[edit]

@Atitarev I have some questions about Persian. For example, for Arabic-script ayin ع, WT:Persian transliteration lists four possibilities: ', a, e, o. Under what circumstances do those four possibilities occur, and you give examples of each one (as many as possible)? Likewise for Arabic-script ه with two possibilities given h, e; Arabic-script و with five possibilities given v, u, ow, ô, aw (you gave some info on this above but some things are still unclear); and Arabic script ی with three possibilities given y, i, ê. I'm asking because I don't know Persian-language phonotactics (i.e. which sounds can occur where in a word). Thanks! Benwing2 (talk) 06:50, 9 March 2023 (UTC)[reply]

BTW @Atitarev speaking of Dari vs. Classical vs. Iranian, I see there is already an etymology-only code fa-cla for Classical Persian, and another such code prs for Dari. Why aren't these being used regularly in Dari-specific translations? This should make it possible to have different vocalization and transliteration schemes for Dari vs. Iranian Persian, which should (maybe) resolve the complaints of the native speakers. Benwing2 (talk) 07:46, 9 March 2023 (UTC)[reply]
@Benwing2: True but all three languages/varieties are treated as one language. Only Tajik is separate. Anatoli T. (обсудить/вклад) 07:55, 9 March 2023 (UTC)[reply]
@Benwing2:
  1. ع - should always be ', IMO. The vowels are the written or unwritten vowels BEFORE/AFTER the letter, not instead of it. E.g. عراق ('erâq). The vowel is AFTER the 'ain. This part needs more clarity in the doco.
  2. ه - "h" in most cases. "e" as the word final after a consonant (+ a short unwritten vowel), for Dari/classical it's "a". Kind of like Arabic ة‎. ﮥ‎ is e-ye in Iranian. So e + ezâfe.
    1. "h" - ایستگاه (fa) (istgâh). It's final but after a long vowel.
    2. -e/-a (Iran/Afghanistan) - کیسه (fa) (kise) (Iranian), خریطه (xarita) (Dari)
    3. -e-ye/-a-ye (Iran/Afghanistan) -کیسۀ پلاستیکی (kise-ye pelâstiki) (Iranian), خریطۀ پلاستیک (xarita-ye palâstik) (Dari)
  3. ی - "i" after consonants (long vowel). fatha (aka fathe, zebar) + ی = "ay" (perhaps not applicable for modern Iranian). kasra (aka kasre, zir) + ی = "ey" (different from Arabic). "y" after any other vowel, long or short or before vowels.
    1. Please see my example غِیرَت (ğeyrat) above. It's "ey" but the first "e" is an unwritten vowel.
    2. کیسه (kise) a long "i".
    3. "ê" - is irregular, used in some native and loanwords
  4. و - "v" before vowels (written or unwritten)
    1. و - ‎"u" after consonants (long vowel). fatha (aka fathe, zebar) + و = "aw" (perhaps not applicable for modern Iranian). damma (aka zamme, piš) + و = "ow" (different from Arabic). "w" after any other vowel, long or short or before vowels. I noticed sometimes people inconsistently transliterate it as "v" or "w" after a long alef.
    2. Please see my example نُوروز (nowruz) above. So "و" is "ow" but the first "o" is an unwritten vowel. I use "ow" for transliteration purpose, even though the actual pronunciation may differ in colloquial Iranian. (A question to the community would be - is it a "w" or "v" in cases like واو - "vâv" or "vâw"?)
    3. کُروز (koruz). "piš" (damma, zamme) results in a short "o", unmarked consonant + "و" - a long "u".
    4. "ô" - is irregular, used in some native and loanwords
Anatoli T. (обсудить/вклад) 07:54, 9 March 2023 (UTC)[reply]

offrir#Conjugation has error caused by fr-conj-auto template

[edit]

Equinox 08:22, 6 March 2023 (UTC)[reply]

@Equinox What is the error? I don't see any error. Benwing2 (talk) 08:30, 6 March 2023 (UTC)[reply]
Its not parsing the link that appears right before the table. it says
its past participle [[{stem}ert#French|{stem}ert]] is irregular.
At least thats the error i see. Soap 10:01, 6 March 2023 (UTC)[reply]
Done Done Fixed by User:Theknightwho. Equinox 08:16, 8 March 2023 (UTC)[reply]

Edit incorrectly identified as harmful (2)

[edit]

Has the Wiktionary security become oversensitive?

This is the second time in a few weeks that one my posts/edits is being wrongly flagged as harmful. Even stranger, unlike the previous instance, this time I've got no links, and I'm not even editing an article, but rather just attempting to reply to continue a conversation thread (which I initiated).

Here's what I tried to add as a reply to User:Equinox at Wiktionary:Requested_entries_(English)#A:

///Blocked here too, so I'll try to split it///

—DIV (1.145.32.254 12:26, 6 March 2023 (UTC))[reply]

\\\IGNORE TEXT BELOW\\\

Thanks for the contribution.
—DIV (1.145.32.254 12:28, 6 March 2023 (UTC))[reply]
At that Reddit discussion, which is initiated with the spelling '''om''', there are then several variations mentioned, including
1.145.32.254 12:28, 6 March 2023 (UTC)[reply]
///STILL BLOCKED. IT DOES SEEM TO BE STUBBORNLY FIXATED ON THE FORMATTING. TRYING ANOTHER APPROACH///
///PLEASE READ Q AS ' ///
///OK, this is giving me the shits. I can't post the content it's complaining about even in modified form. Maybe it's not the formatting???/// 1.145.32.254 12:34, 6 March 2023 (UTC)[reply]

\\\IGNORE TEXT ABOVE\\\

\\\UNFORTUNATELY WIKTIONARY IS BLOCKING ME FROM DELETING MY OWN (SUPERSEDED) COMMENTS TO MAKE THIS MORE READABLE  :-p \\\


///NO, THE ISSUE IS NOT THE FORMATTING, IT'S THE SPELLING. IT FORBIDS ME FROM USING THE SPELLING WITH MANY REPEATED LETTERS, AS PER THE REDDIT DISCUSSION CITED BY EQUINOX./// 1.145.32.254 12:36, 6 March 2023 (UTC)[reply]
Thanks for the contribution.
At that Reddit discussion, which is initiated with the spelling om, there are then several variations mentioned, including "'''''Um⁵ er⁶''' in Leicestershire!''" and "''Ours was so stretched out it was basically “'''om-muh-ne³r⁷'''”''".
I would imagine for attestation it might possibly be mentioned in published diaries/memoires or children's books. With who-knows-what spelling. But actively searching for it seems to be a difficult task. More like to just keep an eye out in case it crops up.
—DIV (~~~~)
The only things I can think of are that I've used a bunch of bold and italic (but not necessarily overdoing it, and I think the syntax is OK), and — yes — I'm editing without logging in to an account (but that's not alluded to in the warning message).
THE PROBLEM APPEARS TO BE THE REPEATED LETTERS. NOW SUPERSCRIPTS INDICATE HOW MANY TIMES EACH LETTER SHOULD BE READ AS APPEARING.

ORIGINAL WARNING MESSAGE:

Warning: This action has been automatically identified as harmful.
Unconstructive edits will be quickly reverted, and egregious or repeated unconstructive editing will result in your account or IP address being blocked. If you believe this action to be constructive, you may submit it again to confirm it.
A brief description of the abuse rule which your action matched is: probably vandalism. If you believe your edit was flagged in error, you may report it on the Wiktionary:Grease pit.
—DIV (1.145.32.254 12:41, 6 March 2023 (UTC))[reply]
Sorry you had such a hard time with this. It might be that the filters are more easily tripped in the Wiktionary namespace because it is the only non-talk namespace where conversations often happen. That would explain why you couldn't post either on the original page or on this page. I cant see the filter, so I cant say if it's been recently changed, but it might just be a coincidence that this has happened to you twice in a short time if the hits are for unrelated things.
Any administrator would be able to see the message you were trying to post through your contributions page, so all you'd need to do here is say that there was a false positive, but sometimes no administrator is watching this page at a particular time and the response might not be immediate. Another idea if you need to post something but can't is to use Pastebin and then link to that. Soap 12:55, 6 March 2023 (UTC)[reply]
Ah, thanks for that information. I assumed that I had to somehow communicate what I was trying to post/edit, because I was under the (mis)apprehension that when the post/edit was blocked no data was stored.
Could you please confirm, did you mean only Administrators (with elevated priveledges) can see that information, but nobody else can? Certainly I can't see any detail of the blocked edits/posts at Special:Contributions/1.145.32.254 — only the 'accepted' posts/edits.
I guess that could be useful, although it also means that any other people perusing this forum might have difficulty following the discussion.
Just wondering: suppose one of these Administrators goes to my contributions page and finds the blocked content. Would they be able to (re)post it successfully to, say, Grease Pit? Or would Wiktionary block them in the same way that I was blocked?
—DIV (1.145.32.254 14:34, 7 March 2023 (UTC))[reply]
The blocked edit won't show up in the main page of your contributions, but if you click the "abuse log" link under the header, you will see a list of various actions that triggered the abuse filter. An admin sees additional information on some of those actions, like the text of attempted edits. — Eru·tuon 22:12, 7 March 2023 (UTC)[reply]
Just so you know, if you set up an account and get "confirmed" status (4 days of using the account), you won't have that type of edit blocked anymore. Better for privacy (keeps people from knowing where you are based on your IP address) and lets people know who you are even if you switch IP addresses. — Eru·tuon 22:29, 7 March 2023 (UTC)[reply]

Reply button shown, but not allowed to use it.

[edit]

Have I entered the world of Kafka?

Now I try to reply using the reply button at Wiktionary:Grease_pit#Edit_incorrectly_identified_as_harmful. and I get the message

The "reply" link cannot be used to reply to this comment. To reply, please use the full page editor by clicking "Edit".

Why show it if I can't use it?!?

Sure, this was from last month, but it doesn't seem to be archived per se, given that it appears directly on Wiktionary:Grease_pit.

—DIV (1.145.32.254 13:01, 6 March 2023 (UTC))[reply]

I htink everyone gets that. It's not related to permissions. I dont know what's going on under the hood, but it might be related to comments being shifted around such that in some cases the software cannot keep track of what is a reply to what. On the old MediaWiki there were no threads, and "replies" were nothing more than comments indented with colons. It's possible that if just one person in a thread uses the old-school way of replying to a comment, the entire thread must then be parsed in the old manner, meaning the reply button breaks for everybody. Just a guess. But it definitely has nothing to do with being an IP or any other usergroup permission level. Soap 13:06, 6 March 2023 (UTC)[reply]
Try replying at Wiktionary:Grease_pit/2023/February#Table_width. I've found that the reply buttons work on the monthly pages, but not on the nominally undivided pages. This annoying limitation has been here for as long as I've been aware of the reply buttons. --RichardW57m (talk) 15:34, 6 March 2023 (UTC)[reply]
Thanks for the thoughts on this. I wasn't really a fan of the reply button when it was introduced, because I felt I'd never needed it before.
But, to be fair, it could be a convenient way to ensure consistent indentation. And so I have gradually gotten into a new habit of using them.
Soap: you might well be correct. If that's what happens, then I wish the 'back end' were smart enough to figure out that the relevant reply button(s) should be hidden, or greyed out!
RichardW57m: that's interesting. I can confirm the same behaviour for me: the reply buttons on the "Table width" discussion work fine at Wiktionary:Grease_pit/2023/February#Table_width, but don't work at Wiktionary:Grease_pit#Table_width.
—DIV (1.145.32.254 14:26, 7 March 2023 (UTC))[reply]

Serbo-Croatian accel forms: Cyrillic/Latin spelling

[edit]

I just noticed that the autogenerated Serbo-Croatian accel forms do not show the Latin spelling, when applied on Cyrillic entries. For instance брезо.
Interested parties: @DanielWhernchend, Kamen Ugalj, BabaGlupa, Dijan, Ivan Štambuk, Mladifilozof, Tomispev, Bongo4561, Drugoveda, Sjevtic, Biblbroks
Gorec (talk) 15:55, 6 March 2023 (UTC)[reply]

@Горец: I've created Module:accel/sh to automatically use {{sh-noun form}} for nouns. — Fenakhay (حيطي · مساهماتي) 17:08, 6 March 2023 (UTC)[reply]

{{l}} and {{m}} inside {{ngd}}

[edit]

... seem to be broken atm, as any preceding space gets deleted: {{ngd|A {{m|en|mention}} and a {{l|en|link}}}}A mention and a link. —Al-Muqanna المقنع (talk) 01:40, 7 March 2023 (UTC)[reply]

Fixed. Theknightwho (talk) 03:53, 7 March 2023 (UTC)[reply]

Can't create the page nor find the definition, due to Grease pit. The quotation I found is:

    • 2010, Otis Gospodnetic, Erik Hatcher, Michael McCandless, Lucene in Action
      Field(String name, TokenStream, tokenStream, TermVector, termVector) allows you to preanalyze the field value into a TokenStream. Likewise, such fields aren't stored and are always analyzed and indexed.

Also, the word "artistics" doesn't have any definition but its quotations are found in Citations:artistics. 176.88.80.215 17:13, 7 March 2023 (UTC)[reply]

Done Done Equinox 08:16, 8 March 2023 (UTC)[reply]

Languages per page

[edit]

Is there a way to find out how many pages have 2 languages, 3 languages, 4... etc. ? Thank you. ‑‑Sarri.greek  I 00:13, 8 March 2023 (UTC)[reply]

I don't know if there's an easy way to get l2 counts, but I was curious, so I ran some stats on a recent export. A has the most l2 sections, with 229! JeffDoozan (talk) 01:46, 9 March 2023 (UTC)[reply]
Thank you @JeffDoozan. I don't know how you did it, but it is interesting. (I was thinking of a horizontal toclimit2 toc for the ones with too many l2. But I don't know how to do it. Example te.) ‑‑Sarri.greek  I 02:39, 9 March 2023 (UTC)[reply]
Could someone do something like User:Sarri.greek/toc2-hor? I cannot do a css, and I do not know exactly how such varieties of toc-modules could be done ‑‑Sarri.greek  I 05:20, 9 March 2023 (UTC)[reply]

CSS columns for translations is broken on mobile

[edit]

If we look at the translations on mobile https://en.m.wiktionary.org/wiki/wiki (wiktionary_en_all_maxi_2023-02.zim for permalink), the columns are broken. In the ZIM, this didn't happen for articles like "permalink" but it happens on web now so it might be that it was working fine for outdated articles due to Kiwix's caches. This meant that things were working before, and I see the width was specified in the li elements though that is quite an outdated CSS way to do things. Currently, there is a style @media (min-width: 720px) { .content table { width: auto !important; } } that is overriding the needed inline style width: 100%; on the table. Unchecking the width: auto !important; in DevTools gets the columns to work again. I don't know whether it is the translations table or the global style that needs to be changed. Can someone please fix this? Daniel.z.tg (talk) 03:57, 8 March 2023 (UTC)[reply]

Wow, that looks terrible. I didn't realise. The following sections need to be removed from MediaWiki:Mobile.css:
  • lines 1-3 (.translations, .translations tr, .translations td, .translations td+td+td)
  • lines 9-12 (.translations td)
  • lines 23-25 (.NavContent ul)
@Fytcha, Erutuon? This, that and the other (talk) 00:23, 9 March 2023 (UTC)[reply]
@This, that and the other: Removed those style rules because they don't make sense to me and one of them causes bullets on the list to disappear. I can see that translations look bad in mobile because there is just one column that doesn't take up the full width, but I haven't been able to confirm that the style rules mentioned in the original post cause the problem. — Eru·tuon 05:29, 9 March 2023 (UTC)[reply]
@Erutuon good point, I am not exactly sure where the style rule mentioned by @Daniel.z.tg originates. Perhaps it comes from the MediaWiki built-in style sheets. This, that and the other (talk) 06:00, 9 March 2023 (UTC)[reply]
I have a style containing table.translations { width: 100% !important; } in a personal userscript to fix this locally. Though I don't prefer this solution because !important is a code smell. Daniel.z.tg (talk) 03:01, 10 March 2023 (UTC)[reply]
@Daniel.z.tg: Could you try removing that style rule and see if the problem you spotted still remains? I actually wasn't completely clear on the exact problem in your original post, but the translations look better after the change I made. The yellow background of the translations now extends across the translation box whereas it didn't before. — Eru·tuon 03:22, 10 March 2023 (UTC)[reply]
@Erutuon: I looked at the new mobile site on desktop. It does look visually better after you fixed it. However, there are still no columns, and the new style does not apply to my phone unless I rotate it into landscape orientation.
The thing is that I've applied the userstyle to Kiwix only as I only go on web to edit. Even though I'm on desktop, it's based on the mobile scrape. There I've forced all NavContent to be opened initially so I don't have to click. Without columns, the opened translation boxes make it a pain now to scroll down from the noun section to the verb section.
Even without that, and using a normal phone instead, I feel like we can have two columns for mobile. We can let it autodetect columns for desktops that have the mobile site open. There are just so many translations and it took me a tiring 10 swipes of my thumb to scroll through them. Daniel.z.tg (talk) 03:22, 11 March 2023 (UTC)[reply]
@Daniel.z.tg: I just removed style rules... What style doesn't apply except in landscape? (I mean, what is the visual difference?) The columns aren't there because they were added to MediaWiki:Common.css, which doesn't apply to mobile. It might make more sense to put them in a TemplateStyles page that's added by {{trans-top}} (probably best name for it Template:trans-top/styles.css) because then a single stylesheet can apply to desktop and mobile. — Eru·tuon 07:42, 11 March 2023 (UTC)[reply]
Actually, added column width styles for translations to MediaWiki:Mobile.css because it's good to try to fix it immediately, and I forgot, it wouldn't make sense to put them in Template:trans-top/styles.css because they might (eventually) be used for things besides translations. (The classes are just called multicolumn-list, multicolumn-list-wide and multicolumn-list-narrow, nothing naming-wise to do with translations.) — Eru·tuon 07:50, 11 March 2023 (UTC)[reply]
Is it possible that it's based on screen space rather than whether the user is on mobile or desktop? see my comment below .... even on desktop i see just one column because i have what is nowadays considered a small screen .... but all i need to do is zoom out in the browser to 90% and i see two columns like everyone else. PHP can dynamically reorganize the layouts of websites and i think it might be possible even just in CSS ... so the code we need might not be in Mobile.css . Soap 08:39, 11 March 2023 (UTC)[reply]
@Soap: When Daniel posted above, there was no mobile CSS making translations have columns. Now there is, so there will be columns when the screen is wide enough. — Eru·tuon 20:58, 11 March 2023 (UTC)[reply]
@Erutuon: Thank you for adding that style. It is now working as expected. Daniel.z.tg (talk) 01:44, 12 March 2023 (UTC)[reply]
@Soap: Yes, I was implying screen size by "landscape." Also, just forget about what I said about having to swipe 10 times to scroll. I think the majority of people instead would like to keep the current padding at least according to recent UI trends. Daniel.z.tg (talk) 01:44, 12 March 2023 (UTC)[reply]

Whatever caused this new change also affects small screens on desktop, but I dont want to distract from the issue at hand since I suspect relatively few people besides me are having the problem (see below under {{lv-conj}}) ... i use magnified text and the trend towards very high screen resolutions has passed me by. Still, i bring this up because there might be common code underneath it all and might not be related to mobile.css. Soap 11:16, 10 March 2023 (UTC)[reply]

head2 and Hindi spellings on Urdu entries

[edit]

Can the 2nd Hindi spelling display (fonts) be fixed for زُبان (zubān) / زَبان (zabān)? Anatoli T. (обсудить/вклад) 01:07, 9 March 2023 (UTC)[reply]

@Atitarev This is probably my doing, as I revamped Module:headword last night. The problem is that the second term is showing up as Urdu instead of Hindi. Let me look into it. Benwing2 (talk) 05:42, 9 March 2023 (UTC)[reply]
@Benwing2: Thank you!
In some cases, equivalent Hindi words are automatically displayed. I think when the translit matches 100% and entry exists.
Also @Theknightwho: Please consider "productionising" Urdu transliterations. They work if vocalisations are provided and not broken. Semi-automation is fine. Anatoli T. (обсудить/вклад) 05:48, 9 March 2023 (UTC)[reply]
@Atitarev What do you mean by "equivalent Hindi words are automatically displayed"? Is this correct or incorrect and can you give an example? Benwing2 (talk) 05:50, 9 March 2023 (UTC)[reply]
@Benwing2: I just got this ping from a few days ago.
If you look at کشور (various senses with multiple readings). Try removing |hi= on various headwords and you will see that Hindi is automatically added. It can be correct or incorrect, dependent on the automated Urdu transliteration.
  1. Removing |hi=किशवर on "kiśvar" generates non-existent Hindi किश्वर (kiśvar). The correct Hindi spelling is किशवर (kiśvar)
  2. {{ur-noun|g=m|head=کشور|hi=क्षौर}} transliterates "kaur" (OK for Hindi), the expected one for Urdu is "kśaur"
Also pls check دیش (deś). If you remove |hi=देश, no Hindi is automatically generated.
This functionality is quite interesting and is worth looking into and improving. Also @Theknightwho. Anatoli T. (обсудить/вклад) 21:42, 13 March 2023 (UTC)[reply]
@Atitarev This is not actually my doing; this never worked. The template code in {{ur-noun}} is complex and buggy; needs rewriting in Lua. It uses some strange module Module:ur-convert/sandbox created by User:Wyang but mostly developed by User:Tspielberg, who I haven't heard of before and who codes by trial and error (mostly error). This will take some non-trivial amount of fixing. Benwing2 (talk) 06:14, 9 March 2023 (UTC)[reply]
They've been contributing in Indic languages for quite a while, but they had their account renamed recently. Chuck Entz (talk) 06:44, 9 March 2023 (UTC)[reply]
When it comes to code, their modules are extremely difficult to deal with, and I’ve had to completely rewrite some of them. In part, it’s due to a bizarre formatting issue which I’ve spotted on about 10 modules: e.g. Module:new-Newa-translit. No idea how they managed that, quite frankly. Theknightwho (talk) 09:17, 9 March 2023 (UTC)[reply]
@Benwing2: I see. Thanks. Anatoli T. (обсудить/вклад) 21:49, 13 March 2023 (UTC)[reply]

@Theknightwho, Atitarev, Benwing2 – My apologies for pinging you all, but I've noticed another issue with Urdu lemmas and I was wondering whether it was related to this. It seems that lemmas which has the letter ئ is being replaced with ی. For instance, I created this page چہ جائے کہ – but if it is linked with the ur lang code, it redirects the user to چہ جائے کہ which doesn't exist. نعم البدل (talk) 06:29, 31 March 2023 (UTC)[reply]

Sorry, I've just realised it isn't, but I was hoping if you knew what is causing the issue, anyways? نعم البدل (talk) 06:36, 31 March 2023 (UTC)[reply]
@نعم البدل. Sorry, it was my confusion. Pls try again, I have undone the previous chnage in diff. (need to do the same for Ottoman Turkish).
The links should only be removed for Persian هٔ (link to ه). Anatoli T. (обсудить/вклад) 06:36, 31 March 2023 (UTC)[reply]
Yes, it seems to work now: چہ جائے کہ (cih jā'ye kih). Anatoli T. (обсудить/вклад) 06:38, 31 March 2023 (UTC)[reply]
Ah, that's fixed it, thank you! نعم البدل (talk) 06:39, 31 March 2023 (UTC)[reply]

Page cannot be created. One quotation I found for the past participle:

    • 1986, Raymond Carney, American Vision: The Films of Frank Capra, Page 37
      This is a man who, while claiming to be above the petty commercial compromises of the studio system, previewed and repreviewed each of his films in front of numerous audiences ; recorded their live responses scene by scene ...

And possibly for the verb:

    • 2012, Pariah S. Burke, ePublishing with InDesign CS6
      To remove a custom thumbnail and use the Aquafadas DPS automatically generated thumbnail of the article, click the Delete button and republish or repreview the article or project.

or:

    • 2008, xtine burrough, Michael Mandiberg, Digital Foundations: Intro to Media Design with the Adobe Creative Suite
      If you lift the lid and move the object, you will have to repreview in order to tell the scanner where to locate the selection.

I can't find "a repreview" in Google Books, and any quotation that uses this word as a noun. 176.88.80.215 15:23, 9 March 2023 (UTC)[reply]

Why does {{lv-conj}} (template:lv-conj) only take up partial width?

[edit]

I clicked a few pages for Latvian verbs and found that on all of them, the {{lv-conj}} either has a manual width entered, or uses the default inherited parameter. But that in every case, it compresses itself into the left side of the screen and therefore I need to use the scrollbar to see the forms. Is this by design? If so, why just this template? it may have been designed with mobile users in mind, but presumably if all of our other verb conjugation templates can get by without horitzontal compression, so too can this. I use font magnification to make the screen easier to read, so its likely that most other people arent seeing this problem, and if it's literally just me, I'll let this go because I dont work with Latvian and havent seen this behavior on any other language yet. Soap 08:35, 10 March 2023 (UTC)[reply]

The user who created and edited these templates, @Pereru, is long-inactive, so I've just gone ahead and changed it to force 100% width. Probably not an ideal solution though, the template needs better styling in general and maybe migration to Lua (the nuts and bolts are at {{lv-conj-1}}). —Al-Muqanna المقنع (talk) 10:56, 10 March 2023 (UTC)[reply]

Translation adder: entries ending up in "Category:Terms with redundant transliterations/cmn"

[edit]

Hi, @Benwing2, Erutuon, I notice that entries are ending up in "Category:Terms with redundant transliterations/cmn" (see, for example, absence makes the heart grow fonder). However, removing a pinyin transliteration means it doesn't actually appear at all (it isn't automatically generated by the adder), so it isn't redundant. Is this due to the recent edits to the adder?

Also, @Benwing2, hope the "?" and "?!" gender options for translations can be included in the adder soon! Thanks, guys. — Sgconlaw (talk) 22:21, 10 March 2023 (UTC)[reply]

@Sgconlaw Pinyin is now generated automatically in the majority of cases (including the example you gave). The only times when it isn’t are when it’s ambiguous. Theknightwho (talk) 23:47, 10 March 2023 (UTC)[reply]
Oh! Hmmm. The last time I noticed, the pinyin wasn’t automatically generated. However I can’t recall which entry it was. — Sgconlaw (talk) 04:40, 11 March 2023 (UTC)[reply]
@Sgconlaw Actually, it was probably a bit misleading to say “the majority of cases”, as it isn’t generated in the following cases:
  • If the target has multiple pronunciations (i.e. ambiguous).
  • If the target doesn’t have a pronunciation for that lect.
  • If the target hasn’t been created. However, it will work when you subdivide SOP terms with internal links, as with the second translation on absence makes the heart grow fonder: {{t|cmn|[[相見]][[不如]][[懷念]]}} becomes 相見不如懷念相见不如怀念 (xiāngjiàn bùrú huáiniàn).
We also have transliterations for many of the other lects like Cantonese (with the notable exception of Hakka and Min Nan for now). Theknightwho (talk) 11:51, 11 March 2023 (UTC)[reply]
@Theknightwho: if a transliteration is not in fact redundant, and one is manually added, does the entry still appear in the category mentioned? It shouldn’t, otherwise the category becomes useless. — Sgconlaw (talk) 12:13, 11 March 2023 (UTC)[reply]
It should appear in Category:Terms with manual transliterations different from the automated ones/cmn (or the equivalent for whichever lect it is). Theknightwho (talk) 13:31, 11 March 2023 (UTC)[reply]
@Theknightwho: OK, great. — Sgconlaw (talk) 14:53, 11 March 2023 (UTC)[reply]
@Erutuon How difficult is it to add ?! to the translation adder? I am not familiar with the code but it seems it shouldn't be difficult. Benwing2 (talk) 04:33, 12 March 2023 (UTC)[reply]
@Benwing2: It could be added to the metadata object in MediaWiki:Gadget-TranslationAdder-Data.js, or maybe it could be appended automatically to any gender table that's not empty, and maybe something needs to change in MediaWiki:Gadget-TranslationAdder.js to add a checkbox for it and save its value and make sure it can't be combined with any other "gender". — Eru·tuon 21:29, 12 March 2023 (UTC)[reply]

Lua Memory Usage and Module Count

[edit]

Is there a central location for Lua memory issues? I suspect there is, but I couldn't find one. My question is, "How important is it to keep the memory count low?". To keep the Lua memory usage low, I've been trying to do things in templates rather than the easier way of using modules. Someone reported that modules simply by existing consumed 500 kB apiece (global limit is 50 GB), but I recently took a spot sample that used 22 modules and came out with an average of 242kB. Has Lua memory consumption dropped? --RichardW57 (talk) 09:36, 11 March 2023 (UTC)[reply]

It’s absolutely untrue that they consume 500kB each - using a new module inherently uses 80 bytes (I have tested this). However, using separate modules sometimes necessitates duplication, which is generally when it isn’t a good idea. The global limit is 50MB, by the way - not 50GB. Theknightwho (talk) 11:38, 11 March 2023 (UTC)[reply]
The size reported might then be an indication of the complexity of the 'average' module. Apparently something is saved between #invoke calls, even when the module is loaded by require rather then load, and it seemed to be quite a lot. It was probably @Erutuon or @Chuck Entz who reported the stats. We also have memory problems because of parallel execution, which seems to degrade the garbage collection. My statements are second-hand. --RichardW57 (talk) 13:45, 11 March 2023 (UTC)[reply]
Reducing the number of invokes does seem to help (which explains the success of {{multitrans}}). However, it would be wrong to assume each additional invoke uses ~500kB, as that would mean 100 uses of {{l}} or {{t}} would exhaust the memory allocation (when in reality some pages use them a few hundred times without needing any of the measures used to reduce memory usage).
My experience is that using the same module over and over causes a logarithmic increase in memory usage, where the first invoke is generally memory intensive (e.g. a page with just one link might use 3-5MB), but each subsequent invoke uses increasingly less. This is presumably down to the use of mw.loadData instead of require. Theknightwho (talk) 00:58, 13 March 2023 (UTC)[reply]
@Theknightwho I am convinced that multiple calls to require made by a single module on a single page essentially load the module only once. This is based on the fact that implementing memoization of module loads to ensure that a given module is loaded only once from a given other module seems not to help. I'm not sure if this applies across source modules; if not we could potentially reduce memory usage dramatically by creating a global module memoization structure (if this is possible) so that e.g. something like Module:links is loaded only once per page even across multiple source modules. Modules for which this is done have to be careful not to have global variables storing state that they expect to be different across different invocations. User:Erutuon can you comment on this as I think you understand the innards of the Wikimedia/Lua module implementation a lot better than I do? Benwing2 (talk) 01:27, 13 March 2023 (UTC)[reply]
@Benwing2: There isn't memoization per page. require re-executes the module for every module invocation (#invoke:). However, during the first require of the module, it saves the module return value in package.loaded and returns the saved copy for every subsequent require of the module in that invocation (thus require("Module:links") equals a second require("Module:links")). That way there's no way to modify the module return value and have another module invocation see the modification (phab:T67258). I've thought about it a bit but I don't see a way to memoize modules on a page because functions can pass information between invocations in so many ways, and that messes up wikitext parsing. — Eru·tuon 02:01, 13 March 2023 (UTC)[reply]
@Erutuon Thank you, that explains a lot. (BTW User:Theknightwho this probably interests you too.) Essentially what you're saying is that there's memoization of module requires at the invocation level, even across different source modules, but not across invocations. This explains why {{multitrans}} works so well, as effectively the entire translation table is inside of a single module invocation. It also suggests that doing the same trick more broadly would be highly successful; I have proposed doing this in the past with calls to {{l}}, {{m}}, {{lb}} and certain other calls that appear many times on a given page. I definitely think we should move forward with a solution like this in place of the {{*-lite}} templates, which are hacky as well as difficult both to maintain and to use, and (as Theknightwho points out) risk hitting the 10 sec time limit because template code is so much slower than Lua code. I think you have tried rewriting {{multitrans}} so the page text is passed inside nowiki tags, and it parses the text itself, so you can write {{t}} instead of {{tt}}? How successful was that and can it be productionized or are there just too many edge cases? Even if we have to go the {{tt}} route, that is IMO a lot better than using {{*-lite}} templates, and no more work to implement; instead of changing the templates to their {{*-lite}} version and worrying about whether all the functionality is supported, you just replace with e.g {{l$}}, {{m$}}, {{lb$}} etc., wrap in the appropriate {{multi...}} call and it just works. Thoughts? Benwing2 (talk) 02:28, 13 March 2023 (UTC)[reply]
@Benwing2 @Erutuon I have explored the idea of a Lua Wikitext parser, which could be one way to solve the issue you mention. See my rudimentary efforts at Module:User:Theknightwho/parser. This takes the Wikitext for a whole page, and parses each template manually by converting it into a Lua function (without using preprocess until *after* all invocations have been resolved, as it provides no memory gains). As it stands, it can only cope with a page with simple templates, though, such as T:head. Commented out is the start of a much more ambitious conversion of mwparserfromhell, but that is a very big job, and may prove totally impractical. Theknightwho (talk) 03:20, 13 March 2023 (UTC)[reply]
@Theknightwho Yeah I think implementing all of mwparserfromhell, even if possible, would be impractical and very hard to maintain. It sounds like the {{l$}}, {{m$}}, {{lb$}} idea will be the better one; it's pretty easy to implement and it doesn't even require that every such call is converted to its $-equivalent. Benwing2 (talk) 03:27, 13 March 2023 (UTC)[reply]
@Benwing2 Are there many instances where these multi templates would be the most practical solution (assuming they’re single-purpose)? For example, multiple links are better served by a column template imo. Theknightwho (talk) 03:33, 13 March 2023 (UTC)[reply]
@Theknightwho I am not proposing a set of single-purpose multi templates, but a single multi template that supports multiple types of calls; essentially it should support everything for which we have a {{*-lite}} template currently. Benwing2 (talk) 03:36, 13 March 2023 (UTC)[reply]
@Benwing2 I see what you mean. The main issue that I've encountered with this is that section edit links disappear, which would therefore necessitate having this in every section. That's not the end of the world, but could start getting pretty messy (given how many headers are usually placed on a large page). Theknightwho (talk) 01:37, 14 March 2023 (UTC)[reply]
@Theknightwho Hmm, I see. I would still advocate for a solution of this sort over {{*-lite}} templates and wrap things language-by-language. That would mean loss of edit links for individual subsections within a language, but that doesn't seem like the end of the world. Is there any way to auto-generate the edit links? User:Erutuon do you have any input here? Benwing2 (talk) 06:07, 14 March 2023 (UTC)[reply]
@Benwing2 I’ve chatted with @Erutuon about this, and it seems that they appear if you run preprocess on Wikitext headers, but the section edit links don’t actually work properly. Have a look at Attempt #1 in the top answer to this StackOverflow question. The way section edit links work is that the parser gives each section a consecutive number - unrelated to whether the header is L2, L3 etc. - which allows it to draw up only the relevant section when you click the edit button (have a look at the section edit URLs, and you’ll see what I mean). However, section edit links added by preprocess in Lua have their own counter (I suspect because it runs a separate instance of the parser), so if you have 3 of them they’ll always be numbered 1, 2 and 3 - regardless of where they are on the page. Not only that, but it also depends which frame object created them: by default, it’ll be sections 1, 2 and 3 of the module (which don’t exist). However, you can change these to any arbitrary page - not that it’s much use in most situations - and they’ll take you to the section edit page for section 1, 2 or 3 of that page (whatever those might be).
On top of this, the sections created via preprocess don’t seem to be recognised as separate (e.g. if you have an L2 and then transclude another L2 below it, the first section will include that transcluded L2 and whatever’s below it). That means that it’s effectively impossible to use this feature in a way that doesn’t cause confusing issues for a normal user. (Now I think about it, I bet this is also related to why in-line Reply doesn’t work on the collated community pages, as they rely on transcluding from the monthly pages.)
This is actually the main reason I started working on a general-purpose parser, because the only solution to this I could think of is one where we never mix and match proper sections with Lua-generated ones:
  • Place the contents of a very large page at a subpage (e.g. a at a/raw).
  • Invoke a module on a to manually parse/process everything on a/raw. Because this is a single invocation, it has all the advantages of {{multitrans}} over the entire page, but it’s not simply a case of running preprocess, as that runs the internal parser (i.e. it runs each Lua invocation in its own environment, so no memory gains). In other words, it has to be a manual parse in Lua. I suspect we could get partial gains by doing a mix of both, though, which may be the way forward in the short-term.
  • Use a frame object associated with a/raw to preprocess all the “transcluded” headers. That way, all the section edit links will work correctly, as they’ll point to the relevant section on a/raw. However, this only works if we process the entirety of a/raw, or else the section numbers won’t match up.
Theknightwho (talk) 19:30, 15 March 2023 (UTC)[reply]
The effective limit on the number of modules was on the number of different modules; repetitions were not a problem, except in so far as, for example, that different invocations of {{l}} invoke different transliteration modules, so for example one should prefer to share Devanagari transliterators instead of having one per language. My understanding was that the limits applied across all the Lau invocations on a page. I think the CPU time limit is on Lua execution, and does not include wikicode. (I don't know how wikicode invoked from Lua counts.) --RichardW57 (talk) 09:10, 13 March 2023 (UTC)[reply]

Optionally adding parameters within templates

[edit]

Is there any flexible method for optionally adding parameters to a template call? The only method I can see is to have different template calls for different combinations of parameters, though just possibly one might reduce it to different calls for different numbers of optional parameters via the use of parser function #expr. There might be a trick via a bespoke module, but that seems horrendously complicated and raises other issues. --RichardW57 (talk) 10:52, 11 March 2023 (UTC)[reply]

@RichardW57 Can you clarify what the issue is? You can use e.g. |foo={{{foo|}}} to pass parameters from one template to the next. This will always set the inner foo to some value (the empty string if the outer foo is empty or undefined), but you should never write your code to have a distinction between undefined and empty string. You can test for undefined-or-empty-string using {{#if:{{{foo|}}}|...}}. Benwing2 (talk) 04:29, 12 March 2023 (UTC)[reply]
I get a module error from {{link}} if I pass it an empty string as the fifth positional parameter. I hadn't realised it was a bug in Module:parameters. My work-around is to interpose another template that filters out the fifth and subsequent positional parameters. That filter also solves the issue (to me, anyway) that {{link}} prefers empty |4= to non-empty |t= for the (optional) gloss.
What I wanted to write was, for the named parameter example, {{#if:{{{foo|}}}||foo={{{foo}}} }}, but the second pipe in '||' is interpreted as part of the control structure, and wrapping it in another template call doesn't work - it then gets interpreted as an ordinary character all the way through.
What I'm doing is to provide link and form_of templates that link to both a non-Roman form and its corresponding Roman script main lemma/form, e.g. පාචයති (pācayati). --RichardW57 (talk) 07:14, 12 March 2023 (UTC)[reply]
I've also discovered that I can't use parametrically switch between positional and named parameters! Again, the solution work-around is to call a filter that chops out the unpermitted named parameters. --RichardW57 (talk) 19:58, 12 March 2023 (UTC)[reply]
@RichardW57 I don’t understand why you are trying to pass parameter 5 to {{link}}, because it wouldn’t make any sense to do so. The reason Module:parameters throws an error is because parameter 5 is not used by the link template, not because of a bug. If that were changed, parameter 5 still wouldn’t do anything. This is not new, either - it’s a standard feature of many major templates.
By the way, I strongly urge you not to create a custom link template, as we are trying to migrate away from that. Surely what you want could be achieved by enabling linked transliterations? Theknightwho (talk) 00:30, 13 March 2023 (UTC)[reply]
@RichardW57 Agree 100% with User:Theknightwho. Benwing2 (talk) 00:56, 13 March 2023 (UTC)[reply]
@RichardW57 Fix ping. Benwing2 (talk) 00:56, 13 March 2023 (UTC)[reply]
I've already told you why simply linking all transliteration of Pali wouldn't work. Quite simply, a transliteration isn't always good Roman script Pali. @Benwing2 said that one should not distinguish a parameter set to the empty string and its absence, but this is exactly what Module:parameters does. I understand why - excess parameters usually indicate some sort of error in the ultimate caller. The problem here is that I use a general purpose template {{pi-ml}} to separate arguments of the link templates that belong on the central, Roman script lemma and those that belong on the non-Roman script inflectional, possibly derived, base. As a generalisation of {{link}} and {{mention}}, it was working quite a while ago. It also supported generalisations of five form-of templates. Getting it to support {{pi-nr-inflection of}}, which was working fine until you arrogantly crippled it, was what took the hard work. I ended up with it passing an extra 10 arbitrarily named parameters and 28 (a bit of overkill on the number of inflection tags) positional parameters. The problem is that I couldn't trim these numbers back for specific cases in {{pi-ml}}. --RichardW57 (talk) 08:57, 13 March 2023 (UTC)[reply]
@RichardW57 Quite honestly, I am finding it difficult to discern a coherent point here. Why does any of what you've just said necessitate passing an additional parameter to {{link}}? It would do absolutely nothing under all circumstances. What have I so arrogantly crippled, in your view? Module:parameters has been in place for a lot longer than I've been editing Wiktionary.
As for transliterations not always being good Roman script Pali - that's all very well, but that raises the question of why you want to link transliterations on an ad hoc basis. It's just confusing and unintuitive, and suggests you need to come up with a more systematic method of distinguishing transliterations from links to Roman script entries (even if these are sometimes the same). Pali is far from the only language to have this issue. Theknightwho (talk) 01:35, 14 March 2023 (UTC)[reply]
@RichardW57 In general I agree that you shouldn't have custom link templates. Can you give some examples of where Pali transliterations and Pali Roman script diverge? Also looking at {{pi-ml}}, this stuff REALLY should be done directly in Lua. It would make your life 100x easier, honestly. Benwing2 (talk) 04:48, 14 March 2023 (UTC)[reply]
There are examples at Template:pi-link#Examples and further examples at User:RichardW57/WIP#Template:pi-nr-inflection_of; look for '⇨'. The reason for not doing it in Lua is that there appeared to be a per-module overhead, and therefore using extra modules should be avoided where possible. --RichardW57 (talk) 08:02, 14 March 2023 (UTC)[reply]
@Benwing2 I think this is exactly the kind of situation that the multiple form syntax is good for, though it would necessitate answering the question of how to control the formatting of transliterations when there are multiple forms involved: here, we'd probably want ພຸດໂທ (buddo)buddho, which contrasts with Chinese where we have 詞典词典 (cídiǎn). Theknightwho (talk) 10:04, 14 March 2023 (UTC)[reply]
Where is 'multiple form syntax' documented, and what is it? --RichardW57m (talk) 13:36, 14 March 2023 (UTC)[reply]
You disabled displaying links as transliterations. So far as I am aware, it's been restored. --RichardW57 (talk) 08:13, 14 March 2023 (UTC)[reply]
It wasn't my intention to pass an extra parameter to {{link}}; it's just that I have a general function that invokes link and form-of templates. I've now solved the problem by interposing a filtering template. Unfortunately, it increases inter-template coupling. For example, {{pi-link}} and {{pi-link core}} need to be sufficiently consistent. --RichardW57 (talk) 08:13, 14 March 2023 (UTC)[reply]
@RichardW57 On the topic of modules, your current solution is a lot less efficient in terms of Lua memory usage than doing things entirely in Lua, because it involves multiple invocations for a single template call. If you are concerned about memory usage, then what you are doing is completely counterproductive.
What really concerns me here is that you don't seem to understand that building a bespoke template ecosystem makes things a lot more difficult for you and everyone else. With your link template, it wouldn't be very difficult to integrate the features you want into the standard template (as they don't seem to be all that complex). On the other hand, your bespoke template still lacks three of the standard parameters, and leaves you having to constantly make these kinds of adjustments every time the main modules change.
Could you please provide me with a simple explanation of when transliterations are linked, and when they aren't? How is this controlled by the user, and is there any systematisation to it? Theknightwho (talk) 09:50, 14 March 2023 (UTC)[reply]
No, I'm not sure I can, but I'll try to give you an answer. Each case seems to need separate consideration. The general principle is to link to the main Roman script form when the reader may be interested in the meaning of the word, and we are likely to supply it. Mere transliterations are unlikely to meet CFI. For Pali, it so happens that the main lemma and the transliteration are usually the same. For Sanskrit, one would want to link to the Devanagari script form of the word; the push to add non-Devanagari Sanskrit seems to have subsided; non-Devanagari non-Roman Sanskrit seems to be hard to come by.
  1. For non-Roman Pali, the definition usually starts with {{pi-sc}}, which normally links to the equivalent standard Roman script Pali. (I am not aware of a standard policy for misspellings - they might link to an equivalent in the standard Roman orthography.) By default, it links to the transliteration, but this has to be overridden when it isn't the standard form.
  2. The transliteration in headwords should never be linked. At @Benwing2's request, it is suppressed (by a headword template) when it is the equivalent standard Roman script form; it will then be given in the {{pi-sc}} form.
  3. In form-of contexts and similar, the standard Roman script form should be linked to. This helps reduce the number of pages the user must traverse to find the information he is looking for. Remember that the full set of senses of a word should, with notable exceptions like Serbo-Croat, should only be stored on one page.
  4. For inflections that are treated as derivative lemmas, e.g. past participles and causatives and passives, there should be a link to the Roman script form. This has only recently been implemented, by the deployment of {{pi-link}}.
  5. For (other) inflections in tables, there is an option (|showtr=) to suppress the transliteration (not needed since |subst= was implemented), to leave it untransliterated (the default), or to link it. There is no facility to link to an alternative Roman script target. One is generally unlikely to find a use for a link, though I suppose one might find a use for case forms of demonstratives and the relative pronoun. The motivation to add the switch was to have the ability to disable transliteration for difficult words, of which the king is probably ᨾᩉᩣᨻ᩠ᨵᩮᩥᩣ (mahābodhi). (The simplex can be just as bad, but I haven't found a usage of the cruciform spelling of the simplex in a durable source yet.)
  6. For mentions, it would depend upon the purpose of the mention. If the meaning of the word was irrelevant, I probably wouldn't link to the Roman script form. Non-Roman Pali ought to be rare in etymology, though I suppose there is a case for dual-linking if one can justify linking to a non-Roman script as the source of a borrowing. --RichardW57m (talk) 16:42, 14 March 2023 (UTC)[reply]
@RichardW57 Reading through this, the impression I get is that you should be finding some other way of linking to the Roman script Pali version than using the transliterate parameter (because - as you yourself already know - the Roman script version isn’t necessarily a transliteration). What you have effectively done is expand the scope of the tr= in a way that means it is used for multiple things (true transliterations and links to the Roman script form), and as such it’s very difficult to properly cater for this, because they’re two different things that work in different ways. I can think of other instances of this (e.g. Japanese does it too), and it’s similarly problematic. I strongly suggest we find a way to do this that separates them (from a technical standpoint). How they’re then displayed is something that doesn’t necessarily need to change. Theknightwho (talk) 19:56, 15 March 2023 (UTC)[reply]
@Theknightwho: Does the extra memory usage count against the Lua quota?
Which three standard parameters have I omitted? Did you test before or after 16.00 UTC on Sunday? I will admit that the handling of what is sometimes called |alt= (and is usually a damned nuisance and trap) needs some thought - I'm not really convinced it has a good use for non_Roman script Pali with dual linking. RichardW57 (talk) 19:56, 14 March 2023 (UTC)[reply]
I don’t understand your question about extra memory usage. Using multiple invocations uses up more of the 50MB Lua memory limit. As for the parameters, I went from what the documentation says isn’t supported. Theknightwho (talk) 19:58, 15 March 2023 (UTC)[reply]
@Theknightwho: But if the limit is on memory used for Lua, why should what the wikicode parser does be counted towards that limit? (I suppose some or all of the interpretation of #invoke might reasonably be counted.) Can the parsing of a page be interrupted because of a Lua error, such as quota exhaustion, even if it uses no Lua?--RichardW57 (talk) 21:16, 15 March 2023 (UTC)[reply]
The documentation hasn't been updated since before the weekend's edits. With a few exceptions, which I will list, I believe all the parameters of the language-independent base templates are interpreted under some name; if not handled explicitly they will be passed on to the language-independent base template. I had not previously worked out to handle seemingly arbitrary differences between the various templates' parameters. ({{pi-ml}} started as a way of using the same code for {{pi-link}} and {{pi-mention}}, I then realised if could also support most uses of most form-of templates, and finally generalised it to create a dual-linking extension of {{inflection of}}.) Poor synonyms are mostly not handled, and for list type parameters there is a limit to how many are passed on. For example, there seems no point in allowing four genders when Pali only has three. These limits are straightforward to relax. The language parameter is not supported - the templates would need to be renamed anyway if they were generalised to cover other languages. --RichardW57 (talk) 21:16, 15 March 2023 (UTC)[reply]
Additionally, for the core parameters, I don't think I should advertise them where the apportionment between non-Roman and standard Roman form still remains to be thought through and implemented. Changing parameters semantics can cause trouble. (I have noticed that I should need an |eqvid= for sense/etymology IDs on the Roman script form - the IDs seem to be underused.) Also, I'm not sure of the correctness (or even validity) of the decision (or private diktat?) that the earliest publications of Pali literature in Roman script do not consist of words that meet CFI. I hope that they are admissible as evidence for words when entered in modern orthography (a type of IAST). --RichardW57 (talk) 21:16, 15 March 2023 (UTC)[reply]
@RichardW57 You invoke Lua via other templates, such as {{xlit}}.
You say that all of the functionality of the language-independent templates is interpreted under "some name", but the main issue here is that you are building a separate template ecosystem that only works for Pali, which frequently breaks whenever changes to the main modules are made. This model is unsustainable, and I don't know how to make it any clearer to you that things would be a lot easier if the same functionality was achieved via the conventional templates. Theknightwho (talk) 21:31, 15 March 2023 (UTC)[reply]
The modules invoked by {{xlit}} would be implemented by method transliterate from Module:languages, which will usually be invoked by {{pi-sc}} from the same page and would also be invoked by a Lua implementation of the dual linking functionality. As the 'conventional templates', which this ecosystem invokes by fundamental design, also invoke Lua, there is nothing new in your observation. What we have is your apparently unfounded and implausible assertion that complicated Wikicode consumes Lua quotas. --RichardW57 (talk) 22:38, 15 March 2023 (UTC)[reply]
What actually broke in this ecosystem was the design-wise isolated template {{pi-nr-inflection of}}, which did however serve as a prototype for the more ambitious {{pi-ml}}. The revision of {{pi-ml}} to form a basis for {{pi-nr-inflection of}} did hit inflexibilities in wikicode, but they have been sidestepped. --RichardW57 (talk) 22:38, 15 March 2023 (UTC)[reply]
@Benwing2, Theknightwho: Is this meant to be a productive invitation to add this functionality to the language-independent modules and templates? Where should the changes be outlined? Module talk:links? Should the design address multi-script languages whose primary Wiktionary script is Latin, or should it also address multi-script languages whose primary Wiktionary script is another script? Serbo-Croatian might be an interesting case, accessing a detailed Latin or Cyrillic term to explain a Glagolitic term, as might Old Khmer where, last time I looked, the primary term might be in either script. --RichardW57 (talk) 22:38, 15 March 2023 (UTC)[reply]
@RichardW57 No, Richard, it is not "unfounded and implausible", because separate invocations of Lua from the same page inherently use more memory than one invocation, even if the overall result is the same. This is because separate invocations are each given their own environment, which inherently involves duplicating things in memory. It is designed this way so that they can't interfere with each other, and in fact, this is one of the most important design constraints that Scribunto has, and is precisely why {{multitrans}} is able to achieve large reductions in memory usage.
If you can't accept that, then there is nothing I can do to help you, I'm afraid, but I will also make zero allowances for your bespoke system in any further adjustments to the main modules, as you don't seem willing to understand the problems with your approach. You'll have to figure any issues out yourself. On the other hand, if you can bring yourself to properly consider the advice some of us are giving you, then maybe we can actually get somewhere. Theknightwho (talk) 22:48, 15 March 2023 (UTC)[reply]
So, do you have evidence that the memory for these environments is not released when the invocation has finished execution? One problem that has been reported is that there is parallel execution of sections of pages, which leads me to believe that without a strict enough limit on the number of parallel processes, the page a will permanently cease to work once we have enough languages. It also explains why the memory consumption statistics vary from run to run. There also seems to be a major failing in Lua garbage collection in Scribunto when parallel processing is at work.
If the number of executions of #invoke is indeed a limiting factor that will outweigh the number of modules - and pure data modules, those that one can load with load(), are highly likely to irrecoverably consume memory as loaded, then you do have a good point. (Yuck, that also argues against data modules for sharing data on irregular inflection.) I can then luaise {{{{{ml}}}|pi|{{{1}}}|ts=|tr=-|sc=|lit=|t=||||||||||||||||||||||||Z1=|Z2=|Z3=|Z4=|Z5=|Z6=|Z7=|Z8=|Z9=|Z10=}} with a light heart. Unfortunately, I'm having a problem thinking up tests to evaluate these matters, because deeply nested wikicode control structures reportedly themselves run out of memory regardless of Lua - and '8 can be deep'.
I have considered the advice I have been given. --RichardW57 (talk) 00:29, 16 March 2023 (UTC)[reply]
@RichardW57 As an experiment, try converting all of the translations at water/translations to {{t}} or {{t+}} and remove {{multitrans}}, and see how far down the page it runs out of memory, despite nominally doing exactly the same work. When I tried it just now, it was only able to manage 1,865 of the 3,787 translations listed. {{multitrans}} ensures that we can do them all in one invocation, and it manages the whole thing with just under 10MB to spare. Theknightwho (talk) 01:27, 16 March 2023 (UTC)[reply]
@Theknightwho: Thanks for that precise data point. It is, alas, susceptible of conflicting explanations. --RichardW57 (talk) 07:14, 16 March 2023 (UTC)[reply]
@RichardW57 Are you willing to elaborate? Theknightwho (talk) 17:59, 16 March 2023 (UTC)[reply]
I can think of at least three explanations:
  1. #invoke causing memory leaks.
  2. Too many distinct modules - in one incarnation, back in 2018, changing all the languages to German made the problem go away.
  3. Too many parallel processes, causing wasteful duplication of modules and very poor garbage collection.
And of course, combinations of these possibilities. --RichardW57 (talk) 19:29, 16 March 2023 (UTC)[reply]

After checking a number of these, I have yet to find any that have an actual headword parameter in the entry. As far as I can tell, whatever headword parameter there is gets added at the template (see {{Template:tr-verb}}) or at the module level. Chuck Entz (talk) 02:24, 12 March 2023 (UTC)[reply]

These category links have appeared for many languages and are apparently the result of this change a few days ago by @Benwing2. The usual cause seems to be that, for many languages, the headword templates explicitly supply the page name as a default. Generally people have been fixing the red links by simply creating the category rather than changing the templates, though in these cases the new category is fairly useless (e.g. Category:German terms with redundant head parameter, created yesterday, with its current ~30k entries). —Al-Muqanna المقنع (talk) 02:42, 12 March 2023 (UTC)[reply]
@Chuck Entz, Al-Muqanna Formerly this category was active only for English but I activated it for all languages. My apologies, there is a flag in the data passed to full_headword() that disables adding entries to these categories, which I added to the module code for various languages, but it's not yet added to the Turkish or German code. It's there e.g. in the French module, so that Category:French terms with redundant head parameter has only 431 entries, which don't appear to be false positives. Once it's added to German and Turkish, the entries in the categories will dramatically drop. All the red-linked categories will get created tomorrow evening when Special:WantedCategories gets refreshed (which occurs around 5:18am UTC time every 3rd day). These categories are hidden so they won't appear once created. Benwing2 (talk) 03:05, 12 March 2023 (UTC)[reply]
BTW once the remaining categories are created I can see which ones are the biggest offenders (so to speak) and fix the corresponding modules. Benwing2 (talk) 03:06, 12 March 2023 (UTC)[reply]
@Benwing2: For fun I created a query to find which languages have the most pages affected. Not sure it finds the ones whose templates need to change, but maybe it's useful. — Eru·tuon 22:20, 12 March 2023 (UTC)[reply]
@Erutuon Thanks! This is super helpful, actually. Benwing2 (talk) 00:01, 13 March 2023 (UTC)[reply]

Template category not updating

[edit]

I've run into some problems with categories whilst adding documentation to {{Template:lt-grammar tag}}. As part of this I tried changing the name Category:Lithuanian additional templates to Category:Lithuanian supplementary templates to be consistent with the way it's named for other languages, for instance Hungarian, Macedonian, Icelandic, Scots, etc. I checked first that it wouldn't break any links and modified the name. I followed the instructions at Help:Documenting templates and modules#Categories and interwiki links about using the transclusion tags, but when the new category didn't appear I decided that maybe I shouldn't actually be messing around with this since a) it's doing no harm, b) there might be unintended side effects and c) someone might even have named it that for a good reason. So I've changed the wikitext to use the old category but that doesn't seem to have fixed the issue, the template is still in the nonexistant Category:Lithuanian supplementary templates. I thought perhaps it could be a server-side thing, I've tried null editing and leaving it for a few days in case there's a delay, but the correct category still shows up empty. Since this edit also created the page {{Template:lt-grammar tag/documentation}}, I can't just undo it either. This brings me to three questions:

  • Is there a policy on category naming / renaming?
  • What step am I missing here to get the page to appear in the correct category? I had a look at Help:Category#How to create a category, but none of it seems to apply here since the category already exists and I didn't change anything on the category end anyway.
  • Are the instructions regarding where to put the category label out of date?

Any tips would be much appreciated. (edited to add 3rd question) Helrasincke (talk) 03:26, 12 March 2023 (UTC)[reply]

  • I haven't looked at the instructions lately, but the correct place is inside "<includeonly></includeonly>" in the documentation subpage. The "<noinclude>{{documentation}}</noinclude>" in the template page itself will transclude it. Here's the tricky part: the category will display just fine on the template page, but the category won't know about it until you edit the template page itself, which updates the system's category links for that page. This is peculiar to templates (perhaps modules, too, but I haven't tested it). Category changes transcluded to entries are propagated to those pages automatically, but category changes transcluded to templates aren't. I noticed that you had changed the category in the documentation subpage back to the old one, so I changed it again to the new one and then did a null edit on the template page. The template is now showing in the new category. Chuck Entz (talk) 04:12, 12 March 2023 (UTC)[reply]
    Ok, thank you for clearing that up - I might add a note about that on those two help pages I linked to. P.s sorry if my wording wasn't clear, but Category:Lithuanian additional templates was actually the old one, so now you've changed it back to the new (no longer needed) one. So I'll change it back and then try the null edit on the template page. Helrasincke (talk) 04:43, 12 March 2023 (UTC)[reply]

Global RfC filled to enable global abuse filters on large Wikimedia projects by default

[edit]

Hello!

On Meta-Wiki, a set of global abuse filters is maintained by Meta-Wiki's administrators and the stewards. Global abuse filters are a powerful tool designed to fight against long-term abusers that operate cross-wiki. It is especially useful (and often irreplaceable by other means) when a cross-wiki LTA starts to rapidly change IP addresses (when that happens, regular blocks are significantly limited due to the IP hopping).

As of today, all small/medium Wikimedia projects (as-determined by number of articles) are automatically subscribed to global abuse filters. They are not, however, enabled on several Wikimedia projects classified as large (except several large Wikimedia projects who opted-in, such as Wikidata). This makes it possible for global long-term abusers to vandalize a project with no global filters enabled, which makes it significantly more difficult for the Stewards to fight against the abuse.

By this message, I'd like to let you know I submitted a global RfC (request for comments), where I propose enabling global abuse filters on large Wikimedia projects as an opt-out feature. This change will make global abuse filters an even more effective tool for combating long-term abuse at the global level. Please feel free to participate in the discussion, which happens at Meta-Wiki.

Thank you for your time.

Sincerely,
--Martin Urbanec (talk) 17:15, 12 March 2023 (UTC)[reply]

Extend actual parameter lists

[edit]

Following on from #Optionally adding parameters within templates, I have another problem that this time I may be able to solve using Lua. I'm wondering if we already have code that will do the job.

The background is that for Pali I want to be able to manage parameter lists for the inflection templates. Inflection is defined by modifications to regular patterns. I want to generate inflected forms for each stem in each writing system (we have over a dozen). Mostly this can be done by propagated the same set of modifications (though I need to assemble the tooling to convert a set to each writing system), and ideally this would be done by a macro to extend the parameter list of the inflection parameters. The same macro would be applied, along the lines of say

{{pi-decl-noun|g=m|{{pi-decl-noun-bahu-m}}|aa=both}}

for the Lana script inflection of the masculine of the adjective bahu (many).

Unfortunately, we don't have such a macro capability. However, we could build such a facility by instead having

{{pi-decl-noun|g=m|load=pi-decl-noun-bahu-m|aa=both}}

where the template pi-decl-noun-bahu-m would contain calls of say template addparam whose parameters were the same as those of {{pi-decl-noun}}, say

{{addparam|genp_mod=before |genp=bahunnaṃ |datp_mod=before |datp=bahunnaṃ}}

This says that there is a genitive/dative plural bahunnaṃ that is commoner than the regular form.

The template addparam would convert its parameter list to an easily parsable string and for each load parameter the Lua function would load the expanded template as a string and parse it to extend the argument list.

@Benwing2, Theknightwho: Is something like this already implemented here? Have I overlooked a simpler method? --RichardW57 (talk) 21:00, 12 March 2023 (UTC)[reply]

@RichardW57 I think you’re overcomplicating things here. This seems like something that you are much better off doing in a single module, instead of trying to use a mix of templates involving multiple calls into Lua: the step where you say that addparam would convert its parameter list into an easily parsable string seems pointless, as you may as well just parse the original input directly in Lua. Theknightwho (talk) 00:47, 13 March 2023 (UTC)[reply]

@RichardW57 Apologies, I tried and failed to follow the gist of what you're saying. Can you give some more specifics/examples of what the end result you're looking for is? I think you're proposing Lua code to modify a template call according to specific directives; I don't think that exists currently, and it isn't how I'd go about implementing an inflection system for any language. The way I've implemented things e.g. for Italian is to use a specification consisting of the main principal parts along with additional specifications to override the stem or individual forms as needed. For example, scrivere (to write) uses this:
{{it-conj|a\ì,scrìssi,scrìtto}}
where a indicates that the auxiliary is avere, the backslash indicates that the verb has stress on the root instead of the ending, and following are principal parts for the present (abbreviated as just ì because it's derivable from the infinitive with stress on the specified vowel), past historic (scrìssi) and past participle (scrìtto). A slightly more complex example is aprire (to open), like this:
{{it-conj|a/à,+:apèrsi,apèrto}}
which differs from the previous in specifying two past historic principal parts, + (which stands for the default, in this case aprìi) and apèrsi.
A more complex example is morire (to die):
{{it-conj|e/muòio^muòre,+,mòrto.fut:+:morrò}}
where two present tense principal parts need to be given (1st singular muòio and 3rd singular muòre, separated by ^), the past historic is regular (indicated by +), the past participle is mòrto, and the future stem is optionally irregular and needs to be specified (+:morrò means either regular + = morirò or irregular morrò).
A significantly more complex example is fare (to do), which is notably irregular:
{{it-conj|a/-,féci,fàtto.
  stem:fàce.
  presrow:fàccio,fài,fà*,facciàmo,fàte,fànno.
  sub:fàccia.
  imp:fài:fà'
}}
Here, the present tense is so irregular that I simply list all six forms (1s, 2s, 3s, 1p, 2p, 3p) after presrow:; in addition, there is an irregular subjunctive stem fàccia, two possible imperatives fài or fà', both irregular, and all the remaining forms are as if the infinitive were spelled fàcere, hence the stem: spec.
This sort of syntax needs to be handled in Lua but similar slightly clunkier syntax could be designed that uses separate parameters for all the specs and is implementable mostly or completely in template code. A similar approach should be possible in Pali regardless of the specifics of the inflectional system; basically, you design things so that everything predictable is handled by default and parameters only specify what's unpredictable. I would avoid trying to do things like "insert value X before param Y" and instead use a symbol to represent the default value explicitly, as I've done here with +. Benwing2 (talk) 00:53, 13 March 2023 (UTC)[reply]
BTW User:Theknightwho's point is similar to mine, and I think at a certain point you will have to go entirely into Lua. Benwing2 (talk) 00:55, 13 March 2023 (UTC)[reply]
I also follow the same principles, though Pali is closer to the PIE principle of umpteen principle parts. See for example the invocation of pi-conj-special at karoti (to do). (I could perhaps allow aorist and future stems to be formally derived from the present stems, though strictly they're supposed to be formed directly from the root.) However, variations in citation form evidence of 14 different writing systems, of which half already have entries for this word on Wiktionary. There are also alleged forms of the verb karoti which might be grammarians' fantasies. Consider the maintenance effort when one of these is verified. (Ultimately at least 14 tables to update.) That effort is what I am trying to reduce. (There is a notoriously long list of alleged passives, but the list of passives is currently handled independently, like the list of past participles, which I'm growing as I add quotations for them.)
Reverting to the example of bahu, the regular genitive/dative plural exists. The interface already implemented allows a choice of whether the irregular forms are listed before or after the regular form(s), or replace the regular form(s) entirely. By default, the irregular forms follow the regular forms. I try to arrange the forms by frequency, but the interface isn't flexible enough to always do this without formally replacing the regular forms and re-adding them as irregular forms, which I regard as asking for trouble. --RichardW57m (talk) 12:38, 13 March 2023 (UTC)[reply]
@Benwing2: So are you suggesting a mechanism for easily handling irregular inflection across 10 scripts? (The Thai, Lao, Tai Tham and Burmese scripts have several different writing systems.) Your advice may assume that there is a mechanism for handling the various writing systems. For Pali, what is implemented is currently implemented by script recognition and additional parameters in the invocations requesting an inflection table, but irregular forms have to be given manually in the invocations. For some stems, this labour-intensive mechanism may have to remain - transliteration of complete words to the target writing system may be unreliable. I do not see any suggested mechanism. --RichardW57m (talk) 16:27, 13 March 2023 (UTC)[reply]
Thinking of how one would generalise the Italian conjugation system to multiple scripts, I've hit upon a solution to the input problem which is quite simple. In the design above, the suggested template {{pi-decl-noun-bahu-m}} can be as simple as
genp_mod=before |genp=bahunnaṃ |datp_mod=before |datp=bahunnaṃ
The naked pipes get treated as ordinary characters. Then, in the revised invocation
{{pi-decl-noun|g=m|load={{pi-decl-noun-bahu-m}}|aa=both}}
the contents of the inner template will be presented as a single string. I still have to code the parsing of the string passed in as |load=, but after expansion by the parser it should be a fairly simple structure to parse, probably no worse than the Italian conjugation strings. --RichardW57 (talk) 07:59, 16 March 2023 (UTC)[reply]
@Theknightwho: Back in 2018, I originally proposed using Lua databases to implement irregular declension, but @AryamanA rejected the idea on the basis that while ordinary users can edit template calls in wikitext, they would be too daunted to edit a Lua module. In this case, someone who dare edit a Pali inflection template call for irregularities should not be put off by editing what looks like a template invocation with the same semantics and syntax for the arguments. (Alas, I have no evidence that anyone but me is happy to record irregular declension for Pali.) One could define a database syntax for supplements as being a table indexed by parameter name, but there'd be several syntax differences. In my example one would have a data module consisting of:
return {genp_mod="before", genp="bahunnaṃ", datp_mod="before", datp="bahunnaṃ"}
The difference is a human interface question. My take is that AryamanA's original objection would also apply to using Lua data modules. --RichardW57m (talk) 11:25, 13 March 2023 (UTC)[reply]
@RichardW57m The principle of manually specifying irregularities and automating what's predictable remains even when you have 14 writing systems, it just gets more complicated. You'd need to be able to override a given principle part/stem for a given writing system, and auto-generate everything not explicitly given. I don't agree that it's too daunting to put inflection info in Lua; this happens currently e.g. with highly irregular Latin nouns/verbs/adjectives, and I also implemented most irregular Italian verbs that way, so that e.g. the 200 or so collocation entries involving the verb fare (to do, to make) can be conjugated without having to duplicate the entire fare specification in each one. Spanish and Portuguese irregular verbs are also implemented this way. Template syntax isn't so different in many ways from Lua syntax, and editing templates can also be daunting for people not familiar with them. Benwing2 (talk) 22:04, 13 March 2023 (UTC)[reply]
I completely agree with @Benwing2. Wikitext quickly becomes a lot more complicated than Lua when you try to scale things up, too. Theknightwho (talk) 10:09, 14 March 2023 (UTC)[reply]
Are you forgetting that #invoke is banned from the main namespace? Even simple invocations have to be hidden behind a template. --RichardW57m (talk) 13:28, 14 March 2023 (UTC)[reply]
Complex wikitext in templates is a lot harder to maintain, which is one of the reasons you have been running into issues. Theknightwho (talk) 17:39, 14 March 2023 (UTC)[reply]

Tagalog categories / watchlist

[edit]

When I open Tagalog categories containing Baybayin, I see only squares even I have the font installed. https://en.wiktionary.org/wiki/Category:Tagalog_terms_in_Baybayin_script

When you do some inspect element per link, you can see that the class in <span class="None" lang="tl"> is None. I wonder if you can change it to "Tglg" so that the css class would make it seen as a Baybayin script.

I'm not sure how to do this but for Arabic categories (i.e.: https://en.wiktionary.org/wiki/Category:Arabic_lemmas) When you do an inspect element, the links are having this html <span class="Arab" lang="ar">

and for Greek (https://en.wiktionary.org/wiki/Category:Ancient_Greek_lemmas) the links have <span class="polytonic" lang="grc">

Maybe you can do something here so that <span class="Tglg" lang="tl"> would appear.

Although I did a bit of testing, maybe it would affect the links with Latin script as well, so maybe only if it detected it as a Baybayin script?

Thanks! Ysrael214 (talk) 01:19, 13 March 2023 (UTC)[reply]

Hiding Sandboxes from CAT:E

[edit]

Is there any way we can send module errors for all sandboxes to CAT:Pages with module errors/hidden? The whole idea of a sandbox is to be able to experiment without having to mess up the regular part of the site. Over the years we've had template and module sandboxes repeatedly showing up in CAT:E, and frequently staying there for weeks or even months. Since module errors are emergencies, we should do everything we can to keep CAT:E clear of non-emergency clutter.

Ideally, we would want to exclude anything with "/sandbox/" anywhere in the title outside of mainspace and mainspace talk pages. I realize, though, that I may be asking too much from a system that uses straight MediaWiki template code, ParserFunctions, Magic Words, etc. I see that we already exclude a couple of cases, but I would definitely want to exclude any module error in a page called "sandbox" or in its documentation subpage (except, of course, for [[sandbox]], and [[Talk:sandbox]]).

If we can't do that, perhaps we should consider limiting sandboxes to places we can exclude from CAT:E, such as subpages, subsubpages, etc. of Template:Sandbox, Module:Sandbox, and Wiktionary:Sandbox.

Pinging @Erutuon, who's already done some work on this. Chuck Entz (talk) 01:43, 13 March 2023 (UTC)[reply]

@Chuck Entz I would advocate not using sandboxes named .../sandbox at all. I never use such things. Instead I create a sandbox copy of the module in my module userspace, e.g. Module:User:Benwing2/links. This way, all the junk stays contained within a given user's module space, there's no question who created the module, and there's no possibility of two users overwriting each other's module work. Benwing2 (talk) 03:22, 13 March 2023 (UTC)[reply]

Syllabification/hyphenation for Japanese Romaji words

[edit]

Hello, Can Japanese (Romaji) words written using syllabification/hyphenation although using Kanji, Hiragana or Katakana? Yuliadhi (talk) 02:02, 13 March 2023 (UTC)[reply]

@Yuliadhi Can you give some examples of what you want to do? Benwing2 (talk) 02:00, 15 March 2023 (UTC)[reply]
@Benwing2 For example:
Kanji: 重工業
Hiragana: じゅうこうぎょう
Romaji (Hepburn): jūkōgyō
Hyphenation/syllabification: jū‧kō‧gyō
Meaning: heavy industries Yuliadhi (talk) 05:39, 15 March 2023 (UTC)[reply]
@Yuliadhi Thanks. Maybe User:Fish bowl or User:Huhu9001 can comment on this as I'm not super familiar with the Japanese templates and modules, but in general Japanese syllabification is pretty straightforward so something like this should not be hard to implement if desired. Benwing2 (talk) 06:10, 15 March 2023 (UTC)[reply]
@Yuliadhi: I still don't quite understand what you mean. Are you suggesting adding syllabifications to all romanji, which would effectively means a change of the current Japanese transliteration rules? (Wiktionary:Japanese transliteration) -- Huhu9001 (talk) 08:31, 15 March 2023 (UTC)[reply]
@Yuliadhi, Huhu9001 I am guessing they want syllabification added as another line under Pronunciation. Changing the actual translit wouldn't make sense IMO. Benwing2 (talk) 21:50, 15 March 2023 (UTC)[reply]
and you definitely want this instead of morae? —Fish bowl (talk) 18:49, 16 March 2023 (UTC)[reply]
@Fish bowl Yes, I want hyphenation/syllabification for Romaji (Hepburn) instead of morae e.g. 戦争 (sensō, war) and サッカー (sakkā, soccer/football) Yuliadhi (talk) 10:30, 19 March 2023 (UTC)[reply]

Linking to pages containing "+"

[edit]

Hi, I've noticed a recent bug in {{l}} (and its related templates). When there is a link to a page that contains the plus symbol, the template ignores it: +ve, ++ungood, LGBT+. Einstein2 (talk) 13:10, 13 March 2023 (UTC)[reply]

@Theknightwho Can you take a look? Benwing2 (talk) 22:05, 13 March 2023 (UTC)[reply]
The entity code works FWIW, I used it as a hack at Wiktionary:Information desk/2023/March#How do I link to 18+ under a definition?. —Al-Muqanna المقنع (talk) 22:26, 13 March 2023 (UTC)[reply]
We should make plain + work again, though, since having to continually find and change unwitting new uses of + to the entity code will be a hassle. + worked before, when I check e.g. an old version of the page LGBT. - -sche (discuss) 23:34, 13 March 2023 (UTC)[reply]
@Benwing2 @-sche I'll have a look at this. We obviously do want + to work properly, so this is a bug. Theknightwho (talk) 01:42, 14 March 2023 (UTC)[reply]
@Benwing2 @-sche @Al-Muqanna This is fixed. The function makeEntryName in Module:languages (which generates the correct page name in links etc) calls mw.uri.decode, which decodes any URL percent encodings into plain text. This means it's possible to use percent encodings in entry names, and have them point to the right place (e.g. {{l|la|%61%6D%C5%8D}} appears as amō - note it happily still deletes the macron). This is a default feature of conventional links, so mw.uri.decode is one of many measures we use to ensure that none of the default functionality is lost despite all the extra stuff we do on top. However, by default, mw.uri.decode also converts + into spaces (as that's how spaces are encoded in URLs). I've now switched that extra feature off. Theknightwho (talk) 04:50, 14 March 2023 (UTC)[reply]
@Theknightwho Thanks! Encoding/decoding of characters in links is such a mess. There is a current bug whereby underscores in the accel-form accelerator parameter don't make it all the way through to the accelerated entries, but get converted to spaces. This causes problems for e.g. Spanish (and similarly Portuguese) verb forms where I pass the conjugation in the accel-form and include it in the call to {{es-verb form of}}. For example, if the verb is arrecir and the original conjugation reads <no_pres_stressed>, the accelerated entry for any form of this verb should have a template call {{es-verb form of|arrecir<no_pres_stressed>}} but it wrongly appears as {{es-verb form of|arrecir<no pres stressed>}}. I haven't been able to fix it because I don't understand well enough all of the transformations/encodings/decodings that happen along the way from the code in Module:links to MediaWiki:Gadget-AcceleratedFormCreation.js to Module:accel plus default MediaWiki encoding/decoding. In particular I don't understand very well the default MediaWiki encoding/decoding steps and where they happen. If you have any insights into this latter issue, I would be grateful. Benwing2 (talk) 06:03, 14 March 2023 (UTC)[reply]
@Benwing2 I'm not sure, but it could be the fault of some other use of mw.uri.decode somewhere. The second param of that and mw.uri.encode can have one of three values, which relates to how spaces are handled: "QUERY" (default - spaces as +), "PATH" (spaces as %20, which I used to solve the issue mentioned above) and "WIKI" (spaces as _). mw.uri.decode("<no_pres_stressed>", "WIKI") would output "<no pres stressed>". Theknightwho (talk) 06:14, 14 March 2023 (UTC)[reply]

Collocation Frequency Information

[edit]

Next in line in terms of "Vin's obsession of the month", I'm continuing musings on frequency information (see Wiktionary:Beer_parlour/2023/March#Frequency_information for part one). I've begun thinking about frequency information of collocations. Now I know most people don't add collocations to pages, but I add plenty to Polish pages all the time.

Currently I sort them by POS (i.e. adj + noun, then noun + adj, then noun + noun, etc), but the source I'm getting this information from, {{R:pl:NKJP}} does provide information on frequency within the corpus. I am wondering if we shouldn't include a parameter for absolute frequency in the colocation templates, and then perhaps the {{co-top}} template could allow for different types of sorting (however this might require some sort of module and a lot of extra work). I believe this could be useful for people, if you have two synonymous collocations, one might be more popular than the other, for example. Thoughts? Vininn126 (talk) 13:20, 13 March 2023 (UTC)[reply]

[edit]

Requesting automated (bot-facilitated) addition of these reference templates in Tagalog entries: {{R:Pambansang Diksiyonaryo}} and {{R:tl:Pinoy Dictionary}}, similar to what is being done with Spanish, where a link to DRAE is added whenever a word here in Wiktionary is also listed in DRAE. TagaSanPedroAko (talk) 07:13, 14 March 2023 (UTC)[reply]

mw.ustring.gmatch

[edit]

This single line of code can cause Lua to run out of the time limit:

for p1 in mw.ustring.gmatch('apple', '%f[%a]') do end

while the following runs normally:

for p1 in string.gmatch('apple', '%f[%a]') do end

Any non-empty string instead of 'apple', no matter how short, yields the same result. I guess there is a bug. -- Huhu9001 (talk) 09:59, 14 March 2023 (UTC)[reply]

@Huhu9001 Thanks. It's probably worth raising this at the Phabricator. Theknightwho (talk) 10:14, 14 March 2023 (UTC)[reply]

template to count visits to an entry

[edit]

WP has a template "annual readership" which is one of a few templates that count pageviews, apparently for any page there. It would be interesting to use something like this to sample views on some of our pages, eg, mainspace vs. thesaurus, lemmas vs. alt forms, vernacular vs. taxonomic names, etc. Can we implement this?

We also would benefit from more knowledge of which entries were the most popular as well, but 7 million uses of this type of template would not be the way to go. DCDuring (talk) 12:54, 14 March 2023 (UTC)[reply]

The data actually comes from here:
https://pageviews.wmcloud.org/?project=en.wiktionary.org&platform=all-access&agent=user&range=latest-365&pages=Wiktionary%3AGrease+pit%2F2023%2FMarch
Mere knowledge of this does not require implementing a template. -- Huhu9001 (talk) 01:18, 15 March 2023 (UTC)[reply]
Thanks. I have the feeling that a template might be useful to facilitate comparisons, though daily pageviews are not useful for most purposes I have in mind. DCDuring (talk) 14:10, 15 March 2023 (UTC)[reply]

Etymology and Accel modules

[edit]

Hello! Is it possible to define Etymology heading and {{af|foo|-bar}} code in Accel modules? Gorec (talk) 16:12, 14 March 2023 (UTC)[reply]

@Горец Not currently but it wouldn't be hard to support. Can you give me more specifics? Benwing2 (talk) 01:59, 15 March 2023 (UTC)[reply]
One example might be including Slavic verbal nouns? Vininn126 (talk) 11:11, 15 March 2023 (UTC)[reply]
@Benwing2, Vininn126 It will certainly be useful for other forms as well. I had in mind the Macedonian augmentatives, that are listed in the headword line. I added some rules for them in Module:accel/mk (scroll to bottom), and I wanted to add Etymology and {{af|mk|... under it. Gorec (talk) 12:03, 15 March 2023 (UTC)[reply]
@Vininn126, Горец Should be supported now. Benwing2 (talk) 06:09, 16 March 2023 (UTC)[reply]
@Benwing2 I just tested it and it works perfectly. Thank you! --Gorec (talk) 09:36, 16 March 2023 (UTC)[reply]
Cool! Thanks. Vininn126 (talk) 09:54, 16 March 2023 (UTC)[reply]

Another problem with Japanese romanizations

[edit]

Example: kaizan. The definition line should say "Rōmaji transcription", not "R transcription". Rdoegcd (talk) 21:36, 14 March 2023 (UTC)[reply]

@Theknightwho, Benwing2: Seems to be another example where a module doesn't like certain symbols. I've substituted a character code for the ō and it works again, but ideally the root problem should be fixed. —Al-Muqanna المقنع (talk) 01:51, 15 March 2023 (UTC)[reply]
Root problem fixed: Special:Diff/71661170. -- Huhu9001 (talk) 02:29, 15 March 2023 (UTC)[reply]

Delinking two diacritics

[edit]

Please delink ـٰ (dagger alif or "ARABIC LETTER SUPERSCRIPT ALEF", U+0670) from Urdu and Persian, similar to other diacritics - it does it for Arabic but not for other Arabic script languages (also Pashto, etc). یٰ or یٰ should link to ی and قُرُونِ وُسْطیٰ (qurūn-i vustā) should link to قرون وسطی (currently only Persian exists)

Also هٔ should link to ه. Symbol ـٔ ("ARABIC HAMZA ABOVE", U+0654) should be removed from links Persian, Urdu, Pashto (not sure about others). Anatoli T. (обсудить/вклад) 02:53, 15 March 2023 (UTC)[reply]

@Benwing2: Thanks for fixing! I've done the same for Ottoman Turkish. --Anatoli T. (обсудить/вклад) 21:39, 15 March 2023 (UTC)[reply]
@Benwing2: While changing some entries, I found that some entries had ۀ (U+06D5) as the displayed form instead of the correct هٔ (U+0647 U+0654). --Anatoli T. (обсудить/вклад) 22:15, 15 March 2023 (UTC)[reply]

Can someone incorporate this into the underlying system(s) used by {{auto cat}} please? Acolyte of Ice (talk) 11:54, 15 March 2023 (UTC)[reply]

@Solomonfromfinland: you have been creating a lot of categories for entries manually; we generally don't do this here at the Wiktionary. Instead, such categories are created by updating the Lua module "Module:labels/data" and its subpages. If you're not sure how to do this, please ask for help here. — Sgconlaw (talk) 19:51, 15 March 2023 (UTC)[reply]

Is there a reason for this to be coded as a distinct (non-Lua) template from {{given name}} rather than a special invocation? At the moment it's missing out on many of the features of the latter, e.g. the from= parameter doesn't show up in the definition line. —Al-Muqanna المقنع (talk) 18:08, 15 March 2023 (UTC)[reply]

@Al-Muqanna: I can’t see one. This should definitely just be folded into {{given name}}. The words “of historical usage” and any categorisation could be added with historical=1.
Its third parameter doesn’t have an equivalent in {{given name}}, as it’s for adding “notorious” historical people with the name. At the moment, it requires the WP link to be done manually, and if you want to add more than one person you need to do it yourself in the same parameter. I suggest that instead we use the format notableN=, to allow multiple people to be listed in a straightforward way. By default, these should be English Wikipedia links, but with the option to add interwiki prefixes to specify a different Wiki (e.g. having de:XXXX would link to the German WP). If this happens, we should use the little superscript langcode to indicate it.
An advantage of this is that it would allow us to decouple the list of notable people from the “historical” qualifier. Perhaps we could even allow arbitrary qualifiers (e.g. if a notable person is associated with it for a specific reason). Theknightwho (talk) 18:42, 15 March 2023 (UTC)[reply]
@Theknightwho, Al-Muqanna Agreed. This template is old and dates from before the time that I Lua-fied {{given name}}. I never got around to figuring out what to do with it so I left it alone. Benwing2 (talk) 20:59, 15 March 2023 (UTC)[reply]

Translation adder - Glosses should be unique error

[edit]

"Could not find translation table for '****'. Glosses should be unique" on South_Korea#Translations. Anatoli T. (обсудить/вклад) 21:41, 15 March 2023 (UTC)[reply]

@Atitarev, try it now. I think I fixed it, the id parameter position was the problem. Gorec (talk) 22:33, 15 March 2023 (UTC)[reply]
@Горец: Hi, Belgium#Translations strikes back with "Glosses should be unique" error. --Anatoli T. (обсудить/вклад) 03:36, 22 March 2023 (UTC)[reply]
Hello @Atitarev. Sorry for the delay in response, I was absent for some time.
I see, the same problem. Fixed. Most likely caused by some recent changes. The problem occurs when the id parameter is in the 1st field. @Benwing2, Theknightwho Can you find a solution for this? Thanks. Gorec (talk) 16:12, 12 April 2023 (UTC)[reply]

{{Babel}} shows rubbish

[edit]

{{Babel}} shows rubbish as users' language level. Anatoli T. (обсудить/вклад) 00:47, 16 March 2023 (UTC)[reply]

It's OK now. --Anatoli T. (обсудить/вклад) 07:23, 17 March 2023 (UTC)[reply]

Avoiding suggestion of error by automatic transliteration

[edit]

We never agreed a mechanism to feed writing system information to Pali transliteration - my suggestion to use the script code did not meet with approval. Where automated transliteration using the standard transliteration interface will not work but is very much needed, I use an alternative interface in the module (function trwo) and pass the resulting transliteration to function full_link() in Module:links. Unfortunately, I use a horrible hack to stop that module then adding these pages to Category:Terms with redundant transliterations/pi or Category:Terms with manual transliterations different from the automated ones/pi. Is there a proper way to do this? Should I perhaps just edit the categories out of the generated wikitext and let the spurious warnings (via miscategorisation) flood through if the category names change? --~ RichardW57m (talk) 13:20, 16 March 2023 (UTC)[reply]

Another possibility would be to format the addition of the transliteration locally rather than delegate it to full_link(). Would that be preferred? I don't like it because this format may diverge from what Module:links does, so we may end up with adjacent abugidic and alphabetic inflection tables using different formats. (There are quite a few words whose stem is spelt the same in both the abugidic and alphabetic systems.) --RichardW57m (talk) 12:57, 17 March 2023 (UTC)[reply]
A further option is to use the fourth parameter to full_link(), but that looks as though it is meant to be a private back door and in no way a stable interface. Or is it generally available? --RichardW57 (talk) 09:14, 18 March 2023 (UTC)[reply]
@Benwing2, Huhu9001: I see that from 26 March, the fourth parameter (no_check_redundant_translit) is now mentioned in the documentation for Module:links. Does this mean it is now part of the module's public API? Or is its being documented a documentation bug? --RichardW57m (talk) 11:36, 29 March 2023 (UTC)[reply]
Sorry, it could be that @Theknightwho should be the one to answer. --RichardW57m (talk) 11:40, 29 March 2023 (UTC)[reply]
¯\ (ツ) /¯So you can see how ridiculous it is to claim that these general utilities functions are "well-documented". This parameter was added more than a month ago and there is not a single bit of explanation belonging to it. -- Huhu9001 (talk) 11:50, 29 March 2023 (UTC)[reply]
@Huhu9001: One doesn't normally publish documentation of backdoors! The comments in the code makes this parameter look like a back door, or at least experimental. If I do use it, it seems I should then ask its new user to be recorded in Module:links. --RichardW57m (talk) 12:21, 29 March 2023 (UTC)[reply]

Those two categories contain some elements simply because of the nature of the different writing systems. What is the recommended manner of recording a list of pages which belong there? The implication is that pages in these categories can be improved. I worry about the categories being automatically deleted because some error leaves them devoid of pages. --~ RichardW57m (talk) 13:20, 16 March 2023 (UTC)[reply]

Koine/Byzantine inflection tables

[edit]

At the moment {{grc-conj}} and {{grc-decl}} generate a warning notice at the bottom about non-Attic dialects being poorly attested if a dialect other than Attic is specified. For relatively obvious reasons, I think this warning should be disabled for Koine and Byzantine Greek.

Another nice-to-have feature might be to set the default forms to SP, i.e. without the dual, for Koine and Byzantine adjective and noun tables. —Al-Muqanna المقنع (talk) 13:34, 16 March 2023 (UTC)[reply]

Hello @Al-Muqanna, I am delighted that someone is interested in greek. I had no special training in anc.greek, but I think, traditionally, Koine inflections are just like attic, with a note at appendix or at table (that prosody and dual were lost by then). At el.wikt, we just place dual at the bottom of table at both ancient and koine tables. (e.g. wikt:el:βερίκοκκον and Cat.Koine.words.with tables). But prosody was designed to help students, so it could be retatained, i guess. We do not put prosody and dual at later use of quasi ancient words (up to 1970s, at Katharevousa (example wikt:el:κεντρικότης). For us, Koine, extends up to 6th century as in most of our sources;up to Iustinianus.
As for 'byzantine', en.wiktionary does not have a separate section for it. It would be a nice idea to rename it to Medieval Greek (not associating language to historical places and periods?) Our med.sources cover from 7th centruy up to 1700. No inflectional endings are marked near the lemmata at sources like {{R:LBG}} or {{R:Kriaras Medieval}}, because they vary so much: phase1 they are like Koine, phase3, like modern.gr. The Cambridge Grammar, has tables with possible endings according to register. At el.wikt, we just try to make lists of various forms, and hopefully, some day, add documentation for them with quotations. (example). Thank you!! ‑‑Sarri.greek  I 02:30, 17 March 2023 (UTC)[reply]
@Sarri.greek: As far as I know you're right about the inflectional differences between Koine and Attic being fairly minimal, so for Koine this would just be a minor housekeeping change for the inflection templates when |dial=koi is specified. We do have the Byzantine label and Category:Byzantine Greek, but I agree it could be renamed to Medieval Greek since it would be clearer that it also covers the Greek spoken outside of the Byzantine Empire itself. Handling inflections in medieval Greek properly is definitely a bit more complicated than Koine, and I don't think there's any automatic system for it at the moment—at any rate it seems odd for e.g. μασγίδιον to have a declension table marked "Attic" with a link to info about other ancient dialects. —Al-Muqanna المقنع (talk) 02:39, 17 March 2023 (UTC)[reply]
Thank you, @Al-Muqanna: A major problem with not having med.greek separately, is these 'pseudocategories' Ancient Greek nouns, Category:Ancient Greek terms derived from Latin as at ὁσπίτιον etc... I do not know why it could not have its own category: even for a few words... :( en.wikt has many languages with very few words, one more would be possible perhaps? ‑‑Sarri.greek  I 02:48, 17 March 2023 (UTC)[reply]
@Sarri.greek I think you are proposing allowing for derived-from and borrowed-from categories where the destination language is etymology-only, e.g. Category:Byzantine Greek terms derived from Latin. There's no technical issue with implementing this but can you bring it up in the Beer Parlour to make sure people agree with the principle? Benwing2 (talk) 16:45, 17 March 2023 (UTC)[reply]
Thank you @Benwing2, I do not know how to approach the issue of gkm because I felt that there is no interest in this subject.
First, the name 'byzantine' is to be discussed. Then, would en.wikt consider treating Medieval Greek as a language? (The Intro of the above Grammar has interesting points about the language's study). Another issue is the periodization of Koine (up to?) and Medieval (from Iustinianus up to 1453? or up to 1700? as {{R:Kriaras Medieval}} dictionary, and our current sources. If the language Section were created, only very few words would be lemmatized, usually those appearing in Modern Greek etymologies. Thanks for your response. ‑‑Sarri.greek  I 17:05, 17 March 2023 (UTC)[reply]
@Sarri.greek There are multiple issues here, all of which should be discussed at the Beer Parlour (and separately). Can you start the relevant discussions there? I can comment on the possibility of etymology-language derived/borrowed-from categories. I can't help much with changes to Greek language naming or periodization but there should be others who can help. Benwing2 (talk) 17:11, 17 March 2023 (UTC)[reply]
Just a note that I've implemented the first of the changes I mentioned, removing the poorly attested dialect notice from Koine and medieval Greek, which I hope is non-controversial. In the process I've fixed a marginal bug at {{grc-conj}} where the notice was displayed when Attic was specified explicitly. —Al-Muqanna المقنع (talk) 17:57, 17 March 2023 (UTC)[reply]
[edit]

Currently Module:links tracks every "list" with the tracking category Template:tracking/links/list, which it defines as any link template which contains links arranged as {{l|[lang]|[[link1]], [[link2]], [[link3]] ...}}. To me, this seems like a completely pointless waste of resources, as there's a whole little subroutine that the module has to go through in order to determine this.

Would anyone object to me just removing this? Theknightwho (talk) 13:38, 16 March 2023 (UTC)[reply]

@Theknightwho Please remove. In general I would remove all sorts of tracking that has been there awhile and doesn't serve an obvious purpose, as it uses up memory and time. Benwing2 (talk) 01:42, 17 March 2023 (UTC)[reply]

Malfunctioning Template:ru-noun+

[edit]

The head line for холм has bad links, displaying as square brackets:

холм • (xolm) m inan (genitive [[холма#Russian|]], nominative plural [[холмы#Russian|]], genitive plural [[холмов#Russian|]], diminutive хо́лмик)

The template {{ru-noun+}} has not changed recently but User:Theknightwho changed one of its dependencies today. Vox Sciurorum (talk) 14:17, 16 March 2023 (UTC)[reply]

Frustratingly, I saved a slightly older version than I intended to while working on this, which still included this bug. I thought I'd already dealt with it. It affected link templates with redundant links inside them. Theknightwho (talk) 15:01, 16 March 2023 (UTC)[reply]

Undefined Variable in Module:string utilities

[edit]

When trying to check the quality of some of my code by using require("strict"), I discovered that Module:string utilities was using undeclared variable plain. Can some privileged being please fix this; this breach of our coding conventions has halted the clean-up of my own code. --RichardW57m (talk) 18:01, 16 March 2023 (UTC)[reply]

@RichardW57m: If require("strict") doesn't work, you can use Module:log globals and look down at the Lua logs to find the places where global variables are accessed. But I think I fixed the problem. — Eru·tuon 18:45, 16 March 2023 (UTC)[reply]
Thanks for both the fix and telling me about module log globals. I needed it to finish my check, because Module:links also had two accidental globals - variable m_utildata in function makeLangLink() and variable tr_fail in function full_link(). (Pinging @Theknightwho as he's been working on the latter module.) --RichardW57 (talk) 20:23, 16 March 2023 (UTC)[reply]
Module:links has now been fixed by Theknightwho. --RichardW57m (talk) 12:28, 17 March 2023 (UTC)[reply]

Would anyone care to make such a TOC?

[edit]

I know you all have more urgent things to do. Would anyone be interested in making a TOC to change columns at L2? or to make inline toclimit2 (as needed for pages like te)? I cannot do it. I just tried a plan at template User:Sarri.greek/toc2-hor (and at wikt:el:Template:toc-test), a test for Module:User:Sarri.greek/toc2-hor - and a css like User:Sarri.greek/toc2-hor/style.css. Pages with 2-4 languages would then look, something like this manual toc. Thank you! ‑‑Sarri.greek  I 01:50, 17 March 2023 (UTC)[reply]

removing the horizontal rule

[edit]

Anyone object if I do a bot run to remove this? The vote at Wiktionary:Votes/2023-02/Removing the horizontal rule passed 24-2-9. Benwing2 (talk) 02:57, 17 March 2023 (UTC)[reply]

BTW Wonderfool was right on time closing the vote. Benwing2 (talk) 02:58, 17 March 2023 (UTC)[reply]
FYI: There are 477,764 pages needing doing as of the March 1 dump. I will do a second run when the March 20 dump is released. The command to be run looks like this: python rewrite.py --pagefile <(extract_pagename.sh < find_regex.enwiktionary-20230301-pages-articles.xml.bz2.separator.out.2) --from '\n+---+\n+==' --to '\n\n==' --from '\n+---+\n*\Z' --to '' --diff --comment 'remove horizontal rule separators per [[Wiktionary:Votes/2023-02/Removing the horizontal rule]]'. Pay attention to the regexes and replacements used. I will run this with parallelism 10 so it takes less than a day. User:Erutuon or someone, we should enable the CSS to auto-add the rule. Benwing2 (talk) 08:27, 17 March 2023 (UTC)[reply]
Are you going to add a colour perhaps @Benwing2? Like fr or de wikt? ‑‑Sarri.greek  I 08:30, 17 March 2023 (UTC)[reply]
Already I'm finding how accustomed I am to the wikitext horizontal rule when editing. I regret my vote, not that it would have mattered. DCDuring (talk) 14:47, 20 March 2023 (UTC)[reply]
@DCDuring I understand your pain, I'm also used to looking for the rule to distinguish language sections. Hopefully we'll adjust. Benwing2 (talk) 18:35, 20 March 2023 (UTC)[reply]
It's just a matter of time before we all get used to it. Changes at first are always hard - I, too, am put off, but I know that in a week or month or so I'll have gotten used to it. Vininn126 (talk) 18:38, 20 March 2023 (UTC)[reply]
If this problem persists, we could potentially look into creating an editing tool that automatically adds and removes the rule in the editor, much like the Incubator does with links. Thadh (talk) 18:56, 20 March 2023 (UTC)[reply]
Thanks. That cheers me up a bit. DCDuring (talk) 20:05, 20 March 2023 (UTC)[reply]
FYI There were another 1780 pages since Mar 1 for which I removed the separator based on the Mar 20 dump. Note that I only just now removed the separator from the accelerator gadget code, so any accelerator entries added to existing pages within the last day (e.g. мостом) will have the separator, and some people may still be adding it manually. I'll do another run based on the Apr 1 dump; hopefully that should be enough. Benwing2 (talk) 00:36, 21 March 2023 (UTC)[reply]

Converting named to numbered parameters

[edit]

I was wondering. Is there a way to convert named parameters to different patterns of numbered positions? e.g. a module like {{Q}} has 5 named params: lang|author|work|subpage|anchornumber. If one wants to convert them to numbered, in a way to fascilitate editors, positions vary: some are omitted. lang is always 1. Of the positions 2345 either author or work always exists. I found six patterns: 2345, 2304 (no subpage), 2034 (only one work, not mentioned e.g. Thucydides has |grc|Th|X|(X).), 2003 (only anchor), 0234 (no author), 0203. I see e.g. at {{quote-book}} how named are converted to numbered. I tried to do patters here but cannot find a way for so many patterns. Thank you. ‑‑Sarri.greek  I 09:01, 17 March 2023 (UTC)[reply]

Is fascilitation painful? (It reads like beating with sticks.) For long lists, numbered parameters are worse than named parameters and harder for revisers to read. Write only quotations are not a good idea. --RichardW57m (talk) 12:39, 17 March 2023 (UTC)[reply]
I prefer named parameters, so I hope we aren't eliminating them. Using named parameters, I can copy/paste a cite in almost any format and insert the variable names as appropriate. Reordering the items to fit a positional system is a challenge to my fine moter skills, arthritis, and declining vision. DCDuring (talk) 13:07, 17 March 2023 (UTC)[reply]
I definitely think both should be allowed? I don't see the value of having exclusively one over the other. Vininn126 (talk) 13:29, 17 March 2023 (UTC)[reply]
Thank you all. About ancient greek quotation-links. We usually copypaste from Liddell-Sott {{R:LSJ}} abbreviations like
LXX.3.45 or Th.2.22. or Α.Ch.43 All we need to do is replace dots with pipes. But with unfixed positions we have to think: o! does this have an author? Or, does this have chapters? etc. and add blanks for the missing (or named) params like
{grc||LXX|3|45} or for Thucydides, {grc|Th||2|22}, Aeschylus {grc|A|Ch||43}. That is why I asked, @Vininn126, but, true: it is a detail... ‑‑Sarri.greek  I 15:57, 17 March 2023 (UTC)[reply]
Could it possibly be done at the template level or with some customization of the module. DCDuring (talk) 17:36, 17 March 2023 (UTC)[reply]

{ping|Benwing2} . Sir, if it is not too time-consuming and too complicated, could you give me a tip? I cannot understand why I can't put numbered params at wikt:el:Template:sarritest... Problem is, the variety of numered sequences. I've tried all kinds of things for 4 days now :( ‑‑Sarri.greek  I 21:08, 18 March 2023 (UTC)[reply]

@Sarri.greek: There are some obvious oddities looking at your current code: for example |page={{{page|{{#if:{{{3|}}}|{{{4|}}}|{{{page|}}}}}}}} is telling it to use 4 for page if 3 has been specified and page has not (and to otherwise ignore 3), and the last {{{page|}}} is redundant since the function will only be evaluated if page isn't set in the first place. In an {{#if:A|B|C}} function, the second part is the 'then' and the third part is the 'else': B is displayed if A is not empty. Similarly |author={{{author|}}} {{#if:{{{2|}}}|{{{2|}}}}} is telling it to show both the author and 2, and {{#if:{{{2|}}}|{{{2|}}}}} is generally equivalent to just {{{2|}}}. The usual way to go about this is much simpler: for example, {{{author|{{{2|}}}}}} is all you need to use author if it's been set, and 2 otherwise. —Al-Muqanna المقنع (talk) 02:51, 19 March 2023 (UTC)[reply]
never mind, i restrat in a different way. Thank you ‑‑Sarri.greek  I 11:50, 19 March 2023 (UTC)[reply]

Change of font in Cyrillic-alphabet headwords

[edit]

Why was the font for Cyrillic-alphabet headwords changed some time ago from the roman serif font that is used for Latin-alphabet headwords to the sans-serif font (presumably Arial at a glance) that is used now? Can it be changed back? It looks ugly and out of place. Emmalib (talk) 17:35, 17 March 2023 (UTC)[reply]

@Emmalib It's the same font that gets used in headwords, but at a much bigger size (and I agree it's pretty ugly). This is because the title gets tagged as being Cyrillic text, so the CSS uses the standard Cyrillic font. @Erutuon I think it's probably best if we still keep tagging the titles as being Cyrillic, but perhaps we could use a different font for them? Theknightwho (talk) 18:32, 17 March 2023 (UTC)[reply]
I get that, but that isn't the case for Latin-alphabet entries, e.g. in English. Also it never used to be the case - the font used to be the same roman font that is still used for English etc. So why the change? (Can't remember exactly when it changed, and I wanted to query it at the time but didn't have an account. It wasn't so long ago though, within the past 12 months for sure.) Emmalib (talk) 18:51, 17 March 2023 (UTC)[reply]
Checking archived versions of various entries, the change in font happened sometime between November 2022 and the 8th of March. - -sche (discuss) 01:40, 19 March 2023 (UTC)[reply]
At some point alphabet styling was applied to titles, which it wasn't previously (note the span class of the title is mw-page-title-main in the earlier one and Cyrl in the latter). It could be manually overridden by setting .mw-first-heading .Cyrl to the regular h1 fonts in common.css, though I'm not sure in that case that applying the Cyrl class to it is doing anything very useful? —Al-Muqanna المقنع (talk) 03:19, 19 March 2023 (UTC)[reply]
@Emmalib Thank you for bringing this up, it's been bugging me for a while now. I had the same thought too regarding the very plain font used for Arabic script headwords. I agree that it would be nice to change it, perhaps we could bring these both more into line with the more elegant Latin script style, considering it's one of the first things a person notices on opening a page. Helrasincke (talk) 11:21, 19 March 2023 (UTC)[reply]
Yes, I am not experienced enough at the moment to do anything about it other than posting on here. I wonder if anyone with skills can sort this out now that we have some discussion about it. It is, for example, the "correct" roman font on Russian Wikipedia and Wiktionary. Emmalib (talk) 05:47, 22 March 2023 (UTC)[reply]

Recently I noticed someone is replacing all mw.ustring methods in Lua modules with ones in mod:string utilities. I suppose the latter are smart interfaces that can decide between string.xxx methods and mw.ustring.xxx methods to improve performance. But since I can see these functions typically run through additional long-winded checks and iterations to achieve its goal, does it really benefit to replace all mw.ustring with these ones? This may be the case for languages that use mostly ASCII letters plus a few others. But for languages that use exclusively non-ASCII letters, mod:string utilities functions will often eventually resort to mw.ustring functions and the extra checks or whatever are done for no merit. As two examples I found, mod:string utilities actually increased Lua time usage for both mod:ja and mod:ja-ruby in the page 菩薩 by approx. 20%.

In addition, I suspect this adds quite some difficulties to code maintenance. And I also saw instances where even the native lua string library functions are replaced by mod:string utilities functions, which seems highly inexplicable to me. -- Huhu9001 (talk) 15:03, 18 March 2023 (UTC)[reply]

@Huhu9001 Take a look at the ustring functions themselves in the ustring library; they are even more complex than these. That’s why I opted for this compromise. In many cases, Module:string utilities is actually able to use the string library for Japanese text, as in many cases it’s able to modify the input pattern. I know this, because the quotation at 賦斂 uses 38 capture groups and caused an error, as the string library has a max of 32. I’ve since fixed that.
I agree we should just be using the string functions where we know patterns don’t need ustring, though. I was probably just tired/not thinking if I did that. Theknightwho (talk) 17:25, 18 March 2023 (UTC)[reply]
No matter how complex mw.ustring themselves are, the inequality
  • mw.ustring + mod:string utilities > mw.ustring
holds because
  • mod:string utilities > 0
And the current result is that they are indeed counterproductive.
38 capture groups can easily be worked around without any change to the infrastructure in many ways, with the simplest solution being just adding some more % markups. Also I think people are expected to "be not tired/be thinking" when editing modules transcluded by a hundred thousand. -- Huhu9001 (talk) 00:27, 19 March 2023 (UTC)[reply]
@Huhu9001 That's correct, but the point is that there will be enough instances where it's possible to use the string library. You seem to just be assuming that any instances of higher codepoints are automatically going to need ustring, but that just isn't true. You can't just use one page to say that it's counterproductive, either. The nature of a change like this is that different pages will be affected in different ways, because it's a compromise. Plus, the issue of 38 capture groups was something I was using as evidence that the string library was actually being invoked, because it's not a concern with ustring. The point is that your assumption was wrong.
Also, I can't find instances where I converted string to the new function. Could you please point out where that happened? Theknightwho (talk) 00:47, 19 March 2023 (UTC)[reply]
It was you that has made the bold assumption that all modules need the new functions, not I. What I have said is that in non-ASCII languages cases where one actually need native string library are not so frequent as ASCII languages, which causes the benefit of mod:string utilities functions to be balanced out by there heavy chunks. I never said string library is never invoked. Please don't straw man me.
Converting string to the new function was my misseeing. It did not happen. Sorry. -- Huhu9001 (talk) 01:39, 19 March 2023 (UTC)[reply]
@Huhu9001 Right, but your basis for saying that comparing the timings on a single page on which I saw quite different results (as there will always be variations in the load times, for a variety of reasons). The number of instances in which ustring has to be invoked is actually quite low, but of course in many cases it's possible for us to determine whether that's the case ahead of time. The best option is to do that determination in advance wherever possible, and then to use Module:string utilities when it could go either way. Theknightwho (talk) 02:47, 19 March 2023 (UTC)[reply]
The best option is to only use mod:string utilities in those translingual modules when it is really hard to tell whether it is ASCII or not. For individual languages, individual editors make better judgements than one single centralized decision maker. -- Huhu9001 (talk) 03:03, 19 March 2023 (UTC)[reply]

There is also some evidence that some mod:string utilities funcs do not perform significantly better than mw.ustring even on pure-ASCII strings.

local export = {}

local f = require'mod:string utilities'.len
--local f = mw.ustring.len

function export.show(frame)
    for i = 1, 1000000 do f(string.char(i % 128):rep(100)) end
end

return export

The result is:

  • 2.977 seconds, 683,118 bytes for require'mod:string utilities'.len
  • 3.028 seconds, 612,761 bytes for mw.ustring.len

One should not just simply assume mod:string utilities good, mw.ustring bad. -- Huhu9001 (talk) 03:50, 19 March 2023 (UTC)[reply]

@Huhu9001 This feels like a pretty disingenuous comparison, because you're comparing the len function, which is a total reimplementation, and therefore has nothing to do with your original point. It's certainly plausible that len and sub don't provide any advantages (as they don't need to call into the full parser in the ustring library), but that has no bearing on the other functions which do. Plus, any differences are obviously going to be exacerbated over 1,000,000 iterations anyway. Theknightwho (talk) 04:01, 19 March 2023 (UTC)[reply]
My original point is to question the necessity of all functions in mod:string utilities and their imposition on various modules. If mod:string utilities's len and sub do not do better even on ASCII strings, there is definitely no reason to force other modules to use them. -- Huhu9001 (talk) 04:07, 19 March 2023 (UTC)[reply]
@Huhu9001 You know very well that I'm saying that you've not provided a reason to avoid using the other functions. Theknightwho (talk) 04:11, 19 March 2023 (UTC)[reply]
You know very well that I'm saying I am providing a reason to avoid using these functions. I am not obliged to mention every thing every aspect at the same time. -- Huhu9001 (talk) 04:15, 19 March 2023 (UTC)[reply]
@Huhu9001 It makes no sense to disregard the functions that your reason doesn't apply to. Stop being dishonest. Theknightwho (talk) 04:18, 19 March 2023 (UTC)[reply]
You seem not arguing with reason but occupied by emotion to defend mod:string utilities. Remember your own word, w:WP:OWN. Someone questioning the applicability of a Lua module you write is not the end of the world. -- Huhu9001 (talk) 04:19, 19 March 2023 (UTC)[reply]
@Huhu9001 Not really, no. I drew a distinction between len and sub and the others, which you were using as a way to justify your blanket revert. I've also suggested an optimal solution, which is to use the most appropriate function in each place where it's possible to determine that, which you totally ignored. Rather than saying "no you", perhaps reflect on why I brought up WP:OWN in the first place, because the only solution you'll settle for here is keeping your preferred version, on the basis that you think you know best. That's not how it works. Theknightwho (talk) 04:26, 19 March 2023 (UTC)[reply]

Confirmed, mod:string utilities upper() is also unnecessary.

local export = {}

local f = require'mod:string utilities'.upper
--local f = mw.ustring.upper

function export.show(frame)
    for i = 1, 100000 do f(string.char(i % 4 + 0x61):rep(100)) end
end

return export

The result is:

  • 0.350 seconds, 662,684 bytes for require'mod:string utilities'.upper()
  • 0.188 seconds, 596,423 bytes for mw.ustring.upper()
I assume lower() to be the same. -- Huhu9001 (talk) 04:51, 19 March 2023 (UTC)[reply]
However, when you compare gsub with higher codepoints, the results are dramatic:
function export.show(frame)
    for i = 1, 100000 do
    	local char = mw.ustring.char(i % 4 + 0x10061)
    	f(char:rep(100), char, "")
    end
end
mw.ustring.gsub: 4.550 seconds
require'mod:string utilities'.gsub 0.988 seconds
Also noting that Lua memory usage does not scale linearly, and can't be accurately predicted from isolated results (as we've found over and over before).
Theknightwho (talk) 05:43, 19 March 2023 (UTC)[reply]
Please calm yourself down. We are testing these functions one by one. We do not analyse one function while simultaneously meddling with another. -- Huhu9001 (talk) 05:50, 19 March 2023 (UTC)[reply]
@Huhu9001 Don't patronise me. The two functions were compared with otherwise identical parameters, which makes your point nonsensical. Theknightwho (talk) 05:53, 19 March 2023 (UTC)[reply]
Please. Again. Calm yourself down, take a rest while I am doing the rest testing. -- Huhu9001 (talk) 05:57, 19 March 2023 (UTC)[reply]

Next we test find(), we only test the so-called "modified version of the pattern" or where mod:string utilities find() would use string.find() even if the pattern contains non-ASCII character classes. The relevant code is from Module:string_utilities#L-188 to Module:string_utilities#L-283. The reasoning is that, in other cases, module editors will have no difficulty deciding whether to use string.find() or mw.ustring.find().

1. Multi-byte patterns

local export = {}

local f = require'mod:string utilities'.find
--local f = mw.ustring.find

function export.show(frame)
    for i = 1, 100000 do f(('ð'):rep(100), 'ð*') end
end

return export

The result is:

  • 8.065 seconds, 1,142,947 bytes for require'mod:string utilities'.find()
  • 0.712 seconds, 595,007 bytes for mw.ustring.find()

The result is astonishing. mod:string utilities find() performed far worse than mw.ustring.find().

2. the dot ('.')

local export = {}

local f = require'mod:string utilities'.find
--local f = mw.ustring.find

function export.show(frame)
    for i = 1, 100000 do f(('ð'):rep(100), '.') end
end

return export

The result is:

  • 0.601 seconds, 1,142,071 bytes for require'mod:string utilities'.find()
  • 0.721 seconds, 594,979 bytes for mw.ustring.find()

This time mod:string utilities performed slightly better in time, requiring only approx. 80% running time of mw.ustring.find(). But still the Lua memory usage is doubled, which is hard to ignore.

Here my conclusion is, it is still generally a good idea to use just string.find() and mw.ustring.find() when editors can clearly tell whether the string pattern is ASCII or non-ASCII. mod:string utilities find() should be only used when the string pattern is uncertain. And its benefit is also questionable. -- Huhu9001 (talk) 05:47, 19 March 2023 (UTC)[reply]

Next, match()

1. Multi-byte patterns

local export = {}

local f = require'mod:string utilities'.match
--local f = mw.ustring.match

function export.show(frame)
    for i = 1, 100000 do f(('ð'):rep(100), '.') end
end

return export

The result is:

  • 0.494 seconds, 1,141,059 bytes for require'mod:string utilities'.find()
  • 0.624 seconds, 595,007 bytes for mw.ustring.find()

2. the dot ('.')

local export = {}

local f = require'mod:string utilities'.match
--local f = mw.ustring.match

function export.show(frame)
    for i = 1, 100000 do f(('ð'):rep(100), '.') end
end

return export
  • 0.312 seconds, 1,141,247 bytes for require'mod:string utilities'.find()
  • 0.607 seconds, 594,979 bytes for mw.ustring.find()

3. patterns with position

local export = {}

local f = require'mod:string utilities'.match
--local f = mw.ustring.match

function export.show(frame)
    for i = 1, 100000 do f(('ð'):rep(100), '().') end
end

return export
  • 0.880 seconds, 1,141,731 bytes for require'mod:string utilities'.find()
  • 0.624 seconds, 594,979 bytes for mw.ustring.find()

mod:string utilities performs much better this time. With the dot it only needs half the running time as mw.ustring.match(). And we can somehow see an underlying rule that mod:string utilities is usually better when it does not need to return a number value. But concerning the memory usage, mod:string utilities still uses double the memory as mw.ustring.match(). One can see a trade-off between time and memory usage. It is reasonable to say choosing between these two is a personal choice. As for me, I still prefer mw.ustring.match(), given that Lua memory is a bigger problem on Wiktionary than running time. -- Huhu9001 (talk) 06:25, 19 March 2023 (UTC)[reply]

Next, sub()

1. small index

local export = {}

local f = require'mod:string utilities'.sub
--local f = mw.ustring.sub

function export.show(frame)
    for i = 1, 100000 do f(('ð'):rep(100), 1, -1) end
end

return export
  • 0.225 seconds, 1,140,775 bytes for require'mod:string utilities'.sub()
  • 0.532 seconds, 594,987 bytes for mw.ustring.sub()

2. large index

local export = {}

local f = require'mod:string utilities'.sub
--local f = mw.ustring.sub

function export.show(frame)
    for i = 1, 100000 do f(('ð'):rep(100), 1, 50) end
end

return export
  • 3.746 seconds, 1,141,644 bytes for require'mod:string utilities'.sub()
  • 0.456 seconds, 595,112 bytes for mw.ustring.sub()

mod:string utilities is better at small indices, but awful at large indices. Memory same as above. So I will sugguest just using mw.ustring.sub(). -- Huhu9001 (talk) 10:02, 19 March 2023 (UTC)[reply]

Next, gsub()

local export = {}

local f = require'mod:string utilities'.gsub
--local f = mw.ustring.gsub

function export.show(frame)
    for i = 1, 100000 do f(('ð'):rep(100), 'ð*ï*', '') end
end

return export
  • 0.548 seconds, 1,141,243 bytes for require'mod:string utilities'.gsub()
  • 0.866 seconds, 595,031 bytes for mw.ustring.gsub()
local export = {}

local f = require'mod:string utilities'.gsub
--local f = mw.ustring.gsub

function export.show(frame)
    for i = 1, 100000 do f(('ð'):rep(100), 'ð*ï*', '') end
end

return export
  • 0.636 seconds, 1,141,464 bytes for require'mod:string utilities'.gsub()
  • 0.596 seconds, 595,034 bytes for mw.ustring.gsub()
local export = {}

local f = require'mod:string utilities'.gsub
--local f = mw.ustring.gsub

function export.show(frame)
    for i = 1, 100000 do f(('ð'):rep(100), 'ð*ï*î*ï*î*ï*î*ï*î*ï*î*ï*î*', '') end
end

return export
  • 3.053 seconds, 1,145,509 bytes for require'mod:string utilities'.gsub()
  • 0.757 seconds, 595,067 bytes for mw.ustring.gsub()

mod:string utilities is better at the simplest patterns. Its performance deteriorates quickly as the pattern grows and is soon outperformed by mw.ustring.gsub(). Memory same as above.

So maybe we can say with very simple patterns, there is some benefit using mod:string utilities. But its performance is unstable. Personally I would still suggest just using mw.ustring.gsub() all along, unless you are willing to alternate the two function in your code just for how your pattern's length varies.

As far as I know, the only commonly used simple patterns that would be fed to gsub() is '.', to iterate over each single unicode characters. In most cases if we choose to use mod:string utilities, this will be the only place in the whole code where we want it. If you ever bother to care for this special case at all, perhaps a much simplier way is just to use

('some string'):gsub('[\1-\255][\128-\191]*', function(c)
    --do whatever
end)

without the need of any other string utility module.-- Huhu9001 (talk) 10:02, 19 March 2023 (UTC)[reply]

@Huhu9001: Have you timed this? It's also a lot obscurer for the common case :gsub('.', table). --RichardW57m (talk) 15:40, 21 March 2023 (UTC)[reply]
No I did not time this. But I guess it is quicker than mw.ustring.gsub('.', ...). -- Huhu9001 (talk) 16:38, 21 March 2023 (UTC)[reply]
For short strings, perhaps. For transliterating a quotation, a long series of function calls may outweigh the overhead of the callback. --RichardW57m (talk) 17:34, 21 March 2023 (UTC)[reply]
@RichardW57m Neither Module:string utilities nor mw.ustring should ever be used if it is known that the input will always be ASCII. However, it would be a bad idea to use . + a table with non-ASCII input in most instances. Theknightwho (talk) 17:02, 21 March 2023 (UTC)[reply]
There's some confusion here - possibly caused by my writing ':gsub('.', table) for 'mw.ustring.gsub(text, '.', string)'. text = gsub(text, '.', tt) (literally) is extremely common in transliterators for Indic scripts, where we've mostly copied from one another. It tends to be used to deal with things like independent vowels, digits and punctuation. --RichardW57m (talk) 17:30, 21 March 2023 (UTC)[reply]
@RichardW57m In that situation, it’s best to use the pattern for a UTF character with the normal string library. Theknightwho (talk) 18:23, 21 March 2023 (UTC)[reply]
So to confirm, are you just saying that in the non-ASCII case, text=mw.ustring.gsub(text, '.', tt) should usually be modified to text=text:gsub('.[\128-\191]*', tt)? @Huhu9001's use of function in the comment is misleading!! (We'd obviously need a symbolic name for the pattern to keep it readily intelligible.) --RichardW57 (talk) 21:44, 21 March 2023 (UTC)[reply]
@RichardW57 I tend to declare local UTF8_char = "[%z\1-\127\194-\244][\128-\191]*" at some point and use that as the pattern, yeah. Theknightwho (talk) 01:25, 22 March 2023 (UTC)[reply]

Next, gmatch()

As I have stated somewhere above (Wiktionary:Grease_pit/2023/March#mw.ustring.gmatch), mw.ustring.gmatch probably has some bug that causes it to run easily into infinite loops. But fortunately gsub() can do everything that gmatch() can. So we can just use the conclusion of gsub().

Finally to summarize it all, I would come to the conclusion that mod:string utilities functions has many problems. It is difficult to say their benefits outweighed their shortcomings. Module editors should use their own discretion on them. And it is generally a bad idea to impose them blindly on all other modules. @Theknightwho -- Huhu9001 (talk) 10:02, 19 March 2023 (UTC)[reply]

@Huhu9001 Reading over all of this is interesting. I think there may be ways to better optimise the balance of the Module:string utilities functions (at least with match, gsub and gmatch, which are the most promising). Currently, the process is as follows:
  1. They pass the text and pattern to patternSimplifier, which runs a series of pre-screening tests as to whether the pattern is string compatible. These are a modified version of those performed by the ustring library itself, with the major difference being that the ustringlibrary fails anything which contains bytes above 0x7F (i.e. anything with a codepoint of U+0080 and above). If it passes all of these, then the pattern is returned unchanged and flagged as 'simple' (i.e. suitable for string).
  2. If the pattern fails, patternSimplifier iterates over the pattern byte-by-byte, keeping track of various factors (e.g. whether there's a charset, if a % escape applies and so on). It saves these bytes in a new_pattern table, and in some cases these are modified for the sake of string compatibility (e.g. a . unaffected by magic characters is saved as [%z\1-\127\194-\244][\128-\191]*, which is the pattern for a single UTF-8 character). Certain situations cause an automatic fail (e.g. a multibyte character followed by *), which results in the original pattern being returned with a fail flag. However, if this doesn't happen, new_pattern is then concatenated and returned. In a small number of situations, the pattern can be considered simple if and only if the text contains no codepoints of U+0080 and above, so this is checked where necessary.
Finding the optimal balance is difficult, because the more tests we add, the more performance degradation there will be for any patterns which ultimately end up failing. On the other hand, any patterns which pass will generally see performance improvements.
It's important to remember that patterns such as ð* are actually pretty unusual: the only magic character routinely used with individual multibyte characters is ?, which does pass the second set of tests as the patternSimplifier converts (e.g.) ð? to \195?\176?, which it manages by looping back and updating the relevant entries in new_pattern. However, there are some tests for unusual patterns which we can probably sacrifice, such as those for patterns involving %b and (): they're hardly ever used, in my experience.
As for memory usage, I'll repeat what I said before: you can't really take it in isolation, because memory usage does not scale linearly in Scribunto. Module:string utilities gsub currently manages all text substitution in Module:languages/doSubstitutions, and actually caused a decrease in memory usage on more pages than not (along with a speed-up). On a page such as water/translations, I suspect it's being run more than 50,000 times. However, that doesn't mean we can't make further improvements. Theknightwho (talk) 12:50, 20 March 2023 (UTC)[reply]
Just one more suggestion. Could you please at least write a documentation for anything you are trying so hard to propagate, especially something that is supposed to be used as a general utility library? I have already asked you to update the documentation of Module:languages/data/2 for the "generate_forms" stuff you have made before, and it is still nowhere to find. Do you expect others to build their code on something whose behaviour is largely unknown, and then when it somehow breaks panick and wait for you rescue? Yes I know it feels good to be the sole world-saver who knows more than anyone else by simply not sharing information, but please remember that is not how we are working as a community. -- Huhu9001 (talk) 14:52, 20 March 2023 (UTC)[reply]
@Huhu9001 The reason generate_forms doesn't have documentation yet is because the general syntax still needs to be properly determined, which I already explained to you. Every time I have tried to work collaboratively with you, you have ignored what I've said and responded with patronising snark. It's profoundly unhelpful. Theknightwho (talk) 03:06, 21 March 2023 (UTC)[reply]
So what is the reason why the new mod:string utilities function does not have a single bit of documentation either? You seem to have no time properly determining "the general syntax" which is essentially the basis of the whole Chinese Lua infrastructure and which can not be more important. Nor do you have any time to write any instructions for the new string functions you are pushing so hard. But you have already made the decision for the whole community that it is the time they must use them with no ifs or buts, and you do have pretty much time getting mad at someone that disagrees and throwing all kinds of bad-faith-assuming attacks against them. I am deeply amazed. -- Huhu9001 (talk) 03:53, 21 March 2023 (UTC)[reply]
@Huhu9001 The documentation is commented in the module itself, and is pretty self-explanatory anyway. I'm also not annoyed with you for disagreeing with me; I'm annoyed at you because you've made zero effort to be collaborative. I have repeatedly suggested potential ways forward, which you have ignored in favour of whataboutism and insinuations. Theknightwho (talk) 04:11, 21 March 2023 (UTC)[reply]
Great. Next time when I try to search, like, Lua manual, I hope it would told me "Go look the source code! It is all there! We have comments!"
And you are still calling yourself "collaborative" while at the same time even refusing to write a documentation. Oh my boy. -- Huhu9001 (talk) 04:25, 21 March 2023 (UTC)[reply]
@Huhu9001 I'm not "refusing" to do anything - you're just trying to justify why you're acting like an arsehole. Goodbye. Theknightwho (talk) 04:34, 21 March 2023 (UTC)[reply]
I must admit given all your behaviours above, I am still a little surprised to see profanities here, from an admin. -- Huhu9001 (talk) 04:42, 21 March 2023 (UTC)[reply]
@Theknightwho, Huhu9001: I also find the lack of testcases distressing.
I would also remind you that documentation is supposed to go into the documentation page, rather than comments in the code. Be grateful that comments are now allowed; they used to be well-nigh prohibited. --RichardW57m (talk) 14:17, 21 March 2023 (UTC)[reply]
@Theknightwho, Huhu9001, RichardW57 IMO User:Theknightwho needs to learn to not get worked up in arguments, and end the discussion when further argumenting is counterproductive. OTOH, I don't agree at all that putting documentation in the Lua code is wrong. I actually prefer it this way, because (a) if you are calling a function you probably have to look up its code in any case, (b) it's much more likely that the documentation will stay up to date if directly attached to the function and/or module in question than sitting off on some other page. E.g. you will find in general that the modules I've created are heavily documented in the code of the modules (see Module:parse utilities for an example), but not on the module doc pages, which generally just refer to the comments in the code. Benwing2 (talk) 08:17, 22 March 2023 (UTC)[reply]
Bullet 7 at WT:Coding_conventions#Lua says:
"Comments should be used sparingly; good code does not need much commenting. Keep comments brief and to the point. Do not put ASCII art in comments.
Comments should not be used for documentation; use the documentation subpage instead."
Perhaps we should change that part of the convention. While interfaces very much belong in the documentation, the details of implementation don't sit so well in main documentation pages. --RichardW57m (talk) 09:35, 22 March 2023 (UTC)[reply]
@Benwing2: Documentations provide high-level understandings of a module, freeing people form the mental labour of understanding its low-level implementation. Having to look up the source code before running a function is generally a bad idea. It can be a real headache with some complex functions like those found in Module:languages. To have a grasp of what it actual does may require you to hop back and forth for the help functions it used, sometimes even in other modules. It is also a sign that the code is highly coupled, as you are wary of the likely unexcepted influence that may come from a module's subroutine, which normally you should have the right to ignore. This is basically what is called "w:Modular programming". The name of "modules" we are using is closely related to it. -- Huhu9001 (talk) 09:56, 22 March 2023 (UTC)[reply]
@Huhu9001 Dude, I do software for a living. I know how to write clean code. There is no fundamental difference between putting documentation in the form of a comment above a function, and putting it elsewhere in a documentation file. If you doubt me, please read some of my code and the comments I've written. Benwing2 (talk) 01:42, 23 March 2023 (UTC)[reply]
@RichardW57 That statement about comments in code is nothing but someone's (rather misguided) opinion; I disagree completely. Benwing2 (talk) 01:43, 23 March 2023 (UTC)[reply]
@Benwing2: An easy mistake someone that "do software for a living" would make is assuming everyone they work with is a professional developer they meet in their everyday workplace, which is not the case on Wiktionary. Inline comments contain much irrelevant information like in your example Module:parse utilities an older implementation of some function. It may be good for developers, but not for users. They don't write wt:Coding_conventions in this way for no good reason. -- Huhu9001 (talk) 02:26, 23 March 2023 (UTC)[reply]
If the next time Android or Lua or Python were to replace their documents with "just look at our code, we have all comments there", I would expect lots of people to kick them in the teeth. -- Huhu9001 (talk) 02:32, 23 March 2023 (UTC)[reply]

@Benwing2: We can easily find many examples of good utilities libraries with well-written documentation. I can not know for sure whether their creators "do software for a living", but they definitely don't just go "read my inline comments". Perhaps Benwing2's idea work well with projects limited to a workgroup or something (I never write any damned doc either for something I write only for myself), but an utility library open to indefinitely many users is a completely different story.

-- Huhu9001 (talk) 03:17, 23 March 2023 (UTC)[reply]

@Huhu9001 Whatever. Please don't lecture me. Under Module:parse utilities you will see detailed documentation at the top of every function, which you seem to have failed to notice. API documentation of the sort you cite is normally generated exactly from such comments. I should add, people who write modules and call functions in other modules are necessarily "developers" so I don't understand at all your comment about "not good for users". Benwing2 (talk) 03:56, 23 March 2023 (UTC)[reply]
@Benwing2: If you are giving you reasoning, I am surely allowed to give mine. This is not to be labeled "lecturing" and forbidden. I will actually appreciate you very much if you can do this "generated exactly from such comments" thing here, as it is an obviously necessary job that all those I listed above have done.
If you don't like the term "developers" and "users". I can use the term "developers from the same working group" and "developers beyond the same working group", but it may look longer. -- Huhu9001 (talk) 04:05, 23 March 2023 (UTC)[reply]
@Huhu9001 There is no framework to generate documentation from Lua code comments, so you will have to read the code comments. This is two extra clicks compared to reading the module documentation page, that's all. Benwing2 (talk) 04:09, 23 March 2023 (UTC)[reply]
@Benwing2: I don't think so. Lua code in the Module namespace can be fetched automatically just as easily as any other wikitexts. There may be some tricks with the formatting, but it is far from impossible. The reason why I think it is not "two extra clicks" has been explained above in my comments. -- Huhu9001 (talk) 04:18, 23 March 2023 (UTC)[reply]
@Benwing2: Now here is a less abstract request. Can you please add a brief description of what characters in any string that is fed to mod:languages and mod:links will be modified and what one can do to escape them? Quite a few languages had trouble with them. But it is written nowhere, neither docs nor comments, and it is also not easy to deduce them from the code. I guess at least you would agree this small piece of doc is useful and worth being there. -- Huhu9001 (talk) 04:34, 23 March 2023 (UTC)[reply]
@Huhu9001 I have no idea anymore. User:Theknightwho will need to write this up. Benwing2 (talk) 04:43, 23 March 2023 (UTC)[reply]
@Huhu9001 @Benwing2 Sure, I'm happy to write that up. However, I'm only aware of Japanese having issues (and by extension, Okinawan, as its modules are just a clone of the Japanese ones). Which others have had problems?
As for documentation, it seems to me that there is a very obvious need to separate documentation from explanatory comments: documentation gives the purpose and general overview, while explanatory comments offer clarification on why a certain implementation was used. However, it's fine to separate these by placing the documentation above the function in the module code, while making sure the explanatory comments are in-line.
And no - I'm not interested in getting into a protracted argument about this. This is obviously a matter of personal preference. Theknightwho (talk) 08:12, 23 March 2023 (UTC)[reply]
Appendix:Quenya/mista had been hanging in cat:E for a week or longer, apparently nobody was able to do anything because without any documentation how mod:languages manipulates string was largely a secret. -- Huhu9001 (talk) 08:23, 23 March 2023 (UTC)[reply]
@Huhu9001 No, it was for 2 days. I check CAT:E extremely often, and considered it a low priority because it's an appendix language which seems to use a custom character set for a specific font.
And no, it isn't a secret - it's actually explained in Module:languages#L-95. I don't care if you don't like comments in the code - it's dishonest to pretend that documentation isn't there, simply because it isn't in the format you would like. Yes, it will be written up in more detail soon enough, but the fact you complained about even Ben's heavily documented code is just entitled and unreasonable. Theknightwho (talk) 08:45, 23 March 2023 (UTC)[reply]
Good, "risky characters" (Module:languages#L-95) a very good explanation which can not be more helpful. And "low priority" actually means just putting it aside and be busy playing with your own sandboxes. It turns out that it is you that should stop being dishonest. -- Huhu9001 (talk) 08:50, 23 March 2023 (UTC)[reply]
Also I roughly remember someone has said "I'm not interested in getting into a protracted argument about this." I appreciate them for doing so, rather than quickly getting emotional and resuming attacks again. -- Huhu9001 (talk) 08:59, 23 March 2023 (UTC)[reply]
"Playing with my own sandboxes" meaning "writing, testing and deploying new code" haha. Good grief. Theknightwho (talk) 09:03, 23 March 2023 (UTC)[reply]
Does that justify you for ignoring existing problems? What "low priority" are you actually assigning to a Wiktionary proper page, even lower than your own sandboxes? This basically again shows your lack of respect for others.
"Also I roughly remember someone has said 'I'm not interested in getting into a protracted argument about this.' I appreciate them for doing so, rather than quickly getting emotional and resuming attacks again." -- Huhu9001 (talk) 09:14, 23 March 2023 (UTC)[reply]
@Huhu9001 The only person showing a lack of respect for others here is you, and unfortunately you keep making this my business by personally attacking me. I suggest you take a break from this page for a day or two. Theknightwho (talk) 09:21, 23 March 2023 (UTC)[reply]
I suggest someone that uses the vulgarity "arsehole" and formerly stated "I'm not interested in getting into a protracted argument about this" but breaks this immediately take a break instead. -- Huhu9001 (talk) 09:24, 23 March 2023 (UTC)[reply]

Proto-Hellenic Conjugation Templates

[edit]

These were created a while back by a certain IP who added inflection templates for a wide variety of proto-languages, but these are the only ones having problems, and only now.

As far as I can tell, all the entries using certain of these are in CAT:E: the template adds the * to indicate the terms are reconstructions, but by the time Module:links checks for it, it's not there. I'm guessing that this is a side effect of all the preprocessing that @Theknightwho has been adding to link parameters, but there seems to be something else at work, as well.

At any rate, the relevant code in the template is {{l-self|grk-pro|*{{{1|?}}}}}}}, with each cell in the table having its own numbered parameter instead of 1. Oddly enough, not all of the instances of this give the error in all of the templates (within the same template in the same entry, some cells display properly while others have the error), and some of the templates that use named parameters seem to be okay. Of course, this has been progressing, so only a few entries were in CAT:E originally and more appeared later- so this might be temporary.

I suppose we could get rid of the asterisks in the templates and add them to the parameters in the entries (there are a few dozen entries, so far), but I'd like to make sure this isn't a problem that could progress to other templates but that just hasn't as yet.

I don't have the expertise to figure this out, so I'm bringing it here. Chuck Entz (talk) 17:06, 18 March 2023 (UTC)[reply]

@Chuck Entz
Theknightwho (talk) 17:22, 18 March 2023 (UTC)[reply]
Turns out it was actually a bug in Module:languages. Fixed. Theknightwho (talk) 18:17, 18 March 2023 (UTC)[reply]

Vertical scripts in categories

[edit]

Previously, terms in languages like Manchu that are written vertically displayed vertically even on category pages like Category:Manchu lemmas (see e.g. this version of the page from last November). Currently, they no longer do. (Here's the discussion from when verticalization was first implemented, which might help with figuring out how to re-enable it.) - -sche (discuss) 04:05, 19 March 2023 (UTC)[reply]

@Theknightwho You'll have to look into this, the catfix code that (I think) sets things up is in Module:utilities but there may have been changes elsewhere that led to this. Benwing2 (talk) 07:51, 20 March 2023 (UTC)[reply]
@Benwing2 @-sche I don't have time to find the specific thread right now, but I remember this coming up a few months ago after the Mongolian script got subdivided. The new script codes were mnc-Mong, xwo-Mong and sjo-Mong. If that's the case, I don't have the permissions to fix it as it's CSS related. However, it seems to affect Mong as well, so there might be two issues going on. Theknightwho (talk) 12:05, 20 March 2023 (UTC)[reply]

Ukrainian accelerated forms for words with alternative stress

[edit]

The accelerated forms generator seems to use the right form to generate the IPA for words like військовий which have multiple stress variants, but it still uses the same form for both "inflection of ... " forms (it seems to default to the first named variant). See for instance [3]. Helrasincke (talk) 11:04, 19 March 2023 (UTC)[reply]

@Helrasincke I'll take a look; probably need to port the Russian accelerator code. Benwing2 (talk) 23:43, 19 March 2023 (UTC)[reply]
@Benwing2 Great, thank you! Helrasincke (talk) 05:16, 20 March 2023 (UTC)[reply]
@Helrasincke The bug is here: [4] It always takes the first lemma when there are several. Module:inflection utilities is a general library module for implementing inflection tables, but the Russian code doesn't use it (because it was written before Module:inflection utilities existed) and rolls its own accelerator code, so it doesn't run into the same issue. Fixing properly may not be super-trivial; I'll look into it tomorrow. Benwing2 (talk) 07:45, 20 March 2023 (UTC)[reply]
@Helrasincke The case you mentioned should be fixed; see зимовий for an example. Please also check out черга and click on the genitive singular or nominative plural form черги and let me know if it looks right; this one is tricky to get right. Also see міст and click on instrumental singular мостом; this is another tricky case. Ideally all of these should have only one pronunciation section with all pronunciations given there with |ann=1 set; I think Russian does that currently, and I'll have to see how to get that for Ukrainian. Benwing2 (talk) 23:50, 20 March 2023 (UTC)[reply]
@Benwing2 Yes that seems to be working as desired, thanks for fixing that. As for the pronunciation section, yes, they are still divided though that could be a matter of editorial preference. As you mentioned, the Russian entries use |ann=y and it seems to me that in due course this would be a handy feature to have irrespective of language. Helrasincke (talk) 00:13, 21 March 2023 (UTC)[reply]
@Helrasincke I implemented combined pronunciation sections and added |ann=1 to the Ukrainian generated pronunciation whenever there is more than one target. User:Горец I'm not sure if it's realistically possible to have multiple etymologies or pronunciations in Macedonian but if so, this will change the way they are displayed. I made a functionality to customize the generation of the combined sections if you need this functionality. Benwing2 (talk) 08:56, 21 March 2023 (UTC)[reply]
Thanks, @Benwing2. Sorry for the delay in response, I was absent for some time.
Maybe in rare cases. If we need further help about this we will contact you. Gorec (talk) 15:48, 12 April 2023 (UTC)[reply]

Template-protected edit request to Template:cite-newsgroup on 19 March 2023

[edit]

Template:cite-newsgroup's documentation page numerically labels the date/year parameter as |6=. Currently, the output display is author, then date/year, then title. I ask that the section of the source code which deals with the date/year be moved up from |6= to |3=, in keeping with the standard output.

There seems to be another problem with the template which I've raised at Talk:Mexican standoff#Template display. Thank you. — CJDOS, Sheridan, OR (talk) 20:40, 19 March 2023 (UTC)[reply]

It was just a typo. The numbers should not have appeared at all. I have corrected it. — Sgconlaw (talk) 17:04, 20 March 2023 (UTC)[reply]
Thank you, Sgconlaw, for the assistance. I see that in the process you've reverted my edit, returning the date parameter to after the |title= and |newsgroup= parameters. The output is still Author (Date), "Title", which is standard; I just wanted the documentation to match the actual order of output. — CJDOS, Sheridan, OR (talk) 06:27, 21 March 2023 (UTC)[reply]

Incorrect Category:Hungarian terms with hyphenation

[edit]

This new (red-link) category appears in every Hungarian entry and in other languages, as well. When I remove {{hyphenation}}, it disappears. Was this template edited recently? Panda10 (talk) 17:09, 21 March 2023 (UTC)[reply]

@Panda10 User:Fenakhay added this earlier today. Can you explain your addition of this category and removal of French from the languages where syllable-counting categories are added? Benwing2 (talk) 19:28, 21 March 2023 (UTC)[reply]
@Panda10, Fenakhay I have disabled this category addition for the moment. We should discuss what its purpose is and whether it's really necessary; e.g. if you need it for some internal purpose, you should just use the template tracking mechanism. Benwing2 (talk) 08:06, 22 March 2023 (UTC)[reply]

Bug with {{l}} or {{m}} in non-gloss definitions

[edit]

When plain text is followed by a link generated by {{l}} (or {{m}}) inside a {{n-g}} line, the space before the link is not displayed (e.g. POS#Noun, def #4). Einstein2 (talk) 17:41, 22 March 2023 (UTC)[reply]

gloss/gl template

[edit]

I've noticed this a few times in the past, but ("woodsman"; via Old French bigre from Medieval Latin bigarius) - when gl or gloss is used in this way, the spacing gets removed between the names of the languages and the remaining text. This is occurring with cog but also with m+ . Leasnam (talk) 18:46, 22 March 2023 (UTC)[reply]

@Leasnam This is fixed, as per the below. Theknightwho (talk) 07:50, 23 March 2023 (UTC)[reply]
Thank you ! Leasnam (talk) 13:54, 23 March 2023 (UTC)[reply]

"phonetic alphabet" template error

[edit]

See Yankee. There is a missing space, so it says "code for the letterY" [sic]. Equinox 01:37, 23 March 2023 (UTC)[reply]

@Equinox This is the same bug as noted in the preceding two sections. I don't know the cause but I am guessing it was introduced by some change made recently by User:Theknightwho; can you take a look? Benwing2 (talk) 03:59, 23 March 2023 (UTC)[reply]
@Benwing2 @Equinox The full explanation for what caused this is complicated, but it affected situations where HTML entities for spaces (in this case &#x20;) were being placed right before links. It's fixed.
I will put this in the documentation, but the way trimming is supposed to work is that Module:languages should only trim whitespace which get added during the substitution process, and keep any which gets passed to it in the first place. Theknightwho (talk) 07:26, 23 March 2023 (UTC)[reply]
It seems that the {{l}} template strips out a space whenever the last word in the link is just one letter, for example ooo o links to oooo and not to ooo o. This affects only a few pages, one of which is I know you are but what am I. But it's still a bug. Is it possible to get the link template to use the string literal input instead of processing it? Soap 08:48, 23 March 2023 (UTC)[reply]
@Soap Sorry - that was just down to a brainfart when I did this fix. It was supposed to conditionally trim added whitespace before the first character, but was conditionally removing whitespace before the last character instead. Theknightwho (talk) 09:00, 23 March 2023 (UTC)[reply]
@Soap In response to your question above, I'm not exactly sure what is going on currently but in general Module:links has to process its input at least to convert links to language-specific ones. Benwing2 (talk) 17:04, 23 March 2023 (UTC)[reply]

{{n-g}} deletes colon after brackets

[edit]

Related? {{n-g|a burden: what it is}} renders as a burden: what it is. Another example of something that crops up only very rarely, on just a tiny number of pages, but is still a bug. Found on dildo, fixed temporarily by using nowiki around the colon. Soap 18:36, 23 March 2023 (UTC)[reply]

@Theknightwho, Soap This is surely related. Is this connected with processing interwiki links? If so you need to check that what precedes is only in the form of a language code, and probably that there's no space following. (In any case however, {{n-g}} isn't supposed to do link processing at all, so I think some rethinking of the def_t code is in order.) Benwing2 (talk) 18:46, 23 March 2023 (UTC)[reply]
@Benwing2 I'm suspect you're right about it being related to interwiki links - I'll investigate, as it's supposed to check against the server's own interwiki prefix list (meaning no false positives/negatives). {{n-g}} does actually process language links, as it's meant to generate links to the English section. This was formerly implemented as a special case, but now there's a general mechanism for it. Theknightwho (talk) 18:56, 23 March 2023 (UTC)[reply]
@Benwing2 @Soap This was a very easy fix: makeDisplayText in Module:languages checks for interwiki prefixes at the start of the string (and : on its own is one of them). makeDisplayText has a keepPrefixes param, which disables their removal. This was already being used when Module:links detected no links at all, but if embedded links are found Module:links follows a different routine when it comes to non-linked text. It was that routine which wasn't using keepPrefixes. Theknightwho (talk) 19:16, 23 March 2023 (UTC)[reply]
@Theknightwho Thanks. Why is it checking for interwiki links, though, in this case? It should only check at the start of the actual string, not in parts following links that may end up string-initial due to the way you chop things up. Benwing2 (talk) 19:22, 23 March 2023 (UTC)[reply]
@Benwing2 There are two modules interacting here. Module:links has three main functions (which I will get on properly documenting tomorrow).
  • language_link (which is called by full_link) defaults to a language link, unless embedded links are found, in which case it deals with them individually, and then doesn't link the other text.
  • plain_link, which is a non-language link (mostly for use by {{also}}), which means it's able to take advantage of various special features of other link templates without needing a language param (e.g. unsupported titles).
  • embedded_language_links, which is called by templates like {{n-g}} and {{lang}}; this deals with any embedded links, but won't link the text if none are found.
Module:links will deal with all the stuff that's specific to links, but will offload general language processing to Module:languages, which has makeEntryName and makeDisplayText, which determine the target and display form respectively. When it comes to non-linked text, Module:links still pushes it to Module:languages to get the display form. When there are no links involved it's straightforward, but where it gets tricky is handling all the non-linked text when there are embedded links involved. This is done in an iteration after the links have been processed. Module:languages was receiving : what it is on the second iteration, which is where the problem was coming from.
The main issue is that it's currently not a good idea to process text twice, because it can result in double escapes or double substitutions that are not desirable (e.g. a transliteration might get wrongly processed as a Latin script term). The reason it's possible to nest link templates to an arbitrary level is because Module:links specifically avoids doing this, as it essentially strips it all out and reprocesses the link. Theknightwho (talk) 19:43, 23 March 2023 (UTC)[reply]
@Theknightwho I agree we should not be processing text twice. What I'm saying is when Module:languages receives : what it is, it should know not to even check for interwiki links; you don't want it to be passed e.g. {{n-g|burdens: serious problems}} and interpret the s as a Wikisource link. Let's see what it does currently: burdens: serious problems. (Seems to work; maybe you are already accounting for this?) Benwing2 (talk) 19:50, 23 March 2023 (UTC)[reply]
@Benwing2 Yep - that's exactly what the keepPrefixes param is for :) I wanted to implement prefix processing in Module:links itself (for obvious reasons), but doing that would make it impossible to use : a way of overriding entry names, e.g. Latin . Although : is overloaded, this is something we shouldn't get rid of imo, because it's already standard MW behaviour with other special links like categories. Unfortunately, the only way to handle prefixes properly was to do it right in the middle of all the language processing, which suggests we need to do a major redesign, as Module:links and Module:languages are too interwoven at the moment. Theknightwho (talk) 19:59, 23 March 2023 (UTC)[reply]

Allow user to close all sections with one click

[edit]

wgMFCollapseSectionsByDefault should be a preference that we can change. Etc. Please see https://phabricator.wikimedia.org/T332852 . Jidanni (talk) 10:48, 23 March 2023 (UTC)[reply]

Misformatted altforms

[edit]

A large proportion of alternative forms listed on Wiktionary incorrectly use {{l}} rather than {{alt}}. A quick selection of examples: Dalmatian recolegro, English metre, Ukrainian весь (vesʹ). Can we have a bot fix this? Nicodene (talk) 13:01, 23 March 2023 (UTC)[reply]

@Nicodene I have a script to do this and I have run it for various languages but not universally. It feels safer doing this one language at a time so I can check the output; any specific languages you want run? Benwing2 (talk) 17:06, 23 March 2023 (UTC)[reply]
@Benwing2 Aromanian (rup) and Romansch (ro) are some of the worst offenders that I've come across. Dalmatian (dlm), Lombard (lmo), and Friulian (fur) are the runners-up.
If it would speed things along, I could check outputs as well. Nicodene (talk) 17:16, 23 March 2023 (UTC)[reply]
@Nicodene This is done. I had to run a separate script beforehand to templatize raw links, esp. for Aromanian, which had a ton of them. Let me know if there are other languages needing doing. All of the languages you mentioned are minor Romance languages, and I'm not sure I've run my script on all the major Romance languages; it looks like I've done French, Italian and Portuguese, as well as (on request) a bunch of Philippine languages, but not Spanish, Romanian or Catalan. Benwing2 (talk) 22:11, 23 March 2023 (UTC)[reply]
@Benwing2 Looks great, thank you. That's my main headache sorted. I can't say I've noticed misformatted altforms being a major issue for the other Romance languages, though there are a few here and there. Nicodene (talk) 22:30, 23 March 2023 (UTC)[reply]
@Nicodene: Where is it decreed that {{alt}} shall be used instead of {{l}}? Retract your vile allegation. --RichardW57m (talk) 09:42, 24 March 2023 (UTC)[reply]
Of all the bizarre hills to die on that has to be one of the funniest. Nicodene (talk) 11:20, 24 March 2023 (UTC)[reply]
@RichardW57m You know, you don't have to start so aggressively. Please watch your tone. {{alt}}, believe it or not, is better set up to handle the information needed in in that section, such as clarifications, and is machine-readbale. Vininn126 (talk) 11:26, 24 March 2023 (UTC)[reply]
Sorry, small administrative issue sorting out the paperwork, here it is: “Whereas {{alt}} is better suited for the task of listing alternative forms than {{l}}, I decree that {{alt}} shall be substituted for {{l}} in every place whatsoever where the latter may be found in an Alternative forms section. Given this 24th day of March, 2023, in the Abyss of Grease. Tremble and obey.” —Al-Muqanna المقنع (talk) 11:50, 24 March 2023 (UTC)[reply]
@al-Muqanna, Vininn126: Ultra vires. Besides, burying it in a Grease Pit discussion, though far from the worst place, is not very effective. For a mere list of alternative forms, I don't actually see that {{alt}} is superior to {{l}}, though it better accommodates qualifiers and the like. I would therefore not aggressively say that the use of {{l}} is incorrect. For example, how does the change actually improve the entry at English metre? --12:38, 24 March 2023 (UTC) RichardW57m (talk) 12:38, 24 March 2023 (UTC)[reply]
So while it accommodates qualifiers better, it's not better? You also realize Al-Muqanna was joking, right? Vininn126 (talk) 12:40, 24 March 2023 (UTC)[reply]
Accommodating qualifiers better is of no use if there are no qualifiers to accommodate. Indeed, {{alt}} might be slightly worse, for it causes more Lua modules to be loaded. Now, working with {{desctree}} is an advantage - I think it's worth mentioning it in Template:alter/documentation. --RichardW57m (talk) 13:09, 24 March 2023 (UTC)[reply]
Grand, looks like it's strictly better unless a page is hitting memory limits so let's stick to the consensus to use {{alt}} and deal with that problem if and where it actually comes up. —Al-Muqanna المقنع (talk) 13:23, 24 March 2023 (UTC)[reply]
How about documenting this claimed consensus? --RichardW57m (talk) 13:46, 24 March 2023 (UTC)[reply]
Do we need to worry about the interactions of {{desc}} and {{desctree}} with {{pi-alt}} and {{sa-alt}} and any of their analogues? The latter two list the alternative forms of a word in different writing systems in the multi-script languages Pali and Sanskrit. I've never seen any need to list alternative forms listed by {{pi-alt}} separately. --RichardW57m (talk) 13:46, 24 March 2023 (UTC)[reply]
@RichardW57m This is precisely why it’s a bad idea to create your own template ecosystem. I did tell you this. Theknightwho (talk) 15:01, 24 March 2023 (UTC)[reply]
@Theknightwho, Octahedron80, Svartava: So what are you proposing? Replacing {{pi-alt|Laoo=ພຸທ຺ຘ|Laoo2=ພຸທຘະ|Laoo3=ພຸທທ຺ະ|Laoo4=ພຸດທະ|Latn=buddha}} (expansion is at buddha) with something like {{alt|pi|𑀩𑀼𑀤𑁆𑀥<q:Brah>|बुद्ध<q:Deva>|বুদ্ধ<q:Beng>|බුද‍්ධ<q:Sinh>|ဗုဒ္ဓ<q:m0,m1>|ၿုၻ္ꩪ<q:m2>|ၿုၻ်ꩪ<q:m3>|พุทฺธ<q:t0>|พุทธะ<q:t1>|ᨻᩩᨴ᩠ᨵ<q:Tham>|ພຸທ຺ຘ<q:l0>|ພຸທຘະ<q:l1>|ພຸທທ຺ະ<q:l2>|ພຸດທະ<q:l3>|ពុទ្ធ<q:Khmr>|𑄝𑄪𑄘𑄴𑄙<q:Cakm>}} with all 16 non-Roman forms entered manually? At present the {{pi-alt}} invocation is present for all forms, which happens to be helpful for an editor, but I'm starting to replace its more complicated invocations by bespoke templates. This will make an editor's life easier but a scraper's life harder. Do you think we should prioritise a scraper's convenience over editors'? --RichardW57m (talk) 16:17, 24 March 2023 (UTC)[reply]
Nobody has proposed any of that stuff. The proposal was to run a bot job to replace bare {{l}} with {{alt}} in a few specific languages, and you've reacted by popping off about decrees and about bespoke, unrelated templates that exist for totally different languages nobody else was talking about. To be clear, my "decree" was a joke—we do things on the basis of established practice and community consensus, not decrees handed down from above, so demanding to see them is strange. —Al-Muqanna المقنع (talk) 16:57, 24 March 2023 (UTC)[reply]
How was I supposed to know that the original hope was not for the change to be done for all languages? --RichardW57 (talk) 18:49, 24 March 2023 (UTC)[reply]
You've replied to a reply to the complaint of @Theknightwho, not directly to the request of @Nicodene for some bot actions. The answer to my original query with regard to {{pi-alt}} might be that it isn't an issue at present, because the set of alternatives generally only contains one form in the main script for Wiktionary. I can see two plausible problems:
  1. The most likely problem is how I expose the optional sandhi that converts final 'ṃ' to 'm' before vowels in the Roman script; that erases a word boundary in the supported non-Roman scripts, so isn't recorded there. Perhaps I need to stop half-hiding it under {{pi-alt}}.
  2. For the main script, one can have two citation forms, implicitly or explicitly in the |Latn= parameter. For this the solution is for scrapers to treat {{pi-alt|Latn=xxx}} as {{alter|xxx}}, and, depending on the styles of editors, {{sa-alt|Deva=xxx}} as {{alter|xxx}}. There may, however, be an issue with wrapping {{pi-alt}} in a template.
Do you, Al-Muqanna, want discussion of this interaction moved to another topic? --RichardW57 (talk) 18:49, 24 March 2023 (UTC)[reply]
ALT allows for multiple forms to be neatly put together, while L forces awkward chaining. The matter of machine readability has also been mentioned- note that L fails to work at all with desctree and desc|alts=1. Nicodene (talk) 12:50, 24 March 2023 (UTC)[reply]

Estonian - declension type 17/elu - template error

[edit]

Not all the words of type 17 have a short form in illative case. However, the current template of the type 17/elu takes the short form as granted and generates a nonexistent short form, such as lõvi > lõvisse/*lõvvi.

It would be better to add an extra parameter into the template to manually indicate whether a specific word has a short form.

One step further, phonologic restrictions (b d g h j k l m n p r s t) could be applied to automatically determine whether a short form is available. Kaʋel (talk) 13:53, 23 March 2023 (UTC)[reply]

@Coywjs Unfortunately the editor who created these tables (User:Rua) is not currently active. The code that controls this is around line 500 of Module:et-nominals but someone who knows a bit about Estonian will have to work on this. Benwing2 (talk) 19:19, 23 March 2023 (UTC)[reply]
Thank you so much for taking time to date back the original contributor.
With the help of the link you offered, I attempted to revise a bit and it already worked. Kaʋel (talk) 14:34, 24 March 2023 (UTC)[reply]

Template {{RQ:KJV}}

[edit]

I notice that this template, used for linking to scripture quotations, always renders "The Holy Bible" in italics. Most style guides maintain that the name of a particular religion's holy book should not be italicized. Is this a fixworthy issue? – HelpMyUnbelief (talk) 21:37, 23 March 2023 (UTC)[reply]

I would say no, because we are treating it as the title of a book. Here at the Wiktionary it is important that the specific edition or version of a work (including a piece of religious scripture) is specified, as the text varies from version to version. — Sgconlaw (talk) 21:45, 23 March 2023 (UTC)[reply]
I agree here, I think the style guides you are referring to are to be used in running text, e.g. "It says in the Koran that ..." which looks better than "It says in the Koran that ..."; but here it's being used in a citation, which is a different story. Benwing2 (talk) 22:07, 23 March 2023 (UTC)[reply]
@HelpMyUnbelief @Sgconlaw On a separate-but-related point, I'm surprised that we're citing the 1611 version, as it's hardly ever used and likely to be wrong in most instances. The first page I clicked on was because, and it's using the modenised (18th c.) text: "work" is spelled "worke" in the 1611 ed. Theknightwho (talk) 23:13, 23 March 2023 (UTC)[reply]
For lexicographical purposes it makes sense to use the first edition since it reflects the original context in which the word choice was made. I've corrected KJV citations citing modernised text to the text of the 1611 ed. whenever I come across them and the word is actually in the 1611. —Al-Muqanna المقنع (talk) 00:01, 24 March 2023 (UTC)[reply]
Yes, the 1611 version is significant because it is the first edition. Generally it is lexicographical practice to use the first edition of a work regardless of whether it is the most accurate version of the work or not, as it is likely to be the earliest instance of the use of a term in the work. (Of course in some cases a particular term does not appear in the earliest edition but only in a later edition. For example, some terms only appear in the First Folio of Shakespeare’s plays and not in the earlier quartos.) — Sgconlaw (talk) 01:17, 24 March 2023 (UTC)[reply]
So it's not about copyright? I've made entries for a few Bible quotes or things that derive from Bible quotes, and thought that we had to use the 1611 site because the more modern NKJV is under copyright protection in England. For at least a few entries, I'd think a newer version would be a better fit. Soap 05:47, 24 March 2023 (UTC)[reply]
In England, all versions are under the protection of something akin to crown copyright, perhaps better described as a monopoly right. The law is different in pariah states like the US, so one is probably OK if one is not working in the UK, using UK computing facilities, or is a British national. However, I'm not sure in which of His Majesty's domains said crown right still applies, so caveat editor. RichardW57m (talk) 09:37, 24 March 2023 (UTC)[reply]

Redirects for emojis with variation selectors?

[edit]

@Erutuon

Just FYI, I'm tagging all emoji+vs sequence redirects for deletion. Checking as I go that none are linked from anywhere, because unlike the behaviour of a search, they won't be automatically connected if linked. So far only only was. kwami (talk) 00:22, 24 March 2023 (UTC)[reply]

@Kwamikagami I think it's fine to have hard redirects on things like this. They're something that someone might plausibly type into the search bar without realising there's a VS there (as most people don't know what they are). Theknightwho (talk) 17:38, 24 March 2023 (UTC)[reply]
They will redirect automatically. Equinox 17:40, 24 March 2023 (UTC)[reply]
Great - I didn't realise that. Theknightwho (talk) 18:05, 24 March 2023 (UTC)[reply]
They redirect automatically from the search engine, but not from a bracketed link in an article. But apart from some mistakes I made, only one of them (the copyright) had been linked to that way. Now when I get a red link, that tells me I made an error. kwami (talk) 20:39, 24 March 2023 (UTC)[reply]
I edited the nonexistent page text to put up a link if there's a title without a variation selector, and that makes existing JavaScript code redirect to it after three seconds. — Eru·tuon 02:39, 25 March 2023 (UTC)[reply]
Fantastic, thank you. - -sche (discuss) 19:14, 25 March 2023 (UTC)[reply]
Do you mean the line (e.g.) "Did you mean 0?" Yes, that handles the problem quite nicely, I think. I don't follow your comment about JS, but I probably don't need to. kwami (talk) 21:42, 25 March 2023 (UTC)[reply]
Should we have such a line even when the character doesn't have an article, so that people creating a new article will do so at the right place? Or will we know to move it if they do? (I don't know if this would actually be a problem.) kwami (talk) 21:53, 25 March 2023 (UTC)[reply]

Double Spanish past participles

[edit]

I used the ACCEL tool to create crepado, clicking from crepar. The result was a repeated defn line. That needs fixing, somehow. Van Man Fan (talk) 22:32, 24 March 2023 (UTC)[reply]

Template:normalized needs to be normalized

[edit]

I just fixed a module error at 𐤊𐤋𐤌 caused by this template. There was no error displayed, and I only figured out which template was causing the error was by systematically commenting out different parts of the wikitext until I found out which one made the error go away. I finally figured out that it was because the language code in the first parameter was for an etymology-only language that was rejected by the Module:languages/templates as invalid. The module error was fed into an {{#ifexist:}} ParserFunction, and since there is no page that includes a Lua error message in its name, nothing was displayed.

What's more, it's not obvious why this even takes a language code for the first parameter. As far as I can tell, the language code is used to display the "Wiktionary:About" page for the language if there is one, and to provide the language code for {{lang}} to display the |from= parameter. The addition of the |fromlang= parameter just made it more confusing.

I'm not sure how to untangle this mess, but I would appreciate if someone would take a look at this who knows template code and can figure out what it's supposed to do. We can't have a template throwing invisible, undiagnosable module errors if someone gets the language code wrong. Pinging @The Editor's Apprentice, who added the ParserFunction part. Chuck Entz (talk) 02:36, 25 March 2023 (UTC)[reply]

Hi there @Chuck Entz, I've sandboxed a new version that should now display this type of error to users, see the current test cases for a demonstration. For details about how the sandboxed version works, I've left comments in the code. Let me know if you have more questions. I'll also ping @Catonif since they use the template a decent amount (AFAICT) and have edited its code lightly in the passed and in case they want to add anything. —The Editor's Apprentice (talk) 06:55, 25 March 2023 (UTC)[reply]

Strips L3

[edit]

> This action has been automatically identified as harmful, and therefore disallowed. If you believe your action was constructive, please start a new Grease pit discussion and describe what you were trying to do. A brief description of the abuse rule which your action matched is: strips L3

I'm trying to mark a page as 'no entry', similar to ادھنا, but it doesn't let me submit it because of this issue. Is there a way to skip this warning? نعم البدل (talk) 05:11, 25 March 2023 (UTC)[reply]

Your specimen entry looks wrong - it currently lack an L2 header! --RichardW57 (talk) 09:37, 25 March 2023 (UTC)[reply]
I've fixed the entry in question. {{no entry}} entries should contain the language header for the relevant language. This, that and the other (talk) 21:25, 25 March 2023 (UTC)[reply]
Thank you! نعم البدل (talk) 22:06, 25 March 2023 (UTC)[reply]

Why is this categorizing into Spanish diminutive nouns? The {{dim of}} template has |pos=adjective. Ultimateria (talk) 16:34, 25 March 2023 (UTC)[reply]

It looks like the POS= parameter is case-sensitive. JeffDoozan (talk) 16:47, 25 March 2023 (UTC)[reply]
Could you change it to not be? I imagine this isn't the only entry with this error. Ultimateria (talk) 18:35, 26 March 2023 (UTC)[reply]
@Ultimateria |pos= is used for a different purpose, otherwise spelling it that way would throw an error. We could consider renaming |pos= to something else but in general template params are case sensitive. Benwing2 (talk) 19:00, 26 March 2023 (UTC)[reply]

different fonts for different Syriac-script languages

[edit]

At MediaWiki talk:Common.css#Applying_correct_noto_fonts_for_aii_and_tru, it was requested that different fonts be set for .Syrc:lang(tru) vs .Syrc:lang(aii). I implemented the request since MediaWiki:Common.css already sets other script+language specific fonts in that way, like .Hani:lang(vi) and .Xsux:lang(hit) vs .Xsux:lang(sux), but User:ColumbaBush reports it's not having the desired effect (though it does seem to have an effect for me). Is there any problem on our end that needs fixing (in particular: does .SCRIPT:lang(LANGUAGE) work? and independent of that, should we specify Syrn and Syrj as separate from Syrc in Module:scripts/data, and/or create codes like aii-Syrc, a la fa-Arab?), or is the problem on the end of the users, who need to download fonts if they want those fonts? - -sche (discuss) 19:11, 25 March 2023 (UTC)[reply]

I responded on the talk page and changed the order of the fonts in the CSS because the old order would have applied the Noto Sans Syriac font to both languages because it was before the Eastern and Western fonts in the font-family list. Does that fix the problem, ColumbaBush? — Eru·tuon 21:25, 25 March 2023 (UTC)[reply]
@ColumbaBush, if you/relatives still aren't seeing things in Noto Sans Syriac Western/Eastern, I get the impression that the issue is that they have to install the font. I'm not sure if we can install it for them(?), at least not on the level of Wiktionary's CSS. Or can we? I recall we had to file a Bugzilla (now Phabricator) request one time to get "WebFonts" to use a certain font for Hebrew; is "WebFonts" still a thing (I can't find any local documentation) and would it help to file a Phab request for WebFonts to load these Noto Sans fonts? - -sche (discuss) 21:36, 25 March 2023 (UTC)[reply]
Thanks so much for both of your follow ups, I really appreciate it 🙂
Just a bit of background the estrangela, eastern and western fonts were regrettably rolled into one a while back https://github.com/notofonts/syriac/issues/10#issuecomment-1160919173 which is the root of this issue but there's now an active effort to decouple them https://github.com/notofonts/syriac/issues/10#issuecomment-1160919180. Once this code gets merged https://github.com/google/fonts/pull/5740 and there's a new macOS and iOS upgrade which provides these fonts I think everything should render properly (I was trying to shortcut this by suggesting we change the lang attr, but I can be a bit patient here.)
IE there's no more changes we need to make here on wiktionary.
ColumbaBush (talk) 21:48, 25 March 2023 (UTC)[reply]

Old Portuguese → Old Galician-Portuguese

[edit]

Per this discussion, and the earlier one, we appear to have a solid consensus for renaming the language.

Quite a few Galician ety's will need to be fixed, namely the ones with a manually entered "From Old Galician and {{inh|gl|roa-opt|-}}" (e.g. abella). I could do that part if needed. Nicodene (talk) 20:11, 25 March 2023 (UTC)[reply]

@Nicodene OK I'll wait a few days and then make the change. I recently changed 'lou' from 'Louisiana Creole French' to 'Louisiana Creole' and exactly the same steps work here. Benwing2 (talk) 22:41, 25 March 2023 (UTC)[reply]
@Nicodene This change should be complete. Let me know if you see any instances of 'Old Portuguese' still. There were 5 new categories beginning with 'Old Portuguese' that were created between when I last listed all categories (last night) and now; there may be other stragglers of this nature that contain 'Old Portuguese' in the name but don't begin with 'Old Portuguese', but hopefully not. Benwing2 (talk) 22:48, 28 March 2023 (UTC)[reply]
Brilliant, thank you. I'll get to work sorting out the Galician etymologies. Nicodene (talk) 22:57, 28 March 2023 (UTC)[reply]
@Nicodene Don't worry about those, I'm doing a run to fix them. There are about 2,150 of them so a lot to do by hand. Benwing2 (talk) 23:23, 28 March 2023 (UTC)[reply]
Thanks again! Nicodene (talk) 23:25, 28 March 2023 (UTC)[reply]
@Nicodene See User:Benwing2/galician-lemmas-old-galician-or-galician-portuguese. This is a list of the remaining 163 Galician lemmas that have the strings 'Old Galician' and/or 'Galician-Portuguese' in them. These need manual cleanup. Benwing2 (talk) 01:34, 29 March 2023 (UTC)[reply]
@Benwing2 A few were actually were fine as-is (using 'Galician-Portuguese' in a modern sense). The rest should be fixed now. Nicodene (talk) 02:34, 29 March 2023 (UTC)[reply]

Please allow multiple |head= like with other header templates.

E.g. بْرُولِيتَارِيّ (brūlītāriyy) has two heads for a noun but only one for nisba. Anatoli T. (обсудить/вклад) 00:30, 27 March 2023 (UTC)[reply]

@Atitarev Should be fixed. I rewrote the {{ar-nisba}} code in Lua. Benwing2 (talk) 02:11, 27 March 2023 (UTC)[reply]
@Benwing2: While cleaning up warnings about ineffective parameters, specifically قُبْطِيّ (qubṭiyy), I notice that you forgot to modulize {{ar-noun-nisba}}, which can be placed on the bulk of pages having a nisba adjective. I tried to use {{ar-sing-noun}} as an alternative but it throws an error for |f=, though the feminine of قُبْطِيّ (qubṭiyy, Copt) obviously exist, so only {{ar-noun}} works, which does not express the collective–singulative relation with قُبْط (qubṭ). Fay Freak (talk) 01:38, 6 August 2024 (UTC)[reply]
@Fay Freak I'm working on rewriting Module:ar-verb. (Some time) after that I'll fix {{ar-noun-nisba}}. Benwing2 (talk) 02:36, 6 August 2024 (UTC)[reply]

"Lua error in Module:languages at line 791"

[edit]

"The function getByCode expects a string as its first argument, but received a table." In e.g. outfire#Etymology. (Also noted Module_talk:languages#Error.) Is moving fast and breaking things breaking things again? - -sche (discuss) 07:36, 27 March 2023 (UTC)[reply]

@Sbb1413, Theknightwho If he is, he seems to have an imitator at Module:bn-IPA that accounts for most of the Bengali entries in cat:E. We've got the number of now-fixed module errors down from 6,000+ to under 300, most of which are Bengali words. As the module's declared not to be ready for deployment and has a set of testcases, I suggest the best thing for the Bengali module errors for now is to put suitable alternative text into Template:bn-IPA to allow rapid deployment if it's ever got working. (I'm not sure that automatic text to IPA can work well enough for Bengali - the implicit vowel may present a challenge.) --RichardW57m (talk) 14:50, 27 March 2023 (UTC)[reply]
This was down to a confluence of two things: a latent undeclared variable in Module:compound/templates (which was previously just getting ignored), which happened to have the same name as a variable that was accidentally declared as a global in Module:languages. Module:compound/templates was therefore trying to process the global, and it caused this unexpected issue. I've fixed both. Theknightwho (talk) 15:02, 27 March 2023 (UTC)[reply]
The Bengali issue has also been resolved, by the developer backing out the change, and there have been discussions with that developer on how to stop that problem reappearing in his current work. --RichardW57m (talk) 17:24, 27 March 2023 (UTC)[reply]

Template:compound has a Lua error

[edit]

Saw an error on 看板娘 where this template is used. The error also seems to happen in the template page as well. Someone fix? I'm not familiar with Lua in Mediawiki. User670839245 (talk) 07:36, 27 March 2023 (UTC)[reply]

Edit: looks like the message above me is talking about the same thing User670839245 (talk) 07:37, 27 March 2023 (UTC)[reply]

Lua error on eyeball

[edit]

Hi, there is a Lua error in the etymology section of eyeball. I let you fix this error. Pamputt (talk) 07:47, 27 March 2023 (UTC)[reply]

Deprecating Template:zh-der

[edit]

Previous relevant discussions:

I think it's safe to say that there isn't any reason at all as to keeping {{zh-der}} as a separate template now that simplified forms could be automatically generated. Also note that the Lua code is terribly written, and the template syntax for specifying tr/t/q is very clumsy: in fact I couldn't add both tr and t since they both share using the colon as input, and also the only way to remove the pinyin output is to specify tr (which I could not) or adding an asterisk and specifying the simplified form. @justinrleung, theknightwhoWpi31 (talk) 10:11, 27 March 2023 (UTC)[reply]

Support. One handy thing that we have is {{subst:zh-new/der}}, which autopopulates {{zh-der}} with entries from 國語辭典 or 漢語大詞典. We would have to tweak that to autogenerate {{der3}} (?) instead, which should be simple. — justin(r)leung (t...) | c=› } 12:59, 27 March 2023 (UTC)[reply]
Support - this all makes sense. Theknightwho (talk) 14:58, 27 March 2023 (UTC)[reply]
Support. It seems a lot of the Chinese-specific (and other East Asian) code is badly written and has weird syntactic quirks (e.g. {{zh-l}} allows param 2= to be either the simplified form, the gloss or the transliteration, and guesses which one it is). There are 38000+ uses of {{zh-der}} so it definitely needs a bot to orphan it. What should be the replacement template? {{der3}}/{{der2}}, {{col3}}/{{col2}}, ...? It seems {{zh-der}} looks at the number of characters in the pagename, and uses three columns if there's one character and two columns otherwise. Should we preserve this behavior or standardize on some specific number in all cases? Benwing2 (talk) 21:51, 27 March 2023 (UTC)[reply]
Just as a heads up, I did some page previews to see what the memory impact would be, and it seemed like {{der3}} caused a significant spike. I’m not entirely sure why. Theknightwho (talk) 22:03, 27 March 2023 (UTC)[reply]
@Theknightwho Hmmm, can you give some numbers and specifics on what you tested? The only significant differences I can see between {{der3}} and {{zh-der}} are (1) {{der3}} is wrapped in {{check deprecated lang param usage}} (which seems it shouldn't make a difference) and {{der3}} uses Module:columns instead of Module:columns/old. It might be useful to do a few experiments to see if there's a significant memory difference between those two modules. Benwing2 (talk) 23:35, 27 March 2023 (UTC)[reply]
Note also that {{zh-der}} adds the language span tags itself, which causes full_link() not to be invoked; possibly it's expensive stuff that your rewrite of Module:links is doing. Benwing2 (talk) 23:41, 27 March 2023 (UTC)[reply]
@Benwing2 I've realised I was previewing this with my userspace modules, which deep copy any data modules so that I can make test changes to data in a clear way (e.g. Module:User:Theknightwho/languages/data/2). Obviously that increases the memory hit, though. Theknightwho (talk) 22:34, 28 March 2023 (UTC)[reply]
Aha, makes sense. Can you do a test or two just replacing {{zh-der}} with {{der3}} and previewing a production page to see if it makes any difference? I would hope not ... Benwing2 (talk) 22:40, 28 March 2023 (UTC)[reply]
@Theknightwho Benwing2 (talk) 22:41, 28 March 2023 (UTC)[reply]
@Benwing2 So there's actually still a big memory hit by default, but if you set sort=0 there's generally a modest improvement instead. This makes sense, as {{zh-der}} doesn't use automatic sorting. We could always turn automatic sort off, but I'd prefer keeping it where possible, and just using sort=0 as the first port of call if we start seeing memory issues on a page.
I'm not sure how we could make Module:Hani-sortkey more efficient, but it may be possible. @Erutuon? Theknightwho (talk) 22:56, 28 March 2023 (UTC)[reply]
In any case, I think we can start by migrating the ones in non-single character entries, which shouldn't cause any issues with memory or whatsoever. – Wpi31 (talk) 06:14, 28 March 2023 (UTC)[reply]
Support. The thing I don't like when someone I know is mixing derivations from multiple topolects into one list, when terms are not used in Mandarin. Perhaps there is a way to group those somehow into subgroups like "Cantonese", "Min Nan/Hakka", etc or give appropriate labels. (It's not against topolects but for the sake of users who want to make sense of those lists). E.g. if a derivation in the list doesn't belong to Mandarin, it's a candidate for a subgroup. It may be done differently (simpler), if a derivation list is for, e.g Cantonese entry and all derivations are also Cantonese. --Anatoli T. (обсудить/вклад) 23:56, 28 March 2023 (UTC)[reply]
Agree that derivations should not be always mixed into one giant list. Note that there's a few cases where the pronunciations do not always match one-to-one, e.g. between Mandarin and Cantonese at , where a number of derived terms (and maybe senses, I haven't checked in detail) should be under the other pronunciation section based on the Cantonese pronunciation (but this probably warrants a separate discussion). For now, I think the current practice is good enough.
I should also mention the cases where there is only one compounds/derived term section but multiple etymologies, such as (which I assume all of them could be safely regarded as etymology 1), which I reckon to be a more pressing matter than separating the lists between lects. – Wpi31 (talk) 07:07, 1 April 2023 (UTC)[reply]

Making Chinese sortkeys more efficient

[edit]

@Theknightwho, Erutuon This should not be hard. Looking at Module:Hani-sortkey/data/009 for example, it is a table of 500 entries with consecutively numbered keys. Store this instead as an offset + a single string of all the values concatenated together and use a simple offset+string-index scheme and you should get huge memory savings. I also think the large number of modules is an issue; once converted to strings we should reduce the number of modules. Beyond that, another simple way to reduce the string length is to encode the stroke count as a single ASCII char rather than two numbers. Beyond that, some sort of run-length encoding is possible since nearby values tend to be the same, but that would make the lookup function more complex (it would have to binary-search to find the right value). Benwing2 (talk) 23:36, 28 March 2023 (UTC)[reply]

@Benwing2, Erutuon I've fixed this by creating Module:Hani-sortkey/data/serialized - I'm now seeing memory savings even with sorting turned on, which is promising.
Obviously it is completely unreasonable to expect a human to manually update the serialised data, so I've also created Module:Hani-sortkey/serializer, which generates it from the data modules. I think it would be a good idea to amalgamate the 196 data modules into a simple table, which can be updated by hand: it's more user-friendly and makes the serializer code simpler. It shouldn't be too hard to add this to Module:data consistency check. Theknightwho (talk) 02:03, 29 March 2023 (UTC)[reply]
I almost forgot - the the data is fragile in one way: it is entirely dependent on only using radicals in the U+3400-9FFF range for sorting, as these are 3-byte codepoints. I don't think any variant radicals are outside that range, but if any do get added then it'll fuck up the sortkey for every character after it, as U+10000 and up are 4-byte characters. Doing it this way means we can use a single string.sub, but it is something to be aware of. We really want to avoid the ustring library, as it has a 10,000 character limit that is relevant here. Theknightwho (talk) 02:10, 29 March 2023 (UTC)[reply]
@Theknightwho Great, thanks for doing this! Definitely I agree that the serialized data should be generated automatically. I will look into converting {{zh-der}}. I think maybe I'll just use {{col3}} for everything as {{col2}} seems wasteful; I looked at some two-character entries using {{zh-der}} and most of them have a lot of blank space. Benwing2 (talk) 02:20, 29 March 2023 (UTC)[reply]
@Benwing2 Great - that'd be a big help. I noticed that this change pushed a few pages into CAT:E, but converting {{zh-der}} to {{der3}} on the affected pages seems to be having a dramatic effect for the better. Plus, they're generally pages where pronunciation has already been hidden, too - so we're getting the pronunciation back + sorting, and still making big memory savings. Theknightwho (talk) 02:26, 29 March 2023 (UTC)[reply]
@Theknightwho Great to hear. Benwing2 (talk) 02:31, 29 March 2023 (UTC)[reply]
@Benwing2 It seems to be having the best effect on pages where the simplified forms have been manually specified. I'll look into serializing the simplification data, too. I don't think the effect will be as big, but it should help. Theknightwho (talk) 02:33, 29 March 2023 (UTC)[reply]
@Theknightwho Sounds good. Benwing2 (talk) 02:35, 29 March 2023 (UTC)[reply]

The issue with template transclusion limit

[edit]

Since Module:zh-translit obtains the page source for the pronunciation, this counts as a transclusion and is problematic on pages with many derived terms, e.g. on where I experimented with and found that the templates only load up to midway through the Korean secion. This wasn't a problem before with {{zh-der|hide_pron=1}} which doesn't involve transclusion. I suppose the tr will have to specified in such case. – Wpi31 (talk) 09:03, 30 March 2023 (UTC)[reply]

@Wpi31, Theknightwho I don't know the specifics of how this works, but can we either avoid doing this at all for characters with only one possible reading, and/or cache the results in at least some circumstances? Benwing2 (talk) 03:15, 31 March 2023 (UTC)[reply]
@Wpi31, Benwing2 This is a nontrivial problem, because the pronunciations are obtained from the pages for compound terms directly. However, it seems like nonexistent pages are currently counting towards the limit, so I'll see if there's a way to avoid that. That should provide us with enough of a buffer for the time being. Theknightwho (talk) 03:20, 31 March 2023 (UTC)[reply]
Actually, I'm not sure why there's such a big spike, as {{zh-der}} only causes it to use 1.5MB of the 2MB limit, even if you turn the pronunciations on; and it works via a similar method. I'll have to investigate. Theknightwho (talk) 03:59, 31 March 2023 (UTC)[reply]
@Wpi31, Benwing2 Took me a while, but I've found the culprit: the post-expand include size limit (i.e. the maximum size that the expanded wikitext of templates is allowed to be) is 2MB, but certain parser functions have a multiplier effect on any text run through them: notably, {{#if:}} and {{#switch:}}. Many templates run their outputs through {{check deprecated lang param usage}} via an {{#if:}}, which means that the entire output's contribution to the limit is doubled. In the case of (rén), this was doubling the contribution of a {{col3}} call with 2,279 terms, which obviously caused a massive uptick.
To help solve this, I've modified Module:columns so that it handles the checking for deprecated lang parameters internally instead, which brought the page under the limit. Unfortunately, that doesn't help if I want to add the 1,550 additional terms from 漢語大詞典 that we're missing (i.e. 3,779 in total), or for the 4,893 terms currently listed at (and yes, I know these numbers are insane). Theknightwho (talk) 14:52, 15 June 2023 (UTC)[reply]
Awesome, thank you User:Theknightwho. Benwing2 (talk) 18:18, 15 June 2023 (UTC)[reply]

Posting here because {{Edit protected}} directs me here. Please flip the redirects around for cat's pyjamas and the cat's pyjamas, and likewise for the alternative spellings. Then please edit Wiktionary:Criteria_for_inclusion#Articles to either use a better example (perhaps bomb vs the bomb) or to explain why one entry redirects to the other. If we keep the pajama example in it might be good to change the spelling as well to save people an extra click. Thank you, Soap 10:39, 27 March 2023 (UTC)[reply]

To clarify, the reason Im requesting this is that I think the expression is only used with a the preceding it. If I'm mistaken, then I don't think we need to have an entry for the cat's pyjamas at all, and therefore it should be replaced on the CFI page with a better example. However I think bomb ~ the bomb is a better contrast regardless of what we do with the other entries. Soap 11:31, 27 March 2023 (UTC)[reply]
@Soap Hmm, since CFI says we should omit articles unless it makes a difference, shouldn't we leave the main form at cat's pyjamas? I agree that the expression is normally the cat's pyjamas and the article wouldn't normally be omitted (except in an SOP usage). Examples where the article makes a difference are indeed bomb vs. the bomb, also shit vs. the shit (it's been pointed out before that it's shit and it's the shit are nearly opposite in meaning). We should also fix the bee's knees to point to just bee's knees (and consider creating an entry for dog's tuxedo with the same meaning :) ...). Benwing2 (talk) 22:02, 27 March 2023 (UTC)[reply]
I'm inclined to agree with Soap on this and follow the bee's knees. It does make a difference since "cat's pyjamas" without the is just an error unless you're mentioning the phrase itself, and with another article/determiner it loses its idiomatic meaning: "that cat's pyjamas", "a cat's pyjamas" etc. (The only example I could find of it being used, and not mentioned, in running text without the article is the dubious example "Aren't you cat's pyjamas" in a self-published dictionary.) This is different from proper nouns where "the" sometimes appears in headwords but not the title: those can drop the in some contexts like attributive use. —Al-Muqanna المقنع (talk) 00:25, 28 March 2023 (UTC)[reply]
Just sayin'. DCDuring (talk) 00:42, 16 June 2023 (UTC)[reply]
@Soap: Wiktionary:English entry guidelines § Phrases also needs to be changed after your move. J3133 (talk) 09:14, 5 October 2023 (UTC)[reply]

Alphabetization of tables and lists for editors

[edit]

We have templates that alphabetize tables for the benefit of users of such tables. This seems to work where the delimiter is " | ". AFAICT, we do not currently have a means of alphabetization that works for editors who maintain such tables.

Is there a means of alphabetizing lists (delimiter = " , ") and tables with delimiters like "CR/LF* "? Such alphabetization would be very handy for eliminating duplicates and other purposes. It could be something that only worked in the various edit windows and only on demand. It is cumbersome to copy such content to an offline editing tool that supports alphabetization and preserves wikiformat and then copy back the sorted and cleaned list or table. DCDuring (talk) 18:55, 27 March 2023 (UTC)[reply]

@DCDuring Are you looking for a tool that will generate the alphabetised wikitext and automatically eliminates any duplicates, essentially? That shouldn't be too difficult to do. Theknightwho (talk) 20:36, 27 March 2023 (UTC)[reply]
@Theknightwho I was thinking that exact duplicate elimination is not hard for a human either. Usually the alphabetization makes all sorts of other things easier too. DCDuring (talk) 20:44, 27 March 2023 (UTC)[reply]
@DCDuring, Theknightwho I am guessing DCDuring wants a Javascript gadget that works in the edit window and has a button or similar to alphabetize lists in templates such as {{col3}}/{{der3}}/{{rel3}}. I hardly know Javascript and don't know how to write gadgets, but if you can do this, please go ahead; I think there are a lot of potential gadgets that could be written to do lots of useful things for editors (or we could have one gadget with lots of options). Benwing2 (talk) 21:55, 27 March 2023 (UTC)[reply]
@Benwing2 @DCDuring Gotcha - I was thinking of something involving subst:, where you can use the diff to preview the Wikitext. A JavaScript gadget would be more handy, but that’s out of my skillset. Theknightwho (talk) 22:00, 27 March 2023 (UTC)[reply]
A subst:ed template like that would be great. A JS gadget might be easier for newer editors, but a subst:ed template is all I need. I assume/hope that it could ignore wikitext differences like use or non-use of the various list templates (and {{vern}} and {{taxlink}}).
Is it practical to have a template work similarly on horizontal lists, with commas and semicolons (and other punctuation?) as delimiters? There might be complications because both commas and semicolons appear in the same list, as well as commas inside parentheses that should not be interpreted as delimiters.
Thanks for considering this. DCDuring (talk) 23:24, 27 March 2023 (UTC)[reply]
@DCDuring Yes, it's definitely possible to make a substing template that works with horizontal lists and handles all the issues you bring up. However, can you clarify what you mean when you mention {{vern}} and {{taxlink}}? I assume that 'subst' on a template with arguments operates after the templates in the arguments are expanded, so the substing code wouldn't see {{vern}} or {{taxlink}} but instead the output of those templates (unless wrapped in <nowiki>...</nowiki>). But I think User:Theknightwho knows the ins and outs of template expansion and such better than I do, from having worked extensively on Module:links. Benwing2 (talk) 23:32, 27 March 2023 (UTC)[reply]
Right, the preview displays HTML, so wikitext formatting, templates, etc. don't matter. But my other concern remains: that commas within parentheses not be treated as delimiters, and that alphabetization work between semicolons, not across them. Ie, a list: "Ab, Cb, Ba; Aa" should be alphabetized as "Ab, Ba, Cb; Aa". If this is too picky or idiosyncratic, I could live without it. Could the scope of alphabetization be limited to selected text? DCDuring (talk) 23:49, 27 March 2023 (UTC)[reply]
@DCDuring What you're asking for isn't hard to implement. However, I'm not sure what you mean by "be limited to selected text". Also can you ping me in your responses? Otherwise I won't be notified and may not see them. Thanks! Benwing2 (talk) 02:50, 28 March 2023 (UTC)[reply]
@User:Benwing2 It may only be that one portion of what is visible in an edit window (for a section) that should be alphabetized. For example, there are some entries that have two derived terms sections, one for those with taxonomic or geographic content, one for other derived terms. Each should be alphabetized separately. DCDuring (talk) 12:01, 28 March 2023 (UTC)[reply]
@DCDuring I guess I don't quite understand your concern. What I'm thinking of implementing is something like {{subst:alpha|col3|en|term1|term2|term3|...}}; if you want to auto-alphabetize the contents of a given template, add subst:alpha| before the template, and it will do the rest, outputting the appropriate template call with the terms alphabetized (provided it recognizes the template name). This is similar to how {{chars}} operates. It wouldn't be applied to large chunks of wikitext. Benwing2 (talk) 00:50, 31 March 2023 (UTC)[reply]
I was imagining something more visual, but what you are thinking of implementing should address my problem as long as the contents (parm1) of {{vern}} and {{taxlink}} are treated like those of, say, {{l}}. I hope the template would be able to address a derived-terms section like that of [[duck]]. DCDuring (talk) 12:32, 31 March 2023 (UTC)[reply]
@DCDuring Try to ping if you can; I don't use Watchlists because it overwhelms me. The problem with the derived-terms section of duck that you're referring to (under Etymology 2) is that nested templates are expanded *BEFORE* the code that handles the subst: even runs. So the code would not see {{vern|...}} but the output of that template, and the result of substing would also contain the template output and not the actual call to {{vern|...}}. Essentially, this approach won't work at all with nested templates. That's why I said originally that you are thinking of a Javascript gadget. (User:Erutuon and User:Surjection seem to know how to write such gadgets; maybe you could ask them for help?) Benwing2 (talk) 01:13, 1 April 2023 (UTC)[reply]
Perhaps such a gadget would be possible, but supporting templates like {{vern}} and {{taxlink}} would probably require a full template parser, which is considerable work to implement. — SURJECTION / T / C / L / 14:18, 1 April 2023 (UTC)[reply]
I guess that means I have to extract items involving organism names from such tables and put them in a separate table (eg, of derived terms) using old-fashioned wikitext, which allows the table to have useful orange links. I would them be alphabetizing manually the somewhat shorter lists and the templates would handle the rest, except for duplicate elimination and near-duplicate grouping. DCDuring (talk) 17:10, 1 April 2023 (UTC)[reply]
@DCDuring The other possibility is to make a version of {{col}} etc. ({{taxcol}}?) that has built-in support for {{vern}} and {{taxlink}}, e.g. using a syntax like taxlink:Fragaria vesca subsp. vesca f. semperflorens<rank:form> or taxlink:Felis domestica<rank:species><wslink:cat>. Neither template looks so complex so this wouldn't be too hard. That way, {{alpha}} support for {{taxcol}} can be added without too much trouble. Benwing2 (talk) 17:38, 1 April 2023 (UTC)[reply]
I was reacting to User:Surjection's thoughts on this. I'm also fearing that, after significant work were put in to the effort, the result would have problems from my idiosyncratic POV. After all, I could simply copy a table into something offline, alphabetize, using Perl or something simpler with regex, and copy the result back to Wiktionary. This would only be necessary for the larger of tables. DCDuring (talk) 23:29, 1 April 2023 (UTC)[reply]
I was pinged over to here but I'm not entirely clear on what's being asked for. Alphabetizing bare words separated by commas is easy, but it's harder if the comma-separated stuff includes links or templates (which could include embedded commas), as Surjection says. Newline-separated stuff would be much easier because then most likely the items to be sorted would not contain newlines. Would it need special language-specific alphabetizing rules like in {{col-auto}}? To make it more concrete, what are some things you've wanted to alphabetize?
If this needs any sorting rules from Module:collation, either JavaScript or Lua could parse the list (to allow Lua to do it, the list would have to be enclosed in nowiki tags in #invoke:), and Lua would sort the list and JavaScript put it back into the edit box. Either Lua or JavaScript has to at least parse basic templates and not split them up over commas or newlines. The parser could reject complicated input, like maybe non-matching curly and square brackets, or nowiki tags, or HTML comments, to ensure it doesn't garble the list. Having implemented a restricted parser that fairly well parses entries in non-wiki-compatible Lua (using LPeg), it's really complicated to get all this stuff right, but it's probably possible to at least only split on commas that are not inside templates or links (only at the top level). — Eru·tuon 00:58, 2 April 2023 (UTC)[reply]
@User:Erutuon What I was looking for was a way to alphabetize the contents of tables from the PoV of a contibutor using an edit window. If I could do so for tables that have both taxonomic names and English names (in the same item) and templates like {{l|mul}}, {{taxlink}}, and {{vern}}, then I would have no problem with the goal of universal use of the various proposed column templates. Using newlines as item separators is useful to me, but is inconsistent with the design of the column templates, which seems to appeal to others. I expect to have to extract the items that contain {{taxlink}}, {{vern}}, and {{l|mul}} and its relatives to put them in separate tables. That is not unreasonable from a user PoV. Such tables would not use pipes as separators. A simple alphabetization tool using newlines as item separators would be all that I'd need to alphabetize such tables. DCDuring (talk) 14:12, 2 April 2023 (UTC)[reply]

Translation adder issues

[edit]

Something happened with {{t+}} - most (not all) translations lost (+) with a wikilink.

At Ethiopia#Translations, the common error "Glosses should be unique". Anatoli T. (обсудить/вклад) 00:48, 28 March 2023 (UTC)[reply]

@Atitarev This is due to some code changes by User:Theknightwho in Module:languages. I have reverted them. User:Theknightwho Before reinstating them, let's please discuss the plan for etymology language changes and make sure we're on the same page; I haven't even had a chance to review all the sandbox code you pinged me about. A redesign like this doesn't have to happen today or tomorrow; it can wait a few days. Benwing2 (talk) 02:49, 28 March 2023 (UTC)[reply]
+1. Please please please, let's discuss changes like this beforehand. There've been too many entry-breaking changes recently, and on my end, that's only with the Koreanic languages that I'm more aware of. I'm truly worried about the languages that I don't edit for and if they suddenly are broken with no one noticing. AG202 (talk) 12:27, 28 March 2023 (UTC)[reply]

The "Glosses should be unique" bug still pops up when trying to add a translation. However, as far as I can tell, it is limited to translations tables where {{trans-top}} uses the id= parameter. Einstein2 (talk) 21:13, 28 March 2023 (UTC)[reply]

The 1st issue with {{t+}} is gone, the "Glosses should be unique" error persists, e.g. at Ethiopia#Translations. Anatoli T. (обсудить/вклад) 03:13, 30 March 2023 (UTC)[reply]
[edit]

Any idea why links to local Wiktionary in language categories disappeared? TagaSanPedroAko (talk) 02:24, 28 March 2023 (UTC)[reply]

Selected Recent Changes

[edit]

Would it be at all possible to enable adding multiple categories to the Recent Changes filter? It would be nice to be able to keep a specified view on certain languages. Vininn126 (talk) 13:41, 28 March 2023 (UTC)[reply]

I'm not sure how Special:RecentChanges is implemented internally but I suspect you would need to file a Phabricator feature request for this, as I think the implementation is built into the MediaWiki software. Benwing2 (talk) 23:42, 28 March 2023 (UTC)[reply]
[edit]

I noticed that the interwiki links generated by {{t+}} does not parse the // syntax (that was added by TKW recently, to allow displaying terms with multiple forms), for example the Chinese translations at Taiwanese#translations. I suppose the desired behaviour would be linking only to the first form. – Wpi31 (talk) 17:02, 28 March 2023 (UTC)[reply]

@Wpi31 Ah yeah - I see what you mean. This is a bit tricky, as sometimes we might want to link both, so I'll have a think and get back to you shortly. Theknightwho (talk) 22:29, 28 March 2023 (UTC)[reply]

Bengali transliteration question

[edit]

The headword line of the entry উপজেলা transliterates it "upôjela" (matching the pronunciation, which has a vowel between the "p" and "j"), the automatic transliteration in upazila lacks the "ô". Which is correct? Is it possible to fix the automatic transliteration to automatically have the "ô" (assuming the "ô" is correct), or is this a known/insurmountable bug? - -sche (discuss) 21:13, 28 March 2023 (UTC)[reply]

Not a Bengali buff (and not sure who's editing it atm) but ô is an inherent vowel in Bengali, meaning it's typically not spelled out. It's probably not possible to automatically determine its presence or absence, as WT:About Bengali suggests ("Although the transliterations are automatically generated, some terms still need a manual transliteration since there are some exceptions to realize the inherent vowel in Bengali"). —Al-Muqanna المقنع (talk) 21:37, 28 March 2023 (UTC)[reply]
@Al-Muqanna, -sche: That's right. The "shwa-dropping" rules for the inherent shwa vowel (transliterated as "ô" in Bengali or "a" in Hindi, etc.), differ a little between North Indic languages and there could be more than one reading possible on the same word. Also there are some compound words, which are not read by generic rules (they are read as separate words). The transliteration modules are sometimes written by different people and not all straightforward "shwa-dropping" rules have been applied consistently between Hindi, Bengali, Gujarati, Nepali, etc, etc. All this causes missing various "shwa-dropping" or no droppings occurs when they are required. So, a manual transliteration is often in order. In this case, no "shwa-dropping" happens but the rule behind it, is not clear (maybe an exception?).
(The "shwa-dropping" doesn't occur in Sanskrit and most South Indic languaged where an inherent vowel need to be specifically "killed"). Anatoli T. (обсудить/вклад) 00:19, 29 March 2023 (UTC)[reply]
@Sbb1413 is currently working on automatic transliteration (Module:bn-translit) and transcription (Module:bn-IPA). The testcases currently reveal a lot of problems, but they might just be notoriously difficult cases. There seems to be no consensus on transliteration - o circumflex is currently not present at Wiktionary:Bengali_transliteration. There are also contextual rules on how the Bengali inherent vowel surfaces when it does (w:Bengali language#Orthographic depth). --RichardW57m (talk) 09:19, 29 March 2023 (UTC)[reply]
Just for clarity, note that schwa-dropping rules vary from language to language, and, in the case of tatsamas, even by register. We shouldn't always expect other languages to be more regular than English. --RichardW57m (talk) 09:19, 29 March 2023 (UTC)[reply]
There used to be some issues while using Ô (O circumflex) to translit the inherent vowel. However, when I've replaced it with a plain O, the result is better. As far as I know, Ô was initially used to transcribe the open-mid back rounded vowel /ɔ/ in Bengali. Sbb1413 (he) (talkcontribs) 09:41, 29 March 2023 (UTC)[reply]
While it is more regular in Hindi, the schwa-dropping rule in Bengali is quite complex and often irregular. Sbb1413 (he) (talkcontribs) 09:50, 29 March 2023 (UTC)[reply]
Just noting that the entries have been edited to be inconsistent in the opposite way now: upazila gives "upojela", উপজেলা gives "upjela". - -sche (discuss) 20:14, 29 March 2023 (UTC)[reply]
I've restored manual transliteration for the headline of উপজেলা. However, the manual IPA below looks wrong - isn't the vowel in the sample /ɔ/ rather than /o/? (I could be wrong - I don't know the phonetic range of Bengali vowels.) --RichardW57 (talk) 08:01, 30 March 2023 (UTC)[reply]

Pronunciation of upazila

[edit]

@Donnanz, -sche: Where did you two get the English pronunciation of upazila from? Given the spelling, I would have expected it to have /z/ as in some Bengali. --RichardW57m (talk) 10:47, 30 March 2023 (UTC)[reply]

@RichardW57m: From the Wikipedia article, but I did qualify this (in the Tea room) as "more than likely wrong". DonnanZ (talk) 10:56, 30 March 2023 (UTC)[reply]
OK, I've taken discussion there. It's a shame there's no link from the word's page to the Tea Room. --RichardW57m (talk) 11:59, 30 March 2023 (UTC)[reply]

Can we please:

  1. have an optional |dari2= parameter per Template_talk:fa-regional#Optional_2nd_Dari
  2. and make it possible to omit |dari= or |tg= altogether? The reason being - it's easier to find both standard Iranian Persian and Tajik terms than Dari, so users just copy-paste the Iranian into Dari, which may not be right.

Anatoli T. (обсудить/вклад) 03:11, 30 March 2023 (UTC)[reply]

@Atitarev: Sorry, missed your post on the talk page somehow. I made both changes. Let me know if there are any problems. — Eru·tuon 04:13, 30 March 2023 (UTC)[reply]
@Erutuon: Great, thank you! I have the used the code at اتوبوس (otobus) and لتونی (letoni). Anatoli T. (обсудить/вклад) 04:29, 30 March 2023 (UTC)[reply]

Usage example/collocation collapsing

[edit]

I've implemented this in MediaWiki:Gadget-defaultVisibilityToggles.js, but kept it disabled for now. It would work pretty much the same way as it works for quotations, provided that the usage examples are formatted correctly. Would there be interest in enabling it? — SURJECTION / T / C / L / 14:46, 31 March 2023 (UTC)[reply]

I'd rather it stay as-is; usage examples are generally shorter and less numerous than quotations, and more critical to understanding a given sense. In addition, it would be a pain when scanning multiple definitions if the usexes were all hidden and you had to individually click on them to open them. Benwing2 (talk) 18:03, 31 March 2023 (UTC)[reply]
I'm not sure I agree. There are quite a few cases with many usage examples, and unless they use inlining (which they often don't), each usage example takes up as much vertical space as two definitions. As for the part about having to click them individually, there's a "Visibility" panel (on desktop, it's in the sidebar; on mobile, it's at the bottom of the page) that would let you show or hide all usage examples. How about collocations? — SURJECTION / T / C / L / 18:23, 31 March 2023 (UTC)[reply]
Collocations are still rare and in many ways not so different from inline usexes; in fact they might have previously used inline usexes. If there are a lot of collations and they take up a lot of space, IMO they should not use the inline format at all. Note also that the default format I think of {{rel-top}} et al. (?) is to show the first line or so of them and hide the rest, which could be a good compromise (i.e. if for some reason there are lots of usexes or collocations, hide all but the first or first two). But I'd definitely be opposed to hiding all by default, and I don't think the "Visibility" panel button makes up for it; it's very non-obvious and located far from the definition itself. Benwing2 (talk) 18:33, 31 March 2023 (UTC)[reply]
Many inline nym templates are still quite rare, but the code still collapses them. Collocations taking up a lot of space is already reality in a large number of Polish entries. białko is one example. No alternative has been proposed for where else to put these collocations either. "Hiding all by default" is not at all a necessary part of this proposal (indeed it isn't, only that they can be collapsed at all), but I would argue that makes sense for collocations, which are not as important to understanding a definition as usage examples are. — SURJECTION / T / C / L / 19:02, 31 March 2023 (UTC)[reply]
Actually, that needs to be collapsed into {{co-top}} and {{co-bottom}}, something I'm working on, for what it's worth. Vininn126 (talk) 19:05, 31 March 2023 (UTC)[reply]
Personally, I agree with Benwing; frankly I wonder how often users click through to see any of the things we collapse, like coordinate terms, which even I sometimes fail to spot the button for when scanning an entry, which makes listing them under the sense rather than in their own section useless (if no-one sees them), even though in theory (if they were visible) it'd be the les-space-using and better, closer-to-the-definition way to list them. (The Visibility toggle, which is already not the most findable thing on mobile as you note, resets if I log out / clear my cookies.) I wonder, inspired by the ongoing effort to put whole translations tables into one template/module call rather than one per translation, if we could deploy one template for all the various "under the definition" things, that would display the inline alt forms, synonyms, antonyms, coordinate terms, etc all even more compactly on one line, unhidden by default but possibly hiding ones that went past one line in length behind some obvious and clickable fade or ... or something. And put usexes all together on one line like that, too. We could even then add "Collocations:" to one or the other of those templates. E.g. T:semantic relations (shortcut T:semrel) or something:
{{semantic relations|altform=[[fubar]], [[foobahr]]|syn=[[barfoo]]|ant=[[snafu]], [[baz]]|coord=[[bazoo]]|hyper=[[fudgeup]], [[tafoo]]|hypo=[[some]], [[other]], [[string]], [[of]], [[words]]}}
(and then something like the existing {{usex}})
to display:
  1. Definition.
    Alternative forms: fubar, foobahr; Synonym: barfoo; Antonyms: snafu, baz; Coordinate term: bazoo; ... [more]
    There was a foobar. | This is another usex about foobars illustrating some feature of... [more]
where clicking the "... [more]" would unhide the rest of the content. - -sche (discuss) 19:20, 31 March 2023 (UTC)[reply]
Yes please. I think the user interface should allow the user to collapse both use-examples and quotes, and to do so separately for each. I'm a big fan of both, and I've even written up a short essay on the merits of each. But I think they serve different purposes and a user should be able to hide one, both, or neither as per their preference. If we can agree to allow the readers to collapse use-examples, perhaps we could also talk about what the default state of the use-examples should be .... perhaps they could remain as open since they are the more user-friendly of the two, whereas quotes appeal more to heavy Wiktionary users who are more likely to be familiar with the interface. Soap 14:44, 6 April 2023 (UTC)[reply]