Jump to content

Wiktionary:Grease pit/2023/January

From Wiktionary, the free dictionary

Delinking punctuation

[edit]

Can someone use their bot to remove links to hyphens in English headwords? E.g. diff. I've cleared out all other punctuation marks, but there are a lot of these. Ultimateria (talk) 00:41, 1 January 2023 (UTC)[reply]

@Ultimateria This is running. I looked for pages linking to - with hyphens in their names, so possibly it missed something. I do hope you aren't getting spammed by pings, as I mentioned your username in the changelog message. Benwing2 (talk) 03:58, 3 January 2023 (UTC)[reply]
@Benwing2: Thanks! I've manually cleaned up a few other uses. FYI I didn't get notifications from the pings. Ultimateria (talk) 01:17, 5 January 2023 (UTC)[reply]

Module error

[edit]

Something went wrong, I'm seeing "Lua error: bad argument #1 to 'lc' (string expected, got function)" whenever language codes (and other things?) are called, e.g. in Hispania. - -sche (discuss) 01:18, 1 January 2023 (UTC)[reply]

Was about to post this. Here are two more affected pages: sob story, schameler DFlhb (talk) 01:20, 1 January 2023 (UTC)[reply]
Sorted. Theknightwho (talk) 01:20, 1 January 2023 (UTC)[reply]
That was quick! Bravo. DFlhb (talk) 01:22, 1 January 2023 (UTC)[reply]
However, there are now 5000 entries in CAT:E. Someone should get a bot to null-edit them. 70.172.194.25 01:24, 1 January 2023 (UTC)[reply]
I am running this null edit, and it's almost done. Benwing2 (talk) 09:30, 1 January 2023 (UTC)[reply]

Lua error: bad argument #1 to 'lc' (string expected, got function)

[edit]

This is happening on every single entry I visit; why? Chuterix (talk) 01:20, 1 January 2023 (UTC)[reply]

This keeps happening randomly, I don't know what the problem is and sometimes it fixes itself. Chuterix (talk) 01:20, 1 January 2023 (UTC)[reply]
It was a short-lived (mis-)edit to a major module. (I've combined the post about it in Wiktionary:Grease pit/2022/December and the post about it here.) - -sche (discuss) 01:24, 1 January 2023 (UTC)[reply]
This happens when someone (in this case, me) does something that goes wrong in one of the fundamental modules underpinning everything. What happened is that I pressed "rollback", expecting it to undo one edit, and it decided to undo all of the previous edits I'd made, which therefore caused everything to break, because the old revision was no longer compatible with everything else. Theknightwho (talk) 01:25, 1 January 2023 (UTC)[reply]
@Theknightwho Sorry to harp on this but this isn't the first or second time this has happened. Editing production modules in place is a really bad practice IMO. I'd strongly recommend using sandbox modules when editing fundamental modules. Essentially, copy the module and any dependencies you will be editing to your userspace, test carefully, then push all the changes to the modules at once (or one right after the other, as quickly as you can manage). I routinely do this and very rarely fill up CAT:E as a result. Benwing2 (talk) 09:37, 1 January 2023 (UTC)[reply]
@Benwing2 It was a combination of not realising that rollback would do that, and the fact that it’s actioned immediately (without a preview screen). Ordinarily, I use the preview function to test even small changes. Theknightwho (talk) 14:59, 1 January 2023 (UTC)[reply]
[edit]

See the front page. Equinox 18:42, 1 January 2023 (UTC)[reply]

@Erutuon: do you know how to update {{WOTD/previous or next day}} to fix this? — Sgconlaw (talk) 21:12, 1 January 2023 (UTC)[reply]
@Sgconlaw: Can't replicate because it's now the second day of January in England and the yesterday link on Wiktionary:Word of the day/2023/January 1 has the year 2022. Possibly something going on with the template code that made it misbehave on the Main Page, but I don't really understand it. — Eru·tuon 03:52, 2 January 2023 (UTC)[reply]
@Erutuon: you don’t think it’s just some miscoding at {{WOTD/previous or next day}}? — Sgconlaw (talk) 04:43, 2 January 2023 (UTC)[reply]
@Sgconlaw, Erutuon Almost certainly yes, some miscoding in that template having to do with January 1/December 31, but the code is hard to understand and uses some built-in time functions. Probably easier to just rewrite in Lua. Benwing2 (talk) 00:04, 5 January 2023 (UTC)[reply]
@Benwing2: be our guest, when you have time! — Sgconlaw (talk) 04:56, 5 January 2023 (UTC)[reply]
Using built-in time functions is vastly preferable to hacking anything together yourself. There are lots of odd exceptions around leap years, daylight saving, etc. Equinox 12:48, 5 January 2023 (UTC)[reply]
@Equinox: I thought that's what we were doing with {{WOTD/previous or next day}}, actually. We use the {{#time}} function. One difficulty is that the links need to point to different locations depending on where {{WOTD}} is used. When used on the main page, they need to point to dates in the current year (except for 1 January or 31 December). When used as WOTD fallbacks (for example, "Wiktionary:Word of the day/January 1"), they should simply be pointing to the previous or next day, omitting the year. However, they currently don't work properly in that context. — Sgconlaw (talk) 13:39, 5 January 2023 (UTC)[reply]
@Equinox By "rewrite in Lua" I mean to port the template code of {{WOTD/previous or next day}} to Lua, not to rewrite the built-in time functions; I agree we should call the built-in ones whenever possible. Benwing2 (talk) 04:31, 8 January 2023 (UTC)[reply]

This entry has a module error due to edits to Module:ko-pron by @Kopronfixer, but, in fairness, this is a very unusal use case. @Tibidibi embedded {{ruby}} in the |text= parameter of {{quote-book}} in order to show the hangul transliteration of the Han characters in the (1677) quote- this is very creative and seems to have worked well for a while, but the current transliteration module has no clue what to do with such things.

Is there any workaround to bypass the module error using the template's parameters until the transliteration module can be fixed to deal with this? The obvious solution would be to add a |transliteration= parameter, but I'm not very good with Korean, so I can't come up with the content for it. A better option would be to extract the hangul from the current text and have that in the |text= parameter, with the ruby-enhanced version as the display. It's just that I'm not sure if there's a parameter for the display part. It would be equivalent to the |head= parameter in our linking templates. Chuck Entz (talk) 22:43, 1 January 2023 (UTC)[reply]

@Chuck Entz: I made a version of ko-translit that should handle {{ruby}} better, see Module:ko-translit/sandbox. (I would recommend copying this over to the main module.) However, the real problem in this instance is something else: the code [[-이|-ㅣ]]. Removing the piped display text for this suffix fixes the problem (although it also alters the quote, which is obviously not a great solution). And conversely, to replicate the issue you don't even need ruby: {{xlit|ko|[[선남선녀]][[-이|-ㅣ]]}} generates the same error. 70.172.194.25 02:21, 2 January 2023 (UTC)[reply]

Accidently "vandalized" my userpage.

[edit]

I was trying to list citations for the words xxx and XXX on my userpage, so they I could have a record for them for citing purposes. I was just about to publish my newest changes (where I managed to find some cites for xxx as a letter-closer, and XXX referring to porn), but then I got a message saying that my edits could not go through because of "vandalism", and to go here if I believed my action to be constructive. And so I did. CitationsFreak (talk) 23:01, 1 January 2023 (UTC)[reply]

We get loads and loads of vandalism around the pages that contain XXX. (It seems to be clueless users from less tech-literate countries attempting to find pornography.) If you want, mail me your desired page contents and I will replace the page with that. Changing the abuse filters to try to fix this one edit would be a hassle. Equinox 23:04, 1 January 2023 (UTC)[reply]
How do I send mail? CitationsFreak (talk) 23:17, 1 January 2023 (UTC)[reply]
I see you worked it out! User:Chuck Entz, you spend a lot of time with abuse filters, don't you? So have a think. Equinox 00:01, 2 January 2023 (UTC)[reply]
@Equinox I haven't done much recently, so I'm a bit rusty. @Surjection knows abuse filter coding a lot better than I do. It's Abuse Filter 35, and I have only a vague idea how the edit in question triggered it. Without going into detail for obvious reasons, it does check for various types of strings of meaningless text, and some of the content might be interpreted that way. I don't want to discuss how to avoid the problem, anyway, because genuine vandals do read Grease pit discussions.
This is a "dumb vandalism filter", so I suppose we might consider whether it needs to check userspace. After all, user pages of new users are tagged by another filter and some of us check them regularly. That said, it's a very useful filter that we've had in place for 8 1/2 years, so I'd rather we not mess with it unnecessarily or without careful thought. At any rate, it only checks edits by users with accounts while they're still new- so the problem will solve itself in this case soon enough. IPs are another matter- the system considers them new regardless of how long they've been editing. Chuck Entz (talk) 01:41, 2 January 2023 (UTC)[reply]
[edit]

Most words that appear in the thesaurus do not have links to the thesaurus entries containing that word. Only those few words that are chosen as headwords have these links.

I would like this policy to change so that it is easier to look up a word in the thesaurus, once I have the wiktionary entry open. As it stands, in most cases I have to open the URL https://en.wiktionary.org/wiki/Wiktionary:Thesaurus and retype the word.

In the 'Wiktionary:Thesaurus/Benefits' page, it says "Having a separate thesaurus isn't too inconvenient for the reader: they only need to click on the thesaurus link to navigate to it. A single click is all that is required.". But that's now how wiktionary currently works. The Benefits page describes how I would like Wiktionary to work. — This unsigned comment was added by 209.183.136.7 (talk) at 02:58, 2 January 2023 (UTC).[reply]

EDIT: I looked around some more, and now I think that wiktionary entries only have Thesaurus links if an editor has explicitly added them by hand, and it isn't done in a consistent way. I was hoping for something that would automatically insert the appropriate Thesaurus links. In the OED, there is a simple 'Thesaurus' link beside each sense of a word, in the right column where it isn't obtrusive. That would be nice, and perhaps that could be done automatically. — This unsigned comment was added by 209.183.136.7 (talk) at 03:39, 2 January 2023 (UTC).[reply]

Generating new monthly discussion pages

[edit]

Generating new discussion pages must be done in a very specific way if we want the new pages to be on the same watchlists as the old ones. In the past, that wasn't a problem, because @Rua has a script for it and would always do it for us without being asked. For the past couple of years, however, this wasn't done, so I did it myself. It's really quite simple, so I was able to figure it out quickly- but it's rather time-consuming to do it by hand. I'd rather not have to. This year I only did the January pages.

Can someone set up a script or bot run to do this when needed?

Let me describe the process:

It takes advantage of the fact that when a page on a watchlist is moved, the original page at its new destination stays on the watchlist and the redirect left at the old location does as well. By a series of moves and editing of redirects, all the new pages are created and all of them are on all the same watchlists as the old ones. The steps:

  1. Move the most recent monthly page for a specific forum (Beer parlour, Etymology scriptorium, etc.) to the name for a future month's page, leaving a redirect.
  2. Move it back over the redirect, leaving a redirect at the new location.
    1. Move the page at the new location to the next location
    2. Repeat until all the monthly pages are created for the forum
    3. Replace the redirects at all the new locations with the template {{discussion month}}
  3. Repeat for all the other forums that have monthly pages


Just to make it clear, here's an example for this year:

  1. Move Wiktionary:Beer parlour/2023/January to Wiktionary:Beer parlour/2023/February
    Wiktionary:Beer parlour/2023/January is now a redirect to Wiktionary:Beer parlour/2023/February
  2. Move Wiktionary:Beer parlour/2023/February to Wiktionary:Beer parlour/2023/January over the redirect.
    Wiktionary:Beer parlour/2023/February is now a redirect to Wiktionary:Beer parlour/2023/January
  3. Move Wiktionary:Beer parlour/2023/February to Wiktionary:Beer parlour/2023/March
  4. Move Wiktionary:Beer parlour/2023/March to Wiktionary:Beer parlour/2023/April
  5. Move Wiktionary:Beer parlour/2023/April to Wiktionary:Beer parlour/2023/May
  6. Move Wiktionary:Beer parlour/2023/May to Wiktionary:Beer parlour/2023/June
  7. Move Wiktionary:Beer parlour/2023/June to Wiktionary:Beer parlour/2023/July
  8. Move Wiktionary:Beer parlour/2023/July to Wiktionary:Beer parlour/2023/August
  9. Move Wiktionary:Beer parlour/2023/August to Wiktionary:Beer parlour/2023/September
  10. Move Wiktionary:Beer parlour/2023/September to Wiktionary:Beer parlour/2023/October
  11. Move Wiktionary:Beer parlour/2023/October to Wiktionary:Beer parlour/2023/November
  12. Move Wiktionary:Beer parlour/2023/November to Wiktionary:Beer parlour/2023/December
  13. Replace the redirect at Wiktionary:Beer parlour/2023/February with {{discussion month}}
  14. Replace the redirect at Wiktionary:Beer parlour/2023/March with {{discussion month}}
  15. Replace the redirect at Wiktionary:Beer parlour/2023/April with {{discussion month}}
  16. Replace the redirect at Wiktionary:Beer parlour/2023/May with {{discussion month}}
  17. Replace the redirect at Wiktionary:Beer parlour/2023/June with {{discussion month}}
  18. Replace the redirect at Wiktionary:Beer parlour/2023/July with {{discussion month}}
  19. Replace the redirect at Wiktionary:Beer parlour/2023/August with {{discussion month}}
  20. Replace the redirect at Wiktionary:Beer parlour/2023/September with {{discussion month}}
  21. Replace the redirect at Wiktionary:Beer parlour/2023/October with {{discussion month}}
  22. Replace the redirect at Wiktionary:Beer parlour/2023/November with {{discussion month}}
  23. Replace the redirect at Wiktionary:Beer parlour/2023/December with {{discussion month}}

Do the same with Wiktionary:Etymology scriptorium/2023/January, Wiktionary:Grease pit/2023/January, Wiktionary:Information desk/2023/January and Wiktionary:Tea room/2023/January.

You should start with the current monthly page to be sure that you replicate the most recent watchlist membership. That means you need to be quick with the moves for the first page so that no one is editing the page when you move it and no one hits a redirect instead of a discussion page.

As I said, this is rather time-consuming to do by hand, with 125 steps (13 moves + 12 templates) x 5 forums (I've already done 2 of the moves and 1 of the templates for this year for the 5 forums, leaving 110 more for the year). It will be much quicker with a script or a bot.

Whoever ends up doing it this year, we need to set it up so we can be sure that there's always someone assigned to do it every year and we know who it is. Chuck Entz (talk) 05:30, 2 January 2023 (UTC)[reply]

@Chuck Entz How pressing is this, i.e. have you done January for all the forums? Benwing2 (talk) 04:07, 3 January 2023 (UTC)[reply]
@Benwing2: yes, I did January. We have until the end of the month. Chuck Entz (talk) 04:18, 3 January 2023 (UTC)[reply]
@Chuck Entz I made a script to do this and used it to create April and May for ety scriptorium. To do the other discussions, you'll have to drop the page protections so I can run it. Is it better to create all the pages once a year, or just create the next month a few days before the month starts? JeffDoozan (talk) 17:35, 28 February 2023 (UTC)[reply]
@JeffDoozan Thanks! You beat me to it ... was just about to start writing the script :) ... Benwing2 (talk) 05:19, 1 March 2023 (UTC)[reply]
@Chuck Entz I saw you modified the protections to allow template editors (thanks!). I'd prefer to run the script under my bot account, can you change the permissions to allow bots to move the page? JeffDoozan (talk) 00:45, 3 March 2023 (UTC)[reply]
Where do things stand on the idea of doing this (for e.g. a year's worth of upcoming subpages at a time) by script? I notice Chuck had to do it (by hand?) again just now. - -sche (discuss) 00:34, 1 May 2023 (UTC)[reply]
To be fair, I had to do it by hand because I never got back to him on this, and I procrastinated too long to ask anyone else to do it for me. Let's try for next month. Chuck Entz (talk) 01:32, 1 May 2023 (UTC)[reply]

accents documented in the most entries; Template Tiger

[edit]

I want to find out which values ("GA", "India", "Bristol", ...) are most often passed to T:a. Even better would be to know which are most often passed to instances of T:a inside an ==English== section. This would help with determining which national standards we most often document already in entries, and hence which are the highest priority to cover in Appendix:English pronunciation (see Appendix talk:English pronunciation#Other_National_Varieties_of_English). Secondarily, I want to know if a functional version of Template Tiger exists (https://templatetiger.toolforge.org/ doesn't seem to do anything; nothing but a mostly-blank page with no input fields comes up if I click enwiktionary), because I wonder if it could help with this. - -sche (discuss) 02:26, 3 January 2023 (UTC)[reply]

@-sche The former can be done with template tracking added to Module:accent qualifier, but the latter can only be done by examining a dump file. You can write a Python script with the help of mwparserfromhell [1], which is what I use to parse Wikimedia markup. I have recently added the ability to certain pronunciation templates to specify accent qualifiers without physically specifying {{a}} or {{accent}}, which can complicate things somewhat, but given how recent this support is, it's probably not worth worrying about esp. as it mostly or exclusively affects non-English languages. Benwing2 (talk) 04:04, 3 January 2023 (UTC)[reply]
BTW I've never heard of Template Tiger. Benwing2 (talk) 04:05, 3 January 2023 (UTC)[reply]
Template Tiger used to be useful; it showed which templates were most used, which parameters were most used in particular templates, which entries used particular parameters (e.g. if you want to find all entries which use the ninth parameter of T:label i.e. have at least eight labels). Alas, I'm not sure it's still maintained. Perhaps it was felt that all its functions could be handled by Lua / template tracking. - -sche (discuss) 06:30, 3 January 2023 (UTC)[reply]
Click on the information link on the upper-right-hand corner and read the discussions. Apparently things got too big for the system to handle without timing out, so it was abandoned. Chuck Entz (talk) 06:36, 3 January 2023 (UTC)[reply]
I started work on a Rust program that would make a database of IPA transcriptions, including their accent labels, which could be searched somewhat like enwikt-translations. The database would contain a full list of {{a}} labels, but I got as far as generating an event stream of the headers, templates, and list syntax before getting stuck at how the database should be structured. I would also like to index semi-automatically generated transcriptions like from {{fr-IPA}}, which requires a bit more work. This is a good reminder that someone wants it, so maybe I will get back to it soon.
I have the TemplateHoard IPA tool, but it doesn't have {{a}} information because it's just naively searching the parameters of {{IPA}} instances in each page.
It would be nice to have a Template Tiger type tool for Wiktionary (I think I saw another template tool that was updated but wasn't structured well for Wiktionary's purposes), but it's a little harder for us because we have various parameter conventions, such as list parameters, and a naive tool that just lets you search when parameter x equals value y wouldn't work well with them. For instance, the language and term parameter are the first and second parameters to {{m}} and {{l}} but the second and third parameters to {{inh}}, and often you would want to search all of those and other link templates simultaneously. In {{a}} you'd want to examine all the numbered parameters equally because accent or dialect labels can be in any of them. In {{lb}}, to find the uses of labels, you'd want to examine all numbered parameters but the first, which is the language code. And there are list parameters and the weird convention in which {{alter}} has terms and dialects in numbered parameters separated by an empty parameter. So the tool would have to take into account template-specific logic to be useful for some of the tasks that people want to do. I've thought about this a bit, but don't really have a plan for how this template information could be organized in a database so that it could be searched. — Eru·tuon 15:56, 3 January 2023 (UTC)[reply]
@-sche: I've got a program to generate a database with {{accent}} instances and pronunciations from {{IPA}}. Here is a table of the most common labels by language derived from the database. The reasoning to generate the list isn't perfect. It finds a language only if {{IPA}} is nested directly at a lower list level below the accent template in a list. It adds one to the count for every pronunciation in {{IPA}} that is nested below the accent template, so the more alternative pronunciations (phonemic or phonetic), the greater the count. Both of these can probably be improved. But it's a good starting point. — Eru·tuon 02:23, 23 January 2023 (UTC)[reply]
Changed to number of occurrences of the label in this edit. The diff shows how many labels have multiple pronunciations nested directly under them. — Eru·tuon 02:39, 23 January 2023 (UTC)[reply]
Thank you. A lot fewer entries have any non-US/UK pronunciations (e.g. Irish English pronunciations) than I would've thought; perhaps adding other national accents (as suggested on Appendix talk:English pronunciation), while interesting, is not pressing. - -sche (discuss) 18:59, 24 January 2023 (UTC)[reply]

Glitch in the place template

[edit]

At Cape Coast, marking the Central Region of Ghana inexplicably activated categories for Malta. I assume that the Central Region bit of the code somewhere automatically assumes it's Maltese which (a) is silly given the generality of the term and (b) is silly because it should be checking the country regardless. Anyway, something to clean up when one of the place template coders has a minute. — LlywelynII 12:34, 4 January 2023 (UTC)[reply]

@Llywelyn Yes, you are right. This is currently a general problem with the place category handling; it doesn't check the country when categorizing sub-country regions. It will take a bit of doing to fix; the {{place}} code is quite complicated. Benwing2 (talk) 23:55, 4 January 2023 (UTC)[reply]
@LlywelynII Blah. Benwing2 (talk) 23:55, 4 January 2023 (UTC)[reply]

(Other edits)

[edit]

Some more visual noise has been added to Special:RecentChanges and page histories: the text "(other edits)" next to every tag. Given that about a third of edits have at least one tag, often multiple, this adds up. The CSS .mw-tag-other-edits {display:none} will hide it, but leaves an annoying space. 70.172.194.25 20:18, 4 January 2023 (UTC)[reply]

Now it displays as "(⧼tag-link-other-edits⧽)", which is even worse. 70.172.194.25 22:36, 11 January 2023 (UTC)[reply]
Huh, now it's gone completely. Strange. 70.172.194.25 22:55, 11 January 2023 (UTC)[reply]

Request for edit filter to detect unclosed templates and html comments

[edit]

Would it be possible to create an edit filter that verifies that count("{{") == count("}}") and that if "<!--" exists that "-->" follows it later in the page? Hopefully it would catch edits likes this and this. JeffDoozan (talk) 23:17, 4 January 2023 (UTC)[reply]

[...] tricks passage character count of {cite-meta}

[edit]
As already requested on Template talk:cite-meta.

See the references at treblanof, grabovim, in which the passage is displayed in a new block because the character count includes the HTML tags printed by {{...}}. Can anyone who has editing permission at line 512 of {{cite-meta}} change len into len_visible? It's a new function that should hopefully fix the problem. Catonif (talk) 11:43, 5 January 2023 (UTC)[reply]

Done. — Fenakhay (حيطي · مساهماتي) 12:17, 5 January 2023 (UTC)[reply]

Linking to Korean terms with a hyphen

[edit]

At 때문 (ttaemun) I am having trouble figuring out why one usage example links correctly to 이다 (-ida) (copula with an initial hyphen in the title) and the other does not, it links to 이다 (ida). I don't want to copy the whole usage examples, one of them is my citation (the one that works).

If you see the wikicode, my citation uses -이었습니다 (working usex), the last usage example uses -입니다. I expect both to link to 이다 (-ida) but the last one doesn't work. Anatoli T. (обсудить/вклад) 23:56, 5 January 2023 (UTC)[reply]

@Atitarev Trying to figure this out; I created this code so I should be the one to figure out what's going wrong. However, I don't see where the working usex -이었습니다 occurs, can you give me a bit more info? Benwing2 (talk) 01:36, 6 January 2023 (UTC)[reply]
@Benwing2: Thanks for looking! That one is using {{quote-book}}. It's only showing in the expanded mode. Click on "quotations" on the headword or "Show quotations" on the left panel to see the quotation. Anatoli T. (обсудить/вклад) 03:07, 6 January 2023 (UTC)[reply]
@Atitarev It's the fault of {{ko-usex}}. Line 45 of Module:ko removes all hyphens from the "Hangul" argument (the first argument). I don't know why this is being done, but it's wrong. User:Fish bowl, maybe you understand what's going on in this module as you've worked on it? Can you comment? Benwing2 (talk) 04:26, 6 January 2023 (UTC)[reply]
@Benwing2, Fish bowl:
The idea is, in summary:
  1. The symbol "^" is used to force capitalisations but it's invisible to the end-user (accept for capitalising the next Roman symbol) and is removed on any links (including interwikis). E.g. {{m|ko|^서울}} -> 서울 (Seoul)
  2. The symbol "-" is displayed in the transliteration but it's invisible to the end-user on Korean terms and it's NOT removed on links. E.g. {{m|ko|-이다}} -> 이다 (-ida).
So, the behaviour is different on "^" and "-". Anatoli T. (обсудить/вклад) 04:39, 6 January 2023 (UTC)[reply]
@Atitarev OK, I changed line 45 of Module:ko to not strip hyphens. I think this is correct, and it fixes the issue on 때문 (ttaemun); let me know if you see any weirdness. Benwing2 (talk) 05:29, 6 January 2023 (UTC)[reply]
@Benwing2: Great job, thank you! Anatoli T. (обсудить/вклад) 05:38, 6 January 2023 (UTC)[reply]

Translation adder sorting is broken

[edit]

It occasionally adds new entries to the end of tables after the recent changes, e.g. in Special:Diff/70690810, Special:Diff/70688470. — SURJECTION / T / C / L / 09:48, 6 January 2023 (UTC)[reply]

It seems to be struggling when Egyptian translations are present in the table. I presume it is getting confused by the tables in the hieroglyphic display. No idea why this has suddenly started happening after the recent changes. This, that and the other (talk) 04:55, 7 January 2023 (UTC)[reply]

Specifying an unknown gender, etc.

[edit]

If we’re fixing the translation adder, may I suggest we add the ability to indicate that the gender of a word is unknown (equivalent to adding a “?” to {{t}} or {{t+}})? — Sgconlaw (talk) 20:15, 6 January 2023 (UTC)[reply]

It would also be great if there was a way to add a literal gloss to a translation too (equivalent to adding |lit=). — Sgconlaw (talk) 22:57, 27 January 2023 (UTC)[reply]

@Benwing2: as you recently fixed the translation adder, are the above features you can add? Thanks. — Sgconlaw (talk) 03:45, 9 February 2023 (UTC)[reply]

These should not be hard to add but I have little fluency in JavaScript. Maybe IP 70.* can help? Benwing2 (talk) 03:53, 9 February 2023 (UTC)[reply]
@Benwing2: ah, OK. Does 70.* have the right permissions to make the edits? How do we ping them? — Sgconlaw (talk) 04:21, 9 February 2023 (UTC)[reply]
@Sgconlaw No way to ping them but they usually seem to read the main talk pages. They have made changes in the past in a sandbox. Benwing2 (talk) 04:30, 9 February 2023 (UTC)[reply]

Arabic variants bot request

[edit]

Could somebody who has a dump and knows how to use it run a bot to add or edit {{also}} at the top of each entry to link to other pages that differ only in

  • Variants of alef: (ا آ أ إ) codes 0622 0623 0625 0627
  • Variants of kaf: (گ ک ك) codes 0643 06a9 06af
  • Variants of yeh: (ی ئ ي ى) codes 0626 064a 06cc 0649

These are not all the variants that some people put in {{also}}, it is a minimal set that causes more confusion than number of dots.

Vox Sciurorum (talk) 20:02, 6 January 2023 (UTC)[reply]

@Vox Sciurorum This used to be done once upon a time by User:OrphicBot. User:Isomorphyc is no longer active but thankfully they included the source code for their bot (here: User:OrphicBot/updateAlso.py), and in particular the table of equivalences used to create the {{also}} links (at User:OrphicBot/equivalences.txt). I would offer to do this but practically speaking I have so many other tasks to do that it won't get done soon, although feel free to add an entry to User:Benwing2/todo, which records various tasks (often bot-related) needing to be done. Benwing2 (talk) 21:44, 6 January 2023 (UTC)[reply]

Moving User:Conrad.Irwin/creation.js/intro to somewhere more appropriate

[edit]

Really minor thing that's been bugging me for a while. Currently, User:Conrad.Irwin/creation.js/intro is used by the acceleration script MediaWiki:Gadget-AcceleratedFormCreation.js to generate the warning at the top of the page during page creation ("Please ensure that the information is both complete and correct" etc etc). Conrad.Irwin hasn't been active since 2014 (except for one edit in 2021), and it feels a bit silly to have a page like that in someone's userspace - particularly as it appears in the URL when creating any accelerated form.

Could we move it to MediaWiki:Gadget-AcceleratedFormCreation.js/intro instead? Theknightwho (talk) 05:12, 7 January 2023 (UTC)[reply]

Seems like a good idea. — Sgconlaw (talk) 05:27, 7 January 2023 (UTC)[reply]
@Theknightwho Moved; let me know if something breaks. Benwing2 (talk) 23:13, 7 January 2023 (UTC)[reply]
@Benwing2 Seems to work - thanks. Theknightwho (talk) 23:16, 7 January 2023 (UTC)[reply]

Change Tagalog alt forms from q template to double piped alt

[edit]

Please change Tagalog (and maybe all Philippine languages that has this) alt forms that are of this form:

Before:

  • {{alt|tl|<the alt form>}} {{q|<the note>}}

After:

  • {{alt|tl|<the alt form>||<the note>}}

Relevant thread: https://en.wiktionary.org/wiki/User_talk:Mar_vin_kaiser#c-Vininn126-20221205115700-Mar_vin_kaiser-20221205114900 Ysrael214 (talk) 09:09, 9 January 2023 (UTC)[reply]

@Ysrael1214 Yeah this should probably be done for all languages, but in the mean time can you make a list of the other most common Philippine languages that might have the "before" construction? Ilocano, Cebuano, Hiligaynon, ...? Benwing2 (talk) 04:04, 10 January 2023 (UTC)[reply]
@Ysrael214 My mistake ... Benwing2 (talk) 04:04, 10 January 2023 (UTC)[reply]
@Benwing2 Most common would probably be the most common that we would likely edit it.
Ilocano
Kapampangan
Tagalog
Bikol Central
Masbateño
Cebuano
Hiligaynon
Waray-Waray
Maranao
Maguindanao
Tausug
But maybe you should replace all {{l|<lang>|<alt form>}} that are inside an alternative forms header to be {{alt|<lang>|<alt form>}} too? I think I see some {{l}} that haven't been replaced by {{alt}} Ysrael214 (talk) 11:47, 10 January 2023 (UTC)[reply]
@Benwing2 By the way, some {{alt}} are still written{{alter}}, just to note. Thanks.
@Ysrael214 This is a nontrivial task but as it happens I have a pre-existing script to do this. I'm running it now on Tagalog; let me know if you see any mistakes, if not I will run it on the other languages above. Benwing2 (talk) 04:14, 12 January 2023 (UTC)[reply]
@Benwing2 Looks fine to me so far. Ysrael214 (talk) 09:55, 12 January 2023 (UTC)[reply]
@Benwing2 This one didn't get changed for some reason. https://en.wiktionary.org/w/index.php?title=sinungaling&diff=70806571&oldid=69381686&diffmode=source
But I can just manually fix that. Other than that all's fine. Ysrael214 (talk) 10:00, 12 January 2023 (UTC)[reply]
@Ysrael214 That's probably because the script doesn't (yet) know how to handle multiple qualifiers; let me see if I can fix that. Benwing2 (talk) 19:14, 12 January 2023 (UTC)[reply]
It was actually the trailing whitespace that confused it. Benwing2 (talk) 20:05, 12 January 2023 (UTC)[reply]
Should be done. Benwing2 (talk) 04:15, 13 January 2023 (UTC)[reply]

Replace all Tagalog KWF templates

[edit]

Hello, I also would like to request to change all {{R:KWF Diksiyonaryo}} inclusions to be {{R:Pambansang Diksiyonaryo}}.

The KWF (Committee of the Filipino Language) has a new dictionary website that is newer that the {{R:KWF Diksiyonaryo}} should have a separate content than {{R:Pambansang Diksiyonaryo}}. Currently, {{R:KWF Diksiyonaryo}} just redirects to {{R:Pambansang Diksiyonaryo}}. Thank you! Ysrael214 (talk) 09:14, 9 January 2023 (UTC)[reply]

@Ysrael214 Not quite sure what you are asking; are you just asking to rename occurrences of {{R:KWF Diksiyonaryo}} to use {{R:Pambansang Diksiyonaryo}} instead or do you want something else done? Benwing2 (talk) 04:06, 10 January 2023 (UTC)[reply]
@Benwing2 Yes, what you said is correct. Just rename. Ysrael214 (talk) 11:42, 10 January 2023 (UTC)[reply]
@Ysrael214 Done. Benwing2 (talk) 04:01, 11 January 2023 (UTC)[reply]

defective recording in File:Sv-bolag.ogg

[edit]

The pronunciation of "ett bolag" is cut off after the first syllable of the noun and is missing "-lag". This file and its use should be deleted from various Wiktionaries and Wikimedia Commons because now it is very misleading. -- Espoo (talk) 12:13, 10 January 2023 (UTC)[reply]

It looks like @Derbeth imported it to Commons from another source. I wonder if the original they got it from has the same problem. Chuck Entz (talk) 14:18, 10 January 2023 (UTC)[reply]
Yes, https://en.wiktionary.org/wiki/File:Sv-bolag.ogg explains that the source is shtooka.net, and https://shtooka.net/listen/swe/ett%20bolag sounds exactly the same. --Espoo (talk) 16:19, 10 January 2023 (UTC)[reply]

I nominated the file for deletion: commons:Commons:Deletion requests/File:Sv-bolag.ogg. --Derbeth talk 11:23, 11 January 2023 (UTC)[reply]

Sensewise templates for certain semantic relations that lack them to date

[edit]

As seen at Wiktionary:Semantic relations, there are templates for sensewise entry of most semantic relations, including {{syn}}, {{ant}}, {{cot}}, {{hyper}}, {{hypo}}, {{hol}}, {{mer}}, and {{troponym}}; but as of this writing, there are none yet for "derived", "related", or "see also" (unless I'm missing something, which is entirely possible). Could anyone create them? They would be similar in construction to Template:coordinate terms. I would do it myself (better to ask for forgiveness than for permission, as they say), but I lack the requisite skills. Quercus solaris (talk) 06:40, 13 January 2023 (UTC) [Edit:] More info about the nature of "derived" and "related" is under Wiktionary:Entry_layout#Derived_terms. Quercus solaris (talk) 06:45, 13 January 2023 (UTC)[reply]

@Quercus solaris: whether there's any reason to create templates for them or not, none of those are semantic' relations. "Derived terms" and "Related terms" are etymological relations, and "See also" can be just about anything. The first two tend to be less attached to specific senses than most of the true semantic relations, at any rate. Chuck Entz (talk) 07:20, 13 January 2023 (UTC)[reply]
@Chuck Entz Yes, you're quite right regarding etym — but they are often semantically related as well, which is why it comes up. I recognize that the "derived" sense-wise type are not often needed. But occasionally a derived term is sense-dependent. It would be good if the capability existed for occasional times when they are needed. I will plan to bring concrete examples. The "see also" ones are the best examples. Often a term is not a syn/ant/hyper/hypo or even cot (without going a bit father into the semantic field for what the hypernymic noun phrase is) but it is most definitely semantically connected, in a sense-wise manner. It is probably true that the "related" type could be done without, because a sense-wise label could be used under the H4. I just realized that I should have brought concrete examples up front. Will plan to do later. Thanks. Quercus solaris (talk) 14:02, 13 January 2023 (UTC)[reply]
@Quercus solaris I would prefer not to have two very different ways of specifying derived terms. IMO if we need to distinguish derived or related terms by senses (rather than by etymology), this can be done using subheaders under ==Derived terms== or ==Related terms==; but in my experience this is quite rare. Benwing2 (talk) 07:23, 14 January 2023 (UTC)[reply]
@Benwing2 Sounds good. Probably not worth chasing, after all. Thanks, all, for input. Quercus solaris (talk) 00:28, 15 January 2023 (UTC)[reply]

Conversion of bor to lbor for Classical->Modern languages

[edit]

When a modern language borrows from a classical language, it's learned. Of course in Romance languages you have to make the distinction between borrowing and inherited from Latin, but the borrowings are still learned. What technical aspects would we need to consider to switch these over?

We would need a list of relations for conversion, (i.e. Latin -> English), but also we may want to consider when to include {{{notext}}} or not, plus there's always the problem of people disliking hyperlinked etymologies. Vininn126 (talk) 12:21, 13 January 2023 (UTC)[reply]

{{bor}}s cannot be indiscriminately converted to {{lbor}} because {{slbor}} is a thing. — SURJECTION / T / C / L / 12:52, 13 January 2023 (UTC)[reply]
Ach, that completely slipped my mind. I think there might still be cases where that's possible, i.e. Polish/Old Polish borrowings from Ancient Greek/Latin. I suppose that would fall under the list of relations; if a language can possibly have a semi-learned borrowing then it should be excluded. Vininn126 (talk) 12:58, 13 January 2023 (UTC)[reply]
And we can add most tatsamas borrowed from Sanskrit into Hindi, such as तत्सम (tatsam, tatsama). --RichardW57m (talk) 15:38, 19 January 2023 (UTC)[reply]
If it isn't an inheritance from Middle English, is mandamus a borrowing from Latin? --RichardW57m (talk) RichardW57m (talk) 15:51, 19 January 2023 (UTC)[reply]

Repeated derived terms/synonyms/related terms

[edit]

A few times over the last year I noticed a Derived term appearing twice in the same entry (e.g. {{der2|en|biscuit head|biscuit face|biscuit ball|biscuit head}}. Granted, a small number of times it was my own doing. Could someone whip up a cleanup list of such terms? Celui qui crée ébauches de football anglais (talk) 18:28, 14 January 2023 (UTC)[reply]

Sudden increase in memory usage

[edit]

There are now upwards of 40 entries in CAT:E that weren't there about 12 hours ago, and of these, all the ones I've checked so far have been due to Lua memory errors (there were a few that weren't, but they've been cleared). Several are there for the first time, and adding them to the {{redlink category}} exclusion list didn't fix them (here again, there were a couple where that worked, but they're no longer in CAT:E).

That tells me that something has substantially increased memory requirements recently. A bit of brute-force checking of the transclusion lists for recent edits leads me to two main suspects: Module:languages/data3/c, and Module:scripts, both of which have been edited in the past 12 hours and were in the transclusion lists of all the entries I've checked so far. Pinging @Surjection, Theknightwho, who have edited one or the other of those modules in question during that period. Chuck Entz (talk) 00:14, 16 January 2023 (UTC)[reply]

All I changed was a comment in Module:scripts. I've been fixing a few of these with multitrans, which usually results in a dramatic drop in memory usage. The change to Module:languages/data3/c shouldn't have affected the vast majority of these pages, as they don't have Mandarin entries. Theknightwho (talk) 00:15, 16 January 2023 (UTC)[reply]
Actually, some of them do have Chinese entries- but far from all. At any rate, they don't have to have Chinese entries, they just have to have them in their etymologies. Given that the changes to both modules involved scripts for Sinitic languages, there might be some kind of interaction between the two.
At any rate, I'm not trying to pin blame. Something has happened, and someone who knows more than I do is going to have to figure out what to do about it. This is the largest single increase in memory errors over such a short period that I can remember, so I'm concerned. Chuck Entz (talk) 00:48, 16 January 2023 (UTC)[reply]
Actually, I'm wondering if it's the translation sections, as those are by far the main place that cmn gets used. By necessity, Hant and Hans work in a different way to all other scripts (including Hani), and the method is slightly more intensive as it requires drawing down the conversion tables. It's been in place for a while for zh, but that won't have affected translation sections as they don't use zh. In any event, I'm going to keep clearing these. The big advantage of multitrans is that it's a genuine solution rather than a bodge, as it doesn't require cutting down on features like the lite templates do. As such, if we can clear most errors with it (which has been the case for most of these so far), we're not left with a headache about what to do going forward. Theknightwho (talk) 01:01, 16 January 2023 (UTC)[reply]
I wonder if it's the HantHansHani-related changes. Any addition of code to script checking can have huge repercussions. User:Theknightwho, can you do a couple of experiments on some of the pages that were formerly in CAT:E due to memory issues, checking the memory usage before and after the changes to Module:languages/data3/c that introduced the extra scripts? In general I think we need to be very diligent about checking memory usage when making changes like this, and rethink the method if it leads to several new memory-error pages. (This was something I did, for example, when adding the Korean hyphen-related hacks requested by User:Tibidibi awhile ago; I had to experiment with several ways of adding this functionality to make sure it didn't impact the memory of a lot of pages.) Also can you try to include changelog messages indicating what was changed? It helps a lot when trying to review changes to a module, vs. seeing ~40 changes without changelog messages, as in [2]. Benwing2 (talk) 21:27, 16 January 2023 (UTC)[reply]
@Benwing2 I'm pretty confident in saying that it was the cause, as every single page affected linked to cmn. However, the corresponding changes to the other langs grouped under Chinese (e.g. yue) had a negligible effect, as there are very few large pages that link to one of them that don't also link to cmn. I have been able to achieve significant decreases in memory usage by using {{multitrans}} (e.g. cloud/translations is now using 22MB), and I've also implemented some efficiency measures in Module:scripts/findBestScript to ensure that the more intensive Hant/Hans checks are only run when (and for as long as) absolutely necessary.
The overall problem is that the checks for Hant and Hans require drawing down Module:zh/data/ts or Module:zh/data/st respectively, and checking each character in the term until it finds one that's a table key. To mitigate this, I've set it so that a Han script is returned as soon as a matching character is found, on the assumption that (1) traditional and simplified characters won't be mixed together in languages that use both, (2) any terms using Han + Latin will still need a Han code, and (3) Han characters aren't mixed in any other way. Doing that means we can take certain shortcuts, but I'm certain that the biggest memory issue is the fact that one/both the tables have to be loaded in the first place. Traditional is always checked first, on the assumption that it's more likely to be used in links as it's the lemma form.
Fundamentally, though, I've just adapted (and streamlined) the method that {{zh-l}} has been using for years, so if we want to radically change how traditional and simplified characters are recorded, we'll need to ensure that the Chinese modules are updated accordingly as well. Theknightwho (talk) 01:07, 17 January 2023 (UTC)[reply]
@Theknightwho Thanks for the detailed message. Several months ago I suggested redoing various Chinese tables as a single large string and I think that would work very well here. In this case it looks like all you need to do is a set lookup to see whether a character is in the traditional or simplified set. Instead of or in addition to having two big Lua tables, have one big string sorted by Unicode character data point for the simplified, and a corresponding string for the traditional. This will take a LOT less memory. You could for example construct a character class of all the characters in the string (possibly as simple as just "[" .. string .. "]") and then use mw.ustring.find() to check if any of the characters is in the string. An alternative taking advantage of the string being ordered is to binary search character-by-character (however, this might not turn out any faster since the searching will be in Lua, while the regex search implementation may end up being a loop in C code). The same strings could potentially be used for traditional->simplified conversion as well using the same binary-search technique; find the index of the traditional character and fetch the value at the corresponding string index in the simplified string. But at first for just Hans and Hant checking you don't need to worry about the conversion, and you don't need to replace the tables entirely, just add new modules holding the strings. Benwing2 (talk) 07:28, 17 January 2023 (UTC)[reply]
@Benwing2 Thanks - that's an excellent suggestion. The tables are updated semi-regularly (e.g. after new extensions to the CJK set, or if we decide to change which traditional variant should be treated as canonical etc). The main disadvantage of your proposal is that it's far less user-friendly to maintain, while being far more prone to user error. As such, I suggest we create an auxiliary module that can generate the regex(/whatever) from the current tables, and then use it to update Module:scripts/data(/wherever) every time the tables are updated. Ideally, this could be done by bot.
I've been adapting this Lua implementation of the Unicode collation algorithm in my userspace (in the hopes I can make it efficient enough for us to use), and that overcomes a similar problem: converting the data Unicode publishes into an efficient form. It's a bit clunky outputting raw Lua code from a template, but it works (and here's the input string, for comparison). Theknightwho (talk) 12:34, 17 January 2023 (UTC)[reply]
@Theknightwho It's a pity you aren't implementing the slightly more powerful CLDR collation algorithm. Are you similarly planning to tailor off-line? --RichardW57m (talk) 14:19, 19 January 2023 (UTC)[reply]

Rename/deprecate the |uc= parameter of Template:senseno

[edit]

This parameter uppercases the first letter of the template's output. Most other templates with a parameter like this call it |cap= or |nocap=, so it'd just be nice for consistency. 217.229.84.41 16:37, 19 January 2023 (UTC)[reply]

Mass import of Polish Dictionary Source Abbreviations

[edit]

This might be better for WT:BOTREQ but I am unsure. I have been working on converting lists of sources and their abbreviations into RQ templates for Polish. I currently have a spreadsheet with 6699 potential templates to create (however only 5966 will be imported, one set is from an older dictionary whose usage of these abbreviations isn't so regular and will have to be checked with time). This is done for the convenience of Polish editors - it will help check the etydating of entries as well as quoting.

A bot should be able to look at the spreadsheet and know exactly what to do.

In column A I have the name of template, for example (with exact text) T:RQ:pl:Abram. Okul.

In column B I have all of the information, for example {{quote-book|pl|author=Ignacy Abramowicz|title=Podręcznik okulistyki dla studentów i lekarzy|year=1947|page={{{page|}}}|pages={{{pages|}}}|text={{{1}}}|t={{{2|}}}}}<noinclude>[[Category:Polish quotation templates|A]]</noinclude>

I am unsure of the best way to share the full list: currently I have them separated out into tabs within the spreadsheet. I can potentially create a userspace page and paste in all the ones I want created, or I could email a bot owner the list. I would be eternally, forever grateful for helping implement this final step as I have been working tirelessly on this for about 2 months. Vininn126 (talk) 16:52, 19 January 2023 (UTC)[reply]

I should add some details:
I believe I have checked this 3 times and no template should share a name.
Some may or may not have a few formatting mistakes in them, I will manually clean those up when the task is done. I don't think it should be very many at all. Also, some will be redirects - these are for cases where two dictionaries share a source but not an abbreviation. Finally, 2 already exist, these were tests. They can be skipped. Vininn126 (talk) 17:03, 19 January 2023 (UTC)[reply]
@Vininn126 I can potentially help with this. However, I have a bit of a concern about having this many distinct reference templates. When you say "sources" are you referring to primary sources from which Polish usexes/quotes can be pulled, or secondary sources along the nature of dictionaries, grammars, scholarly papers, etc.? If the former, it might be better to use {{Q}} for all of them. Module:Quotations, which implements {{Q}}, supports abbreviations for authors and works, and if there are missing features in {{Q}}, I may be able to add them. If you're referring to secondary sources, you'd presumably want to group them into fewer templates in some logical fashion. Benwing2 (talk) 04:22, 21 January 2023 (UTC)[reply]
@Benwing2 These are titles of books and journals and such. Vininn126 (talk) 11:13, 21 January 2023 (UTC)[reply]
@Benwing2 I will also note that English itself has thousands of templates and it doesn't seem to cause a problem, you also told me yourself an RQ template would be better! Vininn126 (talk) 11:16, 21 January 2023 (UTC)[reply]
@Vininn126 Hmm you are right about English. IMO the English templates used for primary sources would maybe be better structured using {{Q}} but at this point it's not clear it is worth it to change everything. When you say titles of books and journals, are they scholarly works or primary sources? Maybe it would help me to see some examples of what's in your spreadsheet. Can you paste maybe 200-300 lines into a userspace page? Benwing2 (talk) 20:48, 21 January 2023 (UTC)[reply]
@Benwing2 I added some to my sandbox. These are primary sources used in other dictionaries. The aim is to be able to use these abbreviations found in them (i.e. [ https://sjp.pwn.pl/doroszewski/uwiazac;5513191.html Bobiń. Chłop.] here). Not every quote from every dictionary will be added, more ones that will be difficult to quote and also for checking dates of things. I.e. I have many from a dictionary from 1807 that often has the earliest usage of something. Vininn126 (talk) 20:55, 21 January 2023 (UTC)[reply]
@Benwing2 Should I upload a list somewhere of all the templates I'd like to add? Vininn126 (talk) 10:26, 24 January 2023 (UTC)[reply]
@Vininn126 Apologies, I haven't had a chance to review the ones you've uploaded so far, but feel free to email me the full list if it's too big to upload to a userspace page. Benwing2 (talk) 09:25, 25 January 2023 (UTC)[reply]
@Benwing2 I emailed you the document with instructions. Vininn126 (talk) 19:23, 26 January 2023 (UTC)[reply]
@Vininn126 Thanks, I'll review it shortly. Benwing2 (talk) 03:18, 27 January 2023 (UTC)[reply]
@Benwing2 Progress report? Vininn126 (talk) 19:51, 5 February 2023 (UTC)[reply]
@Vininn126 Let me see what I can do about this today or tomorrow. Benwing2 (talk) 22:06, 5 February 2023 (UTC)[reply]
@Benwing2 Vininn126 (talk) 19:26, 8 February 2023 (UTC)[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── @Vininn126 I took a look at the Excel file you sent me. Mostly it looks good. Some comments:

  1. There are several templates where the noinclude section that adds it to the appropriate category is in the middle of the template code, and looks to be duplicating the noinclude section at the end; e.g. #624 under SXVI, and nearby ones. This should be fixed (and in any case I'm pretty sure I can hack things so this category is added automatically, so you don't have to include the noinclude section at all).
  2. Some of the names have brackets in them, like #610 under SXVI. I don't think brackets in template names are allowed; and in any case it's not a good idea to have them. You could potentially use parens, although maybe better to use something else, like slash or colon.
  3. Some of the names e.g. #495 under SXVI have names that don't look quite right: T:RQ:pl:Otwin(?)Erot. Do we really want a question mark like this in the name?
  4. Under the Linde tab, the definitions are split across several cells. Can you put them into a single cell like for the other tabs?

Thanks! Benwing2 (talk) 22:41, 8 February 2023 (UTC)[reply]

1) Please hack them!
2) Let's use colon.
3) We can use a space instead of (?)
Please do not add the Linde tab~, as mentioned before. ~~ Vininn126 (talk) 22:48, 8 February 2023 (UTC)[reply]
@Benwing2 Any other technical issues? Vininn126 (talk) 08:21, 11 February 2023 (UTC)[reply]
@Vininn126 Ack, sorry, I didn't see your message, I was expecting a ping. I may encounter other issues as I push them; I'll let you know. Benwing2 (talk) 08:23, 11 February 2023 (UTC)[reply]
@Vininn126 Haven't forgotten this; I looked more into the templates you specified and there are various other issues, e.g. stray quotation marks and duplicated definitions, that will need more hacking to fix. Also, given the way you've set up the templates I continue to think it's better to use {{Q}}; I'm looking into this. Benwing2 (talk) 23:49, 13 February 2023 (UTC)[reply]
@Benwing2 Care to elaborate? Vininn126 (talk) 08:48, 14 February 2023 (UTC)[reply]
@Benwing2 Why would Q be better? Vininn126 (talk) 13:58, 17 February 2023 (UTC)[reply]
@Vininn126 My apologies for the delay. The basic issue is that each of the templates has a very similar (hence repetitive) structure, including e.g. |page={{{page|}}}|pages={{{pages|}}}|text={{{1}}}|t={{{2|}}}. Here you are propagating the |page= and |pages= parameters down to {{quote-book}}. There are lots of other parameters that {{quote-book}} takes. Now, imagine you decide that you want your templates to be able to specify one of those other parameters. If you have 6000 separate templates, you have to modify every one of those 6000 templates any time you want to make a change of this sort. If you have only a single template, however, like {{Q}}, you only have to modify one place. User:Sgconlaw and I have run into this exact issue with the English quotation templates. Similar issues occur with the documentation pages; with 6000 templates, you need 6000 documentation pages to document the parameters, and if you add a new parameter, you have to modify 6000 documentation pages. In addition, the structure of the English quotation templates at least involves fully-written-out author last names and titles, whereas you'd prefer to have abbreviations like AgrNauka; this is essentially what {{Q}} already supports. I think your main concern with using {{Q}} is (a) the lack of a page number (which I will add), and (b) the param format, which requires you to use |quote= to specify the source text and |t= to specify the translation. In terms of addressing this, a simple fix is to add an abbreviated |q= to specify the source text but IMO a better way is to have a different interface where instead of using the numbered params to specify author, title, section, etc., there is a single param to specify all this information (maybe using a slash or comma as a separator), and further params specify the source text and translation, so we have something like this:
{{Q|pl|AgrNauka|TEXT|TRANSLATION}}
There are about 8000 uses of {{Q}} currently so it should not be hard to convert them to a new interface by bot. Benwing2 (talk) 03:28, 18 February 2023 (UTC)[reply]
@Benwing2 Would it be possible to include urls? Vininn126 (talk) 08:53, 18 February 2023 (UTC)[reply]
@Vininn126 Yes, why not? Benwing2 (talk) 11:04, 18 February 2023 (UTC)[reply]
@Benwing2 If that's the case, then sure, that's fine by me. I'd want to convert any existing Polish RQ templates to Q. Vininn126 (talk) 11:10, 18 February 2023 (UTC)[reply]
@Benwing2 Progress? Is there anything I can do to help? Vininn126 (talk) 15:29, 26 February 2023 (UTC)[reply]
@Vininn126 Apologies, I've been going through the Module:Quotations code but distracted by lots of other things. I'll make this a higher priority as it sounds like you need it soon. Benwing2 (talk) 20:38, 26 February 2023 (UTC)[reply]
@Benwing2 It's more I'm just trying to make sure it doesn't get buried and forgotten! I appreciate you're a busy person, coders on Wiktionary usually are, that's why I've been waiting a bit of time between each reminder. However there have been many times where it'd've been very useful, Vininn126 (talk) 20:41, 26 February 2023 (UTC)[reply]
@Benwing2 If you convert the existing RQ templates, please do them all except T:RQ:pl:ASZDziennik. Vininn126 (talk) 23:32, 1 March 2023 (UTC)[reply]
@Vininn126 Thanks. Haven't forgotten about your request :) ... Benwing2 (talk) 23:45, 1 March 2023 (UTC)[reply]
@Benwing2 sorry, I just remember T:RQ:pl:Wiedza is also special and shouldn't be converted because it uses a module! Vininn126 (talk) 01:47, 2 March 2023 (UTC)[reply]
@Benwing2 Okay sorry for another ping, but I just remembered I've set up some templates as redirects, will you be able to convert that into the Q module? Vininn126 (talk) 11:55, 11 March 2023 (UTC)[reply]

Swedish/northern Sami bug?

[edit]

Hey there, I'm a total noob on Wiktionary but I found a weird thing on this page: https://en.wiktionary.org/wiki/haugr

{cog|se|hög||hill} should link to the wiki article on Swedish, but instead it shows up as Northern Sami hög (hill). Which it does here as well! Envispojke (talk) 19:03, 19 January 2023 (UTC)[reply]

@Envispojke Hi - this wasn't a bug. The code for Swedish is sv, but someone had put se (the code for Northern Sami) instead. I've corrected it. Theknightwho (talk) 19:29, 19 January 2023 (UTC)[reply]
Ok! Just a coincidence that it's a minority language in Sweden, then. Weird abbreviation by ISO I think, but that's another story. Thanks for fixing it and solving the mystery! Envispojke (talk) 20:07, 19 January 2023 (UTC)[reply]

Speech Politics is a thing

[edit]

Speech Politics:

English:

1- Speech politics, also politics of speech, reductionism of speech, and speech reductionism, refers to political positions that defend that certain expressions and words carry a political, social, cultural and historical nature regardless of their origin and etymology.

2- The idea that certain expressions carry a political and social nature regardless of their origin, etymology, and context.

References:

https://pt.m.wiktionary.org/wiki/Pol%C3%ADtica_da_Fala

I don't see why is this concept harmful, since Speech Politics is as real as identity politics, postmodernism, neopositivism, atheist fanaticism, atheist fundamentalism, atheist extremism, cultural reductionism (reducionismo cultural), and class reductionism. — This unsigned comment was added by 2804:14D:AE82:8048:147E:2139:85D6:61FA (talk) at 19:03, 20 January 2023 (UTC).[reply]

Your entry creation was blocked by the abuse filter because it was an unformatted wall of text with a url in it- which is a sign of vandalism 90% of the time.
As for the entry itself:
  1. Wiktionary is a descriptive dictionary. We deal with words, phrases, etc. in the languages of the world as they are used- not "things". In other words: terms, not topics. If you want to write about a topic, go to an encyclopedia like Wikipedia- just make sure you have references from reliable sources to back you up.
  2. A link to a wiki is not a reference. It's too easy to add stuff that's just made up, and people do it all the time. Besides which, wikis are for presenting material that's referenced from somewhere else, not for being the original sourse.
  3. The source for our main content should be usage (etymologies, etc. are different), though less-documented-languages can be referenced from reliable works that describe what is known about the language as it was or is used.
  4. Wiktionary is case-sensitive, so the entry would be at speech politics if we were to decide to have one.
  5. Wiktionary has a well-established format that maintains consistency across our millions of entries and thousands of languages. You can't just write an encyclopedia article or a blog and stick it on a page.
  6. This is English Wiktionary, and our definitions are in English. Your English needs work.
Please read Wiktionary:What Wiktionary is not, Wiktionary:Criteria for inclusion and Wiktionary:Entry layout.
Chuck Entz (talk) 23:44, 20 January 2023 (UTC)[reply]

Hyphenation and multiword terms

[edit]

Hello. What is the proper way to use the Hyphenation template on multiword terms, for instance on момина солза? Thanks. Gorec (talk) 12:42, 21 January 2023 (UTC)[reply]

Hyphenation is different from syllabification. Either way I don't think the template is used on multiword entries, instead individual words get it. Vininn126 (talk) 12:44, 21 January 2023 (UTC)[reply]
Thanks @Vininn126. The "trick" which @Fytcha used, in the meantime, actually works :) I also thought the template doesn't work on multiword terms.
It seems only some of the languages have their own Syllabification templates, and there is no a generic Syllabification template!? In Macedonian, more or less, the hyphenation and syllabification are the same, except in some cases. I wondered how to add Syllabification in those cases?! Gorec (talk) 13:17, 21 January 2023 (UTC)[reply]
What you need to do for syllabification is set the title. Vininn126 (talk) 13:22, 21 January 2023 (UTC)[reply]
How to do that? If you meant on adding a parameter "title=Syllabification" it says "The parameter "title" is not used by this template.". Do I need to create a separate template? Thanks.
Pinging interested parties: @Andrew012p, Martin123xyz, Dimithrandir -- Gorec (talk) 13:37, 21 January 2023 (UTC)[reply]
Using caption, as listed in the documentation ;) Vininn126 (talk) 13:38, 21 January 2023 (UTC)[reply]
It actually works :) I thought on that, but I wasn't sure if that was the correct way. I didn't want to make a mistake. Thanks, Vininn126! Gorec (talk) 13:45, 21 January 2023 (UTC)[reply]

Module errors in Swadesh list Appendixes

[edit]

Appendix:Indo-Aryan Swadesh lists and Appendix:Pahari Swadesh lists are currently in CAT:E because they ran over the Lua time limit. I've looked at the edit histories of everything in their transclusion lists, and there were no changes immediately preceding that could explain this. I suspect that recent changes to script-related modules might have started a process that led to this, but that's just a guess. It could also be an interaction of various factors and perhaps changes on the backend that we don't know about.

At any rate, this has me concerned. I have no reason to believe that this is something unique to these lists, so we will likely see more and more of these, until all of the comparative Swadesh lists that use {{Swadesh list auto}} are affected.

In the past, I've fixed quite a few problems like this in big appendix pages by simply splitting them into smaller ones. I can't do that here, because the entire list comes from a single module invocation. Converting these back to using wikitables without the Swadesh list modules seems like a step backwards.

Perhaps we should provide support for using parts of the list: a [first number] and a [last number] for the range included. If [first number] is empty it would default to 1, and if [last number] is empty it would default to the end of the list. I've split some Swadesh lists by thematic patterns within the lists, or using arbitrary numbers based on how much a single page can safely contain would work. Chuck Entz (talk) 23:51, 21 January 2023 (UTC)[reply]

@Chuck Entz It could be e.g. something that changed in an Indo-Aryan transliteration module, since both lists incorporate lots of Indo-Aryan languages. It might even be that someone introduced an infinite loop. Benwing2 (talk) 05:06, 22 January 2023 (UTC)[reply]
@Benwing2 I think I would have spotted that. There are three translit modules in the transclusion list for Appendix:Pahari Swadesh lists: Module:ne-translit, Module:hi-translit, and Module:Guru-translit, none of which has changed since last July. There's nothing recent in the Module:Swadesh submodules, either (I thought I missed one, but it was January 23, 2021 instead of January 21, 2023). There was an edit to Module:IPA/data, which added an item to the list for Maltese. There was an edit a bit less than 23 hours ago to Module:languages/data2 fiddling with the display text for Mongolian and Sanskrit (neither of which is part of the Swadesh list itself), and similar ones to Module:languages/data3/b (Buryat), Module:languages/data3/c (Classical Mongolian), Module:languages/data3/d (Daur), and Module:languages/data3/x (Classical Tibetan). Other than that, nothing within 24 hours. Of course, within the last week or so there have been all the changes re: script recognition, etc. that are related to the progressive increase in memory errors. It's plausible that they might have similar effects on execution times, but that's all over my head. Chuck Entz (talk) 06:57, 22 January 2023 (UTC)[reply]
@Chuck Entz Hmm, the best I can think of is to try removing some of the languages from the lists and see how many have to be removed to get it under 10 seconds. This should tell us whether the times have increased slightly or dramatically; in the latter case, it's probably a bug in a specific module (not sure which one, of course), but in the former case, it's less clear. Benwing2 (talk) 07:11, 22 January 2023 (UTC)[reply]
The timeouts in the Swadesh lists have been mysteriously resolved. I looked at recent changes to modules and there are lots of changes to transliteration modules and language modules, but nothing jumped out at me. I made some changes to Module:Swadesh to slightly reduce the number of Lua bytecode instructions executed in loops (mainly setting local variables for fields in tables rather than indexing the fields multiple times), but that didn't fix the problem at the time.
The biggest share of the extra Lua execution time seems to be taken by some code with relatively expensive operations done in loops. When I preview Appendix:Indo-Aryan Swadesh lists with this old version of Module:languages, the Lua time is only 1.9 seconds, whereas it's 8.9 seconds now (and was over 10 seconds when it was timing out two days ago). After that version, Theknightwho made a change to Module:languages that causes the makeEntryName and transliterate functions in Module:languages to put text in certain scripts through the toFixedNFC and toFixedNFD functions in Module:string, which replace some deprecated characters and fix the order of combining diacritics (?). These scripts are identified by normalizationFixes = true in Module:scripts/data, and some of them are Indic scripts used in entry names in Appendix:Indo-Aryan Swadesh lists. makeEntryName and transliterate are used to generate link targets and link text respectively in Appendix:Indo-Aryan Swadesh lists. toFixedNFC and toFixedNFD call str.fixDiscouragedSequences and fixNormalization, which does somewhat time-intensive things in loops, like calling mw.ustring.gsub (which sends a string to PHP and back), concatenating strings, and modifying tables. Actually, str.fixDiscouragedSequences seems to be take most of the extra time. Commenting out the call to str.fixDiscouragedSequences in fixNormalization in Module:string and previewing Appendix:Indo-Aryan Swadesh lists gives Lua time usage of 1.9 seconds, as before normalization fixes were added. Perhaps that function can be optimized. — Eru·tuon 09:58, 23 January 2023 (UTC)[reply]
In what I suspect is a related problems, two or three of the Pali script-specific noun-declension tests (Module:pi-decl/noun/Deva/testcases, Module:pi-decl/noun/Sinh/testcases and inconsistently Module:pi-decl/noun/Beng/testcases
are timing out. The Brah, Myan, and Khmr variants are working fine. Thai and Laoo may be working fine, but I'm not sure that there should be no transliteration to the Roman script. The commonality with the Swadesh lists is likely to be the heavy use of transliteration. I was looking at the behaviour between 12:15 and 12:45 UTC, which may be relevant if someone were experimenting with the transliteration set-up, but I don't see any changes. --RichardW57m (talk) 13:11, 23 January 2023 (UTC)[reply]
There isn't a problem with the Thai and Lao script tests for Pali noun description - I explicitly disabled transliteration for space reasons. RichardW57m (talk) 18:22, 23 January 2023 (UTC)[reply]
We also now have the similar Module:pra-decl/noun/testcases/documentation running out of time. --RichardW57m (talk) 13:11, 23 January 2023 (UTC)[reply]
Isn't built-in Lua function require() itself expensive? Shouldn't its unchanging results be cached in Module:languages rather than invoked a couple of times for every transliteration? --RichardW57m (talk) 14:02, 23 January 2023 (UTC)[reply]
@RichardW57m: require does cache module return values within a module invocation ({{#invoke:module|function}}. See the source code. So caching module return values in Module:languages could only save a function call and table look up. This is negligible unless require is called many times in a module invocation, and that does not happen in common templates like {{m}} or {{head}}. — Eru·tuon 17:21, 23 January 2023 (UTC)[reply]
I'm having trouble investigating the timing of the Pali declension testcases. Is there a simple way of experimenting with Module:languages or Module:scripts/data? Fiddling with either's sandbox, at least without saving it, doesn't seem to work, and I can't find the instructions for using sandboxes. I can do some testing by unsaved editing of e.g. Module:pi-decl/noun/Deva/testcases, and that tells me that switching off transliteration makes the test cases complete for scripts Deva and Sinh. (Beng is on the edge - it only sometimes times out.). I'm wondering if it would be quicker for str.fixDiscouragedSequences to call mw.string.gsub instead of mw.ustring.gsub; the substitutions don't need the concept of character. --RichardW57m (talk) 18:37, 23 January 2023 (UTC)[reply]
@RichardW57m: You can preview a page with a module using the "Preview page with this template" box below the editing area of a module or template. There is also Special:TemplateSandbox, which lets you test multiple modules in your user sandbox, Module:your username/Module:module name (Module:User:Erutuon/Module:grc-conj for instance; the redundant Module: is unfortunately necessary). — Eru·tuon 19:52, 23 January 2023 (UTC)[reply]
Unfortunately, the first sentence only applies if one is allowed to edit the module. I'm not allowed to edit the modules I asked about. The second method does work, though it's a little confusing in that the 'Sandbox prefix' in the form also requires the word 'Module', and seems to only work on saved pages. So thanks, I can now confirm that it is the setting of the normalizationFixes for script Deva that is causing the test bed not to complete; the extra time consumed is mostly spent in gsub(). --RichardW57m (talk) 10:37, 24 January 2023 (UTC)[reply]
Yeah, I think it's a design flaw that someone can't change the source code and "preview page with template" on a module where they don't have the permission to save the edit. That would be very useful and safe, and allow them to propose changes after verifying that they work (as long as they don't require editing more than one module), though it would be annoying if someone makes a lot of changes and then suddenly notices they aren't permitted to save them. — Eru·tuon 17:58, 24 January 2023 (UTC)[reply]
What's the case for using toFixedNFC()? Doesn't it just delay the correction of incorrect data? --RichardW57m (talk) 14:02, 23 January 2023 (UTC)[reply]
@RichardW57 The main use case is that it normalizes the data, which (a) ensures it displays correctly and links work etc., and (b) ensures that it's in the expected form before it gets processed (e.g. for transliteration). In terms of manually fixing these, both {{head}} and any linking templates (e.g. {{l}}) will categorise any pages that contain these in Category:Pages using discouraged character sequences - which currently contains about 200 pages. Once those are all fixed, we might want to change things so that they simply throw an error (but that could have downsides, as it means less tech-savvy contributors are likely to be discouraged from contributing if they don't understand what the issue is; particularly when it comes to things like the Malayalam chillus). Theknightwho (talk) 07:43, 25 January 2023 (UTC)[reply]
So how do I search for the discouraged text in quarter? It would help to split the category by script. The categories would benefit from having a warning that they should not be deleted just because they are empty. --RichardW57m (talk) 17:19, 25 January 2023 (UTC)[reply]
There's something odd going on with the categorisation - the category only lists one of the two well-known deprecated Khmer script independent vowels. (Isn't U+17A8 ឨ KHMER INDEPENDENT VOWEL QUK also discouraged?) --RichardW57m (talk) 17:19, 25 January 2023 (UTC)[reply]
For throwing an error, the error should say what the error is, e.g. 'Chillu RR should be encoded as the atomic character ർ, not as three code points, even though they look the same'. This would actually be useful now! --RichardW57m (talk) 17:19, 25 January 2023 (UTC)[reply]
For the first five chillus supported, the Unicode 14.0 Standard (Section 12.9) says, "Because older data will use different representation for chillu forms, implementations must be prepared to handle both kinds of data". As the Wiktionary search engine does not fold the two encodings together, the sequence encoding should exist as a hard link to the atomic encoding. It makes sense to do the same for all nine chillus. --RichardW57m (talk) 17:19, 25 January 2023 (UTC)[reply]
@Theknightwho: The vast majority of those 200 pages use the older chillu encoding, so overstates the use of discouraged sequences. In looking through the pages, I've actually found a case where bad transliteration would have highlighted the problem if anyone had looked at the table, which is hidden by default. (A bad Devanagari dependent vowel sequence O E had been converted to AU, stopping transliteration from failing and making it obvious.) --RichardW57m (talk) 11:35, 26 January 2023 (UTC)[reply]
@RichardW57 The older chillu encoding is actually worse than the other issues, because it creates redlinks even when we have pages, yet doesn’t appear broken under normal circumstances. It’s not relevant that it used to be the proper sequence, as it hasn’t been in quite a long time now.
Given that the display, link and transliteration are all corrected for, there is no urgency with any of these anyway. Theknightwho (talk) 12:54, 26 January 2023 (UTC)[reply]
Some links have actually been broken, e.g. Kashmiri اٟ (ụ̄) to ٳ. --RichardW57m (talk) 16:19, 26 January 2023 (UTC)[reply]
The page was moved today, so the break no longer shows. Both references now, after following a hard link, reference the same page. --RichardW57 (talk) 15:11, 28 January 2023 (UTC)[reply]
We should move that page, because that character has explicitly been deprecated, like the ones in Tibetan and Khmer. It seems there is a bug where some (but seemingly not all) headwords which use these aren’t categorising as using discouraged sequences, so I will fix that. Once I’m home, I’ll also add an override, as there are still very occasional times when we might want to link to these (e.g. , which essentially just says “this is deprecated” and explains why). An override already exists for entry name changes (e.g. Latin needs to include the macron), and I think this can use the same method. Overrides will be very rare, in any event.
On your other points:
  • Searching for discouraged text on a page is an issue, but I didn’t want to add 200 pages to CAT:E, which would have made this an urgent problem. The best solution to get rid of our pre-existing ones is probably to use a bot, with an error/edit filter put in place afterwards to stop the issue happening again.
  • No need for a warning about not deleting the category, as it has the __EXPECUNUSEDCATEGORY__ magic word which prevents it from being listed for deletion when empty.
  • The two deprecated Khmer characters are U+17A3 () and U+17A4 (អា). I’ll have to look into why the latter wasn’t being categorised, but it is now. Same bug as with ٳ.
  • Your suggestion about a hard redirect isn’t possible to implement automatically, as each hard redirect must be added manually. What this fix does is the next-best thing, which is to link to the proper page automatically, display the correct form (where that would appear broken), and to feed the expected format to the transliteration modules et al. For all normal intents and purposes, it doesn’t matter which the user uses. This is important, because it creates a form of standardisation that means we don’t end up with redlinks or broken transliterations etc due to users using different formats. However, we do still want to fix them eventually, because bots tend to look at the raw wikicode.
Theknightwho (talk) 17:36, 26 January 2023 (UTC)[reply]
@The knightwho: One reason hard directs are needed is that the Wiktionary search function, unlike Google's, doesn't fold the two encodings of chillus. --RichardW57 (talk) 08:56, 27 January 2023 (UTC)[reply]
I'm surprised bots can't create hard redirects. Broken transliterations should only result from gross misspellings; they may turn up because of an incomplete transliteration scheme. Incidentally, typing half form plus vowel for a full Devanagari letter is an abomination and in no way deserves support. We don't support backspace (U+0008) for overstriking. --RichardW57 (talk) 08:56, 27 January 2023 (UTC)[reply]
Overrides will probably also be useful in catalogues - I think that applying the cleanup to most Devanagari strings is still breaking some pages. --RichardW57 (talk) 09:01, 27 January 2023 (UTC)[reply]
One trick for indicating text that has been altered without raising a module error is to add warning text in red as is done on Wikipedia with the CS1 and CS2 reference formatting. It may be a bit fiddly, as I have a recollection of some transliteration modules not tolerating embedded Roman script. It's something one should get working before installing, rather than trash the page cache yet again. --RichardW57m (talk) 13:21, 27 January 2023 (UTC)[reply]
I will look at optimising this issue. I'm surprised at the extreme additional load this has imposed, but it shouldn't be complex to fix at all. Theknightwho (talk) 20:59, 23 January 2023 (UTC)[reply]
Well, when a test compares 2 ways of generating 20+ forms (8 cases, 2 numbers, many with alternative forms) of 27 words, transliterating each of them separately, the numbers soon mount up. --RichardW57 (talk) 09:13, 24 January 2023 (UTC)[reply]
20+ averages out as 29. --RichardW57m (talk) 10:38, 24 January 2023 (UTC)[reply]
The bad news is that mw.ustring.gsub is much slower than string.gsub. If one replaces the former by the latter in function fixDiscouragedSequences in Module:string, the timing problem goes away. I can't check the change doesn't break anything - there don't seem to be test cases for these changes! Given the error-prone organisation of the array fixedCompositions, which I think should be transposed, there ought to be confirmation that one of the Egyptian characters is modified correctly, e.g. of the rendering of 𓈗. (It is a shame that the fix stops it displaying properly on MS Edge with the standard Windows 10 fonts.) --RichardW57m (talk) 11:40, 24 January 2023 (UTC)[reply]
@RichardW57m: Only one pattern in fixedCompositions requires mw.ustring.gsub, the second one "्‍?ा" because it contains ? (see Wiktionary:Scribunto#Basic string patterns). All the others are literal strings, so string.gsub will behave exactly the same. Another optimization would be to put the 9 of the 140 replacements that search for a single code point into a table single_character_replacements = { [input] = "replacement" } and do text:gsub(non_ascii_utf8_code_point_pattern, single_character_replacements). I might write some code to categorize or transform the replacements and then paste them into Module:string and rewrite fixedCompositions. — Eru·tuon 21:27, 24 January 2023 (UTC)[reply]
@Erutuon: Whoa!
Before you put effort into a load of substitutions, why do we actually want these substitutions in the first place? Surely we want to notice misencoded text (and renderers generally try to help us there) and correct it, rather than eliminate valid warnings inserted by renderers. --RichardW57 (talk) 00:23, 25 January 2023 (UTC)[reply]
While "्‍?ा" does need mw.ustring.gsub(), it may be computationally quicker to do the rest by dumb application of string.gsub(). If we keep this functionality, we want the code to be easy to maintain, as we are likely to get more bad sequences whenever an Indian Indic script is added to Unicode. --RichardW57 (talk) 00:23, 25 January 2023 (UTC)[reply]
@Theknightwho: There is faulty logic in checking whether any of the changes need to be made. It assumes that the strings to be replaced are single characters, but that only applies to Arabic, Khmer and Egyptian. In most cases, it is the replacement that is a single character. One could improve the logic by just looking for the first characters of the strings to be replaced, though that wouldn't be much of an improvement for Malayalam. (Are you replacing grandfathered chillus? Some fonts even support the old encoding for subsequently discovered chillus!) --RichardW57m (talk) 12:38, 24 January 2023 (UTC)[reply]
@RichardW57m Could you please explain the faulty logic? I'm not seeing where this is happening at present, as the substitutions work regadless of the number of Unicode characters to be replaced. Theknightwho (talk) 23:26, 24 January 2023 (UTC)[reply]
@Theknightwho: You guard the whole set of substitutions by
if mw.ustring.match(text, "[" .. table.concat(fixedCompositions[1]) .. "]") then
with the comment, 'If no characters need fixing, just return text'. If each element of fixedCompositions[1] were a single character, that would work fine as well as looking elegant. However, what it translates as is, "If text contains any of the characters in any of the strings to be replaced, then apply all the transformations." For Devanagari, they will apply to any text string with LETTER A, LETTER E, VIRAMA, VOWEL SIGN AA, VOWEL SIGN E, or VOWEL SIGN AI. That's a very high proportion of Devanagari text strings, whereas I think you expected the guard to eliminate the vast majority of text strings. Now, when the guard fails to prevent a totally unnecessary set of substitutions, the code as a whole does what it was intended to, but far more laboriously than intended. --RichardW57 (talk) 00:10, 25 January 2023 (UTC)[reply]
Correct, though it was just intended as a provisional screening measure. I also implemented it when the number of substitutions was small (and therefore affected a small number of pages). Do note that we also apply these kinds of substitutions en masse via Module:languages/data2 et al without such issues. As such, I'm reworking things so that they work in the same way, which should drastically reduce the number of substitutions performed. Theknightwho (talk) 00:24, 25 January 2023 (UTC)[reply]
@Theknightwho: If you're talking about collation weights, I don't think we would ever calculate the sort keys for 1554 strings on a single page. --RichardW57 (talk) 01:21, 25 January 2023 (UTC)[reply]
@RichardW57 Sort keys are calculated for every term in a column template, and entry names are calculated for every linked term - so yes, on some pages we will be doing that. For example, water/translations contains around 3,500 links. In any event, I've implemented a new method for calculating these along the same lines, and the timeouts seem to have stopped. Theknightwho (talk) 01:27, 25 January 2023 (UTC)[reply]
It's also affecting Module:pra-decl/noun/gallery/Brahmi/documentation, and Module:pra-decl/noun/gallery/Devanagari/documentation, both of which I had split off from a much larger page just days ago, which itself was originally split off from a larger page in May, for similar reasons. As of today Wiktionary:Frequency lists/Hindi 1900 is also in CAT:E, and I don't remember ever having seen it there before. It seems something is going on with the Indic script modules. 70.172.194.25 23:18, 24 January 2023 (UTC)[reply]
Yes. It affects anything using many transliterations of Indian Indic scripts. It's being worked on. --RichardW57 (talk) 00:30, 25 January 2023 (UTC)[reply]
I've made a dedicated function to generate the table in Wiktionary:Frequency lists/Hindi 1900, so at least it manages to complete, though it's still slow. — Eru·tuon 00:50, 25 January 2023 (UTC)[reply]
For what it's worth, the old revision of this is no longer showing errors - but there's no point in changing it back. Theknightwho (talk) 01:35, 25 January 2023 (UTC)[reply]
@Erutuon @RichardW57 I've amended the substitution functions in Module:languages and Module:scripts to pre-screen whether mw.ustring.gsub is actually necessary before doing each substitution, by checking for the presence of specific magic characters. If they aren't there, it just uses string.gsub. This seems to have speeded things up across the board: the Lua CPU time on Appendix:Indo-Aryan Swadesh lists is now 2.5 seconds (as opposed to 1.9 seconds before this function was added). Theknightwho (talk) 07:31, 25 January 2023 (UTC)[reply]
Wow, this really is much faster! Great work! Wiktionary:Frequency lists/Hindi 1900 renders in about 1 second now, and even the version prior to Erutuon's optimization is down to 3 (out of the limit of 10 that it was exceeding earlier today). 70.172.194.25 08:47, 25 January 2023 (UTC)[reply]
@70.172.194.25 I intend to revert Module:pra-decl/noun/gallery/documentation to before the split, as the deleterious text changes no longer exhaust the time limit. I will then request the deletion of the then-isolated former portions, possibly on an experimental charge of copyright violation. --RichardW57m (talk) 15:45, 25 January 2023 (UTC)[reply]
I've reverted it and tagged those subpages with {{delete}}. That said, the time usage is now 7.501/10.000 seconds, certainly better than before but still not exactly great, so we may need to split it again someday if Lua bloat increases (hopefully won't be necessary). 70.172.194.25 09:00, 26 January 2023 (UTC)[reply]

Ghost text

[edit]

I have just found some odd behaviour of the site when using Chrome -- perhaps you can explain it. I think I also had a similar thing happen once before (a few years back), but I was in a hurry then, so didn't have time to investigate.

I was reading a social media post where a woman described herself as a fashionista. I could take a good guess what that meant but, as I often do, I checked our site to see if we had the word, and if so, whether we had it in the sense in which it seemed to be used.

I did a search for it using our normal search box (and later did a site: Google search with the same result), and it came up with 1 use, on the design page, with the context partitive designia designeja inessive designissa designeissa elative designista designeista illative designiin designeihin adessive designilla designeilla.... That had the look of a table of congugation or declension, and I loved the names of some of the items, particularly elative(!) and wondered which language's grammar used that term. so I clicked through to design and did a ^F search for designista. No matches. I wondered if perhaps Windows ^F was now designed to ignore hidden text, so I went down the page showing everything I could find which was hidden. I did another ^F search -- again no hits. Perhaps it was in the raw text? I clicked on edit and searched again. No hits. Perhaps there was just a list of endings which were concatenated with the root? So I ^F searched the raw text for ista. Still no hits.

So I'm intrigued, and out of ideas. Why can't I find the text? --Enginear 02:18, 22 January 2023 (UTC)[reply]

Go down to the Finnish entry and view the declension table- those are all there (Finnish has lots of interesting grammatical cases). I believe those are all generated by a module, so they wouldn't be in the wikitext. The one unanswered question is why only some lines in the table are included. Chuck Entz (talk) 02:37, 22 January 2023 (UTC)[reply]
It's clear when you look at the search results: the form the user searched for (designista I presume) is slap bang in the middle. This, that and the other (talk) 04:16, 22 January 2023 (UTC)[reply]
You missed my point -- I saw it in the search results -- and even quoted that line in my post, but I couldn't find it on the page, which puzzled me. Now sorted. --Enginear 06:08, 22 January 2023 (UTC)[reply]
To be clear, when I said that, I was replying to Chuck's "unanswered question". This, that and the other (talk) 06:12, 23 January 2023 (UTC)[reply]
Oh yes, thanks. I must have failed to more the Finnish table. I was looking for shows and missed it. ^F gives the result when it is shown.
It's only a context line showing a few words before and after, and it seem to have included the 2 lines before and after, so that's OK. --Enginear 06:05, 22 January 2023 (UTC)[reply]
I assume you mean designista (Special:Search/designista). This, that and the other (talk) 04:15, 22 January 2023 (UTC)[reply]

The verb gire and {{it-verb}}

[edit]

Hello, everyone.
I've been working on the Italian entry for gire, a (now mostly obsolete) defective verb derived from Latin īre meaning “to go”.
Now, this verb ostensibly lacks a 1st-person singular indicative present form, so I used {{it-verb}} as follows:

{{it-verb|-,+,+}}

which returns:

gìre (no present, first-person singular past historic gìi, past participle gìto, no subjunctive, no imperative, auxiliary èssere)

This is incorrect, though, because the verb does have attested present forms, namely gìmo (1st-person plural) and gìte (2nd-person plural); also, the imperative form gìte (2nd-person plural) is present. While specific forms can be added when using {{it-conj}}, {{it-verb}} doesn't seem to allow that.
My question is: is there a way to actually specify the existing forms using {{it-verb}}? Or maybe the fact that the headword says there are no present or imperative forms, while the conjugation table instead lists some, is just a non-issue?
Also, if this is not the right place to ask this, please let me know. — GianWiki (talk) 10:23, 22 January 2023 (UTC)[reply]

@GianWiki Hi. You are right that {{it-verb}} currently doesn't handle verbs like gire perfectly. If there's no first-person singular, it looks for a third-person singular, and if that's missing, it falls back to displaying "no first-person singular". Generally what you should do is copy whatever you put in {{it-conj}} into {{it-verb}} as well to keep them in sync. Let me see about fixing it so it looks for other forms from the same row; in the meantime we can leave it slightly incorrect with the knowledge that the conjugation table shows everything correctly. Benwing2 (talk) 21:54, 22 January 2023 (UTC)[reply]

Intentionally unknown gender

[edit]

At the moment, marking g=? in e.g. {{la-noun}} automatically categorises entries under Category:Requests for gender by language, and there's no way to avoid this unless I'm missing something. It would be helpful to have an option to distinguish entries with unknown gender, for things like hapaxes where there's no way to infer the gender, meaning the "?" is deliberate and not a request. This might need an extra option in Module:gender and number/data, or per-template handling. —Al-Muqanna المقنع (talk) 08:03, 23 January 2023 (UTC)[reply]

@Al-Muqanna We can introduce an unknown gender but to me that is synonymous with '?'. I think maybe it should be called something like unknowable instead, or we should use some symbol whose tooltip displays unknowable. If you agree with this, and we figure out the right symbol, this can be implemented in Module:gender and number/data. Benwing2 (talk) 08:42, 23 January 2023 (UTC)[reply]
Maybe ?! ("I really don't know!") could work? — SURJECTION / T / C / L / 10:08, 23 January 2023 (UTC)[reply]
Maybe just unk, or even unknown which contrasts with ? because a question mark is asking something (to the reader/editor) while "unknown" is just stating a fact. Catonif (talk) 10:16, 23 January 2023 (UTC)[reply]
When you are done with it, you may apply it through {{cln}} to farfarum, where even the citation form is unknown so it makes no sense to have it with the headword, which presupposes a gender through the choice of an arbitrary form. See also miscix. Fay Freak (talk) 05:10, 25 January 2023 (UTC)[reply]
@Al-Muqanna, Fay Freak, Surjection, Catonif I added ?!, which displays unattested with a tool tip gender unattested (maybe it should have an abbreviation but I can't think of any good, clear ones for unattested). Benwing2 (talk) 02:44, 31 January 2023 (UTC)[reply]
Thanks @Benwing2. My one thought is that the label "unattested" is potentially confusing since when it's written out it's not obvious it refers to the gender rather than the word in general without looking at the tooltip, maybe it should just go the whole hog and read "gender unattested". We also need the autocat implementation for Category:Latin unattested gender nouns and the declension subcategories, haven't figured out how to add it myself. —Al-Muqanna المقنع (talk) 11:06, 31 January 2023 (UTC)[reply]

bad display for translation tables in letter

[edit]

@This, that and the other Maybe you can figure out a solution for this ... under letter, under the "a symbol in the alphabet" table, I see three columns on my large monitor a half-width Chrome window, which is fine, but there's a problem with the entries for Fula: Fula is in the rightmost column, but it has two subentries for Adlam script and Latin script, and they appear in the *leftmost* column instead of underneath Fula in the rightmost column. This makes it appear as if Basque rather than Fula has two subentries for Adlam and Latin. It may be related to the Chinese translations in the middle column, which appear directly to the left of Fula and have six subrows under them. If you can't reproduce this on your end, let me know and I can send you a screenshot that I just took, or upload it to Commons. Benwing2 (talk) 08:39, 23 January 2023 (UTC)[reply]

Never mind, User:Fenakhay seems to have fixed it. Benwing2 (talk) 08:44, 23 January 2023 (UTC)[reply]
All good! I have a memory of some documentation page that said to use :* instead of *:; I think I corrected it, but there may be other places that this incorrect advice persists. This, that and the other (talk) 08:50, 23 January 2023 (UTC)[reply]

Template help requested

[edit]

I'd like to use a parameter for the table title in {{hu-infl-pos-table}}. Original table title: Possessive forms of PAGENAME. If a new parameter perspron=y is added, the table title should be Personal-pronoun-suffixed forms of PAGENAME. I added the code, but it doesn't work. For testing, I used the entry három: {{hu-pos||||[[hármunk]]<br>[[hármónk]]|[[hármatok]]<br>[[hármótok]]|[[hármuk]]<br>[[hármójuk]]<br>[[hármójok]]|n=sg|perspron=y}} but the title does not change to the new one. Can anybody help me make the necessary edit? Thanks! Panda10 (talk) 19:37, 23 January 2023 (UTC)[reply]

Does this help? 70.172.194.25 19:45, 23 January 2023 (UTC)[reply]
@Panda10 I think I fixed it. Check it. Gorec (talk) 19:57, 23 January 2023 (UTC)[reply]
No, you made it so that it always displays "Personal-pronoun-suffixed forms". The pipe is necessary. I think with just the edit I made before to t:hu-pos it is fixed. 70.172.194.25 20:05, 23 January 2023 (UTC)[reply]
The condition doesn't allow that, but maybe the edit you made in the other template affected my edit in this template. When I tested it worked fine. Gorec (talk) 20:20, 23 January 2023 (UTC)[reply]
Go back to the revision you saved and preview the page Debrecen (which doesn't use the perspron parameter, or t:hu-pos) and you'll see the problem. 70.172.194.25 20:29, 23 January 2023 (UTC)[reply]

Adding the new parameter to {{hu-pos}} resolved the problem. Thank you very much! Panda10 (talk) 21:13, 23 January 2023 (UTC)[reply]

[edit]

@Surjection, Benwing2, Erutuon, Rua, Theknightwho, Atitarev, JeffDoozan, Andrew012p
When we add Macedonian translations that contain accented letters (А́ а́ Е́ е́ И́ и́ О́ о́ У́ у́ Л́ л́ Р́ р́), the autogenerated edit descriptions that appear in Revision history or in Contributions history always show redlinks, even if those entries already exist. For instance, the entry забележителен already exists, but in the autogenerated edit description "забележи́телен" is shown as redlink (t+mk:забележи́телен (Assisted)). Can this be fixed somehow? Thank you. Gorec (talk) 22:13, 23 January 2023 (UTC)[reply]

@Горец This can be fixed by generating a two-part link in the edit comment. I'm not especially familiar with how to do this from Javascript (you have to call makeEntryName() in Module:links) but I'm sure it's possible. Benwing2 (talk) 22:20, 23 January 2023 (UTC)[reply]
MediaWiki:Gadget-TranslationAdder-Data.js already defines a table variable called diacriticStrippers and a function computeRawPageName. The table has an entry for mk that should strip acute and grave accents. The code for the edit summary in MediaWiki:Gadget-TranslationAdder.js#L-958 refers to a rawPageName variable. Something has gone wrong with the implementation, but I'm not sure at which step. 70.172.194.25 22:24, 23 January 2023 (UTC)[reply]
It seems like a much simpler way to normalize the page name would be to just call the API, e.g. https://en.wiktionary.org/w/api.php?action=expandtemplates&text={{l%7Cang%7Cġehīeran}}&prop=wikitext&format=json to see that calling {{l|ang|ġehīeran}} normalizes the link to gehieran. That would also have the benefit of not requiring maintenance of two functions for entry name normalization (in Lua and JavaScript) that will get out of sync over time. 70.172.194.25 22:33, 23 January 2023 (UTC)[reply]
For Macedonian it should strip away the acute accent only, for А́ а́ Е́ е́ И́ и́ О́ о́ У́ у́ Л́ л́ Р́ р́.
The other letters Ѐ ѐ Ѝ ѝ Ќ ќ Ѓ ѓ should keep it. Gorec (talk) 22:35, 23 January 2023 (UTC)[reply]
So even if the JavaScript were working as intended, which it's obviously not, it would still be wrong (!), because the JavaScript normalization function is out of sync with the Lua normalization function. This underscores the benefit of just calling the API and copying the output of {{l|mk|забележи́телен}}. The Lua editing community on Wiktionary is very active, continuously making changes, and I don't think an approach of manually porting the normalization code over to JS could ever be sustainable in the long-term. Maybe someone will figure out a way to fix the issues with Macedonian within the existing framework in the short-term, which would be nice, but there are a lot of languages that use entry name normalization that aren't even included in TranslationAdder-Data.js at all. 70.172.194.25 22:41, 23 January 2023 (UTC)[reply]
I agree 100%. I have no idea why the Javascript code does its own normalization but I bet it dates from before the time when Lua modules existed. Benwing2 (talk) 22:48, 23 January 2023 (UTC)[reply]
OK maybe it doesn't; it was created in 2017 by User:Dixtosa, who would know why it was created but unfortunately is no longer active. Benwing2 (talk) 22:51, 23 January 2023 (UTC)[reply]
The gadget code already includes a similar expandtemplates API call at MediaWiki:Gadget-TranslationAdder.js#L-800 to access language data (via the helper module Module:languages/javascript-interface). That code was also added by Dixtosa in 2017. So apparently they were aware of this method of communicating between Lua and JS, but either didn't think to apply it to entry name normalization, or decided against it for some unclear reason. 70.172.194.25 22:58, 23 January 2023 (UTC)[reply]
It's hard to translate the makeEntryName function into JavaScript. There is a lot of logic beyond what is described by the language data, and Lua and JavaScript have very different concepts of strings, and we don't have enough testcases to verify that the hypothetical JavaScript does the same thing as the Lua. It is easier to generate wikitext like t+mk:{{ll|mk|забележи́телен}} and then expand it to t+mk:[[забележителен#Macedonian|забележи́телен]]. (This would run into the edit summary length limit a lot more with piped links, so maybe it would be a good idea to expand it to t+mk:[[забележителен]] instead.) — Eru·tuon 00:24, 24 January 2023 (UTC)[reply]
As a quick and dirty solution, it's better to just get the links working. I don't think diacritics etc really matter too much for this use case. Theknightwho (talk) 00:54, 24 January 2023 (UTC)[reply]
Sorry, what quick and dirty solution are you referring to? Using {{ll}}? — Eru·tuon 01:04, 24 January 2023 (UTC)[reply]
Just to clarify misconceptions I am not an original author of MediaWiki:Gadget-TranslationAdder-Data.js. I only split it out of the main code (MediaWiki:Gadget-TranslationAdder.js). So if it is magic to you it most certainly is magic to me too. Dixtosa (talk) 12:02, 24 January 2023 (UTC)[reply]
@Горец: Two points here.
  1. The edit summary doesn't remove diacritics but edits themselves are fine. I just learned to live with it and ignore it. It actually shows what edits were actually made - with or without diacritics, which may also be important.
  2. Specifically for Macedonian, a few messages I started in Wiktionary talk:Macedonian transliteration regarding the stress marks, which is stable in a large number of cases, making the stresses for Macedonian redundant. This practice wasn't made by me, I just want to make it a rule. (Notifying Dimithrandir, Горец, Martin123xyz): , also @Andrew012p: Please advise if you for or against the proposal to use stress marks only when it's necessary (per linked page).
Anatoli T. (обсудить/вклад) 01:11, 24 January 2023 (UTC)[reply]

@Erutuon, Benwing2: What do you think of this code? I tried to implement the idea I had above (and that both of you agreed with) of using an expandtemplates API call. I didn't bother removing all the other code about entry name normalization, but none of it seems to be operative, so this should be okay as a stopgap measure. Here is the new version in action. I also left commented alternative code in case you want to keep the piped display name. 70.172.194.25 04:06, 24 January 2023 (UTC)[reply]

@Горец To be clear, the change has not yet been copied into the main gadget code — we need an admin to do that. On that note, I've tested this once more (this time with Arabic) and it seems to fix the edit summary, and nothing else should be changed, so could an admin just copy this new code over? (If you noticed that the ordering is messed up in my tests, that is because I had to load my new version after the older version was already loaded. This won't occur in production.) 70.172.194.25 22:48, 24 January 2023 (UTC)[reply]
@70.172.194.25 Your idea and the code changes you've made solves the problem, rendering the words correctly. I don't know who are the admins who need to approve that, but I think that at least 2 of the people I pinged have administrator permissions. Gorec (talk) 23:56, 24 January 2023 (UTC)[reply]
@Горец, Erutuon, Benwing2: The IP (70.172.194.25) at 04:06, 24 January 2023 did something that worked. Can this be implemented? Anatoli T. (обсудить/вклад) 05:47, 25 January 2023 (UTC)[reply]
{[ping|Atitarev|Горец}} I copied it over, apologies for the delay. Benwing2 (talk) 06:50, 25 January 2023 (UTC)[reply]
@Atitarev, Горец Damn it, second time's the charm ... Benwing2 (talk) 06:51, 25 January 2023 (UTC)[reply]
@Benwing2: Thank you! Anatoli T. (обсудить/вклад) 06:55, 25 January 2023 (UTC)[reply]
I just noticed though that another case needs to be handled — that of {{t-needed}}. So here's an updated version that fixes this. @Benwing2. 70.172.194.25 08:20, 25 January 2023 (UTC)[reply]
@Theknightwho, Atitarev could you copy this over as a stopgap measure? So far two edits have run into this and I'd like for that number not to increase. I'm sorry for not noticing this initially in testing. 70.172.194.25 08:42, 25 January 2023 (UTC)[reply]
I don't have permissions, as you need to be an interface administrator. Sorry! Theknightwho (talk) 08:45, 25 January 2023 (UTC)[reply]
I copied it over. BTW my browser is Chrome 67.something, which is probably several years old by now since I'm still on Mavericks :) ... Benwing (talk) 18:03, 25 January 2023 (UTC)[reply]
I just tested it here. Works perfectly! :)
Thank you @70.172.194.25 and @Benwing! Thank you everyone! Gorec (talk) 18:14, 25 January 2023 (UTC)[reply]

Links to languages with spaces don't work when the entry is on the same page

[edit]

See the page noche and try clicking on the term in "From Old Spanish noche". The link is malformed, it is using a space character rather than an underscore. Another example is "From Old Norse regn".

This bug only appears when linking to another section on the same page. It was not present a month ago on the Wayback Machine. Hvergi (talk) 13:25, 24 January 2023 (UTC)[reply]

@Hvergi: was the behaviour different previously? I thought if a link is to the same page, nothing happens (i.e., the user is not brought to the top of the page). To cause the link to move to a different section, one would have to type {{inh|es|osp|[[noche#Old Spanish|noche]]}}. — Sgconlaw (talk) 14:25, 24 January 2023 (UTC)[reply]
Yes it was different, in this archived version the section links correctly use an underscore. {{inh}} and other templates always include the section link, an example: "From Danish fryd". Hvergi (talk) 14:32, 24 January 2023 (UTC)[reply]
@Hvergi: ah, in that case I have no idea. — Sgconlaw (talk) 16:39, 24 January 2023 (UTC)[reply]
I think this is probably due to a change in MediaWiki, as I always remember spaces in such contexts being automatically transformed into underscores in HTML output. Can we report this on Phabricator? I imagine other projects might be having similar issues. 70.172.194.25 16:45, 24 January 2023 (UTC)[reply]
You're correct, adding [[noche#Old Spanish|link]] to the page noche gives an incorrect link while adding [[noches#Old Spanish|link]] to that same page gives a correct link. Hvergi (talk) 17:51, 24 January 2023 (UTC)[reply]
Someone has reported it to Phabricator T327467 Hvergi (talk) 17:53, 24 January 2023 (UTC)[reply]
They've already pushed the fix to Wiktionary. That was pretty fast. — Eru·tuon 22:03, 25 January 2023 (UTC)[reply]

Lua errors

[edit]

@Theknightwho I'm seeing extensive lua errors now popping up on headword lines; e.g. sacrificator ("Lua error in Module:headword at line 156: attempt to index global 'sc' (a nil value)", sacrificalis ("Lua error in Module:headword at line 156: attempt to index global 'sc' (a nil value)"), sacrate. Urszag (talk) 02:30, 25 January 2023 (UTC)[reply]

This has been fixed, but someone should run a bot to null-edit everything in CAT:E. 70.172.194.25 02:35, 25 January 2023 (UTC)[reply]
The issue was caused by me neglecting to test the changes on pages that should have otherwise been unaffected, which was an (annoying) oversight. Theknightwho (talk) 02:42, 25 January 2023 (UTC)[reply]
For future reference, making this call from the API sandbox is a fast way of clearing CAT:E. It will only do 10 at a time, but it's a hell of a lot faster than doing it page by page. I've just used it to clear the whole lot. Theknightwho (talk) 06:02, 25 January 2023 (UTC)[reply]
That's very handy, thanks! 70.172.194.25 06:30, 25 January 2023 (UTC)[reply]
I've just spotted a way to increase the limit: this method clears 100 at once. You can increase it up to 5,000, but it seems to cause timeouts if you try too many. Theknightwho (talk) 07:53, 25 January 2023 (UTC)[reply]
I've added this link to the top of CAT:E for easy future access. 70.172.194.25 08:49, 25 January 2023 (UTC)[reply]
[edit]

A change was made recently where definitions that include redlinks show up in the "definitions needed" category, along with entries explicitly marked with the rfdef template. I don't mind the change, really, except that the entries do not go away when the item that the redlink points to is created, so the list ends up full of entries where there is no problem. For example, some people (I have noticed since this change was made), tend to create the inflected forms before creating the entry. These inflected forms all show up as "definition needed", even though there is nothing wrong. Kiwima (talk) 02:22, 26 January 2023 (UTC)[reply]

@Kiwima Can you give me an example of where this happens? I'm not sure who made the change or where it was made, but it sounds like it's not a good change because the nature of the category support means that once entries are in a category like this they will likely hang around for awhile (or until someone modifies the page itself, not just creates redlinked pages). Benwing2 (talk) 03:37, 26 January 2023 (UTC)[reply]
@Benwing2: I also have no idea who made the change, or where it was made. I had been concentrating on reducing the list of definitions needed (which was over 1000 when I started), and then suddenly all these new entries started appearing. An example is borepins: borepin has also been added, but clearly borepins was added first, so it appears in Category: Requests for definitons in English entries. Another example is pseudoprobabilities, which is a definition with a legitimate redlink - that one is a problem that did not appear in the list when I started working on it, but which appears in the list now. Kiwima (talk) 20:28, 28 January 2023 (UTC)[reply]
@Kiwima This change was made by User:This, that and the other on Dec 28, 2022 to the {{plural of}} template. Should it be reverted? Benwing2 (talk) 20:45, 28 January 2023 (UTC)[reply]
@Kiwima if a page is showing up in a category when it shouldn't be, you can null-edit it: simply click "Edit", then click "Publish changes" without making any changes. Categories do sometimes take some time to update and you should not expect an instantaneous response upon resolving the issue that caused the page to be in the category in the first place. As an example. I just noticed that PB cups was in Category:Requests for definitions in English entries even though PB cup exists, so I null-edited PB cups and now it is gone from the category.
If it were only one or two entries, that would be a workable solution, but it isn't. There are some editors who routinely enter the plural form before the singular. I am wasting hours of my time null-editing: an unrewarding activity. Kiwima (talk) 19:01, 29 January 2023 (UTC)[reply]
@Kiwima clearly it is not OK for you to be spending so much time null-editing. I have reverted the change so that {{plural of}} no longer categorises pages into the "Requests for definition" categories. This, that and the other (talk) 21:25, 29 January 2023 (UTC)[reply]
Thank you. Kiwima (talk) 21:54, 29 January 2023 (UTC)[reply]
@This, that and the other @Kiwima There is a way to mass-clear categories with null edits. See this thread. Theknightwho (talk) 22:08, 29 January 2023 (UTC)[reply]
@Benwing2 I figured that this change was a good way of drawing attention to these redlinked plural entries, as they are functionally equivalent to a singular entry with {{rfdef}}. I'd actually rather see this change extended to all {{form of}}-type templates, although it might make browsing through the category itself a bit confusing. This, that and the other (talk) 01:35, 29 January 2023 (UTC)[reply]
@This, that and the other The problem is that categories sometimes take days or weeks to update, and there's no precedent for adding red links like this to "Requests for X" categories. Maybe as a compromise we can have a separate category for apparent red links like this, so they don't pollute a "Requests" category and we can add a notice indicating that the category contents may be outdated. Benwing2 (talk) 01:48, 29 January 2023 (UTC)[reply]
@Benwing2 there is already Category:Plurals with a red link for singular but nobody looks at that, probably because it is not broken down by language. Perhaps it could be transformed to a new "Requests for X" structure, like "Requests for lemma definitions in English entries"??? This, that and the other (talk) 04:14, 29 January 2023 (UTC)[reply]
@This, that and the other Maybe something more like 'LANG non-lemma forms with redlinked lemma'? Many of the red links are due to the lemma being deleted, whereas "Requests for lemma definitions" suggests that the lemma needs to be created when in some cases it's rather that the non-lemma forms need deleting. Benwing2 (talk) 06:09, 29 January 2023 (UTC)[reply]
@This, that and the other Until recently, the same could have been said about requests for definition, until I adopted it as a pet project. Some categories do languish. I think the solution is not suddenly dump those entries on a category that is moving, but rather to make the languishing category more visible. @Benwing2, in answer to your question, yes, I think This, that and the other's change should be reverted. It is an abuse of the system. A better solution would be to break redlinked entries down by language, and add a link to that category in the "Things to do" section of the Community Portal Kiwima (talk) 19:01, 29 January 2023 (UTC)[reply]
[edit]

For example, at part, clicking on the part#Middle English in the etymology section of English would sends the user to the Middle English section. However, doing the same thing when editing the etymology section would return a url https://en.wiktionary.org/w/index.php?title=part&action=edit&section=2#Middle_English instead of the expected https://en.wiktionary.org/wiki/part#Middle_English. – Wpi31 (talk) 07:47, 26 January 2023 (UTC)[reply]

apc and ajp merged

[edit]

How should we deal with that? (see the change request, please note that I'm the primary requester) A455bcd9 (talk) 09:43, 27 January 2023 (UTC)[reply]

Could a bot modify all ajp entries to change the header into "==Levantine Arabic=="? and replace "ajp" by "apc" (ex: {{l|ajp|متأكّد|tr=mitʔakked|t=sure}}
We should also rename the various templates. What do you think @Fenakhay? A455bcd9 (talk) 13:14, 28 January 2023 (UTC)[reply]
@A455bcd9 Hmm. I don't know that much about Levantine Arabic lemmas, but are there a significant number that are found only in the northern or southern area? If so should we try to preserve this by adding the appropriate labels (which could be 'North Levantine' or 'South Levantine', or more specifically by country: 'Syria', 'Jordan', 'Palestine', 'Israel', 'Lebanon' etc.)? This would have to be done by hand, I think, because even if we have for example only a lemma under 'North Levantine Arabic', that doesn't mean it doesn't also exist in the southern area. In general this sort of merger could be done largely by bot although it would require some planning. Benwing2 (talk) 20:37, 28 January 2023 (UTC)[reply]
Hi @Benwing2, I suggest we add a template saying "Bulk import from the deprecated South Levantine entry". Then, editors will manually replace these tags by the appropriate labels over time ('Syria', 'Jordan', 'Palestine', 'Israel', 'Lebanon', etc.) What do you think? A455bcd9 (talk) 22:59, 28 January 2023 (UTC)[reply]
@A455bcd9 Sounds good to me. Benwing2 (talk) 01:38, 29 January 2023 (UTC)[reply]
@Benwing2: perfect. So... when and how should we do that? A455bcd9 (talk) 08:51, 29 January 2023 (UTC)[reply]
@A455bcd9 It will take a little while for me to get to this. (Notifying Atitarev, Mahmudmasri, Metaknowledge, Wikitiki89, Erutuon, ZxxZxxZ, عربي-٣١, Fay Freak, AdrianAbdulBaha, Assem Khidhr, Fenakhay, Fixmaster, M. I. Wright, Roger.M.Williams, Zhnka, Sartma): Does anyone who is listed in the Arabic workgroup (sorry, there are a lot of you) have comments about this merger? Benwing2 (talk) 19:26, 29 January 2023 (UTC)[reply]
Please note that @AdrianAbdulBaha, @ShahdDibas and I are the three authors of the (now accepted) proposal to merge ajp and apc in the ISO standard (source). A455bcd9 (talk) 19:31, 29 January 2023 (UTC)[reply]
I'm not sure how bots work & thus what's possible to automate vs. not. But I would say, in theory: 1) automatically change all "South Levantine Arabic" to "Levantine Arabic"; don't do this for "North Levantine Arabic" automatically so that they can be handled on a case-by-case basis; alternatively, do so for "North Levantine Arabic" except in case of conflict, leaving those to humans; 2) transform all ajp templates to apc; overwrite apc templates in case of conflict, as there are far more ajp entries, so there's more of a set standard for entries; 3) moving forward, geographically limited terms & senses should be labeled as such — manually — & I think the described plan by @A455bcd9 & @Benwing2 is as good as any. (Actually, someone might want to double-check those apc templates; if there were ones for verbal conjugations, those would certainly be different from the ones used for ajp. In that particular case — not sure if there might be others — regional dialects may need distinct templates for the same purpose, or alternatively one mega-template for, say, conjugations, including data for multiple regional variants; e.g. إطلع vs. طلاع.) AdrianAbdulBaha (talk) 20:07, 29 January 2023 (UTC)[reply]
@AdrianAbdulBaha Thanks for your comments. What you suggest is essentially what I'm planning on doing: rename both 'South Levantine Arabic' and 'North Levantine Arabic' lemmas to 'Levantine Arabic' unless a given lemma has entries for both lects, in which case the merging has to be done manually. The template merging needs to be done manually as well. As for conjugation templates, I imagine we can handle all regional variants in a single template unless it gets really unwieldy (although I know that some Levantine dialects merge short i and u into a schwa, which will impact a lot of paradigms; not sure the best way to handle that). BTW إطلع vs. طلاع, are these different imperative variants? Benwing2 (talk) 21:41, 29 January 2023 (UTC)[reply]
Yes, those are different imperative variants. I agree that conjugation templates would need to include multiple regional variants; still, I think this is better than having more or less duplicate "North Levantine" & "South Levantine" lemmas. Alternatively, the current ajp templates could have the header "Jerusalem & Amman" & separate templates for other regional dialects (e.g. "Beirut") could be created & inserted back-to-back within entries.
I'm drawing some inspiration from how Armenian handles Eastern & Western varieties (https://en.wiktionary.org/wiki/սիրել); they don't have Western Armenian conjugation templates, but the templates that are featured specify that they represent Eastern Armenian specifically.
By the way, I've purposefully chosen to label the dialect f.k.a. "South Levantine Arabic" as "Jerusalem & Amman" & I hope you'll see why. While ajp was politically delimited as being of Palestine & Jordan (misguidedly so, which is why we changed it), the scope of "Levantine Arabic" makes no reference to political geography; moving forward, we should avoid using terms like "Palestine" to label sub-dialects within Levantine Arabic, because in reality there are usually multiple significantly distinct sub-dialects within the political borders of any given Levantine country. In the case of the urban dialects, I recommend choosing a set of representative cities or regions, suggesting that the sub-dialect "radiates out" from this core area while interacting with other varieties, rather than existing in the same form uniformly across a territory that is actually very diverse. As for the southern regions, I suggest "Jerusalem & Amman" & "Haifa & the Galilee". However, non-urban dialects don't have such a center, so they will inevitably be to a certain extent abstractions of a general sociolect; I recommend "Southern Bedouin" & "Southern Rural" for now to encompass those varieties. AdrianAbdulBaha (talk) 10:57, 1 February 2023 (UTC)[reply]
My comment is that I consent to the merger and see little difficulties—it is better sooner than later in fact, as the likelihoods are such that there would be bare duplication if anyone were to document supposed “North Levantine” Arabic (the whole sound of it has been artificial all the time, the concept is mostly one of those remote database creations, of which we already have deleted other Arabics) on a massive scale, which would only happen if that person know regiolects of it well but merely lack experience of South Levantine Arabic regiolects to become convinced of their comparative identities, and on the other hand, for necessary distinction, labels and dedicated templates are in any case apt to be enough. Fay Freak (talk) 02:55, 30 January 2023 (UTC)[reply]
I, too, agree with the merger. And having checked our Arabic language treatment now, I actually think similar efforts should be made to unify Baharna, Omani, and Shihhi all under Gulf Arabic. Just as Qatari, Emirati, and Eastern Saudi don't have their standalone headings, these shouldn't either, although the respective sociolects across the national borders are indeed varieties in their own right. They just aren't distinct enough to warrant a language status, which I know is still pretty arbitrary in the unique case of Arabic diglossia. Assem Khidhr (talk) 13:58, 30 January 2023 (UTC)[reply]
(Side note: if you think that some varieties should be merged, not only in Wiktionary but also in the ISO standard (and therefore in all publications that reuse this standard, starting with Ethnologue), then I suggest you submit a change request. It's not that hard: we did it successfully for Levantine and others did it for Judeo-Tunisian before.) A455bcd9 (talk) 14:05, 30 January 2023 (UTC)[reply]
Hi @Benwing2: how can we move forward on this project? A455bcd9 (talk) 10:51, 11 March 2023 (UTC)[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── We need a plan. I can put one together but it will involve several steps, some of which need to be done by you (or another native Levantine speaker). I'd need you to be active and available to answer questions and help with changes, so that if things temporarily get in a messed-up state they don't stay there for long. Does that make sense? My plan might be something like this:

  1. Rename 'apc' to Levantine Arabic. This can be done by changing the language data and using a bot to rename the headers in the existing entries.
  2. Merge the headword and declension templates (or at least the headword templates at first). This will take significant work and partly needs to be done by you (at least as an advisor, and someone who will review the existing templates and come up with a plan as to what the merged 'apc-*' templates should look like). I can help in the coding but don't have the relevant expertise in North/South differences.
  3. Move South Levantine lemmas without equivalent North Levantine lemmas to 'Levantine Arabic'. This will require some conversion of headword and declension templates, which hopefully can be done by bot but depends on what the templates end up looking like after the previous step. It will also require renaming occurrences of 'ajp' to 'apc' in templates inside the moved lemmas (e.g. {{lb}}), which can be done by bot. Note that we'll probably have to manually identify a block-list of lemmas not to be moved, if there are cases where the same term is spelled differently in North vs. South Levantine and entries for both currently exist.
  4. Change the language code of all references to the moved lemmas (e.g. in translation tables) from 'ajp' to 'apc'. This can be done by bot.
  5. Figure out how to appropriately convert transliterations of the terms that were moved to 'Levantine Arabic', and do the conversion. I don't know much about North vs. South differences in phonology, so I'd need significant help working this step out.
  6. Merge the remaining South Levantine lemmas that have equivalent North Levantine lemmas. This needs to be done entirely by hand.
  7. Change the language code of all remaining references to 'ajp' to 'apc', and fix up remaining transliterations (see two steps ago).
  8. Delete the 'ajp' code from the language data.

This plan needs refinement and it's a lot of work that will take awhile. Between the moving of lemmas and the changing of references to those lemmas from 'ajp' to 'apc', those references will be messed up and won't link to the right place, so we don't want too much time to elapse between these steps. Please do comment on the plan. Benwing2 (talk) 02:57, 12 March 2023 (UTC)[reply]

@A455bcd9 Forgot to ping you. Benwing2 (talk) 02:58, 12 March 2023 (UTC)[reply]
@Benwing2: perhaps it would be better to develop all the headword and inflection templates first, under temporary names to avoid conflicts. They could be tested in the entries in preview, without saving/publishing. It would take longer, but would minimize the time that in-between stages are live. It would also give a chance to find the trouble spots and develop a better feel for the best way.
Another idea would be to develop tables somewhere of the North vs. South differences by extracting the relevant parts from all the entries and manually sorting them so the North and South counterparts are together, then using them to plan out what needs to be done to which entry. Here again, this would minimize the time needed to have the in-between stuff live in the entries, and make for better-informed implementation.
These are just suggestions off the top of my head, but they reflect the measure twice and cut once strategy I like to use in my own projects. Chuck Entz (talk) 05:08, 12 March 2023 (UTC)[reply]
@Chuck Entz Thanks, this makes sense. @A455bcd9 what do you think? Can you help merge the headword and inflection templates? Benwing2 (talk) 05:21, 12 March 2023 (UTC)[reply]
Hi @Benwing2, I'm so sorry for the late reply, I got caught in non wiki stuff. I'm not a native Levantine speaker, so that's probably the biggest hurdle, although I think that some tasks can be done by anyone. In any case, I won't have much time in the next few weeks so I'm afraid that, unless someone else is motivated, this merge will have to be postponed... A455bcd9 (talk) 20:30, 18 April 2023 (UTC)[reply]

This category seems severely depopulated. It currently lists six entries at A, but a cursory look at Category:Thesaurus reveals we have dozens of English Thesaurus entries beginning with A. I assume the problem affects other languages as well, though in smaller numbers. Could this be solved with a bot? It is usually done with {{ws header}}. brittletheories (talk) 12:40, 27 January 2023 (UTC)[reply]

Suggestion: import gadget w:Help:CharInsert

[edit]

This is essentially a javascript version of Mediawiki:Edittools, but it allows users to customize the symbols according to their own preference. – Wpi31 (talk) 15:25, 27 January 2023 (UTC)[reply]

The Wikipedia version of our MediaWiki:Gadget-Edittools.js is w:MediaWiki:Gadget-charinsert-core.js. The source code of MediaWiki:Edittools is more complex than the source code of Wikipedia's version (in the gadget), but we do it this way so we can add custom formatting, such as fonts to ensure rarer characters display correctly, and tooltips explaining what the characters are (particularly useful in the "Modifiers and combining diacritics" menu), which as far as I can see isn't possible in the Wikipedia version. You can in fact add menus in our version by putting code like this in your Special:MyPage/common.js, though this isn't documented anywhere. — Eru·tuon 22:15, 27 January 2023 (UTC)[reply]

Suggestion: add WikiBlame to MediaWiki:Histlegend

[edit]

On multiple occasions, I've wanted to search for who added or removed certain text, and out of instinct gone to the history page to look for the WikiBlame link (which is present on the English Wikipedia, among other projects). It's really pretty useful. Anyway, the link code would be:

[http://wikipedia.ramselehof.de/wikiblame.php?lang=en&project=wiktionary&article={{FULLPAGENAMEE}} Find addition/removal]

I don't see any reason not to add it. 70.172.194.25 22:25, 27 January 2023 (UTC)[reply]

I use this all the time too. Added (with somewhat less cryptic link text, "Find when text was added/removed"). This, that and the other (talk) 12:09, 28 January 2023 (UTC)[reply]

the categorial status of catenative complements

[edit]

They should [keep us informed].

What's the categorial status of the constituent bracketed in the sentence above? As far as I know, it is analyzed as a clause in CGEL. However, in the literature on generative syntax, it is analyzed as a VP, not a TP (clause) at all. Which analysis should be favored? — This unsigned comment was added by Victor Bob (talkcontribs) at 03:22, 28 January 2023 (UTC).[reply]

To hell with generative syntax IMO but thankfully we are not in the business of debating syntactic theories; I don't see how it's relevant how such a constituent is analyzed. Benwing2 (talk) 20:40, 28 January 2023 (UTC)[reply]
The only relevance I can see is to an argument over whether 'to keep us informed' is an SoP or not. In this particular case, I don't see that the bracketed words are a constituent of the larger sentence, unless they be the equivalent of a lexical verb. --RichardW57 (talk) 21:47, 28 January 2023 (UTC)[reply]

Rendering of Tamil ஸ்ரீ/ஶ்ரீ

[edit]

Neither is rendering as the ligature on Ubuntu Jammy. (The font Lohit Tamil Classical does render them correctly on that OS.) Incidentally, there was only one meaning in common between the pages for the two of them, the meaning Lakshmi! I've now merged the pages. --RichardW57 (talk) 15:49, 29 January 2023 (UTC)[reply]

It's most likely a font problem. It could be a font renderer problem (rendering individual code points separately, not as grapheme clusters) if no font was capable of rendering this combination of code points correctly. The OS doesn't narrow the problem down. Find out the font with the "inspect element" functionality of your browser. The exact process depends on the browser. Right click on the offending text and select "Inspect" in Chrome (also in Edge, and possibly in Chromium in general) or "Inspect element" in Firefox. This brings up the developer tools and focuses on the element that contains the offending text. In Chrome, click on the "Computed" tab (to the right of "Styles") and scroll to the bottom and report the font names under "Rendered fonts". In Firefox, click on the "Fonts" tab (to the right of "Layout") and report the font names at the top under "Fonts used". — Eru·tuon 18:53, 29 January 2023 (UTC)[reply]
I agree it's mostly a font problem, with possible exceptions like a dangerous combination of Internet Explorer on Windows XP. I was just registering the problem here. It usually takes a little thought to find a solution that works across likely platforms, and I was surprised how hard I had to work to find a solution for Ubuntu - Lohit Tamil Classical works, but Lohit Tamil doesn't! --RichardW57 (talk) 23:08, 29 January 2023 (UTC)[reply]

Preservation of Old Encodings

[edit]

@Theknightwho has said on this page that he will provide a facility to preserve old encodings where needed. One of the possibly obscure places where this will be needed is {{wikipedia}} e.g. for ஶ்ரீகாகுளம் (śrīkākuḷam), Tamil Wikipedia uses the old encoding with U+0BB8 SA. I've mostly worked round it by escaping the first half of the onset to hex, but the link colour comes out as red instead of blue. --RichardW57 (talk) 16:28, 29 January 2023 (UTC)[reply]

The problem with {{wikipedia}} is that the underlying module Module:interproject generates links with the full_link function of Module:links, so it applies our Wiktionary-internal entry name modifications to the link target. Those should not be applied to titles on other projects, which might not follow our conventions. It's probably okay to use a simple linking function that doesn't transform the input, because I don't see that any features of full_link are used beyond language and script tagging and bolding. — Eru·tuon 18:02, 29 January 2023 (UTC)[reply]
@Erutuon That is probably my fault because I cleaned up that module relatively recently. The call to full_link is probably because {{lw}} uses it and I did the same by analogy. However IMO the correct fix is not to remove the call to full_link but to fix full_link to recognize Wikipedia etc. links and don't remove accents, because lots of code passes Wikipedia links to full_link. Benwing2 (talk) 19:59, 29 January 2023 (UTC)[reply]
@Erutuon Actually, the code already does that on line 38. Can you explain what the issue is then? Benwing2 (talk) 20:02, 29 January 2023 (UTC)[reply]
@Benwing2: We have two encodings of a Tamil syllable: the deprecated encoding (U+0BB8, 0BCD, 0BB0, 0BC0: Tamil sa, virama, ra, ii) used by Tamil Wikipedia, and the updated encoding (U+0BB6, 0BCD, 0BB0, 0BC0: Tamil sha, virama, ra, ii) used by us. The link target in {{wikipedia}} has to have the deprecated encoding so that the link will work. I thought full_link was putting the updated encoding in the link target, but when I was editing the module, I discovered (after massive confusion) it was putting the deprecated encoding in the link target and the updated encoding in the link text. This is what the if-statement on line 38 of Module:links achieves. Without that if statement, the link target would have the updated encoding and the link would not work. I am confused because I don't see a module edit that would have changed the encoding of the link target since RichardW57 started this thread. So it seems that it was only the link text that was wrong all along. I think it is right to have the deprecated encoding in the link text because it is in the link target, so we still need the custom link function, or need to add yet another option to full_link to not apply makeEntryName and makeLinkText to the target and text of a link. — Eru·tuon 20:29, 29 January 2023 (UTC)[reply]
@Erutuon Are we sure we want the deprecated encoding in the link text? The link text is for display, and doesn't affect the actual link, so it would seem we should use the updated encoding there. But if we do want to use the deprecated encoding, IMO it should be done automatically by Module:links whenever it is working with a Wikipedia link; hence no need for a custom link function or additional param. Benwing2 (talk) 21:44, 29 January 2023 (UTC)[reply]
@Benwing2 Deprecated encodings won’t be consistently used. Personally, I’d just create a new hard redirect on the target wiki, but there are also other reasons to want to escape correction to the normal encoding (e.g. linking to a deprecated character’s page). Theknightwho (talk) 21:50, 29 January 2023 (UTC)[reply]
@Theknightwho I should have clarified what I meant, which is that either Module:links should makeEntryName/makeLinkText on the link display text for Wikipedia links or it shouldn't, but in either case this should be done automatically. (But we shouldn't try to transform the new encoding to the deprecated one.) Benwing2 (talk) 21:54, 29 January 2023 (UTC)[reply]
@Benwing2 I see what you mean. Yes - I agree. We also have makeDisplayText, which is probably what we want to use here. It’s designed for fixing issues like bad encoding, but doesn’t make substantive changes (like the removal of diacritics). Theknightwho (talk) 22:01, 29 January 2023 (UTC)[reply]
It was confusing to me while editing the module for the link text to be different from the link target in {{wikipedia}}, and it might be confusing to other people (sort of false advertising or something), but now that I'm not editing the module and I've figured out what was going on it's not a big deal to me. Perhaps it's worth seeing how many other cases of {{wikipedia}} have differences in link target and link text and coming up with a rationale for when target and text should be different. — Eru·tuon 23:11, 29 January 2023 (UTC)[reply]
@Theknightwho, Erutuon: Where is the use of TAMIL SA in the ligature formally deprecated? Unicode 15.0 says, in Section 12.6, that both encodings should be supported. We may therefore find that both encodings are used in the Tamil Wikipedia, and similarly for other cases where Unicode enjoins support for old encodings, and for links we may have to make a manual selection. I've found that Google seems to treat the old and new Malayalam chillu encodings as equivalent in searches, though this might be done word by word rather than by folding the encodings together. --RichardW57 (talk) 23:20, 29 January 2023 (UTC)[reply]
@RichardW57 I'm pretty sure that it is (though I can't remember exactly where). In any event, it's important for us to use one consistently. I will implement the way to escape it (probably today/tomorrow), but I really do think your best bet is to just create the hard redirect on Tamil WP anyway. Theknightwho (talk) 02:28, 1 February 2023 (UTC)[reply]
Passing the old encoding to {{wikipedia}} already seems to be working here, except for the colour of the link, which comes up purple, even though the link works. --RichardW57 (talk) 02:40, 1 February 2023 (UTC)[reply]
@RichardW57: Purple in the default CSS indicates that you've visited the link before. — Eru·tuon 16:10, 1 February 2023 (UTC)[reply]

Is a TOC with columns possible?

[edit]

I was wondering, is a Table of Contents possible as _ _TOC__ with breaks at Level2 to create columns? It would be nice for pages with 2, 3 Sections? Thank you. ‑‑Sarri.greek  I 18:59, 29 January 2023 (UTC)[reply]

@Sarri.greek Hi, do you mean for there to be a multicolumn table of contents like in a newspaper, or do you have in mind some other way of being more efficient with the whitespace to the right of the existing table of contents? In either case I'm not sure if we have much control over this, although it might be possible with CSS; I think User:Erutuon or IP 70.* would know. Benwing2 (talk) 21:47, 29 January 2023 (UTC)[reply]
Yes @Benwing2 Contents that are a bit long vertically do not use a lot of space at their right. I mean, columns by language (L2), not random. Also pages with 2 or 3 languages, sometimes the same language like English, Old English. ‑‑Sarri.greek  I 22:04, 29 January 2023 (UTC)[reply]
I've been meaning to make a formal proposal regarding our Tables of Contents, specifically, to reorient them to a horizontal structure that only displays the language names and nothing else. If you habitually use the Tabbed Languages gadget, it's easy to forget just how utterly awful the default Table of Contents looks on long entries. Of course, if the new Vector 2022 skin is coming to Wiktionary soon, it would be a moot point: see fr:papa for an example of how this new skin displays the ToC. This, that and the other (talk) 09:35, 30 January 2023 (UTC)[reply]
This, that and the other, Benwing2, Because recently I learnt that 'skins' (I didn't know what they are!) might change over the decades again and again, it would be nice, for some pages at least, to formate the bodytext of lemma, ourselves, independently and unaffected; especially for the needs of a dictionary and/or a specific page. Just as a trial, only some pages could be presented in this manner, something like that: (example as in lemma σκληρός)

Would readers like it? PS, also, for pages with too many languages like te, a horizontal toclimit TOC could be designed? Thank you ‑‑Sarri.greek  I 10:02, 30 January 2023 (UTC)[reply]

@Sarri.greek I'd be all for this or something like it; the current table of contents is extremely wasteful of space. Benwing2 (talk) 00:13, 31 January 2023 (UTC)[reply]

The image used is only 136px and thus looks bad on hidpi displays. Would like it to be increased, say to 391px. Also necessary to remove most of the extra CSS, which was used to fix the size and positioning but is redundant and actually instead breaks the size and positioning when the image size changes. The skin already specifies background-position.

New code:

#p-logo a {
	background-image: url(//upload.wikimedia.org/wikipedia/commons/thumb/f/ff/WiktionaryEn.svg/391px-WiktionaryEn.svg.png);
}

Tbodt (talk) 07:01, 30 January 2023 (UTC)[reply]

This looks significantly better to me (the image can be viewed here: [3]). Any comments/objections/etc. to using it in place of the current logo? Benwing2 (talk) 00:15, 31 January 2023 (UTC)[reply]
Looks good to me. 70.172.194.25 15:43, 7 February 2023 (UTC)[reply]
Personally I think we should change the logo to the tiles, as the current logo looks bad. Benwing2 (talk) 03:17, 10 February 2023 (UTC)[reply]
Hi @Benwing2, this change broke the display of the tiles logo. Please revert it when you get the chance. —The Editor's Apprentice (talk) 20:49, 10 February 2023 (UTC)[reply]
@The Editor's Apprentice I suspect the extra CSS that was removed might fix things if put back but I know hardly anything about CSS so I don't know how to properly test it. Benwing2 (talk) 20:55, 10 February 2023 (UTC)[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── There was a vote on the previous logo change, so I don't think there should be another change without a fresh vote. Personally, I voted in favour of the current logo and against the tiles one, which I thought looked old-fashioned. — Sgconlaw (talk) 21:02, 10 February 2023 (UTC)[reply]

Peer review for a improved version of Template:R:DARE

[edit]

Hello! Template:R:DARE is currently a pretty bare bones, generic reference to all of the volumes of the Dictionary of American Regional English (DARE). Since the DARE has the potential to be a very valuable resource for English-language editors, I worked to create a version, located at Template:R:DARE/sandbox which is fully functional, i.e. can cite specific text, pages, and volumes, among other details. I am not entirely confident in my template coding abilities, though, so I am wondering if one or two people would be willing to take look at the code as well as the test cases to make sure everything looks in order and I haven't made any obvious mistakes. Also, since the currently name for the template is slightly nonstandard, after I get confirmation things look good and I push the new version, I'll more everything to Template:R:Dictionary of American Regional English. Thank you very much for review of the new template code and its output as well as any feedback! Take care. —The Editor's Apprentice (talk) 17:29, 30 January 2023 (UTC)[reply]

@The Editor's Apprentice: happy to have a look when I find some time. — Sgconlaw (talk) 22:48, 30 January 2023 (UTC)[reply]
Thank you! —The Editor's Apprentice (talk) 06:10, 31 January 2023 (UTC)[reply]
Hey Sgconlaw, it has been a while since you first offered your help, and it seems you are still decently busy since you haven't found the time yet. If you think that is likely to remain the case for the near future, I'll just go ahead with making the changes. Thanks again for the offer and take care. —The Editor's Apprentice (talk) 23:05, 25 February 2023 (UTC)[reply]
Are all parameters optional? I hope there won't be messages requesting missing data for any of the parameters.
How is the online edition (paywalled, AFAICR) to be handled?
I actually paid for the house copies of DARE to be sent to me. I now have 3 duplicated volumes (older editions) which I would be happy to pack up for shipment if anyone wants them. DCDuring (talk) 00:51, 26 February 2023 (UTC)[reply]
Thanks for the response, DCDuring. Yes, all parameters are optional, so there should be no problems. The first testcase represents this situation. The template is not setup to handle the online edition, which you are right is paywalled. I don't think it should either and that a separate template should exist for the online version, like is the case with Template:R:OED Online. If you have duplicate versions of the fifth and sixth volumes (the ones not currently on the Internet Archive website) you can donate them to the IA which I'm sure they and many others would appreciate. If you're not interested in that, you can email through Wiktionary and I would be happy to have them shipped to me. —The Editor's Apprentice (talk) 02:32, 26 February 2023 (UTC)[reply]
I don't have vol. 6. I have extra old editions of 1-3. IA app says they don't need vol 1 of my edition. More kindling. DCDuring (talk) 02:53, 26 February 2023 (UTC)[reply]
Unfortunate, so it goes. —The Editor's Apprentice (talk) 03:14, 26 February 2023 (UTC)[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── @The Editor's Apprentice: I finally remembered to review {{R:DARE/sandbox}} and have made some updates. — Sgconlaw (talk) 15:48, 4 March 2023 (UTC)[reply]

Wikipedia now has a "talk" tab instead of a "discussion" tab

[edit]

I don't know when they renamed it, but I noticed it today. Will we/should we follow suit? Equinox 20:08, 30 January 2023 (UTC)[reply]

It was 11 years ago w:Special:Diff/470458069. Vriullop (talk) 20:42, 30 January 2023 (UTC)[reply]
Time flies when you're having fun. Equinox 20:45, 30 January 2023 (UTC)[reply]

Lua error: bad argument #1 to 'toNFD' (string expected, got nil)

[edit]

This is happening on almost every single page with multiple languages or something (e.g. Korean 쓰다 (sseuda, bitter); (see?)).

What is going on? Chuterix (talk) 21:39, 31 January 2023 (UTC)[reply]

@Chuterix: @Theknightwho was engaging in inadequately checked micro-pessimisation (pointless normalisation) and fell foul of a bad arrangement of data (from/to pairs). The triggering problem seems to be the form of m["Kore-entryname"] in Module:languages/shareddata. --RichardW57m (talk) 10:22, 1 February 2023 (UTC)[reply]
@RichardW57m More accurately, I was engaged in ensuring that the Mandarin pinyin sort worked correctly due to the fact it involves double-substitutions that both rely on normalising to the NFD form first. This was explained in the edit summary. The issue was also solved very quickly, and the form of m["Kore-entryname"] is working exactly as intended. Please do not change it.
A word of advice: don’t jump to calling things pointless just because you don’t know why the change was made. It’s needlessly antagonistic, and you could have found the answer for yourself very quickly. Given that you clearly found the edit in question, I’m going to give you the benefit of the doubt that you weren’t being intentionally misleading, too. Theknightwho (talk) 10:36, 1 February 2023 (UTC)[reply]
@Theknightwho: You did not explain that the change was to address pinyin sorting. In general, for a chain of substitutions to work as intended, a substituted string as a whole has to be converted to NFD at each stage. Moreover, you should not need to convert deprecated substrings on each of the from and to items in the definition; simple :toNFD() should do. I confess I did misread the code; I thought line 16 of Module:languages was in the inner loop, which would have made conversion of the replacement string of the substitution unnecessary whatever it contained. --RichardW57m (talk) 13:11, 1 February 2023 (UTC)[reply]
Of course, replacing an NFD substring of an NFD string by another NFD string does not guarantee that the resulting string is in form NFD. (Just consider replacing a decomposed combining character with one of a different canonical combining class.) --RichardW57m (talk) 13:11, 1 February 2023 (UTC)[reply]
The from/to structures are in general wrong. There is a partial check that the 'from' lists are not shorter than the 'to' lists. However, some of the lists are so long that it is very difficult to work out what is being changed to what. This is a bad design. The inputs and outputs should be lined up. --RichardW57m (talk) 13:11, 1 February 2023 (UTC)[reply]
The way I found the edit is depressing. I just looked at the list of your contributions to see what you had recently been editing. While flooding cat:E is no longer so bad of itself, monitoring it seems to be an integral part of your testing process. I'm also surprised that there haven't been complaints about the frequent trashing of the page cache - changing the language and script modules invalidates most of the page cache for the main namespace. Perhaps this comes under the heading 'Wiktionary is not Wikipedia'. --13:11, 1 February 2023 (UTC) RichardW57m (talk) 13:11, 1 February 2023 (UTC)[reply]
@RichardW57m The edit summary did not mention pinyin sorting specifically, but it did mention the exact problem I told you about. I have no patience for misleading pedantry, which you are knowingly engaging in by carefully specifying that I didn't mention pinyin sorting. In general, for a chain of substitutions to work as intended, a substituted string as a whole has to be converted to NFD at each stage. Correct. That's why I made this change. Glad we could clear that one up. And no, it wouldn't "do" to only use toNFD(), because the fixed NFD form involves more than getting rid of deprecated character sequences.
You're more than welcome to set up a more rigorous testing environment, but the sad fact is that it can sometimes be very difficult to identify where errors like these will occur. Despite impacting a lot of pages, the great majority of pages were unaffected. You're also more than welcome to design a new system of substitution, too, if this one isn't satisfactory to you. Rather than making this personal, I suggest you actually do something productive about it. I'm sure you'll have lots of fun with all the memory errors caused by trying to line things up, though, because I tried that one already and it simply isn't efficient enough. Nevermind the fact that doing that makes it impossible to ensure the substitutions happen in a particular order without introducing even more inefficiencies. Have fun. Theknightwho (talk) 13:26, 1 February 2023 (UTC)[reply]
I don't have permission to edit these modules. I could start adding testcases, though. --RichardW57m (talk) 13:45, 1 February 2023 (UTC)[reply]
I have separated out the removal of discouraged character sequences from fixed normalization, so that discouraged sequences are only corrected once. The drawback to this is that it relies on nobody adding any to the character substitution tables, so depending on performance it might be worth doing it at the end of the process, too. twice: at the start and end of the process (so that substitutions don't have to account for them, and to fix any added/created during the substitution process). Fixed normalization must be done each loop, however, because the reason it was created in the first place was to change character combining classes where the default causes problems. Currently, this is only done for the tsa-phru in the Tibetan script (e.g. without it, the transliteration of ཁུ༹་དཱ (xu dā) would break). However, I know there are other instances where we could (and should) do the same. Theknightwho (talk) 14:16, 1 February 2023 (UTC)[reply]
@Thknightwho:: Where are we going to maintain a list of these renderer problems? For example, the Windows documentation of the pre-USE says TSA- PHRU should follow the vowel, but Word (2017?) on Windows 10 doesn't concur, while HarfBuzz-based renderers (e.g. modern MS Edge) have no problem with it. Similarly, HarfBuzz deals with Davis's error in Tai Tham, namely normalisation breaking the sequence SAKOT + consonant, and the less than optimal ordering of Hebrew pointing. A good place would be Module talk:scripts/data. --RichardW57 (talk) 09:03, 3 February 2023 (UTC)[reply]
For ཁུ༹་དཱ, the simple answer is that the transliteration should have been fixed. --RichardW57 (talk) 09:03, 3 February 2023 (UTC)[reply]
@RichardW57 We don’t need to worry about the renderer, because the MediaWiki software does its own internal NFC normalization on the output from the module before displaying it.
Initially I did do the fix for the tsa-phru in the Tibetan transliteration module, but I implemented a general solution because this combining class issue isn’t isolated. Plus, it means that the character substitutions for entry names, sortkeys etc can also take advantage of the fix. To give an example: any sortkey algorithms involving ཁ༹ (x) will need to use fixed NFD normalization, because otherwise the presence of a vowel would prevent it from working properly. As such, it’s used in Module:Tibt-translit and Module:Tibt-sortkey; both are complex enough as it is, without adding needless duplication. While this example could be accounted for with a simple regex, it would need to be accounted for specially in the module/substitution table, and it is unlikely that a normal editor would be aware of the issue. Indeed, until very recently we had been doing substitutions with the default NFC form, which meant many substitutions weren’t working as intended (or they were massively bloated to account for all the precomposed characters). We don’t want users to have to think about this stuff, because it’s a major avenue for error.
Module:scripts/data is already being used for the purpose you suggest. Take a look at Tibt to see how discouraged sequences and combining classes are handled. Which others do you think we should add? Theknightwho (talk) 18:13, 3 February 2023 (UTC)[reply]