Jump to content

Wiktionary:Grease pit/2017/June

From Wiktionary, the free dictionary

Template:IPA letters - rhoticity

[edit]

Template:IPA letters produces ɑː for the letter R. This should, I suppose, be ɑː(ɹ) or suchlike. Equinox 01:01, 2 June 2017 (UTC)[reply]

Only if Wiktionary adopts Wikipedia's trans-dialectal (diaphonemic) transcription system. Up till now, we haven't (though I suppose rhymes pages use some sort of diaphonemic system, or choose a particular dialect). The transcription of the above-mentioned vowel would be /ɑː(ɹ)/ for RP,/ɑɹ/ for General American, and /ɐː(ɹ)/ in Australia and New Zealand. — Eru·tuon 16:54, 2 June 2017 (UTC)[reply]
I've updated it. The other pronunciations, like /əʊ/, are also British, so this was obviously intended to be ɑː(ɹ). Ideally it would be updated to give multiple dialects' pronunciations, possibly through the use of separate Template:IPA letters/en-US and Template:IPA letters/en-UK subpages and through accepting en-US and en-UK as codes. - -sche (discuss) 18:21, 3 June 2017 (UTC)[reply]

Terms derived from Chinese

[edit]

Why are the various "terms derived from Chinese" categories (e.g. CAT:Irish terms derived from Chinese) not subcategories of the corresponding "terms derived from Sinitic languages" categories (e.g. CAT:Irish terms derived from Sinitic languages)? Can and should it be fixed? —Aɴɢʀ (talk) 12:24, 2 June 2017 (UTC)[reply]

Chinese is a synonym of the Sinitic languages. —CodeCat 12:43, 2 June 2017 (UTC)[reply]
Then why do we have both? —Aɴɢʀ (talk) 13:26, 2 June 2017 (UTC)[reply]
The Chinese category is used when someone hasn't specified which Chinese language something is derived from, while the Sinitic languages category is an umbrella category in which the categories for particular Chinese languages are placed. It is illogical to have both, though, because they're synonymous... — Eru·tuon 16:51, 2 June 2017 (UTC)[reply]
Then I'd say we should merge them to "Chinese". —Aɴɢʀ (talk) 11:00, 3 June 2017 (UTC)[reply]

Oldest tagged RFVs

[edit]

The lists of "Old tagged RFVs" atop the RFV pages are displaying "No pages meet these criteria." They should be fixed, since they're the main way we come to notice tagged-but-not-listed RFVs. As a separate matter, maybe a cleanup bot could periodically list unlisted RFVs... it wouldn't have to run often / represent a major time investment; even once every few months would work. - -sche (discuss) 18:11, 3 June 2017 (UTC)[reply]

It is now working for English. The problem is in the split of the RfV page and associated elimination of a single category for all RfVs. This approach never worked for RfV items that didn't have an RfV template.
The non-English RfV category has only subcategories, for each one of which we could have an oldest RfV list. There are solutions. One class of solutions would require a new category, eg, for all non-English RfVs, or new categories, eg, for all languages in a given family or in a given script or differentiated by nature of attestation. Another might be some dump-based solution. It is beyond my paygrade to even participate seriously any further in the discussion. DCDuring (talk) 19:04, 3 June 2017 (UTC)[reply]
  • @Daniel Carrero, was this you? —Μετάknowledgediscuss/deeds 14:04, 5 June 2017 (UTC)[reply]
    Thanks, DCDuring, for identifying the problem and solving it on the English page. There are two simple ways the remaining problem could be addressed. All RFVs could double-categorize into both the language-specific category and a general category of all RFVs (like the one they previously went into). This would have benefits besides just finding Oldest tagged RFVs (the same benefits as the per language categories: it'd be there if someone wondered what terms needed verification, if they wanted to help cite them; for many languages and RFVs). The other approach, which is not mutually exclusive, and which is more useful for this specific issue, is to have non-English RFVs double categorize into a "RFVs (non-English" category, to feed the Oldest list at the non-English page. If it would be too difficult to implement the idea of categorizing all non-English RFVs, or conceptually untidy/undesirable I could just add an "all RFVs" category. - -sche (discuss) 15:38, 5 June 2017 (UTC)[reply]
    I fixed Wiktionary:Requests for verification/Non-English. See if you like this approach or if you would change something. --Daniel Carrero (talk) 15:45, 5 June 2017 (UTC)[reply]
    Your solution seems to require constant maintenance, obliging people to notice and update that template whenever a term in a new language is RFVed (or do I misunderstand?). But one point of the Oldest Tagged list is to catch cases that people don't notice have been RFVed. So, I suggest we use one of the two double-categorization approaches I outline above. (Potentially perhaps your template could be updated to fetch the contents of Module:languages'(s) data modules and check every language's RFV category, so as to catch whenever a code is added, whenever a new in a new language is RFVed, etc, but that seems like a very unnecessary and probably expensive use of Lua for something {{rfv}} could do by just automatically inserting a certain category.) - -sche (discuss) 19:38, 5 June 2017 (UTC)[reply]
    I see. What about using "CategoryTree"? I added an example here in the discussion now. This way the list would still be organized by language, and it would be updated in real time as new language categories are created. It would be nice if we could use JavaScript to replace "Requests for verification in Volapük entries" by just "Volapük", and do the same for all languages. --Daniel Carrero (talk) 20:31, 5 June 2017 (UTC)[reply]
    @Daniel Carrero: I've created a JavaScript function that does that; see User:Erutuon/scripts/sandbox.js. It does make the list much easier to read. — Eru·tuon 20:52, 5 June 2017 (UTC)[reply]
    @Erutuon: Thanks! I added the categorytree in WT:RFD, WT:RFC and WT:RFV. Do you think you could update the JS code to do the same for the other two? Note that RFD and RFC also have a "language code missing" category at the start, but RFV doesn't (because you can use RFD and RFC in entries without a language code, but you can't use RFV without a language code without triggering a module error).
    Your code worked when I added it to User:Daniel Carrero/common.js, but I basically don't understand anything of JS and I don't know if we need to edit somehow MediaWiki:Common.js to implement it properly for everyone to see. --Daniel Carrero (talk) 21:22, 5 June 2017 (UTC)[reply]
    @Daniel Carrero: Yep, I've modified it to do the same for RFD and RFC categories. The "language code missing" category could perhaps stand to be shortened, but I'm not sure what to shorten it to. — Eru·tuon 21:30, 5 June 2017 (UTC)[reply]
    Oh, the code is working now. Maybe we could replace "Requests for (deletion, cleanup) with the language code missing" by just "language code missing". I also implemented a page count in all the categories. Do you think you could use JS remove the "0 c," from all the categories? A category like Category:Requests for deletion in Lithuanian entries obviously doesn't have any subcategories. --Daniel Carrero (talk) 21:42, 5 June 2017 (UTC)[reply]
    The sandbox script now removes the useless "0 c, ". It will probably do that to any category tree generated in the same way as the one above, though it doesn't do anything on category pages like Category:English nouns; those must use different classes. — Eru·tuon 22:53, 5 June 2017 (UTC)[reply]
    Thanks, it's working perfectly. --Daniel Carrero (talk) 23:07, 5 June 2017 (UTC)[reply]
  • To retain some focus on problematic older entries, perhaps each language with more than, say, 20 entries in one of the categories, could use the "oldest" listing, using the approach now used in RfV/English. DCDuring (talk) 23:24, 5 June 2017 (UTC)[reply]
    I believe it's not to possible to use "categorytree" to focus on the older entries anymore, because of the change in categories. Technically, all the "older" entries in Category:Requests for verification in English entries are the ones that were added when I created the new category in May 18 2017. --Daniel Carrero (talk) 03:58, 6 June 2017 (UTC)[reply]
    Well, right now all the entries in each category were added at about the same time (when you created the categories, as you said), but several months from now, when old RFVs have been dealt with and new ones have come up, it will once again be possible to sort (a category) by age, won't it? - -sche (discuss) 15:59, 7 June 2017 (UTC)[reply]
    That is correct. --Daniel Carrero (talk) 16:02, 7 June 2017 (UTC)[reply]
I consider it suboptimal that I have to click to expend each language's category to see its entries; however, assuming that the sum of all those language categories contains all of the entries which would be contained in a (list of the oldest RFVs in a) single category of all non-English RFVs similar to the one we previously had (which was for all RFvs, before WT:RFV was split by language), this is adequate; thank you for your help. I do still think we could benefit from restoring a general category for all RFVs to double-categorize into, so that the oldest ones (independent of language) can be found. (Likewise for RFCs and RFDs.) - -sche (discuss) 15:59, 7 June 2017 (UTC)[reply]
I'm pretty sure we could leave the list completely un-collapsed by default, but assuming we want that, I'm not sure how to do it.
Here's an idea for a different categorization approach, though it would require constant maintenance. Wikipedia has cleanup categories organized by date, such as w:Category:Articles with unsourced statements from February 2017. I guess we could have categories for each year (the month is not necessary because we don't have that many requests to deal with, I guess) If we created categories like Category:Requests for verification from 2017 (and even Category:Requests for etymology from 2017, Category:Requests for pronunciation from 2017, you get the point) maybe some bot could constantly add the correct year in entries and automatically create new categories each year? I could help by making {{auto cat}} usable in these categories. I could also fill the categories by checking the current entries with RFVs and perhaps other types of request. --Daniel Carrero (talk) 16:16, 7 June 2017 (UTC)[reply]

Bug in inserting IPA characters

[edit]

If you edit a page, and then delete a character and then immediately click on one of the IPA symbols under the "IPA and enPR" section at the bottom of the page (at least, this happens with ɛ), it inserts the character one character to the right of where it should be. This happens to me consistently on Chrome under Mac OS X 10.9. It doesn't happen if the last thing you did before clicking on the IPA symbol is to insert rather than delete a character. Do people see this on other systems? Someone should file a Phabricator bug if there isn't one already (I'm still not quite sure how to do that). Thanks! Benwing2 (talk) 19:19, 3 June 2017 (UTC)[reply]

Flags for Hijazi Arabic and Najdi Arabic

[edit]

See MediaWiki_talk:Gadget-WiktCountryFlags.css#Hijazi_and_Najdi for more details. --Lo Ximiendo (talk) 02:31, 4 June 2017 (UTC)[reply]

Citations at citations

[edit]

Why do we bother putting {{citations}} at the top of every Citations page? I reckon it should be automatically included in the software. --Celui qui crée ébauches de football anglais (talk) 11:17, 4 June 2017 (UTC)[reply]

Well, we still have to specify the language the citations are in, and we need a template to handle that. —Μετάknowledgediscuss/deeds 14:03, 5 June 2017 (UTC)[reply]
True. But on a related note which I've mentioned before, why do we put {{reconstruction}} atop most reconstruction pages? Could that be done by a Mediawiki page or JS? Or a bot... - -sche (discuss) 15:44, 5 June 2017 (UTC)[reply]
I think we can do that if we install PageNotice extension. - [The]DaveRoss 18:31, 5 June 2017 (UTC)[reply]
Well, that sounds like a good idea, then. Before we set up a vote on installing that extension, does anyone have any comments / see any obvious problems? - -sche (discuss) 19:42, 5 June 2017 (UTC)[reply]
@-sche, if you still feel this requires a vote to install, did you want to go ahead with that? --Victar (talk) 19:27, 24 September 2018 (UTC)[reply]
Assuming the extension will indeed do what we want, I'm in favour of it and am fine with doing it without a vote. The people over on phabricator may want some kind of show of consensus, though. - -sche (discuss) 21:45, 24 September 2018 (UTC)[reply]
Hard to say it's exactly what we need without giving it a go, but it looks to fit the bill. There is also this discussion I started, which brings the in favor count to 6 (@Metaknowledge, TheDaveRoss, -sche, Victar, Fay Freak, Erutuon). --Victar (talk) 02:56, 26 September 2018 (UTC)[reply]
I am indeed in favour, but I'm not clear on where your count is coming from — when did I say that? —Μετάknowledgediscuss/deeds 03:13, 26 September 2018 (UTC)[reply]
I didn't indicate support either, or rather I did and then deleted it because it was premature. I would support it if nobody brings up any issues. But this kind of question should be raised on a current discussion page where more people might respond, probably the Beer Parlour.
I agree with -sche, Phabricator folks often ask for a link to a discussion.
Hm, Rua proposed this last year (Wiktionary:Beer parlour/2017/September § Proposal: install mw:Extension:PageNotice), but nobody responded. There's an old Phabricator task that we could piggyback on: Review the PageNotice extension for deployment. They mention the extension's usefulness in the Draft namespace (a Wikipedia thing); to that we can add the Reconstruction namespace. — Eru·tuon 03:52, 26 September 2018 (UTC)[reply]
I suppose "in favor" was too strong a word; perhaps non-objecting would be better. --Victar (talk) 00:09, 26 September 2018 (UTC)[reply]
Support: I support the installation of mw:Extension:PageNotice. --Victar (talk) 05:16, 26 September 2018 (UTC)[reply]
I will support, as long as the due diligence is done to ensure that it will accomplish what we want. I only briefly looked into it, and it seemed like it might work, it would be good to know that it will work. - TheDaveRoss 13:31, 26 September 2018 (UTC)[reply]
@TheDaveRoss I don't know if we can really know to what extent it works for our purposes until we give it a try. --Victar (talk) 16:09, 30 September 2018 (UTC)[reply]

rel-top columns messed up the 2015 NORM vote

[edit]

For the record, the recently-added automatic columnization (?) of {{rel-top}} messed with this 2015 vote: Wiktionary:Votes/pl-2015-11/NORM: 10 proposals.

All the collapsed parts of the vote are formatted as two columns now. They were normal text without columns before. --Daniel Carrero (talk) 12:21, 5 June 2017 (UTC)[reply]

Deletion of {{script}}

[edit]

Why was {{script}} deleted? I think it would be really useful to be able to type {{sc|Mani}} and get Manichaean. Yes, I'm aware that {{sc}} is currently for {{smallcaps}}. @CodeCat, Daniel Carrero? --Victar (talk) 03:12, 6 June 2017 (UTC)[reply]

Why would that be useful? It isn't a significant change in character count, it isn't likely to change frequently over time, why not just type the name of the script? - [The]DaveRoss 18:43, 6 June 2017 (UTC)[reply]

Memory Weirdness at

[edit]

How is it that this diff pushes the memory usage from 44.6 MB to over the 50 MB limit? If I edit and preview the section containing the template, I see "Lua memory usage 7.95 MB/50 MB" in the parser profiling data table. If I then remove that template and preview again. I get "Lua memory usage 7.01 MB/50 MB".

.84 MB≠5.4 MB+ Chuck Entz (talk) 03:48, 6 June 2017 (UTC)[reply]

Weird. I even added as an exception in Template:redlink category and then did a "hard purge" on the diff, but this did not stop the memory errors. --Daniel Carrero (talk) 03:54, 6 June 2017 (UTC)[reply]
Well, that template uses Module:zh-cat, which uses Module:zh; perhaps the problem lies there. Perhaps it would help if Module:zh were split into smaller modules. — Eru·tuon 15:47, 6 June 2017 (UTC)[reply]
Never mind; Module:zh would already be invoked in any Chinese entry. — Eru·tuon 15:49, 6 June 2017 (UTC)[reply]
I went through many of the Chinese modules and removed global variables. I hoped that it would help with the memory error, but apparently not. — Eru·tuon 18:26, 6 June 2017 (UTC)[reply]

Header vs headword-line POS mismatch

[edit]

I wrote some probably inefficient, possibly horrifyingly inelegant, case sensitive regex which I used to search the most recent database dump and find 1666 entries where the part of speech header gave one part of speech but the headword-line template gave another. I cleaned up many pages which matched this same regex in 2014, so it appears we're seeing an edit or two a day (on average) introducing a header-headword POS mismatch. Would it be worthwhile to have an edit filter check for this in real-time? Would it be overly expensive? My regex can probably be improved upon; perhaps it can be made case-insensitive by adding = at the start of each header and by ignoring entries with nonstandard spaced headers (=== Noun ===), although the current approach also catches cases of "Proper Noun". It does not find all cases of mismatch, just common ones. (A related but separate approach would be to monitor for edits that introduce a header without also introducing a corresponding headword-line template.) If the filter works, it could even be updated to warn users against the edits. - -sche (discuss) 16:55, 7 June 2017 (UTC)[reply]

As to the regex, the header part could easily be simplified: Noun(===| ===|====| ====)Noun ?====?. So could the language code part: (..|...)[^\|]+. (That would also allow it to catch the exceptional codes from Module:languages/datax.) — Eru·tuon 17:14, 7 June 2017 (UTC)[reply]

Wikidata not working?

[edit]

To try out, I queried a single Wikidata item at Module:User:CodeCat. However, I get an error "attempt to index field 'wikibase' (a nil value)". I thought Wikidata access was enabled for Wiktionary already? —CodeCat 17:29, 11 June 2017 (UTC)[reply]

See phab:T159316. Wikidata is expected to work sometime, but it's not working yet. --Daniel Carrero (talk) 17:43, 11 June 2017 (UTC)[reply]
Ah, so we only have to wait for someone to enable it then? —CodeCat 17:46, 11 June 2017 (UTC)[reply]
I believe the next step of the big Wikidata plan is to enable custom interwikis. This is phab:T158323. The interwiki thing will be enabled on June 20th as said in Wiktionary:Beer parlour/2017/June#Enable sitelinks on Wikidata for Wiktionary pages (outside main namespace).
Apparently this has to be done first, before implementing normal Wikidata queries ("arbitrary access") as described in phab:T159316. I don't know when they'll do it, but it's marked "ready to go". --Daniel Carrero (talk) 17:53, 11 June 2017 (UTC)[reply]

Template:place problem with Parijs

[edit]

I just converted the entry Parijs to use {{place}}, but now the category Category:nl:Cities in France has disappeared from the page. It is the capital city of France, so isn't it also a city in France by definition? —CodeCat 18:05, 11 June 2017 (UTC)[reply]

It's a known bug (Template_talk:place#Capital_cities). The whole thing needs to be rewritten. DTLHS (talk) 18:13, 11 June 2017 (UTC)[reply]

Requested change to Template:trans-top

[edit]
Partially related: Template talk:trans-top#link to wikidata item

Right now, on the first line, there is this:

{{#switch:{{{1|}}}||Translations to be checked=|id{{=}}"Translations-{{anchorencode:{{{1}}}}}"}}

I'm requesting that this be changed to:

{{#ifeq:{{{1|}}}|Translations to be checked||{{#if:{{{id|{{{1|}}}}}}|id{{=}}"Translations-{{anchorencode:{{{id|{{{1}}}}}}}}"}}}}

This change will allow the id= parameter on translation tables, so that each table can be matched to the corresponding senseid on an individual sense. It will create an anchor on the page with the format Translations-{{{id}}}, so that the translation table can be directly linked to. It is probably desirable to, eventually, remove {{{1}}} from the anchor (if no id is present), so that only ids are used as anchors and not glosses. Moreover, it would be nice if there was a small link to the corresponding senseid in the top bar. —CodeCat 11:51, 12 June 2017 (UTC)[reply]

Done. It is probably good to have a separate |id= parameter. — Eru·tuon 19:38, 13 June 2017 (UTC)[reply]

preg_match_all equivalent

[edit]

Basic question. What's the equivalent to preg_match_all in Lua? I was hoping that I could create a table/array like so: mw.ustring.match(content, "\n=+[^=]*=+[^=]*"). --Victar (talk) 14:32, 12 June 2017 (UTC)[reply]

@Victar: There isn't any equivalent. But I created the function matchToArray in Module:string, which hopefully will work for that purpose. — Eru·tuon 17:48, 12 June 2017 (UTC)[reply]
Either that or if you plan to do something to each match that doesn't require an index of which number match it was, you can use a for loop with mw.ustring.gmatch. — Eru·tuon 18:12, 12 June 2017 (UTC)[reply]
@Erutuon: Thanks again! Man, you think that would be a basic function in Lua. --Victar (talk) 18:19, 12 June 2017 (UTC)[reply]
@Erutuon: I'm pretty sure I'm asking for the impossible, but I'm trying use this to create an array entry of section on a page: local content = require("Module:string").matchToArray(content, "\n=+[^=]*=+[^=]*"). I need this part, [^=]* to match all and stop at 2 repeating ==. I thought maybe [^={2}]* or .*(==), not no luck. Again, I assume this is impossible, but I thought I'd ask. --Victar (talk) 03:38, 13 June 2017 (UTC)[reply]
@Victar: If you want to stop at just two equals, I think .-==[^=] would work. .- is a non-greedy quantifier, equivalent to JavaScript .*?. — Eru·tuon 03:44, 13 June 2017 (UTC)[reply]
@Erutuon: \n=+[^=]*=+.-==[^=] is better, but it's stealing the first ='s and letter of the next section. You can find my test here. --Victar (talk) 04:43, 13 June 2017 (UTC)[reply]
@Victar: Well, I found a solution that grabs the content of the Descendants section. Was that what you wanted to do, or did you want to grab the whole rest of the language section below the Descendants header? — Eru·tuon 04:54, 13 June 2017 (UTC)[reply]
@Erutuon: I need to grab the section header and the content until the next section header. --Victar (talk) 04:58, 13 June 2017 (UTC)[reply]
@Victar: You can move the parentheses to capture whatever you want. — Eru·tuon 05:03, 13 June 2017 (UTC)[reply]
@Erutuon: Cool, but the problem with matching content that I'm not including is that is taking it from the next array item, which is causing every other to be skipped. --Victar (talk) 05:09, 13 June 2017 (UTC)[reply]
@Victar: Ouch. I can't think of a way to fix that, except by coming up with an entirely different approach to finding the Descendants section. — Eru·tuon 05:15, 13 June 2017 (UTC)[reply]
@Erutuon: If I just wanted to grab the Descendants section, that would be a breeze, but I'm trying to do something like GET KEY_1 of MATCH for "{{senseid|xxx}}" THEN FIND MATCH for "Alternative forms" with KEY_2 +/-2 from KEY_1. --Victar (talk) 05:35, 13 June 2017 (UTC)[reply]

Automatic Palindromes and नून

[edit]

The automatic palindrome categorizer is good, but seems to be incorrect for abugidas. नून (nūn) isn't a palindrome, because backwards it would be ननू (nanū). Things like ननून, नूनू, प्रतंप्र, etc. would be considered palindromes. DerekWinters (talk) 19:24, 12 June 2017 (UTC)[reply]

Is there a rule that can be added to Module:palindromes/data to fix it? DTLHS (talk) 19:28, 12 June 2017 (UTC)[reply]
I think the reason why it's considered a palindrome is that नून consists of three characters, न ू न, and Module:palindromes determines palindromes based on individual characters, not combinations of letter and diacritic. It can be made to find letter plus diacritic combinations instead. (More technically, it has to put letter plus diacritic combinations into slots in the table that it uses, rather than individual characters.) — Eru·tuon 19:41, 12 June 2017 (UTC)[reply]
That's a language-independent process though, isn't it? If we know in advance which characters need to combine. Can we use our Unicode database for this? —CodeCat 19:45, 12 June 2017 (UTC)[reply]
It might be possible, but it would be more complicated than simply doing it for one script where there's a relatively small number of combining diacritics. With one script it would be fairly simple to make a pattern to search for letter–diacritic combinations, whereas the full list of Unicode combining characters would be huge and might require a more complicated function that uses the is_combining function in Module:Unicode data. — Eru·tuon 20:37, 12 June 2017 (UTC)[reply]

more pages exceeding the memory limit

[edit]

fire now runs out of memory, following the addition of two Tamil translations. This lends support to the point, also made by some on Phabricator, that our auto-transliteration modules seem to be among the main culprits behind the errors. (They demonstrably are culprits behind a large chunk of the errors that plagued water.) As more and more pages run out of memory, perhaps we should rethink whether translations really need to provide transliterations. - -sche (discuss) 06:08, 13 June 2017 (UTC)[reply]

I'd rather move the translations to a subpage /translations than remove transliterations, which are very helpful. However, even that solution will run into problems in the future, when the translations themselves go over the 50MB limit. — Eru·tuon 06:57, 13 June 2017 (UTC)[reply]
@Erutuon In that case, how about /translations (A) for languages, whose names start with the letter A (for example, Arabic); /translations (B) for language names starting with B, and so forth. --Lo Ximiendo (talk) 07:04, 13 June 2017 (UTC)[reply]
If I remove any 2 translation languages with transliterations then the page renders fine. It seems that most weird transliteratins are Korean and Thai. If I remove Korean translations then it's fine. If I fill the Korean transliteration in the translation template then it still uses 4 modules for Korean and it breaks. There is something inefficient there calling modules that it really don't use at the end. If that is fixed then a bot could subst missing tr parameters avoiding large upload of translit modules. The cons is to refresh them after an update in translit modules. --Vriullop (talk) 09:14, 13 June 2017 (UTC)[reply]
I think we should stop using modules for static text which is unlikely to change much or often, like transliterations. Make the script code and transliteration into parameters and only call the module when there is no value provided. As Vriullop says, a bot can do the work to keep them up to date. - [The]DaveRoss 11:15, 13 June 2017 (UTC)[reply]
Module:links generates a transliteration whenever a transliteration module is available, in order to compare manual to automated transliterations, so providing a manual transliteration actually uses slightly more memory. This problem of escalating memory usage is fairly recent- a matter of a few months- so we should look at changes made this year to see if any of them have side effects on memory usage. Chuck Entz (talk) 13:01, 13 June 2017 (UTC)[reply]
It may do that, it doesn't need to do that. - [The]DaveRoss 14:04, 13 June 2017 (UTC)[reply]
I second that: checking transliterations is obviously too taxing. Ideally the transliterations for a word should be created once and stored in the page (or in the Future: in Wikidata). It can be automated (by bot or with a gadget or an extension). — Dakdada 15:06, 13 June 2017 (UTC)[reply]
I agree we should stop using modules to automatically provide transliterations in translations (if not also in links), and "subst" them in by bot. (@Dakdada, accessing Wikidata is expensive and relatively slow, so that would not improve things over what is currently done, and might make things worse. And, of course, transliterations vary between wikis. It would be better to store transliterations directly on the page.) So, we need to (1) rewrite {{t}} to stop expensively checking the auto-translit against the manual translit, and (2) "subst" the auto translits into all entries. - -sche (discuss) 15:46, 13 June 2017 (UTC)[reply]
Saying "use a bot" to solve all our problems is intellectually lazy. Whose bot? How often is it expected to be run? What happens if the bot runner leaves the project or dies? How will the bot code be updated if we decide to change a formatting detail? DTLHS (talk) 15:51, 13 June 2017 (UTC)[reply]
For translations, I would think that a bot could run once to subst all existing {{t}}s, and then perhaps the translations-adder script could be changed to fetch and add (spelled out) the transliteration that the automatic transliteration module provides. Perhaps we should require the code of the bot to be available, something we did not require of some previous AutoFormat-esque bots whose users subsequently became much less active or were globally banned.
But even just step 1, rewriting {{t}} so that it does not compare a manual transliteration to the automatic one, together with the addition of manual transliterations to the {{t}}s in the entries that are currently broken, would fix most of the breakages we are seeing by stopping the invocation of the expensive translit modules. Even just {{t-simple}}, if it never invokes a transliteration module and only provided a transliteration when one was manually given, ought to work... - -sche (discuss) 16:20, 13 June 2017 (UTC)[reply]
Shared bot projects are possible if the toolserver is used. They can be maintained by a group and can be run against either the replica databases or via the API. - [The]DaveRoss 17:03, 13 June 2017 (UTC)[reply]
Another test in fire page, on previous version without t-simple: if I remove translations with tr parameter then the page renders fine with Lua memory usage 47.64 MB/50 MB. With current version with some translations changed to t-simple it uses 49.93 MB. To compare manual and auto transliterations is really a waste of resources. It is fine in a language by language basis for checking purpouses but it should be temporary and it should be switched off when the task is finished or nobody is checking it. If the translation-adder script adds tr parameter then maybe a bot is not needed. --Vriullop (talk) 17:06, 13 June 2017 (UTC)[reply]

I've added manual transliterations for the Korean and Thai words on the page fire with {{subst:xlit|lang|term}}. Now transliterations are shown, but they don't use any Lua memory. — Eru·tuon 17:10, 13 June 2017 (UTC)[reply]

Fantastic. Hmm, would it be possible to update your t-simplifier gadget to convert translations in non-Latin scripts, adding the (manual) translit and script parameters? - -sche (discuss) 19:20, 13 June 2017 (UTC)[reply]
Probably. I just haven't done it yet because I imagine it will be complicated. I should, though. Once made, it will simplify things bigly. — Eru·tuon 19:27, 13 June 2017 (UTC)[reply]

{{t-simple}} now supports interwiki links. Turn them on with |interwiki=1. — Eru·tuon 19:28, 13 June 2017 (UTC)[reply]

Okay, I have an idea regarding the transliteration comparison, mentioned above: how about disabling the transliteration comparison on particular pages that are running into memory errors? So, on water and such pages, {{t}} would only invoke a transliteration module if no |tr= parameter was provided. — Eru·tuon 19:16, 15 June 2017 (UTC)[reply]

I prefer a solution which will solve the problem rather than playing whack-a-mole. As a stop-gap I think your solution is reasonable. - [The]DaveRoss 17:31, 16 June 2017 (UTC)[reply]
We can add e to the list, as reporting by an IP on the talk page of the March 2017 Grease Pit (for some reason). The longest running template in that case is {{head}} which hasn't been changed lately. - [The]DaveRoss 22:52, 29 June 2017 (UTC)[reply]
Also, can someone explain to me the reasoning behind {{zu-letter}} and its ilk? Why do we need to perform an expensive call to the languages module for a template which is always going to refer to a single language? The languages module is at the root of nearly every memory error we encounter here, and it often provides no benefit to offset the problems. - [The]DaveRoss 23:01, 29 June 2017 (UTC)[reply]
Based on our own observations and the limited observations of the folks at Phabricator, we know that the functions that provide and check automatic transliterations are (some of) the biggest wasters of Lua memory. Disabling the "checking" that compares manual to automatic transliterations, updating {{t}} to not invoke the transliteration modules at all if a transliteration has been supplied manually, and if possible having a bot (even just as a one-off) subst in automated transliterations for languages where that can be done reliably (i.e. most languages), would save a lot of memory. Perhaps the translation-adder script could be updated to automatically subst: in transliterations, too (via {{subst:xlit}}?). We could also have {{t}} accept langname= and have the translation-adder also subst that in, and have {{t}} not invoke the language modules to ask for langnames when they are manually provided. Basically, make {{t}} more like {{t-simple}}, perhaps eventually combining the two. - -sche (discuss) 19:27, 30 June 2017 (UTC)[reply]
I'm not sure if having {{t}} accept the parameter |langname= would use more or less memory, because the module would still create a language object, and the language name parameter, since it would be passed to Module:translations and Module:links, would have to be stored in Lua memory. It would also require us to modify the linking functions or create a new one... — Eru·tuon 22:20, 2 July 2017 (UTC)[reply]

Why is the entry Asia in Category:en:Countries? It doesn't belong there as far as I can tell, but I can't find where on the page the category is being added to the page. —CodeCat 18:08, 13 June 2017 (UTC)[reply]

{{list:countries of Asia/en}} DTLHS (talk) 18:09, 13 June 2017 (UTC)[reply]
I've made it no longer categorize if the pagename is "Asia", and re-added the template (after @CodeCat removed it) under Hyponyms. — Eru·tuon 18:19, 13 June 2017 (UTC)[reply]
The same is happening to Europe as well. —CodeCat 12:58, 14 June 2017 (UTC)[reply]

Auto-expand translation table when linking to its anchor

[edit]

The translation table on Republic of Macedonia contains a link to Macedonia#Translations-Q221, which takes you to the right translation table. However, the translation table stays collapsed. Could it be made so that the translation table expands whenever you link to its anchor (if it has one)? —CodeCat 18:10, 14 June 2017 (UTC)[reply]

The anchor to Macedonia#Translations-Q221 doesn't work for me (it links further down the page, probably because of things loading in the wrong order) (Chrome). DTLHS (talk) 18:16, 14 June 2017 (UTC)[reply]
That happens with fragments in general, very commonly on discussion pages. But if you wait for the page to load and then click the address bar and hit enter, it jumps to the right place for me. —CodeCat 18:18, 14 June 2017 (UTC)[reply]

Swahili on Wiktionary

[edit]

"Welcome to the English-language Wiktionary, a collaborative project to produce a free-content multilingual dictionary. It aims to describe all words of all languages using definitions and descriptions in English."

Swahili has some words that do not fit into verb, noun, adjective, etc because they are sentences when translated into English, such as 'mtaona' (sic) which means 'you (plural) will see'.

How can we make pages for words such as this? Anjuna (talk) 09:51, 16 June 2017 (UTC)[reply]

I believe those are considered "verb forms". —suzukaze (tc) 10:11, 16 June 2017 (UTC)[reply]
Yes. It's no different than Latin vidēbitis. —Aɴɢʀ (talk) 12:03, 16 June 2017 (UTC)[reply]
I'll also point out the About_Swahili page, which covers some of the nuances with regards to the Swahili language treatment here on Wiktionary. I am not sure it addresses this issue directly, but the "conjugated form" section demonstrates how verb forms should be formatted. - [The]DaveRoss 13:33, 16 June 2017 (UTC)[reply]

Thank you all! Yes, I've made a derived verb page. I'll file them all under verb. I'm afraid that some will be taken down, though. They're a little long sometimes. I won't bother to make pages for words with object infixes, such as 'ninakuchukua', literally 'I am taking you', since those words contain the nominative, and the predicate- there are too many combinations.
Anjuna (talk) 00:56, 17 June 2017 (UTC)[reply]

How about this one? ataona Is there any more formatting that I must do to make this acceptable?Anjuna (talk) 01:31, 17 June 2017 (UTC)[reply]

Put {{head|sw|verb form}} under the Verb header. DTLHS (talk) 01:44, 17 June 2017 (UTC)[reply]
Also make sure you check for any irregular forms before you add {{sw-conj}} to a page. DTLHS (talk) 01:49, 17 June 2017 (UTC)[reply]

Thank you all so much! I found the verbal derivation. I found it helpful. This section can be deleted now, I suppose. — This comment was unsigned.

I see that many of the questions have already been answered. @Metaknowledge is probably able to help with any questions that remain, such as whether the more heavily inflected forms need to be created or not. - -sche (discuss) 03:17, 17 June 2017 (UTC)[reply]
Inflected forms are much lower priority when a language coverage is nowhere near a decent level for a dictionary. Before e.g. a Russian inflected entry for уви́дите (uvídite, you (plural) will see) was created, many lemmas like уви́деть (uvídetʹ) were made. --Anatoli T. (обсудить/вклад) 07:01, 17 June 2017 (UTC)[reply]

User:Metaknowledge hasn't replied to me, so, how do I signify negative?

@Science Bird: they replied on their talk page (which is a common practice), have a look there. Also it is helpful if you sign all your comments with four tildes (~~~~) so that everyone knows who is making a statement or asking a question. - [The]DaveRoss 18:48, 21 June 2017 (UTC)[reply]

Renaming a senseid?

[edit]

Is there a particular way to rename a senseid? Right now, the music sense on house has genre of music as its senseid, but there is also a Wikidata item, d:Q20502. If the senseid is to be changed to match the Wikidata item, how do we deal with all the existing uses of the genre of music id? —CodeCat 16:58, 16 June 2017 (UTC)[reply]

The simplest solution would seem to be to allow multiple senseids, so that the one which is intelligible to humans can be kept, and the Wikidata one can be added. - -sche (discuss) 17:01, 16 June 2017 (UTC)[reply]
I suppose that would work. Should {{senseid}} take multiple id parameters then? —CodeCat 17:03, 16 June 2017 (UTC)[reply]
Yes. (I mean, AFAICT we could just use multiple instances of {{senseid}}, but obviously your suggestion is better.) Some senseids and anchors are linked-to from Wikipedia entries, and others may be linked-to from other off-wiki sites. I'm not entirely opposed to rotting links sometimes, for example if a particular senseid is badly named (very misleading, offensive, etc), in general adding additional IDs seems preferable, with the understanding that there should be no vast proliferation of them (one human-readable ID and one Wikidata ID seems reasonable; five human-readable IDs, if one person wanted orange#the_colour and one wanted orange#the_color, etc, would be too many). - -sche (discuss) 17:52, 16 June 2017 (UTC)[reply]
How are we going to make sure that senseids are not changed or removed without links to them being fixed too? — Ungoliant (falai) 17:10, 16 June 2017 (UTC)[reply]
We can't, really, unless we have a way to track down all uses of a particular id. —CodeCat 17:11, 16 June 2017 (UTC)[reply]
So we're just supposed to accept inevitable link rot? Would anyone else support banning senseids all together? DTLHS (talk) 17:19, 16 June 2017 (UTC)[reply]
Not without an adequate replacement. — Ungoliant (falai) 17:20, 16 June 2017 (UTC)[reply]
Yes, glosses, which are independent of what they reference. DTLHS (talk) 17:21, 16 June 2017 (UTC)[reply]
Senseids are a complement to, not a replacement of, glosses. Even glosses + senseids with link rot is an improvement over just glosses. Glosses form the informational connection between link and definition, while senseids form the software connection. — Ungoliant (falai) 17:27, 16 June 2017 (UTC)[reply]
I support only using them if they are reasonably robust. I think the long-term goal is that the editor and back-end infrastructure get to the point that a senseid would be intelligible to humans and also not suffer from the possibility that it becomes out of sync with the unique identifier. Not sure how long that term is though. - [The]DaveRoss 17:23, 16 June 2017 (UTC)[reply]
@TheDaveRoss: Would it be possible to make an edit filter that would catch changes of senseids? — Eru·tuon 18:15, 16 June 2017 (UTC)[reply]
I think it would be possible, I will take a look. - [The]DaveRoss 18:32, 16 June 2017 (UTC)[reply]
Changed my mind. I can make a filter which can detect if a senseid is added or removed, or a line which contains a senseid is changed. Possibly also if the first instance of senseid has been changed. But without flow control I don't think there is a way to see if any instance of senseid has been changed. - [The]DaveRoss 19:32, 16 June 2017 (UTC)[reply]

Search results from sister projects

[edit]

TOW has just added a nice search feature called "results from sister projects", so that searching Wikipedia also shows results in a sidebar from Wiktionary, Wikibooks, Wikivoyage, Wikiquote, and Wikisource. I wonder if we can get this...? I remember suggesting it a long time ago in response to the argument "we should have entries about TV series because people might not find them on other sites". Equinox 00:36, 19 June 2017 (UTC)[reply]

I like it. - [The]DaveRoss 13:27, 19 June 2017 (UTC)[reply]

Is my abuse filter not working?

[edit]

[1] is supposed to prevent edit summaries of the word "nothing". It has blocked a few edits. How did this one get through? [2] Equinox 17:51, 21 June 2017 (UTC)[reply]

You can add whitespace to the end of the edit summary and it will be stripped in the history, but your edit filter won't recognize it. I'm guessing that's what happened. DTLHS (talk) 18:02, 21 June 2017 (UTC)[reply]
Converting the rule to regex could probably deal with that. — Eru·tuon 18:26, 21 June 2017 (UTC)[reply]
Aha. I have changed it to ^\s*[Nn]othing\s*$. Hope that's correct. Equinox 18:33, 21 June 2017 (UTC)[reply]
If they aren't editing a section (unless sections are automatically stripped?). To account for that, it would have to be something like ^\s*\/\*.*?\*\/\s*[Nn]othing\s*$ (though there could be errors, because I'm not sure what version of regex is used). — Eru·tuon 18:36, 21 June 2017 (UTC)[reply]

Lua error: attempting to index upvalue 'm_data'

[edit]

Many of the templates (of various kinds) on the page for the Malay/Indonesian term burung kakaktua are showing this error:

Lua error in Module:script_utilities at line 167: attempt to index upvalue 'm_data' (a boolean value).

Each of the templates looks fine to me individually when I look at it in edit mode. Other Malay/Indonesian words look fine. --46.226.49.232 11:33, 22 June 2017 (UTC)[reply]

This happens sometimes. Someone made an error when changing the code in one of the modules and then it was fixed, but there are so many entries using the module that it takes a while for the system to update all of them. If you find any more like that, simply edit the entry and save/publish it without making any changes (what we call a "null edit"). This will make the system apply all the edits waiting for that entry and bring it up to date. If that doesn't solve the problem, then something still needs to be fixed and you can let us know here. Thanks! Chuck Entz (talk) 13:43, 22 June 2017 (UTC)[reply]

Language name to ISO converter

[edit]

I just discovered that the language name to ISO code converter that I used to be able to see in the sidebar is no longer showing up for me in either Firefox or Chrome, although it is still selected in my per-browser preferences. Any idea why this might be? It does warn that it's one of the potentially buggy gadgets, but it was also working before... Andrew Sheedy (talk) 03:34, 23 June 2017 (UTC)[reply]

[edit]

I recently noticed that зелений (zelenyj) didn't have a see-also link at the top to зелёный (zeljónyj). I'm surprised a bot didn't add it, and wanted to check that this was just an anomaly. — Eru·tuon 17:42, 23 June 2017 (UTC)[reply]

@Erutuon: But one has a soft ending, the other a hard one. --Barytonesis (talk) 17:43, 23 June 2017 (UTC)[reply]
Ohh... duh. Well, they both have hard endings, actually, just spelled in different alphabets. — Eru·tuon 17:44, 23 June 2017 (UTC)[reply]
I hadn't even noticed one was Ukrainian, not much better... --Barytonesis (talk) 17:48, 23 June 2017 (UTC)[reply]
The see-also links are mostly added by hand anyway. There was one bot that briefly added some, but it missed a lot. I just recently added {{also|weder}} to the top of Weder, for example. —Aɴɢʀ (talk) 06:54, 24 June 2017 (UTC)[reply]
@Angr: (Belated response.) I remember seeing bots add these links before, but I haven't noticed they stopped, probably because I hid bot edits in my watchlist. — Eru·tuon 06:41, 8 July 2017 (UTC)[reply]

Affix sense differentiation

[edit]

Several affix entries include two or more (often completely unrelated) meanings, yet the words they derive are all categorized together:
e.g. modesty and messy both fall under [[Category:English words suffixed with -y]], with no mention of how -y was used in two completely different fashions.
I'm fairly new in this discussion page, so I can't say for certain this hasn't been discussed before, but I would like to hear if anyone else is... well, kinda bothered by this. – GianWiki (talk) 15:20, 24 June 2017 (UTC)[reply]

I agree it's not ideal. Another good example is e-. Are there any practical ways to get around this? Equinox 15:22, 24 June 2017 (UTC)[reply]
You can add a gloss, for example merger. DTLHS (talk) 15:29, 24 June 2017 (UTC)[reply]
Better still, use a senseid. —CodeCat 16:29, 24 June 2017 (UTC)[reply]
As is currently done with Category:English words suffixed with -y (diminutive). — Eru·tuon 17:46, 24 June 2017 (UTC)[reply]

I seem to remember someone bringing this up before, though I don't remember the outcome: this tracking category currently has 33,393 entries in it- most of which seem to be Serbo-Croatian and proto-languages. I believe the reason for the latter is a statement in the block of code in Module:headword#full_headword that generates the category:

if not mw.ustring.find(cat, "^" .. data.lang:getCanonicalName()) then

I'm sure there are efficiency/speed benefits to converting the language name into a pattern this way, but it seems to me like any language name with pattern characters in it such as "-" would give unexpected results. Is there any way to use an escaped version of the language name instead? Or maybe we should skip language names with characters like "-" in them?

If there are reasons not to fix this, the question then becomes: how can you use a tracking category with 99% false positives? I ran into a couple of Latin entries in the category that seem to have some other problem (Alba Pompeia, for one), but looking through 33,393 entries for similar cases seems pointless (there is Special:WhatLinksHere/Template:tracking/headword/no lang category/lang/la, though, if you know you want to look specifically for Latin entries).

Of course, I'm not well-versed enough in Lua to write even the simplest of code, so I may not be understanding this correctly- but I figure it's worth bringing up, anyway. Thanks! Chuck Entz (talk) 01:37, 26 June 2017 (UTC)[reply]

Looking further, I notice that there are 53,191 entries in Category:Serbo-Croatian lemmas, so a simple language-name explanation (literally) doesn't add up. After a quick look through the tracking category, I notice that there don't seem to be any verb or adjective endings- perhaps it has something to do with the code for nouns in function export.noun of Module:sh-headword? There seem to be a good number of Esperanto terms in the tracking category, as well, as well. Chuck Entz (talk) 02:26, 26 June 2017 (UTC)[reply]
For some reason, the category name Serbo-Croatian lemmas doesn't get processed by the code mentioned above. In adsorpcija, the category name that put the entry in Category:head tracking/no lang category was Serbo-Croatian feminine nouns, not Serbo-Croatian lemmas. (I printed out the offending category name in the Lua log.) So perhaps only the S-C entries that have categories besides the lemma category ended up being put in the tracking category, or categories besides the lemma and basic part-of-speech category. — Eru·tuon 03:30, 26 June 2017 (UTC)[reply]
I think you're right that the problem was due to the minus-hyphens in the language names. I pattern-escaped the language name, then tested the entry adsorpcija; this removed it from the category. — Eru·tuon 03:22, 26 June 2017 (UTC)[reply]
I took a look at abateco, an Esperanto noun. It was in the tracking category because the Esperanto headword module Module:eo-headword put the category Category:Missing Esperanto noun forms into the table of categories supplied to Module:headword. I solved that problem by having Module:eo-headword put that category in a separate table. That should begin emptying Esperanto entries from the tracking category. — Eru·tuon 05:05, 26 June 2017 (UTC)[reply]

Misspelling kinda

[edit]

Hello! In many languages I've found words that haven't been misspelled but rather missconjugated. In Swedish the verbs skära and bära are both strong verbs but many wrongly conjugate them as weak ones hence skärde, skärt, skärd, skärda, bärde, bärt, bärd and bärda. Is there a template for missconjugations? If there isn't one I think it would be good to have one since a misspelling isn't really what it is. I see that most English examples (fighted, swimmed, shooted) seem to have just been labeled (nonstandard) as if it isn't incorrect, just not as common. Should the Swedish verb forms also be labeled (nonstandard) or what does the guidelines say?Jonteemil (talk) 02:00, 26 June 2017 (UTC)[reply]

nonstandard is for terms that are generally considered incorrect: “Not conforming to the language as accepted by the majority of its speakers.” — Ungoliant (falai) 03:33, 26 June 2017 (UTC)[reply]

Could somebody please fix this template? It was broken after Template:RQ:Authorized Version was redirected here. See for example Nathan. There is no advice on how to write biblical books that contain numbers (2 Samuel, 2-Samuel, whatever)? Also the links don't work, or link to the wrong place and ":" after the old template produces an empty line. A user complained in March 2016 on the template talk page but nothing has been done.--Makaokalani (talk) 13:57, 28 June 2017 (UTC) Pinging @Smuconlaw. --Makaokalani (talk) 14:12, 6 July 2017 (UTC)[reply]

Actually, it was broken before I redirected the template; I never noticed that the problem mentioned on the template talk page in 2016 hadn't been fixed. Anyway, I think I've fixed the template now – try it out. — SGconlaw (talk) 17:24, 13 July 2017 (UTC)[reply]