Wiktionary:Grease pit/2010/December

From Wiktionary, the free dictionary
Jump to navigation Jump to search
This is an archive page that has been kept for historical purposes. The conversations on this page are no longer live.

December 2010

It would be good to have a help page saying how to install various font to avoid seeing numbered squared, particularly (in my experience) for Gothic, Egyptian, Sumerian and Burmese. There's a Wikipedia page of the same name, I believe. Mglovesfun (talk) 16:15, 2 December 2010 (UTC)[reply]

I remember bringing that up a few years ago. Don't remember where it is... ah, it was Wiktionary:Unicode -- Prince Kassad 16:23, 2 December 2010 (UTC)[reply]

Interwiki gadget

At the Dutch wiktionary, we had since quite some time an interwiki gadget that was reactivated recently. It gives a list of pages with the same name, and if different from the list of interwikis, also lists the collection of interwikilinks (see e.g. the list for the entry granule). I thought this might have been handy for the English wiktionary as well.

The server-side php tool on the server crawls through the databases on the toolserver. One of the positive things is that it only sends http requests to the different wiktionaries when the databases on the toolserver are not functional. It uses the following javascript code to create a button to go to the toolserver page: User:Annabel/Interwiki.js.

I am aware of the bot status request for ChuispastonBot, that would resurrect the interwicket bot code. But if necessary, I can generalise the gadget code into an interwiki bot code which would not need to build a database of links.

Annabel 15:25, 3 December 2010 (UTC)[reply]

That must be useful for w:Wikipedia:WikiProject_Interlanguage_Links. JackPotte 18:39, 26 December 2010 (UTC)[reply]

QuasiBot has being extended to Esperanto verbs

This is to inform that QuasiBot has been set up to also be able to conjugate Esperanto verbs. The code and procedure are available for scrutiny at the QuasiBot page, and are significantly shorter than for Latin. —AugPi 10:39, 10 December 2010 (UTC)[reply]

Support. Mglovesfun (talk) 18:25, 12 December 2010 (UTC)[reply]

Hi all, I am wondering how to link categories such as es:Categoría:EN:Antropología to something equivalent here since you have no such Category:en:Anthropology. I rested happy with linking it to Category:Anthropology, but that was forgetting that bots would get confused by this, with the result that now es:Categoría:EN:Antropología is linked to fr:Catégorie:Anthropologie nl:Categorie:Antropologie no:Kategori:Antropologi pt:Categoria:Antropologia tr:Kategori:Antropoloji which is irrelevant. Is there a way to bypass that? Creating empty redirect categories each time ? - Olybrius 12:44, 11 December 2010 (UTC)[reply]

I started a vote some time ago to add en: to English topical categories, and it failed. The not distinguishing between English and other languages using en:, that's essentially the cause of the problem. Mglovesfun (talk) 18:24, 12 December 2010 (UTC)[reply]
So that's hopeless? Alright, delinking then... - Olybrius 09:32, 15 December 2010 (UTC)[reply]

Quick request

A list of all the entries that contain [[Category:French nouns]] as a written category. This should almost always be covered by {{fr-noun}}, or occasionally by {{infl}}. May of the ones that don't use {{fr-noun}} are lacking a valid plural. Perhaps at [[User:Mglovesfun/French nouns]], s'il vous plaît. Mglovesfun (talk) 18:28, 12 December 2010 (UTC)[reply]

Retracted. Mglovesfun (talk) 18:32, 13 December 2010 (UTC)[reply]

{{termx}} is broken when used on non-appendix languages

I've tracked the problem down to a rather small piece of code, but I'm really stuck. It seems like a bug in the wiki software to me. Take a look at User:CodeCat/sandbox. It is part of the code that gets included in {{termx|fier||far|tr=|sc=Xyzy|lang=fy}}. Now try putting subst: in front of the #if and save the page. It adds a newline for some reason, and this puts the # on the beginning of the line, so that it now treats it as a numbered list. It shouldn't do that, because the # is really used to create a section link. Very strange... —CodeCat 10:50, 18 December 2010 (UTC)[reply]

That's f*cking annoying, yes. Mglovesfun (talk) 12:28, 18 December 2010 (UTC)[reply]
Please use Template:compound/test to fix the bug; compound is too widely used to serve as a test template. Mglovesfun (talk) 12:33, 18 December 2010 (UTC)[reply]
The problem is in {{termx}}, not {{compound}}. You can just revert it back to using {{term}} as it did before, until the newer template gets fixed properly. —CodeCat 14:56, 18 December 2010 (UTC)[reply]
Ok I think it's fixed now, I moved the # out of the #if. It's not a perfect solution because now it adds the # even if there is no section to link to. But that's not a big problem. —CodeCat 15:05, 18 December 2010 (UTC)[reply]
I haven't checked out the problem here at all (have only read this discussion), but if your template puts # (or * or : or ;) at the start of a line, you can often stop it from being interpreted as a list item by preceding it with <nowiki/>.​—msh210 (talk) 20:55, 20 December 2010 (UTC)[reply]

Bot request

Hello, I have been adding Vietnamese readings for Han characters for 5 years now, inputting each by hand using data from two reputable Vietnamese/Han character databases. I believe I have gotten well more than half finished. In order to identify which entries still need to be done, can a bot operator do the following?:

  • Generate a bluelinked list of Han character entries at en:Wiktionary that do *not* have a Vietnamese heading

I realize that there are more than 20,000 Han character entries at Wiktionary and I am guessing that there are probably about 5,000 or so entries that do not yet have a Vietnamese heading.

Thank you very much,

71.66.97.228 18:36, 21 December 2010 (UTC)[reply]

That task would actually probably be best performed with a dump analysis rather than a bot, Connel and Conrad used to do those but I don't think either are around. I am not sure if anyone already has the current dump downloaded but the analysis would not be too tricky to perform. - [The]DaveRoss 20:29, 21 December 2010 (UTC)[reply]

Wonderful--should I ask in a different place? There is no hurry, but it would help to speed the task, in that at this point it's difficult to identify through lists of characters which ones haven't been done yet, as most of them have. 71.66.97.228 22:28, 21 December 2010 (UTC)[reply]

I've found dump-anaylsis help from Wikipedians in the past.​—msh210 (talk) 22:44, 21 December 2010 (UTC)[reply]
Didn't see anyone jumping on this, so I took a swing at it, I think this list is correct and comprehensive as of October 30, 2010. If you see any problems let me know and I can fix it and rerun it, but the few I checked looked right. Enjoy. - [The]DaveRoss 21:16, 23 December 2010 (UTC)[reply]

Thank you kindly; however, by my rudimentary count those are links to about 19 thousand+ characters, and they including links to many entries that do already have a Vietnamese header. Is it not possible to do a bot run excluding ones that have a Vietnamese header? I was trying to isolate the entries that don't yet have a Vietnamese header. 71.66.97.228 21:50, 23 December 2010 (UTC)[reply]

Can you give me a few examples of ones that do have Vietnamese headers? - [The]DaveRoss 23:10, 23 December 2010 (UTC)[reply]

For example, or . 71.66.97.228 23:32, 23 December 2010 (UTC)[reply]

I made that list from the most current data dump; it was created on October 30, 2010. Any changes which have been made since then will not be reflected in the results. I can run it again once the next dump comes along, but until then we have to use stale data if we are going by a dump. A bot could get more current data but it would take quite a bit longer since bots are only allowed to load a certain number of pages at a time. If all of the Han pages are categorized it might be a bit faster. You are about right on the number, there were about 18,000 at time of the dump. - [The]DaveRoss 23:41, 23 December 2010 (UTC)[reply]

Customized Signature; Align to Right

Ok, I Tunneled thru [My Preference] the Help -> Information Desk -> Grease Pit. and searched Both Archives for ways to have my signature to align right and be always in BOLD and in a NEW line If i place the <br> code in the preference IT splits the DATE and signature which i Don't want , when i type the 4 tilde(~)
Is there a way to have a More Detailed Cheatsheet
==== Red 03:38, 24 December 2010 (UTC)
[reply]

Can you show us what you would like your signature to look like? - [The]DaveRoss 05:41, 24 December 2010 (UTC)[reply]
Like the one i have now but aligned to the right & a Tilde ~ between the name and date ; And to that I don't have to keep typing <br>'''~~~~'''
==== Red 11:38, 27 December 2010 (UTC)[reply]
Maybe <p style="text-align"right"><b>==== <i>[[User talk:Red|Red]]</i></b> ~ {{CURRENTTIME}}, {{CURRENTDAY}} {{CURRENTMONTHNAME}} {{CURRENTYEAR}} (UTC)</p> and then use only ~~~ instead of ~~~~? I haven't tested it.​—msh210 (talk) 17:13, 27 December 2010 (UTC)[reply]
Too Long for me to type I was wondering if i could use a User:Red/Sig to code my signature and I would just type the tilde for faster typing as a macro.

==== Red ~ 21:59, 14 November 2024 (UTC)[reply]

P.S. It's not working ^_^

Indeed, using <p style="text-align:right">...</p> around your signature will align it to the right side of the page, however I would advise against it. The purpose of a signature is to enable people to readily identify whose text they are reading, and modifying the signature too far can detract from that purpose. A signature aligned to the right of the screen will be seen by many as such a detraction. As a side note your current signature is pointing to the user page of someone other than yourself, which is not a good thing at all; you should change that immediately. - [The]DaveRoss 20:27, 27 December 2010 (UTC)[reply]
I see, but having it Coded Properly will avoid the confusion it could make, even in a long threaded discussion. Also Lost the email password on the old account it's mine.
____________________________==== Red ~ 00:55, 28 December 2010 (UTC)[reply]
You can fill out a little form at WT:CHU and take over your old account name, since you only had one edit it shouldn't be too much hassle. Also, I tested it and you can use <br /> and <p /> in signatures, no idea what you might be doing differently than I am. For example: User:TheDaveRoss/testsig

I've recently been seeing strangely coloured links - a sort of dark brownish red? They are for terms that exist - i.e. the link should be blue. As I have poor eyesight, it is very confusing. SemperBlotto 11:04, 27 December 2010 (UTC)[reply]

Do you have examples of the circumstances for us to try to replicate. What browser do you use? DCDuring TALK 12:33, 27 December 2010 (UTC)[reply]
Take a screen shot (Prt Scr near F12, and paste into Microsoft Paint or something equivalent). Mglovesfun (talk) 12:36, 27 December 2010 (UTC)[reply]
They wouldn't happen to be this color, would they? --Yair rand (talk) 13:35, 27 December 2010 (UTC)[reply]
It has something to do with Special:Preferences --> Appearance --> Advanced options, which colors links to pages that are smaller than a given byte threshold. However it's coloring random links that way for me in Firefox now as well. — lexicógrafa | háblame17:06, 27 December 2010 (UTC)[reply]
I'm using Google Chrome at the moment. A screenshot is at [1]. The strange link colours include that for "Main Page" over on the left. In the main body of Latin words, the ones at the top (funny colour) are newish; the ones at the bottom don't exist yet. The effect comes and goes, by the way. SemperBlotto 16:49, 27 December 2010 (UTC) (Oh, and the colours don't look quite right after all the processing)[reply]
When I visit a redlink and do not save anything and then use "back" to return to the page that had the redlink, the redlink changes to a maroon/brown color. I thought that was a desirable feature, serving on occasion to remind me of unsaved work. If I then visit another redlink, the same thing would happen. If I refresh the page, the saved edits turn blue, the unvisited ones remain red and the visited/unsaved ones remain maroon. Could that be what you are noticing? DCDuring TALK 20:48, 27 December 2010 (UTC)[reply]
Now I see. Using FF3.6.13 I just got the maroon color when moving synonyms from a show-hide to dispaly after {{sense}}. I did not use any of the links. This is new behavior, not particularly functional AFAICT. DCDuring TALK 00:33, 28 December 2010 (UTC)[reply]

Note to admins: Please watch major templates for a potential vandal

This notice is being cross-posted to the major administrators noticeboard (incidents or alerts) style pages on all the major projects.

Earlier today, a w:User:Meepsheep2 was blocked on the English Wikipedia. Apparently in reprisal, he vandalized a major template on English Wiktionary with a fake fundraising banner that he photoshopped. Someone reported it on IRC, and we blocked him quite quickly for this time of night, but we want you to be on the lookout for future similar incidents. Please help keep an eye on major templates for vandalism specifically related to the fundraiser banners, and if they occur, globally lock their accounts (if you do not have that access, please block them locally on the wiki they vandalized, and then find someone on IRC who can globally lock the account). Stewards can assist with this. I know you guys all watch the high value templates anyway, and I'm not asking you to do anything different with those. I'm specifically referring to incidents that spoof the fundraising banners. Please keep an extra careful eye out for those, and take the extra step of asking a steward to globally lock the account to prevent future recurrences of this specific kind of vandalism. Please send any questions to drosenthal (at) wikimedia.org, or use my English Wikipedia User Talk page as I cannot respond locally on all projects. DanRosenthal Wikipedia Contribution Team 07:04, 29 December 2010 (UTC)[reply]

rfp to categorize by dialect?

I think it would be helpful if {{rfp}} were to categorize by region/dialect/accent. People looking to add pronunciations could then find the entries more readily. Perhaps {{#ifexist:template:accent:{{{1}}}|[[category:Requests for pronunciation ({{{1}}} {{langname|{{{lang|en}}}}})]]}} in addition to the current categories? Even better would be

{{#if:{{isValidPageName|{{accent:{{{1}}}|l=}}}}<!--
then-->|[[category:Requests for pronunciation ({{accent:{{{1}}}|l=}} {{langname|{{{lang|en}}}}})]]
[[Category:Wiktionary-namespace discussion pages]]
<!--
else-->|{{#ifexist:template:{{{1|}}}|<!--
  test if a non-context template-->{{#ifeq:{{{{{1}}}|sub=}}|{{{{{1}}}|sub=1}}||<!--
       else-->[[category:Requests for pronunciation ({{{{{1}}}|sub=[something that makes it just display the label]}}
               {{langname|{{{lang|en}}}}})]]}}}}}}

, if we add an l parameter to the accent templates as we have for the language ones, and allowing for context tags also. (There are probably typos there, though, and I can't think how to devise a context "sub" template that displays just the label as plaintext (without using JS), as labels have wikicode or HTML code in them often.) Thoughts?​—msh210 (talk) 21:04, 23 November 2010 (UTC) [reply]

Okay, no one's commented as of yet, so I'm going to be more explicit about a proposed change, and am hereby explicitly labeling it aas a proposed change rather than a mere idea. I want to change {{#if:{{{lang|}}}|[[Category:Requests for pronunciation ({{langname|{{{lang}}}}})]]}} to
{{#if:{{isValidPageName|{{accent:{{{1}}}|l=}}}}<!--
then-->|[[Category:Requests for pronunciation ({{accent:{{{1}}}|l=}} {{langname|{{{lang|en}}}}})]]<!--
  and-->[[Category:Requests for pronunciation ({{langname|{{{lang|en}}}}})]]<!--
else-->|{{#if:{{{lang|}}}|[[Category:Requests for pronunciation ({{langname|{{{lang}}}}})]]}}<!--
-->}}
and add an {{{l}}} parameter to each accent: template so that, for example, {{accent:US}} will read {{{l|[}}}{{{l|[w:American English{{!}}}}}US{{{l|]]}}} instead of [[w:American English|US]]. Thoughts, support, objections?​—msh210 (talk) 16:46, 1 December 2010 (UTC), modified slightly 18:03, 1 December 2010 (UTC)[reply]
How did I miss this excellent suggestion? This may provide all of you with the opportunity to hear my version of US pronunciation. DCDuring TALK 17:41, 1 December 2010 (UTC)[reply]
Okay. I've taken the first step: I've added {{{l}}}, as described above, to every accent template (more precisely, to every template that's not a redirect and that's on the list at [[special:prefixindex/template:accent]]).— This unsigned comment was added by msh210 (talkcontribs).
And now I've added to {{rfp}} code similar to the above, but categorizing only if lang= is set.​—msh210 (talk) 10:01, 7 February 2011 (UTC)[reply]

Replacing Xyzy, langscript and others

Discussion moved from WT:RFDO#Template:Xyzy

This sounds like a "Be bold" kind of situation. How about someone who knows how to make a bot just takes {{langscript}}'s pile of language-script listings, autocreates things like {{sc:en}}, {{sc:fr}}, {{sc:he}}, etc. (with the four letter script code as its content), and fill those not covered by langscript with "None". Then we can deprecate Xyzy and other templates. --Yair rand (talk) 17:15, 29 December 2010 (UTC)[reply]

Let's not be too bold given the number of pages involved. In fact I'd strongly advise caution, testing, etc. Mglovesfun (talk) 17:19, 29 December 2010 (UTC)[reply]
[e/c] Right!​—msh210 (talk) 17:25, 29 December 2010 (UTC)[reply]
There are already a ton of language templates. I don't think creating a ton more for each possible piece of information we would like to store about a language is a very good idea. I think it would be better to give each language template a t= parameter, which decides what information to return. This is probably faster too, because it saves on a separate template retrieval since it's re-using the same template instead. —CodeCat 17:21, 29 December 2010 (UTC)[reply]
That sounds good, and is what some other templates do.​—msh210 (talk) 17:25, 29 December 2010 (UTC)[reply]
Otoh, I don't think everything should go in there. For example, the script code should, but not the script name (so that in case we decide to change what we call it, it's centrally located at template:script or wherever).​—msh210 (talk) 17:32, 29 December 2010 (UTC)[reply]
So what could go in there? The only other language information that I can think of is language family, and that's only necessary in main language categories, and is handled pretty well by {{langfamily}}. --Yair rand (talk) 17:36, 29 December 2010 (UTC)[reply]
Family, yes. Link to a Wikipedia article on the language, perhaps. Status as living, if we have use for that now or in the future, yes. Maybe other things?​—msh210 (talk) 17:39, 29 December 2010 (UTC)[reply]
Type of language (mainspace, proto, conlang) would be useful too, because it's used by {{langprefix}} and that's used by a lot of other templates. —CodeCat 17:45, 29 December 2010 (UTC)[reply]
Including language family in these templates would be completely useless, since the families have no effect on any mainspace entries. Bits of information on the language would also be useless if they aren't useful for the entries themselves. Keep in mind that these templates are loaded, usually dozens or even hundreds of times, on every single page in Wiktionary. --Yair rand (talk) 12:25, 30 December 2010 (UTC)[reply]
I really doubt that the templates are actually loaded more than once. Most likely they are loaded once and re-parsed with different parameters each time. —CodeCat 12:27, 30 December 2010 (UTC)[reply]
A translation table with a dozen translations is a dozen language templates loaded. --Yair rand (talk) 12:36, 30 December 2010 (UTC)[reply]
True, but those are already loaded as it is. That wouldn't change at all in the new situation. —CodeCat 12:49, 30 December 2010 (UTC)[reply]
I don't think one extra small template load per language would make all that much difference. Trying to load this onto parser functions for every usage, plus using safesubst for everything so that it's still substable, would probably be less efficient. --Yair rand (talk) 17:29, 29 December 2010 (UTC)[reply]
But if we use {{sc:he}} (or {{he/sc}}) we'd need to use a parserfunction for it anyway (#ifexist the script template) which is not necessary if we load it into {{he}} (just use {{he|sub=sc}} and the parserfunction (#switch) use is in the template itself). Seems to me we come out even.​—msh210 (talk) 17:39, 29 December 2010 (UTC)[reply]
If we fill all the unknown script codes with "None" as I suggested, then we can avoid parserfunctions entirely. --Yair rand (talk) 17:43, 29 December 2010 (UTC)[reply]
We can't rely on its being filled in for all languages, as new language templates can be created at any time. (Unless we create an sc: (or /sc) template for every two- or three-letter string?)​—msh210 (talk) 17:58, 29 December 2010 (UTC)[reply]
I thought new language templates were only ever added by a bot? --Yair rand (talk) 18:04, 29 December 2010 (UTC)[reply]
On the other hand, if we make it mandatory that every language template have a corresponding script template, we can catch such cases easily where a human creates a language template without an associated script template. -- Prince Kassad 01:04, 30 December 2010 (UTC)[reply]

Here is a first draft: User:CodeCat/en, User:CodeCat/ru, User:CodeCat/gem. It's short and simple but it works the way it should. Thoughts? —CodeCat 17:54, 29 December 2010 (UTC)[reply]

Language templates need to be substable. --Yair rand (talk) 18:04, 29 December 2010 (UTC)[reply]
Oh... and this isn't? I don't really know how else to do it then. Do you know? —CodeCat 18:10, 29 December 2010 (UTC)[reply]
Ok I think I found a way. How is this now? —CodeCat 18:12, 29 December 2010 (UTC)[reply]
No, that way won't work non-substed, simply transcluded. The only way I know to have parser functions workable substed and non-substed is to use safesubst: surrounded by includeonly, or to use {{{|safesubst:}}}. --Yair rand (talk) 18:15, 29 December 2010 (UTC)[reply]
Allright, I fixed that, and it seems to work now. —CodeCat 18:18, 29 December 2010 (UTC)[reply]
Shouldn't this discussion be on WT:GP where people will expect to find it? Mglovesfun (talk) 01:35, 30 December 2010 (UTC)[reply]
Moved, then. :) —CodeCat 12:15, 30 December 2010 (UTC)[reply]
I quite like the idea of {{he/sc}} (et al.). The information already exists on this wiki in langscript, so a bot could create them. I bet Conrad.Irwin could do it, just he's not active right now. So basically templates would do {{{{he/sc}}|{{{1}}}}}, since the entire content of he/sc would be Hebr, this would be {{Hebr|{{{1}}}}}. Mglovesfun (talk) 12:48, 30 December 2010 (UTC)[reply]
With my proposal that would just become: {{{{he|t=sc}}|{{{1}}}}} which isn't really all that different. —CodeCat 12:53, 30 December 2010 (UTC)[reply]
Well, 'my' way (actually msh210 above) you wouldn't need to modify any of the templates themselves, it would be entirely done by subtemplates not transcluded by the language templates themselves. Mglovesfun (talk) 12:55, 30 December 2010 (UTC)[reply]
I remember my old proposal of editing Template:en to display various names: see the old revision and its code. It would be easy to add support for scripts, families and a good safesubst: to make it work, and deprecate {{langscript}}, {{langfamily}} and {{Xyzy}} in the process.
However, I may be wrong, but I believe that one template with a list of 825 scripts is parsed much more quickly than a group of 825 templates with one script in each template. --Daniel. 13:18, 30 December 2010 (UTC)[reply]
While that may be the case, you have to look at it from a practical point. How many language templates are used on average on a single page? I doubt the average is higher than about 10-20. Most pages only have one language on them. —CodeCat 13:28, 30 December 2010 (UTC)[reply]

So we've got a number of possible solutions:

  1. The current way of using Xyzy, which holds a decent-sized switch function to choose scripts for some of the more popular languages, going through a few dozen options for every case.
  2. The way used by Daniel.'s templates, using {{langscript}} itself, using a very large switch function to go through hundreds of languages to get the right script.
  3. Codecat's suggestion of changing over the existing language templates to use switch functions.
  4. The langcode/sc method, creating one /sc template per language, which would include the ISO 15924 script code for the language's script. (Another possibility would be to just redirect the templates to the script template, though I don't know whether that would be better or worse.)

Option number 3 would be used (assuming we're talking about a link, which is usually the case) pretty much as {{{{{{{lang}}}|t=sc}}|lang={{{lang}}}|[[target#{{{{{lang}}}|l=}}|target]]}} which would be loaded as this: {{(transclusion #1, of the language template){{#switch:sc|=Hebrew|sc=Hebr|possibly other options...}}|lang=he|[[target#(transclusion of the same template again){{#switch:|=Hebrew|sc=Hebr|possibly other options...}}|target]]}}. Option #4 would be used as {{{{sc/{{{lang}}}}}|lang={{{lang}}}|[[target#{{{{{lang}}}|l=}}|target]]}} which would be loaded as {{(transclusion #1)Hebr|lang=he|[[target#(transclusion #2, of a different template)Hebrew|target]]}}. For the same function, #3 uses two transclusions of the same template and two switch functions, and #4 uses two transclusions of different templates and no parser functions. Multiple transclusions of the same template probably don't have to be loaded separately, so the extra transclusion in #4 would only have any effect in the first load of that template in each page. Language templates and script templates are used in pretty much all links, and there is tons of template duplication on some pages, and virtually none on other pages. Translation tables will often have links to dozens of different languages, and language sections will often have dozens of links to entries in the same language. I would give a rough guess that we're weighing on average, one extra transclusion (option #4) against about three to five small parser functions (option #3) and a bunch of extra data, or against a bunch of massive parser function with hundreds of options (#2), or against a bunch of decent-sized parserfunctions and some inaccuracies (#1). However, the real problem is that none of us, as far as I'm aware of, have the slightest clue about how Mediawiki works these things. :) --Yair rand (talk) 13:33, 30 December 2010 (UTC)[reply]

We should really make a benchmark for this kind of thing. But I don't really know how to do that... —CodeCat 13:39, 30 December 2010 (UTC)[reply]
Certainly not me, BTW we need to consider multi-script languages. Mglovesfun (talk) 13:40, 30 December 2010 (UTC)[reply]
As Yair and more people correctly said, {{langscript}} uses a very large switch function to go through hundreds of languages to get the right script. However, I believe a group of {{sc:en}}, {{sc:fr}} and {{sc:he}} and other 822 templates would be even worse, because it would go through millions of names of pages to go to the right MySQL row. --Daniel. 13:53, 30 December 2010 (UTC)[reply]
MySQL uses indexing, which makes these requests really quick. Switch functions generate much more overhead, because these need to be processed by the CPU. -- Prince Kassad 13:57, 30 December 2010 (UTC)[reply]
If we mandate that each language template must have a fixed set of subtemplates, then it would be a good idea to have a maintenance script/bot to check each template and make sure that all subtemplates exist. Otherwise we would have to use #ifexist everywhere... —CodeCat 14:50, 30 December 2010 (UTC)[reply]
I don't think making sure that each language code has a script subtemplate would be too difficult. ISO doesn't add new codes all that often, do they? --Yair rand (talk) 14:54, 30 December 2010 (UTC)[reply]
Annually. -- Prince Kassad 14:59, 30 December 2010 (UTC)[reply]
I really dislike option #2, both because Robert Ullmann (talkcontribs) apparently felt that it would be too expensive, and also because I think it's ugly. I think of default display behavior as a "property" or "attribute" of languages, rather than something that languages get mapped to, if that makes sense. (Or maybe I've just been an OO programmer for too long.) Options #3 and #4 are both good IMHO. I tend to prefer the parenthesized variant of option #4 (whereby {{he/script}} redirects to {{Hebr}}, as in fact {{HEchar}} already does) as the cleanest approach, but option #3 does have the benefit of matching {{etyl:xx}}. —RuakhTALK 22:48, 30 December 2010 (UTC)[reply]
Just a note that if we do end up using option 4, the templates should be added to a category (like Category:Language-to-script associations). -- Prince Kassad 17:34, 31 December 2010 (UTC)[reply]
I second the cries to just "test" this (just create a page that contains the templates you want, and open the page with ?action=purge&forceprofile=true, then look for the massive comment in the page source to find out how long parsing took [it's probably the biggest number]— retry a few times to sort out all the randomness). From very basic testing, it seems that the switch statement doesn't scale very well User:Conrad.Irwin/script1 (it requires O(n) time, so the language codes starting with a z are very slow), though loading 800 simple templates is ok User:Conrad.Irwin/script2; Xyzy 800 times is halfway between the two User:Conrad.Irwin/script3. [This of course is an "invalid" test — all I'm testing is the wikitext on the page, it would be more useful to test re-implementations of typical Wiktionary pages; particularly those with which we're having problems].
I suspect that options 3 and 4 will be similarly performant — though the more information gets put into the templates in 3, the slower it will get. Option 2 seems an odd choice, it may provide a big O(1) speedup, but at the cost of a small O(n) slow down — so longer pages are made slower even though shorter pages are faster; still, I trust there's a reason it's like that (Ullmann is better at this game than I am). If we were to use Option 4, it would be better to manually copy the contents of the script template to each language instead of using a redirect (that way you're loading only two templates per language instead of three). Actually I suppose all the options would be greatly improved if we didn't need to load the script template at all — are there any differences that can't be implemented in CSS once you know the {{{lang}}} {{{sc}}} and {{{face}}}?. Conrad.Irwin 17:22, 1 January 2011 (UTC)[reply]
I already suspected that large switches would be slower. However, I imagine that the underlying code just looks at each case until it finds a match. So I suspect that if the first case is always taken, it is faster than if it has to fall back to the default case. And if this is so, then the switch-per-template approach will probably be very fast if we order the cases in order of usage frequency. This is what I already did in my little test above. —CodeCat 18:24, 1 January 2011 (UTC)[reply]
If the problems are only arising on pages like water, that won't help, as you need all branches of the #switch: (and some more...). Where are the current problems with these templates best exhibited? Conrad.Irwin 18:34, 1 January 2011 (UTC)[reply]
No, all the script stuff is in the CSS. But we would need to use a bot to add the script information to every single foreign script entry, therefore it's easier to create a bunch of templates. -- Prince Kassad 19:09, 3 January 2011 (UTC)[reply]
Technically I think CSS3 supports bidi stuff, and obviously JS does, but personally I'd greatly prefer that that remain in the HTML proper. But aside from that, everything can be done in CSS. Anyone have any thoughts how to handle bidi stuff without loading script templates? (One option, that I don't really like, is just to say that sc= is mandatory when explicit bidi marking is required. This should be relatively rare; basically it would only come up when either (1) a term in a right-to-left script either starts or ends with a directionally neutral character or (2) two terms in right-to-left scripts are listed next to each other with only directionally neutral characters in between. In the former case, a browser would mistakenly treat the neutral character as left-to-right, and in the latter case, it would mistakenly treat the characters as right-to-left.) —RuakhTALK 19:21, 3 January 2011 (UTC)[reply]
It just came to my mind you could use variable selectors in the CSS, like [lang=ru],[lang=uk] etc., which would forever obsolete script templates. It only breaks IE6. Is this worth looking into? -- Prince Kassad 19:25, 3 January 2011 (UTC)[reply]
Well, on second thought, you could just change the templates like {{infl}} to add a class with the language code (like class="lang-ru") which could then be formatted in CSS. That would be an IE6 safe method. -- Prince Kassad 19:19, 4 January 2011 (UTC)[reply]
Indeed; or class="lang-ru-term". —RuakhTALK 15:19, 5 January 2011 (UTC)[reply]

I implemented the CSS method for now, at least until we can decide on another method. Notice its effects on water. Before:

  • Preprocessor node count: 125094/1000000
  • Post-expand include size: 359213/2048000 bytes
  • Template argument size: 68657/2048000 bytes
  • Expensive parser function count: 54/500

After:

  • Preprocessor node count: 110350/1000000
  • Post-expand include size: 342076/2048000 bytes
  • Template argument size: 68516/2048000 bytes
  • Expensive parser function count: 54/500

Note how especially the first two went down a lot. And Xyzy is not even used very much on water, because most translations are implemented using {{t-simple}} which circumvents it. -- Prince Kassad 21:39, 8 January 2011 (UTC)[reply]

  • ...Hm, so what happened with this? The change to Xyzy was reverted by Ruakh with the edit summary "that solution is not ready yet. for one thing, CSS changes take >30 days to reach all users" over a month ago. Is there any way to make it ready? --Yair rand (talk) 15:04, 21 February 2011 (UTC)[reply]

Okay, looks like the CSS solution seems to cause unexplained problems. Therefore, I retract my previous decision and now vote to delete the template outright, replacing it with {{Latn}}. Yes, defaulting to Latin script is evil™, but as it seems, there's no other solution ready yet, so we're stuck with this. -- Prince Kassad 22:20, 8 March 2011 (UTC)[reply]

User:MalafayaBot for operation in article namespace

Hi, all.

I'm requesting authorization to run my interwiki bot in the article namespace. It runs in -auto mode which means no existing redirects will be removed.

Please, read and cast your vote at Wiktionary:Votes/bt-2010-12/User:MalafayaBot for operation in Article namespace. Thanks, Malafaya 14:44, 30 December 2010 (UTC)[reply]

You should announce the vote in the BP: AFAICT that's SOP. Can you explain the edit to [[Tajik]]?​—msh210 (talk) 20:23, 30 December 2010 (UTC)[reply]
I also announced at BP. It's been over 3 years since that edit but I can think of 2 possible explanations:
  • I was supposed to run the bot for Wikipedia and inadvertedly started it for Wiktionary, and a wrong interwiki in some other Wiktionary was found;
  • I was supposed to run for Categories and ran it against articles, and a wrong interwiki in some other Wiktionary was found;
As it was the first edit, I suppose I was still doing some tests to apply for bot flag, which was then declined (I had asked for article interlinking, exclusively done by Interwicket at the time). Anyway, I immediately noticed the mistake and corrected it. Malafaya 18:15, 31 December 2010 (UTC)[reply]
I should add that if new edits are required, I can easily provide them. Malafaya 18:25, 31 December 2010 (UTC)[reply]
You say "and a wrong interwiki in some other Wiktionary was found". Do you mean — or, anyway, is it true — that if another Wikt has ain interwiki link to us ([[foo]]) from a different pagename ([[he:bar]]), then your bot will link our page ([[foo]]) to theirs ([[he:bar]]) also?​—msh210 (talk) 16:01, 3 January 2011 (UTC)[reply]

This used (unless I'm mistaken) to link to {{{1}}} and display {{{2}}}. Now it links to and displays {{{2}}}, breaking, e.g., [[nil sub sole novum]] (and doubtless many other pages).​—msh210 (talk) 18:08, 30 December 2010 (UTC)[reply]

That was a consequence of this edit by Daniel. to {{makelink}} on 23 October 2010. The edit had no edit-summary, but the goal was apparently to replace a call to {{wlink2}} with equivalent code, for reason or reasons unknown. (I have some guesses, but I don't care enough to speculate.) It looks like copy-and-paste went awry, with {{#if:{{{3|}}}|{{{3}}}|{{{2}}}}} appearing in two places where {{{2}}} must have been meant. I've fixed it now.
I seem to recall Daniel. (talkcontribs) agreeing at some point to stop engaging in this sort of behavior. Does anyone remember the details of that? Specifically, was it before the above-linked edit, or after?
RuakhTALK 20:52, 30 December 2010 (UTC)[reply]

three genitive forms

The German word Aar has three different genitive forms: Aares, Aars, Aaren. Template:de-noun only allows two. Can the template be expanded, or should I use a different template? - -sche 23:48, 30 December 2010 (UTC)[reply]

I don't think we want to add a gen3 parameter just for one word. You could use {{infl}}. Mglovesfun (talk) 17:32, 1 January 2011 (UTC)[reply]
Why not? We've created a pl3 parameter for the sole reason that Junge uses it. -- Prince Kassad 19:25, 1 January 2011 (UTC)[reply]
Abakus also uses three plural forms. How would I use template:infl? - -sche 21:39, 1 January 2011 (UTC)[reply]
Park and its compounds (Tierpark) also have three plural forms, as does Sozius (see de:Sozius). - -sche 19:42, 10 January 2011 (UTC)[reply]

Atlas also uses three genitive forms. I have tried to add a third parameter to de-noun, but it does not work. - -sche 19:45, 10 January 2011 (UTC)[reply]

Why? Where's the problem? -- Prince Kassad 21:52, 10 January 2011 (UTC)[reply]
It works now. Possibly I just had to update my cache, or there was a delay in the software calling up the new version of the template. I can however not tell if my last edit, which added the two }}, was necessary. - -sche 22:31, 10 January 2011 (UTC)[reply]

Proposal to remove prefixes from language templates

Right now we have three different kinds of language templates: Those with no prefix like {{en}}, those for reconstructed languages like {{proto:ine}} and those for conlangs like {{conl:tlh}}. The problems this creates are kind of obvious when you look at all the extra templates we've now needed to create: {{langprefix}}, {{languagex}}, {{langnamex}}, {{lx}}, {{termx}} and so on. And still, a lot of important templates still don't work for languages with codes that have prefixes. So I'm hereby proposing to do away with this and just use the code directly. {{proto:ine}} becomes {{ine}}, {{conl:tlh}} becomes {{tlh}} and so on. Note that this does not mean that we should now allow these language in the main namespace. This is only about the names of the language templates, not more. —CodeCat 17:38, 31 December 2010 (UTC)[reply]

But the problem is that AutoFormat will consider them eligible for the mainspace if they exist without a prefix. This is why we had to choose this option to begin with. -- Prince Kassad 17:42, 31 December 2010 (UTC)[reply]
Then can't AutoFormat be fixed so that it doesn't? It seems to me that by trying to solve a small problem we've created a much bigger one. —CodeCat 17:42, 31 December 2010 (UTC)[reply]
A bit of technical background: AutoFormat checks the page User:AutoFormat/Languages to know which languages it accepts. This page is regularly refreshed by UllmannBot (well at the moment, this procedure is frozen...) which checks all templates with two or three lowercase letters. It then sorts out those that are not language templates (like rfv, rfd, ...), and adds all the others to this list.
Any changes to this would require either maintaining the list manually (which is very tiresome), or changing the UllmannBot code, which I have no access to since it's not public. -- Prince Kassad 17:48, 31 December 2010 (UTC)[reply]
Given the proposal above with respect to language templates, I think it would be easy to add language type as one of the pieces of information associated with languages. So, for example, {{ine|t=type}} or {{ine/type}} could return proto. UllmanBot could then use that information when it makes its list. —CodeCat 17:53, 31 December 2010 (UTC)[reply]
Or instead to just add a comment to the template <!-- (constructed) -->. Bots are much better at reading comments than expanding out templates. Conrad.Irwin 17:31, 1 January 2011 (UTC)[reply]
We may have to assume that UllmannBot won't be updating that page, and its code won't be released, ever. If that's true, then another solution is needed, anyway. Conrad's and CodeCat's, just above, sound good.​—msh210 (talk) 15:56, 3 January 2011 (UTC)[reply]

Proto-Eskimo template

Is it possible for some help creating a Proto-Eskimo template so that I can add Proto-Eskimo derived words from Eskimo-Aleut languages? Many thanks- Jakeybean 19:25, 31 December 2010 (UTC)[reply]

What's wrong with {{proto|Eskimo|foo|lang=kl}}? Mglovesfun (talk) 17:31, 1 January 2011 (UTC)[reply]
Sorry, what I mean is it's red-linked therefore it doesn't exist under the list of languages here, or do we only list those relevant to English, i.e. Proto-Indo-European and Proto-Germanic? Slight confusion, sorry. Jakeybean 17:58, 1 January 2011 (UTC)[reply]
Do you mean Proto-Eskimo, the way we have Proto-Germanic? —Stephen (Talk) 18:06, 1 January 2011 (UTC)[reply]
Yeah, and the category also. Thanks Stephen! I just wasn't sure what the consensus was with Proto languages in general so was reluctant to create anything. Jakeybean 18:17, 1 January 2011 (UTC)[reply]
Proto-Eskimo entries would be stored in the Appendix namespace, like *inuk. Compare *ahwō. —Stephen (Talk) 18:46, 1 January 2011 (UTC)[reply]
Cool, thanks. Just FYI there are two precedents to Proto-Eskimo; Proto-Inupik and Proto-Yupik. Proto-Inupik can also be called Proto-Inuit but I've used the term Proto-Inupik as it gives higher Google hits. Jakeybean 18:56, 1 January 2011 (UTC)[reply]