Jump to content

Wiktionary:Grease pit/2008/January

From Wiktionary, the free dictionary
Grease pit archives edit
2025

2024
Earlier years

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007
2006

Bot request

[edit]

Is there a bot that can can batch-add IPA links to many pages, doing as I have done in this edit. i.e. templating all pages with [[w:IPA|IPA]] with {{IPA|...}} ? Thanks in advance --Keene 01:37, 1 January 2008 (UTC)[reply]

That means more then a simple text replacement--you have to take the following text correctly and embed it inside the template you're adding. Also, I've found that the formatting isn't consistent in those cases, so it could be tricky to do this with a bot. And this particular case includes other mistakes that need to be corrected, such as the fact that there is no primary stress marker at all. It might be better to handle these manually, from a bot-generated list, or at least a list generated from the last xml dump. One thing a bot could do easily is to replace all uses of {{AHD}} with {{enPR}}, since the former is currently a redirect. --EncycloPetey 16:58, 1 January 2008 (UTC)[reply]
Cool, let's get a list then. It can't be that hard can it? Connel makes random-looking lists all the time. --Keene 18:19, 1 January 2008 (UTC)[reply]
Our native search functionality is notoriously bad, but it looks like there are only a handful of these left in mainspace. Somebody with a convenient copy of the latest DB dump could confirm that... -- Visviva 07:35, 2 January 2008 (UTC)[reply]
Enjoy! User:Keene/IPA. --Connel MacKenzie 07:52, 2 January 2008 (UTC)[reply]
I added AHD -> enPR to AF's regex table, so those will eventually get crunched through. (They also get picked up on prescreen, so AF doesn't just find them at random, it won't take forever.) Robert Ullmann 16:23, 5 January 2008 (UTC)[reply]

Aiming for Google keyword define

[edit]

One thing I have noticed about wiktionary is that it never, ever comes up when I search for "define word" in Google. (It also does not come up in the list if I try "define: word" but neither do other online dictionaries, so we will have to wait for Google on that). If I search for "define word wiktionary" I will often, if not always, get Wikipedia before Wiktionary - which is partly because Wikipedia's Pagerank™ is huge, but also partly, I believe, because "Wiktionary" does not use the word 'define' or 'definition' on its pages. For example, there are some Google searches that can (or might) be done. As you can see only the "term Wiktionary" search returns anything useful.

define+pseudodefine+pseudo+wiktionarypseudo+wiktionarypseudo+definitionpseudo+definition+wiktionary

define+alphabetdefine+alphabet+wiktionaryalphabet+wiktionaryalphabet+definitionalphabet+definition+wiktionary

To this end, I propose we add the words "define" and "definition" somewhere in the page source of every entry. There are a few places we can do this, one of the most effective (in use by most online dictionaries) is to have "definition" in the Mediawiki:Pagetitle. so "term - Wiktionary" could be appended or amended to "term - Wiktionary, a Definition" or "term, a Wiktionary definition" (as having Wiktionary in the title is useful for people). The other places in the MediaWiki page are the MediaWiki:Tagline (From Wikipedia, the free encyclopedia) which could be used to say "A free definition from Wiktionary" or something.

"Define" is much harder to include nicely, however it is more important that we do so (it is certainly what I would search for if I wanted a definition). To this end I suggest that we add something on the lines of "Wiktionary aims to define every word. You can help." To the bottom of every page (If that is possible) or, if someone has better suggestions, something less cheesy somewhere else. "Wiktionarians define {{PAGENAME}} as follows:" would also be a good fragment to include. Any thoughts? Conrad.Irwin 15:38, 1 January 2008 (UTC)[reply]

It might also be good to include "dictionary", "free dictionary", "online dictionary", "free online dictionary" somewhere on definition pages. Conrad.Irwin 15:49, 1 January 2008 (UTC)[reply]
I heartily applaud all these ideas. A little basic search-term work like this should pay big dividends. I wouldn't worry too much about how cheesy it appears to our refined non-commercial sensibilities. It is low-format text that will not be noticed by most humans. Not that our best efforts aren't warranted. "define", "dictionary", "word", "definition" seem by far the most natural in English. Would other words like "usage" "spell", "spelling", "translation", "example", "pronunciation", "hyphenation", "glossary", "idiom", "phrase" help? For example, could the citations namepace pages be made searchable by google and include the words: "usage", "example", "definition", "word", "phrase", "idiom"?
Would a look at the HTML for competitors' pages possibly yield insight into what their paid help has found works?
Doing this successfully will yield more new potential contributors. This in turn will further tax our ability to handle first contributors. I would suggest that we piece out our efforts, adding one search word at a time, waiting at least a week before adding another element. Possibly starting with the one least likely to overwhelm us.
Very interesting. DCDuring 16:17, 1 January 2008 (UTC)[reply]
I think it might be simpler to explain the problem to my friend who works at Google and see if he can address the problem. --EncycloPetey 16:53, 1 January 2008 (UTC)[reply]
More applause!!! DCDuring 17:38, 1 January 2008 (UTC)[reply]
I'm fairly sure MediaWiki has a message for the meta keywords. Currently, they are "keyword,Wikipedia,code,command,function,identify,information,key,keywords,link,parola chiave" So it's not exactly surprising that we get this. Circeus 20:22, 1 January 2008 (UTC)[reply]
I took a look at the source code and there is no system message for this. The keywords consist of a normalized version of the page title and the first 10 links on the page (see addKeywords in OutputPage.php). There is an extension called mw:Extension:MetaKeywordsTag that allowed an arbitrary list of meta keywords to be added to a page, but this would have to be a fixed list for the whole wiki because of the interaction of parser functions and extensions (i.e. the content of the extension attribute can't be conditional and trying to use multiple calls to the tag in different branches of a conditional would result in all of the branches being called anyways). Mike Dillon 20:31, 1 January 2008 (UTC)[reply]
I am not sure that that is what we want, it seems to be for adding keywords on a page by page basis, where as we really want to just thrown "dictionary defintion" out every time. Shall we try adding the word definition to our pages for now, as that is the term the other dictionaries concentrate on, it is likely that we won't reach terribly high so may be a gentle introduction into any increased vandalism. (Though 'pedia was waiting for a vandalism run for ages, it never got out of control - well you know what I mean). I would like to edit the Mediawiki:Pagetitle to read "$1, a Wiktionary definition" or a better suggestion. Would that meet with approval? Conrad.Irwin 01:59, 2 January 2008 (UTC)[reply]
I hope there are enough folks standing by and that it can be reversed quickly by someone who is, just in case. Are there any quantitative measures of what would be affected by the change, I guess, hits from Google or raw hits per hour or ?. DCDuring 02:39, 2 January 2008 (UTC)[reply]
I believe the extension could be called from Mediawiki:Pagetitle and have the side effect of adding keywords. However, as I said, this has to be an all-or-nothing change (nothing conditional or dynamic). I should have made it clear that I was referring to calling the extension from a shared message like Mediawiki:Pagetitle or Mediawiki:Tagline. Mike Dillon 02:44, 2 January 2008 (UTC)[reply]
Messing with MediaWiki:Pagetitle sounds like the best approach. It can always be rolled back. --Connel MacKenzie 07:21, 2 January 2008 (UTC)[reply]
The suggested wording seems OK to me. Maybe just "Wiktionary definition of $1"; the article seems unnecessary. -- Visviva 07:31, 2 January 2008 (UTC)[reply]
The reason I prefer having the words "{PAGENAME}" then "Wiktionary" at the start of the title, is that it makes it obvious which tab holds what in browsers. Any change we make won't affect our search results instantly, we would have to wait until Google re-indexes. Then, were we to revert the change, we would have to wait again. There are usage statistics somewhere, but the link escapes me for the moment - I think that they only hits/second, but presumably the referer headers are stored somewhere. Yes, adding the <keywords> into the Mediawiki:Pagetitle is a good idea, as we want the same keywords ("define dictionary definition") on most pages, I wonder if it can be added to MediaWiki:Nstab-main to ensure that it only goes onto the main namespace. Should we ask Brion if he can install it for us? Conrad.Irwin 13:13, 2 January 2008 (UTC)[reply]
Please don't mess with the page title. What we want is to have the meta keywords to start with "(pagename), dictionary, definition, define, ..." The extension doesn't seem to be helpful at all. Should figure out how to do it right. Robert Ullmann 13:30, 2 January 2008 (UTC)[reply]
IMHO, Mike Dillon is on the right track: there is no MW message for keywords, but there should be. And it should only be used for NS:0 pages. (whatever we do must be NS:0 only, so pagetitle is right out. Nstab-main is the link/anchor text, really crappy to try to wedge something in there.) Robert Ullmann 13:48, 2 January 2008 (UTC)[reply]
Does anyone know how often and when does Google typically makes available a new index these days ? I would have thought they would need to be doing it incrementally and continuously. DCDuring 15:43, 2 January 2008 (UTC)[reply]
Yes, of course they update their entire index incrementally; the wikt gets respidered (is that a word? ;-) about once a month apparently. Robert Ullmann 16:00, 2 January 2008 (UTC)[reply]
Spider in this sense is already an entry. Respider is attestable and merits an entry, especially since it might appear to have something to do with breathing. The infrequent respidering makes it all the more important that we have a well-designed experiment. The spidering is done by consent so it might be knowable when it happens. I would then assume that there would be little time lag between respidering and "release" of new material collected into the index. It might be useful to have one low-risk constructive change implemented promptly so we don't miss an opportunity to experiment, pending more detailed and specific knowledge about Google's timing. DCDuring 16:18, 2 January 2008 (UTC)[reply]

(cleared indent) I doubt Google knows when it will spider sites, though Wiktionary seems to get spidered frequently - try "keyword define" site:en.wiktionary.org and look in the cache to see what the state of this conversation was when they last indexed. I also doubt that they spider the whole of Wiktionary in one go, it is probable that they are progressively indexing slowly. It is almost impossible to find out what Google is doing unless you have access to the request logs, or have signed up to Google Analytics - which isn't an option. I have tried to write an extension that based on a simple syntax at Mediawiki:Metakeywords allowing for adding custom keywords based on namespace to the start of the list (after the pagename). It would be useful if we had wiktionarydev back so that it can be tested a bit more rigorously than I can do at home - but I think that this would be a better solution than trying to use <keywords>. See User:Conrad.Irwin/MetaKeywords.php for more information. Conrad.Irwin 20:57, 2 January 2008 (UTC)[reply]

I ran some simple test searches on entries I've created lately and found that Google has already indexed "asstunnel" (not my proudest entry), but not "backbites", created about 3 days ago. Which I take as an indication that their system "looks for" sites that include relatively high-demand words. It would seem that we would need to make sure we had our neologisms and protologisms well handled so that we got "satisfied" users "clicking through" to register with Google. That would also mean that not all of our content should appear on the Google page, but some teaser should, something like "example:" or "quotation:". DCDuring 23:14, 2 January 2008 (UTC)[reply]
My understanding is that the system keeps track of the expected update frequency of known pages based on the history of changes it has seen. It finds new pages based on links from known pages and starts spidering them at a pretty aggressive rate; if it sees that they aren't being updated, it decreases frequency. So, if a new term is linked from a high-traffic page, it is more likely to be added to the index quickly. On top of this, the robots.txt file excludes all pages starting with /w/ as well as Special:Search, Special:Random, and a bunch of pages that are project specific (the robots.txt is shared across all WMF sites). In addition to this, all Special pages are excluded with META tags (they have "noindex" and "nofollow"). Mike Dillon 03:08, 3 January 2008 (UTC)[reply]
Nice extension. It looks like it would do the trick nicely. The one thing I'm not sure about is whether or not it should use wfMsgForContentNoTrans. I don't really see a problem with the keywords or description being translated if we decide to provide them. It won't affect spiders anyways unless they're crawling a page with uselang (in which case it will be indexed separately from the page without uselang). Also, it may be worthwhile to try to get the content of {{PAGENAME}} into the META description tag with a $1 instead of forcing it to be generic. The other thing I'm not sure about is having the namespace-specific description appended to the default one; it seems like it might be better to let it override the default. Mike Dillon 02:55, 3 January 2008 (UTC)[reply]
I was copying Extension:Gadgets quite closely, having never coded for MediaWiki before, and thats what they use. I agree that we might want to translate them, but I am not sure how Google would set its interface language preferences while spidering - so it would probably be better, certainly no worse, to use one of the other message getting functions. I will go and have another play with the extension, your ideas are good! (By the way, feel free to hack around with it too). Conrad.Irwin 10:35, 3 January 2008 (UTC)[reply]
Google doesn't set interface language preferences when spidering, at least not in a way that Mediawiki would respect. The only ways to set interface languages preferences that will actually changes the language are the uselang parameter and setting the preferred language for a logged-in user. Mediawiki doesn't do content negotiation based on Accept-Language or anything like that. Mike Dillon 16:05, 3 January 2008 (UTC)[reply]
I was just thinking some more about this and it's probably best to continue using wfMsgForContentNoTrans for now. If translation were allowed, then the use of memcached would need to be altered to avoid unpredictable behavior with the content (although it could probably be addressed simply by adding the interface language to the cache key). Since nobody really has any intention to translate these right now, it's not worth figuring out how to make it work for that case. Mike Dillon 16:56, 3 January 2008 (UTC)[reply]
That was the conclusion I came to too, I have updated it to use only the site description or the page description, which allows use of the $1 pagename variable. Conrad.Irwin 21:30, 3 January 2008 (UTC)[reply]
I've updated the code to use MediaWiki's built-in handling of "$1" instead of handling it ourselves. Mike Dillon 03:38, 4 January 2008 (UTC)[reply]
Thanks, for this and the Javascript pointers. Conrad.Irwin 12:18, 4 January 2008 (UTC)[reply]
Too bad the "$1" thing broke caching; nice catch there. I guess I need to revamp my MediaWiki install so that I can actually test these things (although I never had memcached running anyways, so I could have missed it). Mike Dillon 16:08, 4 January 2008 (UTC)[reply]
No worries, it wasn't obvious that it read the page into the cache and used the same version again and again. All the editing has to be done on output, not input. Now, I wonder how long it will take before this is well tested enough to go into Wiktionary.... 15:21, 5 January 2008 (UTC)

Just thought I'd point out that Urban Dictionary uses as its page header: "Urban Dictionary is a slang dictionary with your definitions. Define your world." bd2412 T 04:18, 4 January 2008 (UTC)[reply]

I would like to see a change of the heading, in addition to the addition of meta tags. Mainly because I find the - multiple - dashes - annoying. Conrad.Irwin 12:18, 4 January 2008 (UTC)[reply]
Seeing as the "free definition" idea was reverted from the page title, what are we going to do about this. Conrad.Irwin 13:57, 13 January 2008 (UTC)[reply]

New feature

[edit]

I noticed today that redlink pages can now be protected. So perhaps even WT:PT can go away, if someone plows through that list, right? Or do we still want a central location where they are all listed (i.e. by date added.) --Connel MacKenzie 07:23, 2 January 2008 (UTC)[reply]

A central location allows us to see where the problems have occurred. Unless there is a simple way to generate such a list from the protection logs, the central location makes more sense to me. --EncycloPetey 17:16, 2 January 2008 (UTC)[reply]
The new feature comes with its own specials page Special:Protectedtitles.
Also I think we should be much better about using the expiration dates on protections, especially in NS:0 where no page should protected indef/inf. (There are a few that come to mind that might have to be renewed every year or so...) A reasonable policy might be that no NS:0 page protection should be more than one year?
Along that line: if we use title protection and routinely set expiration to 1 year, we can see easily when an entry was presumed to have been added, and the cruft will go away eventually as well. (A lot of the existing list is entirely obsolete.) Robert Ullmann 11:42, 3 January 2008 (UTC)[reply]
Agreed on all counts. (Or mostly, anyway. I really don't mind if all the …/w/index.php pages get permaprotected. :-) —RuakhTALK 04:12, 4 January 2008 (UTC)[reply]
I also agree, Is there a way to protect pages by regexp, so we could protect all of those pages? Conrad.Irwin 15:54, 5 January 2008 (UTC)[reply]
Two things. One is that there is a bug in this function where it doesn't distinguish between lc and UC page titles; trying to protect "Roflcopter" and "roflcopter" doesn't work properly. I suspect it is just the key attribute in the SQL table from the effects I see. Need to write up a bug. Also, there is some bug/enhancement request for regex pagename screening; I don't remember the number. I think it has more to to do with defining restrictions on valid pagenames rather than protection per se.
Um, actually three things: if you leave an "/index.php" page alone, it will get deleted and added to the (present) protected pages list. It looks like I'm doing it, but not exactly ;-) Robert Ullmann 16:12, 5 January 2008 (UTC)[reply]

Hi everyone, this is long overdue, can I bring up some discussion about things that need changing as we migrate, hopefully reasonably soon into Common.js for javascript. Many of the things in Monobook.js are fine, but there are a few changes that need to be made. Here is a checklist of what I would like to change, and the possible effects. Feel free to add both to the list, and to the problems.

  1. Done Remove the ta[] array, it is classed as redundant in wikibits.js.
    • The access key for Logging out "o" would no-longer work, all the others are hardwired into the HTML.
      We could just keep one line of ta[], until they remove support for it; or work out how to add the access key properly. (Wikipedia:Keyboard_shortcuts gives the complete list)
    • Some of the hover titles on the interface links would change, though not greatly, so not really a problem.
  2. Done Remove the Featured article link code, unless it is actually used for something.
  3. Done Only conditionally include User:Connel MacKenzie/custom.js if it is needed, we don't need to give it to every anon who comes along for one definition.
    • Might fractionally delay its loading when a user who does use it is not loading from cache.
  4. Done Move the search provider code to a seperate file that is only included on the search provider page - wherever it has got to.
    • Adds another conditional to every page load, better than sending 50 lines of code to every anon.
  5. Done Add an importScript() function, to allow people to include scripts nicely preventing accidental duplication. This would then be used in the future reconstruction of WT:PREFS.
  6. Add support for leaving short NavFrames open (See above) and generally tidy up that section of code.
    • Could leave to long trans-tables being left forced open by accident, though no more of a problem that pages that don't have {trans-top} at all.
  7. Update doRedirect as above to allow for a link back to the auto-redirected article in some cases. This is still a bad hack around the bug mentioned above.
  8. Technical things to think about in the future include
    • WT:PREFS overhaul
    • Using edittools.js - or something along those lines
    • Removing the addLoadEvent wrapper
    • Fix the bugs that cause the doRedirect to be nescessary
    • A tab to the [[Citations:]] namespace

I think the best course of action is to chug slowly through the list, adding code to Common.js as each section is fixed and the n removing the old code from Monobook.js. If we ensure that we add the functions to Common.js before we remove them from Monobook.js then the old versions will override the new until they are removed, so we need not worry about synchronisation too much. Any thoughts? Conrad.Irwin 00:29, 3 January 2008 (UTC)[reply]

These all sound like good ideas. I've got a version of importScript as well. The main useful difference it has is the ability to use oldid to pull in a particular revision of a script. I've used this to avoid being surprised when Lupin changes the navigation popups on Wikipedia :). Mike Dillon 03:19, 3 January 2008 (UTC)[reply]
I have added oldid support to mine, that is a good idea considering that we include popups, and other things, generically. I prefer the syntax on mine though, slightly easier for the less bracketly inclined. Conrad.Irwin 23:12, 3 January 2008 (UTC)[reply]
The reason I chose that syntax was to allow the call to be self-documenting. It can get confusing when you have to pass nulls to skip positional arguments. Mike Dillon 03:37, 4 January 2008 (UTC)[reply]
Unless I'm reading something wrong, it looks like the ternary expression for your oldid support has its conditions reversed. Mike Dillon 04:11, 4 January 2008 (UTC)[reply]
In general, I rabidly support refactoring of all the WT:PREFs stuff. Everything you've outlined here seems reasonable. (And making Alt-O go away is probably a good thing, anyhow. I've only ever used that accidentally, to date.) --Connel MacKenzie 21:24, 3 January 2008 (UTC)[reply]

This part:

if( ( wgPageName=='Wiktionary:Main_Page'
    ||wgPageName=='Wiktionary:Main_page' )
  && wgAction != "diff"
  ){ //FIXME: To use the CSS DOM if possible, the HTML DOM if not
    document.write('<style type="text/css">#lastmod, #siteSub, #contentSub, h1.firstHeading { display: none !important; }</style>')
}

Could almost be done with pure CSS in MediaWiki:Monobook.css:

.page-Wiktionary_Main_Page #lastmod,
.page-Wiktionary_Main_Page #siteSub,
.page-Wiktionary_Main_Page #contentSub,
.page-Wiktionary_Main_Page h1.firstHeading,
.page-Wiktionary_Main_page #lastmod,
.page-Wiktionary_Main_page #siteSub,
.page-Wiktionary_Main_page #contentSub,
.page-Wiktionary_Main_page h1.firstHeading {
    display: none !important;
}

I'm generally not a fan of changing CSS rules for visible elements during page load, but unfortunately we can't distinguish the diff from the non-diff case with CSS. But if bugzilla:4438 were fixed... Mike Dillon 16:06, 4 January 2008 (UTC)[reply]

Done, thanks again. Conrad.Irwin 15:18, 5 January 2008 (UTC)[reply]

I've got another suggestion here: for the sourcelinks() function, do the checks on wgNamespaceNumber, wgPageName, and wgAction outside of the function. If they fail, don't define the function or call addOnloadHook(). Mike Dillon 06:33, 5 January 2008 (UTC)[reply]

The issue of conditionals in or out of the function seems to be a bit of swings and roundabouts, the slight speed makes almost as little difference as the confusion of having this function formatted differently to the others, which rely of DOM nodes and so have to be run on load. I prefer keeping things as they are as it means that the function is self-contained. Conrad.Irwin 15:18, 5 January 2008 (UTC)[reply]
To be honest, I'm not a fan of how most of these things are formatted. I don't like huge nested conditionals with no "else"; I prefer things like the check for "bodyContent" in that function to be done as early returns from the function to avoid unnecessary indent-itis. As for conditionals around adding page-specific onload hooks, I think pretty much any function that is only loaded for a small subset of pages should be loaded from a separate script. Just my two cents. Mike Dillon 16:07, 5 January 2008 (UTC)[reply]
I have moved it to a seperate page, which was how I thought it was going to be originally, if you have more comments about the site js, please let us know. Conrad.Irwin 00:17, 6 January 2008 (UTC)[reply]

Citations ns

[edit]

Is there a way to get the page title (the "firstHeading" at the top of the page above all the user-edited text) in the Citations ns to be a link to the main ns?—msh210 17:09, 3 January 2008 (UTC)[reply]

It could be done with Javascript; I don't think there is any other way to do it. Mike Dillon 17:25, 3 January 2008 (UTC)[reply]
I am assuming you want the title to read Citations of [[Word]], this could be added to Common.js reasonably easily. Conrad.Irwin 12:15, 4 January 2008 (UTC)[reply]
Oh, that's clever - treat Citations: namespace just like the Talk: namespace, with ALT-C returning to the main entry in NS:0? I like it. Yes, JS is the way to go. --Connel MacKenzie 05:27, 6 January 2008 (UTC)[reply]

We've had a request on Template talk:inflection of to allow this template to accept lang= in order to link to the appropriate language section. I also think this would be a very good idea. --EncycloPetey 00:12, 4 January 2008 (UTC)[reply]

Done. :-) —RuakhTALK 00:31, 4 January 2008 (UTC)[reply]

hostler template malfunction

[edit]

The RP pronunciation line for hostler is not displaying correctly. I don't know whether this results for something that affected {{a}} or {{RP}}, but neither template has changed recently. It must be either a change to another template or bit of code on which one of these templates depends. --EncycloPetey 18:08, 4 January 2008 (UTC)[reply]

Hmmm... seems to have just been a weird transient error. --EncycloPetey 18:09, 4 January 2008 (UTC)[reply]
Well, sort of transient. I'm seeing similar errors when I've edited other pages. The formatting looks odd because bits of coding are being displayed in the page just after editing, usually around the first L3 section header or in the first section after it. This is usually the Pronunciation section, but just now it happened editing a WOTD template. In each instance, refereshing the page makes the problem go away, but editing another page sometimes makes the problem reappear. --EncycloPetey 05:23, 5 January 2008 (UTC)[reply]
It's a side effect of some coding changes Conrad and Hippietrail have been doing; they have been making changes to monobook.js etc; this was discussed on IRC; think the result was that is was understood and fixed. But at least known. Robert Ullmann 16:00, 5 January 2008 (UTC)[reply]
It hasn't been fixed as of five minutes ago. I sent some screenshoots to Commons.... but it turns out they're not "free content" and were just deleted. So, I can't show you the problems. --EncycloPetey 16:16, 5 January 2008 (UTC)[reply]

Most used tempates thing fixed.

[edit]

Just a note: User:Connel MacKenzie/mstuse has now been fixed to exclude pages that have improper <restriction> indicators on Special:Export and http://download.wikimedia.org/enwiktionary/latest/enwiktionary-latest-pages-meta-current.xml.bz2, by taking into account the additional restrictions of http://download.wikimedia.org/enwiktionary/latest/enwiktionary-latest-page_restrictions.sql.gz. So now, at long last, it works right. Note that if not indef-protected I still report it as not being protected (since it is not.) Temporary lowering of protection can still happen as needed. I did the first three; it would be nice if a sysop went through the rest (>5,000 = sysop only, >1,000 = autoconfirmed, IIRC.) --Connel MacKenzie 05:24, 6 January 2008 (UTC)[reply]

Is there any particular reason why the template {{uncountable}} does not add the noun to the category list? - Or is it an error that could be fixed? - Algrif 10:34, 6 January 2008 (UTC)[reply]

I've added topcat=Uncountable I hope that's OK. - Algrif 12:23, 6 January 2008 (UTC)[reply]
Shouldn't this category (and Category:Uncountable) be included directly by {{en-noun}}? -- Visviva 12:35, 6 January 2008 (UTC)[reply]
That's what I thought, too. Meanwhile I've fixed what I discovered how by using topcat= in the template. - Algrif 12:40, 6 January 2008 (UTC)[reply]
No, Category:Uncountable should be nominated for deletion. It is not descriptive of the contents. The category should be Category:English uncountable nouns. --EncycloPetey 15:27, 6 January 2008 (UTC)[reply]
I was actually thinking perhaps it should be "English nouns with uncountable senses," (or something more elegant) since {{uncountable}} is used to mark specific senses. -- Visviva 15:46, 6 January 2008 (UTC)[reply]
Category:English uncountable nouns sounds good to me. "English nouns with uncountable senses", would go into the preamble on the page itself. It would be nice to get this one sorted out AQAP if possible. - Algrif 15:53, 6 January 2008 (UTC)[reply]
Visviva, we've never named any categories that way. Words with obsolete senses are categoeized as simply "obsolete", even if there are senses that's aren't obsolete. Irregular verbs are categorized as irregular, even if a regular form also exists. Etc. So while your proposal would be more accurate a name, it wouldn't be as concise or consistent with current practice. You might raise this as a general issue in the Beer Parlour, but for now I think we should go ahead with the simpler fix, unless there are further objections. --EncycloPetey 17:49, 6 January 2008 (UTC)[reply]
Since {{en-noun}} states "uncountable", I would say yes (fixing the cat name if needed). DAVilla 01:38, 8 January 2008 (UTC)[reply]
The template {{uncountable}} should only add the category if a language can be specified. Most uses of this template do not specify the language. --EncycloPetey 02:00, 8 January 2008 (UTC)[reply]
Then I hope there will also be a "Category:Swedish uncountable nouns" at some point, right? (that is, that the lang= param. works as it should). \Mike 18:07, 9 January 2008 (UTC)[reply]
I really believe that we have a bias toward declaring something to be uncountable. To do it at the PoS level rather than the sense level makes the excessive declaration of uncountability carry over to any added senses (when even the first sense is usually erroneous). In a test I ran of uncountable category items 80% had senses (not necessarily part of WT entry) that were not uncountable. I suppose I should want the category entries to be eliminated unless produced by a template and that both the uncountable tag and the en-noun indications be recorded -- so I can more easily correct them. DCDuring 18:25, 9 January 2008 (UTC)[reply]
Exactly where I'm coming from. My plan was to spend some time cleaning up erroneous "uncountable" entries. If we can have this category correctly named and only allowable at the sense level using {{uncountable}}, then I believe we can put a bit of order to a category that is in danger of getting overused, under categorised, badly defined, and generally out of hand. - Algrif 19:03, 9 January 2008 (UTC)[reply]

A added an {{rfc}} template to this entry - but the reason does not show up. Any ideas? SemperBlotto 16:27, 6 January 2008 (UTC)[reply]

It shows up for me, possibly a cache issue? Conrad.Irwin 16:42, 6 January 2008 (UTC) read before you write :( Conrad.Irwin 22:04, 6 January 2008 (UTC)[reply]
It's not showing up for me either, so it can't be a caching problem. --EncycloPetey 17:44, 6 January 2008 (UTC)[reply]
Fixed. (MediaWiki won't auto-number parameters that have equals signs in them, so you have to number them explicitly. Adding 1= fixed the problem here. Annoyingly, this can happen even if the equals sign is buried in the return value of another template. IMHO, this is one reason named parameters are better.) —RuakhTALK 19:00, 6 January 2008 (UTC)[reply]

Exporting enwikt to other formats

[edit]

Has anyone tried exporting enwikt into other formats? Spellchecker files (w:aspell, w:hunspell, etc.) would be the easiest to create because one wouldn't have to parse the entries too much. Bilingual dictionaries could be made as well. If you didn't want to parse the definitions, a first step would be to strip an XML dump file of unwanted language sections and translations. Any thoughts? --Bequw¢τ 20:20, 6 January 2008 (UTC)[reply]

Yes, lots of thoughts - both User:Hippietrail and I clog the WT:IRC channel talking about this kind of thing. If you want to see how things are progressing try the WT:PREF (conrad's buggy paper view) for example. The answer is, in short, yes it is possible but trying to work out what to do first, and the specifics are all to exciting to let us get down and do some real coding :). Conrad.Irwin 22:03, 6 January 2008 (UTC)[reply]
That IS neat. I'll ponder that a bit. --Bequw¢τ 23:54, 7 January 2008 (UTC)[reply]
Unfortunately, spelling information is not really our forte; we don't have any good way to indicate at the entry for, say, (deprecated template usage) foubarre, that it's a uniquely American spelling (in that I'm an American and I just made it up now), unless it gets an {{alternative spelling of}} entry. If it gets actual definitions, we don't label every sense {{US}}, though we might halfway-imply it by listing * [[foobar]] {{i|UK}} under "Alternative spellings" — and even that we couldn't do in this case, because (deprecated template usage) foobar isn't a UK-specific spelling. This might be something we could work on, though I fear that there might be an irremediable clash between the prescriptiveness inherent in a spell-checker and the NPOV policy we hold dear. (Also, I think a good spell-checking library will be intentionally incomplete, making a point of omitting certain spellings that are rarer in their standard uses than they are as misspellings. We could mimic that, of course, by giving priority to {{misspelling of}} senses when they're more common than than standard uses — but do we want to?) —RuakhTALK 23:24, 6 January 2008 (UTC)[reply]
I've generated a few different lists for different purposes, but I don't trust en.wiktionary English definitions (personally, myself) enough to replace /etc/words with one of these just yet. http://tools.wikimedia.de/~cmackenzie/onelooka.txt (index for OneLook.com) http://tools.wikimedia.de/~cmackenzie/pos.txt (another list by POS) etc. --Connel MacKenzie 21:53, 13 January 2008 (UTC)[reply]

Template lacking documentation

[edit]

Template:yue-hanzi has protected text stating that there is documentation on the talk page, when this is incorrect. This should be removed. __meco 20:49, 7 January 2008 (UTC)[reply]

Perhaps a better solution would be if someone who knows what it does could add some documentation - even if it is very simple. Conrad.Irwin 22:48, 7 January 2008 (UTC)[reply]
Done: Template talk:yue-hanzi. —RuakhTALK 01:05, 9 January 2008 (UTC)[reply]

Folding or collapsing of translations not working

[edit]

Since yesterday, the folding or collapsing of translations does not work for me; I cannot unfold them, as the link [show] is invisible or simply not there. When I log out from Wiktionary and access the pages as an anonymous user, the folding works. Notice that, in preferences, I have set that editing of sections is done using right mouse button, instead of the [edit] links; I've always used this setting though. Anyone else having this problem? Environment: Windows Vista, Firefox 2.0.0.11. --Daniel Polansky 08:09, 8 January 2008 (UTC)[reply]

Can you force reload your the pages you're seeing this on with Ctrl+Shift+R? There have been a lot of changes happening to site Javascript recently, including the code that handles the expanding containers. Also, can you please post any error messages from Tools -> Error Console (after you clear the errors and reload the page). Mike Dillon 08:19, 8 January 2008 (UTC)[reply]
I have tried this; forced reload does not fix the problem. Error console shows only warning messages. --Daniel Polansky 08:26, 8 January 2008 (UTC)[reply]
The warning in the error console is, roughly translated into English: "Error when parsing the value of the property 'white-space'. Declaration omitted." --Daniel Polansky 08:29, 8 January 2008 (UTC)[reply]
Indeed, when I enable the user preference "Enable section editing via [edit] links", folding works again. So the issue seems to be related to that preference. --Daniel Polansky 08:26, 8 January 2008 (UTC)[reply]
I just set the same options you had originally (edit sections with right click = on, edit sections with [edit] = off) and I didn't see this problem. I tested it on the page "and" using Firefox 2.0.0.8 on Linux. Do you see this behavior on every page with collapsible sections, or just some pages? Mike Dillon 08:36, 8 January 2008 (UTC)[reply]
I see the behavior at all the pages, including stone, in most of the collapsible sections, including translations, related terms, and derived terms. The collapsing in the table of contents works fine though; I assume that it uses different collapsing code. The buggy behavior is there also in Internet Explorer 7.0, as I have just tried. --Daniel Polansky 10:11, 8 January 2008 (UTC)[reply]
This problem should now be fixed, it was (as you suspected) dependant on the [edit] links being visible, as - to encourage consistency it was using the same style. I have added a rule to the stylesheet that should override the case when the edit links are hidden - If you could confirm this is now working, after a hard refresh, that would be great. 86.159.27.37 11:39, 8 January 2008 (UTC) (Thats the problem with logging out to test :) Conrad.Irwin 11:44, 8 January 2008 (UTC)[reply]
Works fine; thanks a lot for the fix. --Daniel Polansky 12:25, 8 January 2008 (UTC)[reply]
Umm, Firefox worked just fine before. With the changes, any users logged on now will need to ctrl/shift/r reload. ;-) Robert Ullmann 15:28, 8 January 2008 (UTC)[reply]

My life as a bot

[edit]
We turned on the bot flag for my account two days ago, to experiment with having an account that is Sysop+Bot. Turned off now. A bit annoying, since other patrol ops (deletes, etc) wouldn't show.

I had created automation to carefully go through and delete the redirected from capitalized to lc page titles resulting from the run of the conversion script in June 2005. It was intended to be a low-priority task, just to be shoved into the background to munch on the problem. In that way whenever someone decided to sort them all, it would be a smaller task. However, even small blocks showing up in RC seem to be a problem, and some people (well, at least Connel ;-) want it done quickly.

Faster, Pussycat! Kill! Kill!

It will still take a long time; DAvilla made the excellent suggestion to go longest-to-shortest, there are more and more problems as it goes, they can be evaluated and fixed. So, we can if we want to create a sysop+bot account to do this. It is discouraged in general, precisely because deletes (or other sysp functions) will not show in RC. The English Wikipedia prohibits sysop+bot entirely.

I created the account name User:Robert Ullmann SysopBot. Rather a mouthful, but it is IMHO very important to say exactly what it is. It will also always list immediately after me (e.g. in Special:Listusers). You shouldn't have to type it anyway.

More task description is on the User: page, but essentially it is about very carefully deleting the redirects without breaking desired links. It fixes some (notably in Transwiki: space), ignores others (Gutenberg frequency lists, which shouldn't be modified, and almost all come from sentence capitalization), and logs the remainder.

Question: should it automatically fix links in NS:0 entries? For example in shipshape. (And thousands of others.)

I'm trying a few of these. Robert Ullmann 14:29, 8 January 2008 (UTC)[reply]
Um, no good. First two were Pertaining which needed to be unlinked (adj. def. lines), and Philistine which should be an entry. There's going to be lost of stuff to clean up. Robert Ullmann 15:32, 8 January 2008 (UTC)[reply]

Do we want this set up this way? Robert Ullmann 13:36, 8 January 2008 (UTC)[reply]

I have no problem with the sysop+bot flag, indeed it does make sense for deletions that don't need to show up in RC (like these). However it doesn't really help solve the problem in hand because the admin logs (deletion in particular) are not filterable by bot flag - afaik. There have been three possible solutions to this problem proposed.
  1. We let them tick away in the background for a year (20 ignorable log entries per hour) with the bot flag so that RC doesn't notice.
  2. We ask Ullmann to remove the speed control, leaving 120ile/h, again RC would only be minimally affected, but this time the logs would be very overcrowded - given that the deletion rate of the bot would be equal to the manual contribution rate to Wiktionary. This solution would lead to a month of very clogged logs.
  3. We ask Ullmann to distribute copied of the bot to a couple of other sysops who run it on their own account (or - very insecurely through the SysopBot account), and we try and get the rate up to 1000ile/h this would allow us to finish in a day and a half but the deletion log for this time period would be utterly destroyed - as it can only show about 5000 entries at a time, only five hours worth of normal deletions would show up on each screen.
My choice is between solutions 1 and 3, and neither of them really require the bot flag - 1 because the rate is too low, 3 because it would be insecure to distribute widely. While solution 2 would help RC by using the bot flag, it would make the deletion logs very heavy for far too long. Conrad.Irwin 14:12, 8 January 2008 (UTC)[reply]
Somewhere in the middle (1-2), run it at several hundred a day (at 500/day it would be 70 days). The deletion logs will (and should) have all the entries, but it will always be possible to read 5+ days worth (limit=5000).
I do wonder why the deletion log is an issue? Robert Ullmann 14:29, 8 January 2008 (UTC)[reply]
I thought that was what Connel was worried about, so it may just be me barking up the wrong tree. Conrad.Irwin 16:38, 8 January 2008 (UTC)[reply]
Should also note that it is impossible to "finish" in a day and a half (or whatever), the exception rate will increase as the words get shorter and more commonly used; they will be lots of manual sorting and then re-runs after an XML dump. Robert Ullmann 14:45, 8 January 2008 (UTC)[reply]
I am a supporter of "Get it over with fast", therefore, 3 until the exception count gets high enough to make that unworkable. ArielGlenn 16:34, 8 January 2008 (UTC)[reply]
Has anyone consulted the developers about this? Maybe they could hack something for this 'problem'. RJFJR 16:42, 8 January 2008 (UTC)[reply]
I've considered asking Brion about it. The main problem there, is that a database-side loop to delete the redirects would be much less discriminate. We've already encountered several types, where the deletion isn't a given. The developers are a very limited resource, as a whole; their time is better spent on my important things like CentralAuth. I feel it is petty to ask them to try something, when we are unclear on the exact exceptions list, as it is.
Returning to option 3 for a moment: the reason I like that option the best, is that the (currently unusable deletion log) would only be unusable for one last day. I am very uncomfortable leaving it in an unusable state for the next year. I don't refer to the deletion log every day, but when I do, I'd like to be able to find what I'm looking for.
--Connel MacKenzie 19:16, 13 January 2008 (UTC)[reply]
Someone (IIRC Amgine) suggested that this be just run as as an SQL op; that cannot be done unless one wantes to just break all the links, errors be damned. But Connel is incorrect: it simply can't be done in a day. Or 5 or ten or whatever. The bulk of the process is sorting the exceptions. If Connel can and will tell us what is the acceptable amount to show up in a day's deletion log, fine, I can max that (easier with the bot flag). But the idea of "flood it and get it over with" will not work!. Sorry, it just will not. Robert Ullmann 22:06, 13 January 2008 (UTC)[reply]
Have you ever received an e-mail along the lines of "Why did you delete my entry" where the person didn't spell (or perhaps capitalize) the word they entered, correctly? Maybe this is a poor example, but it reflects the most recent time I tried to use Special:Log/delete...obviously there are other uses. But the "throttled" speed you have it running now, makes Special:Log/delete useless for that. (Again, just an example.) The sooner it is over, the better. If you take off throttling and continue running, give me half the list for me to run, give Conrad half the list, give the toolserver half the list...well, you get the idea. --Connel MacKenzie 23:20, 13 January 2008 (UTC)[reply]
I never get email; they complain directly to OTRS ;-). (I've gotten one email in the last few months, from someone blocked for a spammy entry, that I immediately unblocked as the mail was quite reasonable.) You do know that you can filter the deletion log by user? (e.g. you)
You keep asking me to un-throttle it and get it over with. That will not work.. There are ~34K tasks left, not counting the redirects from CS moves that were reversed. Each takes 3-5+ transactions. The exception rate is now >5%, and growing as the words get shorter. If I simply ran it "unthrottled", it would take 45-50 days, not counting handling exceptions and re-running. If 3-4 people ran it, it would still mean at least two solid weeks of logs and Recent Changes being utterly unusable.
Further: a lot of the exceptions take manual work out and re-running (from the subsequent XML), so it will take a number of passes anyway. The process will extend necessarily over some months. This is why I have been from the beginning trying to put it in the "background". If the current rate of entries in the deletion log is still too high, the only answer is to slow it down. Robert Ullmann 14:02, 17 January 2008 (UTC)[reply]

For import

[edit]

w:Korean-Words-English-Transliterated may be worth importing. – Mike.lifeguard | @en.wb 15:10, 8 January 2008 (UTC)[reply]

is already queued to show up at Transwiki:Korean-Words-English-Transliterated. But is not a very useful list, it entirely lacks hangeul and the transliterations are, um, imaginative. Should just get tossed (there and here). Robert Ullmann 15:40, 8 January 2008 (UTC)[reply]
Are our typical Korean entries much better? Or better at all? --Connel MacKenzie 23:14, 13 January 2008 (UTC)[reply]
I hope so. Granted there is still a certain amount of irredeemable gunk floating around. But if they aren't better on average, I would appreciate if somone would hit me with a clue stick to let me know what's going wrong. -- Visviva 14:28, 17 January 2008 (UTC)[reply]

"Similar Titles" algorithm a bit dodgy.

[edit]

http://en.wiktionary.org/w/index.php?title=Special:Search/embelished&go=Go

I searched for "embelished". The "Similar Titles" search did not "embellished" (two "l"s) but it did find "acciacatura". Very odd.

Apologies if this is the wrong place to post this, it was the most appropriate place I could find. --203.113.234.214 09:57, 9 January 2008 (UTC)[reply]

It doesn't do spelling errors like that, but does do stemming (for English only? presumably) so it found the entry containing "embelishment". If you try Google "embelished site:en.wiktionary.org it will ask if you meant "embellished". (And there are 75K misspellings out there.) Yes, the search here does some odd things. Robert Ullmann 11:58, 9 January 2008 (UTC)[reply]

Unconjugated French verbs

[edit]

Seeing as such a good job was done with the creation of my incredibly long IPA-template-missing page (thanks CMK), could another page be created with all pages for French verbs lacking the conjugation heading. I.e. ==French== ===Verb=== - ====Conjugation====. Just to give myself extra work. Thanks in advance--Keene 18:23, 9 January 2008 (UTC)[reply]

HTML entities in pages

[edit]

I've heard a several times (and agree) that we should replace HTML entities with their unicode counterparts in entries. I analyzed the January db dump and created a list if anyone wants to take a crack at it. Is there any context where we'd want to leave the entity? If not, is this something that a bot would be much better at? --Bequw¢τ 22:47, 10 January 2008 (UTC)[reply]

Firstly, a nitpick: You don't mean "entities", but rather "references", a term that covers both numeric character references (like &#x26; and &#38;) and [general] entity references (like &amp;, amp being the name of a [general] entity). (It also covers parameter entity references, which aren't relevant here.) Secondly: I think all references can safely be replaced with whatever they evaluate to, except (1) references that evaluate to characters in the ASCII range (since when those are escaped, it's generally to prevent MediaWiki from interpreting something as markup) and (2) references that evaluate to whitespace characters (&nbsp; and &#x200B; and whatnot; since it'd be a pain to edit entries that use them un-escaped, if that even works). Judging from your list, it looks like you agree. :-)   Thirdly: Yeah, probably fodder for a bot. :-)   —RuakhTALK 02:19, 11 January 2008 (UTC)[reply]
Dodde has a bot, it hasn't been run for a while. Robert Ullmann 15:16, 11 January 2008 (UTC)[reply]
I think this is best done as an w:WP:AWB task. If I can find my copy of it, I'll start on them later. --Connel MacKenzie 19:48, 12 January 2008 (UTC)[reply]
Ugh. The HTML entities are the least of the problems with these entries...this is a good general cleanup list.  :-(   I'll add the rest of those entities one by one to my copy of AWB and restart it tomorrow. Too bad AWB can't honor normal Javascript extensions...most of the rest of the formatting changes I have semi-automated in JS already. (Anyone know where the AWB bot list thing is supposed to be kept? I'd like to speed the AWB phase up by auto-saving when it does find one of these entities.) --Connel MacKenzie 08:05, 13 January 2008 (UTC)[reply]
Wow Thanks Connel! --Bequw¢τ 19:28, 17 January 2008 (UTC)[reply]
Even without figuring out how to set myself as a bot, I managed to get AWB to edit (at one point) 14 pages per minutes. The HTML entities I think have now all been replaced, but these entries (almost all) have numerous problems remaining (no JS in AWB, so no normal cleanup done for the lot of them, yet.) AWB is an unpleasant tool - I wish to never do this again. If you see someone entering html entities like these again, please permablock them. (Ugh! That was nasty.) --Connel MacKenzie 23:12, 13 January 2008 (UTC)[reply]
Besides whitespace and ASCII characters (mentioned by Ruakh), other things not to convert include RLM (&rlm; and the dec and hex equivalents), LRM (&rlm; and equivalents) , and other no-width characters (for the same reason as whitespace: they're hard to find).—msh210 18:43, 15 January 2008 (UTC)[reply]
Also various hyphens and dashes, which look like one another in the fixed-width edit textareas.—msh210 17:16, 24 January 2008 (UTC)[reply]

To provide a small boilerplate intro to the Requests (language) categories. Conditionally describes a few of the standard sub-cats, and other references. See Category:Requests (French) Robert Ullmann 15:13, 11 January 2008 (UTC)[reply]

Template name - "lvn"

[edit]

Is lvn an ISO code? I was asked to rename template:lvn to template:new lvn "before getting mired any deeper", can someone please explain me this? Thanks! Miasma 22:03, 12 January 2008 (UTC)[reply]

Latvian is "lv". All the templates should start with lv-. "Preload" templates for Latvian should start with "new lv ". What are you doing? It looks like it(they) should be "lv-(something)" Robert Ullmann 22:50, 12 January 2008 (UTC)[reply]
well ok, is lv-n OK? or should it be lv-noun? What's preload templates? (this Help:Tips_and_tricks#Using_preload_templates is the only page I cloud find that meakes a mention of preload templates). This template was intended to supersede/consolidate all the separate declension templates, of course if a user prefers it to other templates, well anyway I wanted a more sophisticated template. OK, Template:new lvn 'll do. Miasma 00:33, 13 January 2008 (UTC)[reply]
It depends on how you intend the template to be used. {{lv-noun}} would be the name of the template for the Latvian noun inflection line (see {{la-noun}} or {{es-noun-m}} for similar examples). For a declension table, {{lv-decl-noun}} would be a better name. --EncycloPetey 04:13, 13 January 2008 (UTC)[reply]

Yai! Thanks for moving it! Guess, now I can start advertising it on the inflection templates page, considering it's more or less functional and properly named. Miasma 17:26, 13 January 2008 (UTC)[reply]

Wiktionary spell checker

[edit]

An interesting bot request has come up on the English Wikipedia. The request is to write a program that displays the percentage of word that are exclusive to the UK or US. However, I am unable to find dictionaries that don't included some of the crossover words.

I eventually came up with using Wiktionary as people could update entries. There are quite a few interesting possibilities like measuring the frequency of Colloquial, loan, slang, and informal words in a text. With some clever programming it could be possible to show Iambic pentameter and a syllable counter. River's has a good start with his wikify tool which looks up words in a Wikipedia and links it if it can (doesn't work well for Wiktionary since it capitalizes the word first). However, the principal metrics that I seek to distinguish a word as being a UK or US are lacking as words do not have a templates or categories to distinguish them. Any suggestions? -- Dispenser 21:34, 13 January 2008 (UTC)[reply]

Neat idea. I've done some stuff at User:Connel MacKenzie/US vs. UK but that remains rather incomplete. Entries here are tagged with either {{US}} or {{UK}} respectively (which are sometimes bitterly fought over, but nevermind that) which have a better degree of accuracy. That is an interesting request - basically a page-sniffer that decides if a page is US or UK? Lemme think about this some. Well, other ideas are welcome, of course. I mean I have nothing on hand for it, immediately. --Connel MacKenzie 22:53, 13 January 2008 (UTC)[reply]
What about a spell checker for the content of the Wiktionary and Wikipedia websites based on vocabulary in the Wiktionary database? Surely this isn't too complex to implement and it could target a lot of human attention to erroneous content. --Josh Greig 2:53, 27 February 2008 (UTC)
WT:PREFS has a crude spell-check feature you can turn on, if you aren't using Firefox 2's built-in spell-checker. After each XML dump, this is updated. --Connel MacKenzie 05:10, 28 February 2008 (UTC)[reply]

protected

[edit]

Could someone get Wiktionary:Information desk/Archive 2007 moved to Wiktionary:Information desk/Archive 2007/January-June , because I think having two page for the Information Desk 2007 archive is better, space-wise. The scond half of 2007 is to be eventually archived at Wiktionary:Information desk/Archive 2007/July-December. I guess that Wiktionary:Information desk/Archive 2007 will just contain the 2 links to subpages. Thanks--Keene 12:36, 14 January 2008 (UTC)[reply]

Done. :-) —RuakhTALK 13:28, 14 January 2008 (UTC)[reply]

Testing new parser

[edit]

It seems that there is testing going on at Wikipedia for a new parser/preprocessor being developed by Tim Starling. I can't find a tracking ticket on Bugzilla or any real documentation, but the new parser can be tested by appending a timtest=newpp parameter to page URLs. The only way I can see to tell that this is working is to look at the HTML source and see if the comments at the end of bodyContent say "Pre-expand include size" or "Preprocessor node count".

Since the new preprocessor is installed on Wiktionary too, I was wondering if anyone is testing it. The new system differs slightly from the old one and has the potential for breakage, especially when complex templates are involved. Mike Dillon 05:37, 15 January 2008 (UTC)[reply]

vervaardigen vs vervaardigen (new) is choking on {{infl}}. Splarka is eyeballing it now. --Connel MacKenzie 07:57, 15 January 2008 (UTC)[reply]
OK, one down. [1]. --Connel MacKenzie 08:19, 15 January 2008 (UTC)[reply]
{infl} was inadvertently using one of the odd side effects of if/switch parsing Robert Ullmann 11:44, 15 January 2008 (UTC)[reply]

It looks like the new entry header is broken too: [2]. Mike Dillon 04:14, 17 January 2008 (UTC)[reply]

Wiktionary:Project-Newarticletext looks OK when viewed directly, but something weird is happening with template expansion when it is called through Mediawiki:Newarticletext at the top of a page... I tried changing "ucase" to "uc", but that doesn't solve the problem. Unfortunately, it seems to be dependent on {{NAMESPACE}} and I can't get Special:Expandtemplates to use the new pre-processor. Mike Dillon 04:31, 17 January 2008 (UTC)[reply]
we are getting one extra close brace for each invocation of {didyoumean} ... Robert Ullmann 12:23, 17 January 2008 (UTC)[reply]
FYI: The preprocessor token can be set by GET, POST, or cookie. You can set it semipermanently with timtest=setcookie and remove it with timtest=removecookie. Also see the new temporary testing extension Special:ParserDiffTest. Splarka 22:12, 17 January 2008 (UTC)[reply]

dump analysis request

[edit]

This was mentioned in another forum, but nothing came of it, I think because it seemed complicated. Maybe someone has ideas, though.

I'm thinking of analyzing the en.wp and en.wikt dumps to get a list WP pages that we don't have. Of course, there are lots and lots of WP pages we don't want, such as most personal names, so we'd have to restrict the list somehow. The following seems reasonable, but please suggest emendations: The list should contain all terms, like foo, that satisfy the following:

  • We don't have an entry foo.
  • WP has an entry w:Foo.
  • WP's article w:Foo, if it's not a redirect, contains the word foo in lowercase within it somewhere. [This is to get rid of names and the like.]
  • If WP's article w:Foo is a redirect to w:Bar, then the article w:Bar contains the word foo in lowercase within it somewhere. [Again, to get rid of proper nouns et al., and redirects from misspellings.]

(I cannot analyze the dumps myself, not having knowhow.)

Opinions would be appreciated; such analysis more so.—msh210 16:18, 17 January 2008 (UTC)[reply]

[edit]

Perhaps this has been discussed before, but why do the interwiki links show up (in the "in other languages" list) as languages' names in their own language, rather than in English? The benefit of having the link to fr: (e.g.) show up as "Français" and not "French" is that someone who knows only French and comes across our page will know how to find the definition (or other info) in his own language. But what are the chances that (a) he'll come across our page, (b) he doesn't know what the word "French" means, and (c) he does know what "in other languages" means, or can guess what the list is for? The benefit of having the link show up as "French" and not "Français" is that an English speaker, who doesn't know what "Français" means (not likely in the case of fr:, but pretty likely in the case of, say, ha:), can look up the word elsewhere and try to make sense of it: I've done this myself. It seems to me having English-language names is the way to go.—msh210 16:58, 17 January 2008 (UTC)[reply]

My understanding is that this is a universal feature across all the WikiMedia projects. The language names are therefore consistent across all the projects, instead of being different on each and every project. --EncycloPetey 18:47, 17 January 2008 (UTC)[reply]
For stylistic reasons I agree with Msh210, and changed my personal view to show English translations using w:User:Tra/sidebartranslate.js (mentioned here for reference). --Bequw¢τ 19:57, 17 January 2008 (UTC)[reply]

SimpleForms

[edit]

Hi, i would like to propose using the mediawiki extension SimpleForms here on Wiktionary. With the help of this extension we could easily provide "input wizards" like forms for adding new words with checkboxes "noun (gender), verb, adjective" etc.. which then insert the right template with the right parameters. It could also be used to create drop-down menus for categorizing where the user chooses from existing categories instead of having to look up the according category. I think having forms for adding new words can make it a lot easier for the occasional user or newbie to add good pages. I could get people to add words which i couldnt talk into taking part in Wiktionary yet. Mutante 20:16, 17 January 2008 (UTC)[reply]

The challenges I forsee are (1) different languages have different templates, (2) would the wizard adapt when the word is being added into an existing page, versus a new page? --EncycloPetey 01:43, 18 January 2008 (UTC)[reply]

I didnt think of one universal wizard yet, but first just of one like "Add new german word", which would be specific to language and only for new pages. Mutante 22:00, 18 January 2008 (UTC)[reply]

I agree that enabling that extension would be much more elegant than our current kludge of MediaWiki:Noexactmatch + MediaWiki:Noexactmatch/fr (etc.) + Template:new en basic + Template:new en basic intro + Template:new en noun + Template:new en noun intro (etc.) + Special:Prefixindex/Template new stuff. --Connel MacKenzie 00:02, 19 January 2008 (UTC)[reply]
I agree also, once we have the extension doing simple things we can then work out whether it is suitable for more advanced editing - but it would be nice to have something better than the new templates, as above. I think that some of the bugs mentioned on the mediawiki page will prevent it from being installed for WMF projects for the moment - but lets try and fix them rather than let that stop us. Conrad.Irwin 00:32, 19 January 2008 (UTC)[reply]
Mutante, I hinted at it with the "/fr" link above, but to be a little more explicit: If you set your Special:Preferences language to German, you'll get MediaWiki:Noexactmatch/de instead of MediaWiki:Noexactmatch. (You can edit that page's talk page, if you aren't a sysop yet.) You can create the "preload" templates needed, as Special:Prefixindex/Template:new de , e.g. {{new de noun m}} and "introduction" text (instructions) at {{new de noun m intro}}. --Connel MacKenzie 01:45, 20 January 2008 (UTC)[reply]

{{infl}}

[edit]

This needs extra parameters, or something. Typing in {{infl|en|suffix}} doesn't add the page to Category:English suffixes. I think the same might be for {{infl|en|prefix}} too. At the moment, {{infl|en|suffix}} just puts the page into Category:English language. How to change this? --Keene 00:34, 18 January 2008 (UTC)[reply]

Use {{infl|en||cat=suffixes}}. And FYI, templates' talk pages are often quite informative. :-) —RuakhTALK 01:00, 18 January 2008 (UTC)[reply]
{{infl|en|suffixe}} works too I've found. --Keene 00:17, 19 January 2008 (UTC)[reply]
Should be {{infl|en|suffix|cat=suffixes}}. Point being that automation/whatever can find the string "infl|en|suffix" if one wanted to (say) convert them all to a new en-suffix template. Robert Ullmann 12:27, 20 January 2008 (UTC)[reply]

Fonts change when editing

[edit]

When I edit anything, the fonts of the surrounding page (the navigation box, tabs etc) change from the normal san-serif to one with serifs. Is this deliberate? SemperBlotto 09:36, 18 January 2008 (UTC)[reply]

I've noticed same for a few days. DCDuring 12:23, 18 January 2008 (UTC)[reply]
And it is only here - still normal on Wikipedia, Wikispecies and Wikinews. SemperBlotto 12:10, 20 January 2008 (UTC)[reply]
Fixed - see "Oper issue" down a bit. SemperBlotto 22:25, 20 January 2008 (UTC)[reply]
What SB said. DCDuring TALK 18:35, 23 January 2008 (UTC)[reply]

RFDO archiving

[edit]

I've written code again (seems like its been a while.) Checking http://en.wiktionary.org/wiki/Wiktionary:Grease_pit_archive/2007/April#Better.2C_more.2C_faster_archiving there has always been strong consensus for just about any archiving method that will work. Discussing it today and yesterday, we hammered out some of the more technical details. At this point, I have it working, for archiving WT:RFDO redlinks only, setting all the nice cross references we want. Right now, I have it doing only one at a time (as per the Ec convention of years ago.)

My question is, should I start this going steadily at once (just one entry) an hour, or some other rate? It seems pretty clear that this archiving is overdue.

--Connel MacKenzie 11:16, 18 January 2008 (UTC)[reply]

At the moment, I'm still squashing bugs and retesting along the way. --Connel MacKenzie 11:33, 18 January 2008 (UTC)[reply]
Relevant links to watch: WT:DEL, WT:RFDO, WT:RFDA. --Connel MacKenzie 21:52, 18 January 2008 (UTC)[reply]
Most bugs seem to be addressed on this proof-of-concept. Feedback would be appreciated. Do people want me to cut it loose doing one entry a minute, or one an hour or what? --Connel MacKenzie 21:51, 18 January 2008 (UTC)[reply]
  • Perhaps a better test of the automation, would be to remove the break that does only one (and exits) instead letting it just do all ~250 sub-items all at once? That would just be one test-run. --Connel MacKenzie 00:05, 19 January 2008 (UTC)[reply]
Just do it all at once, man. Nobody likes archiving anyway, just get it all out of the way in one swoop. And still, it's ONLY archiving, nothing that important. Get the RFD pages as small as possible, cos then it won't take 2 minutes to load each time. --Keene 00:16, 19 January 2008 (UTC)[reply]
I think that this is a good way forward, it means that we don't actually need to keep all the discussions permanently as they are easily findable anyway. Perhaps for the RFDO pages it would be nice to sort them by namespace within the letters - not really sure though. Also, shouldn't these pages be in the Wiktionary: namespace - a minor point that would require lots of effort to fix; at least this is better than the Appendix namespace these pages were in previously. This is certainly a good way forward, are you going to be able to go back through the history and link to the previously closed discussions? Conrad.Irwin 00:24, 19 January 2008 (UTC)[reply]
Moving them to the Wiktionary: namespace is an idea (Appendix: was way off) but I'm not sold on it.
Yes, I think I can extract the other histories from the full XML dump. I did that twice before (before the XML dump change that broke my routines) so at the very least, I'll be able to incorporate those...leaving a gap perhaps. (A gap in some history about entries we've decided we don't want, that is.) I plan on sorting all the subpages when at a logical breaking point, even though they have individual #links for you to reach them all directly, anyhow.
OK, I'll let 'er rip. Each chunk takes 1.5 minutes, doing it nicely. I'll continue with that for a while...about five hours for the RFDO redlinks. I'll have to make sure I keep my "bluelinks" thing separate until that finishes.
--Connel MacKenzie 00:41, 19 January 2008 (UTC)[reply]
Oh, also, thinks like MediaWiki:Previously deleted entries/B#Business Practices Officer can be dealt with two ways, too. What makes the most sense, to me, is to put that section back onto WT:RFD and let it get re-archived nicely, there. But that should probably wait until the RFD run starts/is tested. --Connel MacKenzie 01:00, 19 January 2008 (UTC)[reply]
Note to self, re-add this when RFV archiving is working. Also move [B], [I], [V] and [Y] 's back. --Connel MacKenzie 07:54, 19 January 2008 (UTC)[reply]

Status update:

Dunno what to do about talk page redirects though, for "kept" messages. Pywikipediabot refuses to follow the redirects in that setup. --Connel MacKenzie 09:25, 19 January 2008 (UTC)[reply]

WT:RFDO went from 1/5 MB down to 1/2 MB. Once people get in the habit of striking out resolved items, it will shrink a bit more dramatically. --Connel MacKenzie 10:00, 19 January 2008 (UTC)[reply]

...starting WT:RFD testing now. --Connel MacKenzie 17:43, 19 January 2008 (UTC)[reply]

Followup on question from irc: yes, the intent is to resort the alphabetic indexes when the bot is caught up. Sorting doesn't really matter though, as they have anchors you can link to, as it is. Also, it isn't the sort of page for lite browsing. --Connel MacKenzie 20:08, 19 January 2008 (UTC)[reply]

Location of WT:DEL

[edit]

At the moment these archives are being put into the MediaWiki namespace, which is a lot better than their previous location in the Appendices. I would like to move these pages to the Wiktionary namespace which is their logical home. When I proposed this on IRC it was counter-proposed that these pages should be in the MediaWiki namespace so that Google doesn't index our deletion archives. I think that firstly google indexing our deletion archives is not a problem, and secondly that even if we didn't want Google to search our deletion archives there ought to be a better way of marking pages as such, that could be used for all the pages that we want to hide. Conrad.Irwin 22:14, 19 January 2008 (UTC)[reply]

If you take a look at http://en.wiktionary.org/robots.txt, you'll see that there are already a number of project-specific exclusions in there. The most analogous one is w:Wikipedia:Articles for deletion. I don't see any reason we shouldn't just ask for the eventual location to be added to robots.txt. That only leaves the on-wiki search which doesn't search "Wiktionary:*" by default as far as I know. Mike Dillon 22:35, 19 January 2008 (UTC)[reply]
I'm not sure if it was the accusative tone of the original question (on IRC) or what (that threw this so far out of context.) The intent of having a boat-load of page-specific information "somewhere else" was first and foremost driven by how cluttered the Wiktionary: namespace currently is. The choice of putting it at WT:DEL was a side-benefit to finally get that stuff out of search-engine's paths. I find it suspicious that the community-at-large (only propeller-heads read WT:GP) is being excluded from this conversation, under the guise of it being purely a technical concern. Yes, I am aware that we can request specific robots.txt, if we happen to have community consensus, if the developers feel like doing it, if we have the whole batch of pages prepared long in advance and if we don't mind waiting a month (or six - oh wait - it's been 24 to 36 months so far) for it to finally happen. Oh, and if we don't mind all that same data being replicated by XML-dump mirrors. --Connel MacKenzie 01:24, 20 January 2008 (UTC)[reply]
I'm not sure why you're getting all defensive about this, but I don't really care either since it seems to be some sort of IRC-related personality conflict. All I have to say on this is that putting content pages in the MediaWiki namespace is highly irregular. As far whether the community at-large is being "excluded", it seems a little paranoid to me to jump to that conclusion, but you're certainly right that this isn't a technical question. Mike Dillon 01:52, 20 January 2008 (UTC)[reply]
Paranoid? Um, no, you should have heard Robert's and Conrad's comments. About the MW namespace: it is irregular, but is also useful. In this case, it represents Wiktionary's internal housekeeping regarding items it does not want. --Connel MacKenzie 02:46, 20 January 2008 (UTC)[reply]
I'm puzzled as to why we have alphabetic indices at all, or why they would be in MW space when the monthly archives already are (and should be) in project space. Of course, if these are being automatically maintained, it's no great matter either way. But I don't really see the use value. For those who actually want to find a previous discussion of a particular word, "What links here" is a quite effective tool. -- Visviva 05:50, 20 January 2008 (UTC)[reply]

Enhancements for RFDO, RFD, RFV archiving

[edit]

Sorry, but RFC will have to wait for a little while.

FEATURE: In some side discussions, other enhancements and suggestions are appearing. For example, when an item is stricken out, I think I will add a check to load the page and verify that the respective {{rfv}} or {{rfd}} tag has been removed - if not, the bot can add a comment to that section, indicating that it was inactive and stricken but still tagged.

THROTTLING: I don't know what to do, to limit edit conflicts. I assume there won't be anything like yesterday (and still going on, on RFV now) in the future. I think this will work best checking once every hour, on the hour, once it settles down.

Other cross references: The monthly summary indexes haven't been maintained in a year on some pages. If I revive this concept, I'll do so with uniquely named subpages. Is this a strongly desired feature? It used to seem useful, before we had the ability to look at previously failed entries coherently. (And before we had the ability to lock entries, etc., etc.)

More ideas welcome. Should the pages' preambles be enhanced to describe the new technique yet? Everyone hate the new archiving or love it? Both?

--Connel MacKenzie 05:51, 20 January 2008 (UTC)[reply]

FEATURE: Link-together renominations. (I suppose that happens implicitly on the talk page. For future archivings, anyhow. Nevermind.) --Connel MacKenzie 05:55, 20 January 2008 (UTC)[reply]

Looks great. I don't really like the concept of archiving to history (since it makes the discussions unsearchable), but it's better than what we've had. -- Visviva 06:01, 20 January 2008 (UTC)[reply]
Actually on reflection I really dislike the concept of archiving to history (since it makes the discussions unsearchable and almost inaccessible, and since a great deal of our communal discourse is within those discussions, to say nothing of remarks, links & citations which may be useful for entry improvement). So for future runs it would be nice
a) if the discussion is actually archived somewhere, preferably continuing current practice, i.e. to the Talk page for kept entries and the monthly archive for deleted entries.
b) also if the RFD/RFV tags are automatically removed from the entries when archiving. In theory this should already have been done by the closer, but that often isn't the case.
But anyway, my opinions on deletion tend to be idiosyncratic. Hopefully others will weigh in. -- Visviva 12:43, 20 January 2008 (UTC)[reply]
I really dislike the notion of the bot removing the tags. Posting a reminder (to the section of RFV/RFD) seems much more helpful and consistent. --Connel MacKenzie 19:34, 20 January 2008 (UTC)[reply]
Not searchable is undesirable. I like to be able to find old discussions so as not to waste the time of those with longer tenure here. There is a search box and engine. It seems silly to have to ask - and then gete a vague answer like "I think we had a discussion a year ago". If we could codify everything into policies and guidelines, then the cases, precedents would not be as important, but cases seem to rule. DCDuring 12:59, 20 January 2008 (UTC)[reply]
No one (not even I) has suggested they not be searchable. The only question is whether external searches should be encouraged/promoted/spammed. --Connel MacKenzie 19:34, 20 January 2008 (UTC)[reply]
But when the archive is just a link pointing to history, that removes the discussion from internal search as well. (for the record, I find it mightily convenient to be able to search old discussions with Google, also). -- Visviva 10:23, 21 January 2008 (UTC)[reply]

Opera issue

[edit]

Much to my chagrin, something happened a few days ago and I can't edit Wiktionary (but only the English one) using Opera, my main net browser. The English Wiktionary is the only wiki I've been unable to edit using Opera. English Wikipedia, Romanian and French Wiktionaries - fine. This might not bother me so much if I didn't hate Firefox, which is my only other option. So basically I just want to know if somebody changed something that can be fixed, or if it's just some freak occurence and I'm going to be stuck using this piece of crap forever. :) — [ ric ] opiaterein16:03, 20 January 2008 (UTC)[reply]

Doesn't Opera have a nice "Javascript console" where it reports errors? --Connel MacKenzie 19:35, 20 January 2008 (UTC)[reply]
Thanks for stopping in on IRC, by the way. Please try and describe the problem more clearly in the future - I thought your comment was a joke about the Wikipedia sandbox, before reading this. (Or at least mention that you just posted a new message on WT:GP, next time.) There has been a flurry of development activity recently. A more specific description would help narrow it down. --Connel MacKenzie 19:43, 20 January 2008 (UTC)[reply]
I tried to do some editing with Opera and whenever I hit "Save page", "Show preview", or "Show changes", I'm shown Special:Search (without an external redirect). I opened up Wireshark and this seems to be happening on the backend, not in the browser itself. The URL stays on the expected URL, but "wgPageName" says "Special:Search" and the page content is the search page. Mike Dillon 20:48, 20 January 2008 (UTC)[reply]
It may well be a problem with some javascript, in opera with javascript turned on the page appears distorted (edittools is top left followed by page content, then the navigation tool bar. Having turned the javascript off, everything seems to be working ok. Conclusion: some seriously borked javascript somewhere, argh. Conrad.Irwin 21:14, 20 January 2008 (UTC)[reply]
Ok, now logged in with javascript enabled again - everything looks fine. Just try the save page button... Conrad.Irwin 21:23, 20 January 2008 (UTC)[reply]
Problem caused by edittools, probably by some bad HTML in the Message - ugh, its awful. I'll try and fix it now, but if you want instant gratification, including importScript('User:Conrad.Irwin/edittools.js'); in your Monobook.js will override the problem. Conrad.Irwin 21:38, 20 January 2008 (UTC)[reply]
Hopefully fixed now, for some reason the Ancient greek font <span had not been closed and thus the DOM was broken - why it hurt opera so much I have no idea, probably an idea to inform their devs - if we can work out exactly what it was. It certainly caused about 30 subsequent errors in the HTML validator. Conrad.Irwin 21:52, 20 January 2008 (UTC)[reply]
It looks like the DOM breakage was causing Opera to include the "search" input field from the search form, which was causing the backend to think a search had been submitted... Mike Dillon 22:29, 20 January 2008 (UTC)[reply]
By the way - whatever you fixed also fixed the strange way the fonts changed upon editing (see up a bit). SemperBlotto 22:24, 20 January 2008 (UTC)[reply]

JSLib: namespace

[edit]

Wiktionary has a lot of people who are building stuff in Javascript, and - to save us from constantly repeating ourselves and re-implementing everything a thousand times over. It seems sensible to reserve a section of Wiktionary that is exclusively used for this development. Some thoughts and points I have had, does anyone want to add/ take away from this before I start a VOTE for it. This discussion may belong in the Beer Parlour, I seem to have the wrong idea about where the split should be, if so could someone please move it.

My thoughts, in no particular order:

  • It could contain individual functions on seperate pages, that could be transcluded into multiple Javascript files.
  • It could contain macros (like templates) to do common tasks that should be inline (loading internationalised strings from Mediawiki space is one such contender -see JSLib:int)
  • Allows the creation of reusable libraries etc.
  • Possible protection issues, for using the libraries themselves importScript supports oldids so the pages can be left un- or semi-protected. But transcluded functions and macros would have to be sysop only, there is a great security risk with allowing people to edit javascript that is used by other people.
  • Automatic syntax highlighting would be nice, like current js pages, but it is possible that enabling that feature breaks all the parser fun, so it may be better to leave it off
  • Transclusion *must* be done using {{msgnw:}} - don't want the parser getting a look in.

The vote would be on the creation of the namespace, not on how it should be used in any great detail. Conrad.Irwin 00:35, 21 January 2008 (UTC)[reply]

  1. Support Connel MacKenzie 07:01, 21 January 2008 (UTC)[reply]
  2. SupportRuakhTALK 18:25, 21 January 2008 (UTC)[reply]

A vote is premature at this stage. There are thus far only three JSLib: pages. I am all for separating and reusing code, but have a full go at it before creating the namespace. Also, why not just use a .js suffix? And I imagine there are disadvantages to doing this in the Mediawiki namespace instead? DAVilla 21:07, 21 January 2008 (UTC)[reply]

Um, my "support" un-vote was tongue-in-cheek, meant as encouragement, not a !vote. But, it is pretty hard to populate the namespace, when the very first one gets RFD'ed as being in an invalid namespace.
The disadvantages of the MediaWiki: namespace are: A) shortening the "include" names is defeated by lengthening them, B) non-sysops can't add their code in place (instead only in disastrous userspace) C) The MediaWiki namespace itself is already cluttered (and these presumably would be all over the place.)
--Connel MacKenzie 08:45, 22 January 2008 (UTC)[reply]
There was some talk about using JS: instead of JSLib:, this might be slightly shorter - but it might be confusable at some point in the future with a WMF cross project link - which tend to use short prefixes in front of the :. Any thoughts?
It also obviates the need to remember whether to capitalize the 'L'.—msh210 22:08, 23 January 2008 (UTC)[reply]

And Category:Etymology templates. It's hard enough to memorize dozens of ISO codes, and every source language now has it's on special "code" for source language. What I'd like to see is something like {{e|ISO code 1|ISO code 2}} that would be expand ISO codes into Template:lang:xx, transclude language names out of it and use them generically.

The only reasons where specific etymology templates would be necessary is for languages/langauge families that don't have their own code (e.g. {{Sla.}}, {{PG.}} etc.). --Ivan Štambuk 16:02, 21 January 2008 (UTC)[reply]

Not sure whether this is what you meant, but {{e|xx|qq}}, expanding to [[w:Xyzzy language|Xyzzy]][[category:qq:Xyzzy derivations]], sounds good to me, much better that what we do now.—msh210 18:15, 21 January 2008 (UTC)[reply]
Yes, something like that. Just like now, second parameter should be optional (i.e. not used for English lexemes). It would also assume that 'pedia has all the articles (or redirects to them) in the format of "<language name> language" (which doesn't for some obscure ones like w:Tocharian A language, where it only has for w:Tocharian A). That part would probably ought to be manually checked. --Ivan Štambuk 18:32, 21 January 2008 (UTC)[reply]
Sounds good to me. Personally I've just been creating ISO versions, like {{Fr.}} (fr being French), but your way seems better. And if need be, we can always use an etym: template pseudo-namespace, testing for {{etym:fr}} and falling back on [[w:{{lang:fr}} language|{{lang:fr}}]] [[Category:{{lang:fr}} derivations]]. (Though I can't imagine that Wikipedia would oppose a redirect from w:Tocharian A language to w:Tocharian A.) —RuakhTALK 18:40, 21 January 2008 (UTC)[reply]
This is an excellent idea. I sort of wonder why no one thought of this before. Atelaes 08:58, 22 January 2008 (UTC)[reply]
One caveaut, however. I think it might be better for the template to be {{etym}} instead of {{e}}. It just seems more prudent to not needlessly throw away a one letter template namespace. Atelaes 09:01, 22 January 2008 (UTC)[reply]
I don't think we should use template:etym for something else just yet, even assuming it gets deleted: give editors a chance to stop accidentally using it for its old use.—msh210 18:23, 23 January 2008 (UTC)[reply]

Ok, how about {{etyl}} (etymon language). Perhaps I'm overthinking this.......Atelaes 20:57, 25 January 2008 (UTC)[reply]

The template has been created. Please see the discussion at Wiktionary_talk:Etymology#New_template. Thanks. Atelaes 04:38, 28 January 2008 (UTC)[reply]
[edit]

When you press the wiki-link button while editing, [[#English|{{subst:ucfirst:{{subst:PAGENAME}}}}]] is automatically displayed. This is according to me a good function, but I would like to see that lower case became standard, since it's the most common on pages in Wiktinary ([[#English|{{subst:lcfirst:{{subst:PAGENAME}}}}]]). Any comments to this request? /Natox 17:40, 22 January 2008 (UTC)[reply]

You are right, it shouldn't have ucfirst, but not lcfirst either I would think, just whatever page you are on. I changed it (Mediawiki:Link sample) to take out the subst:ucfirst Robert Ullmann 17:50, 22 January 2008 (UTC)[reply]
I think something went wrong, now it just displays [[#English|Link sample]], shouldn't it be somethink like [[#English|{{subst:PAGENAME}}]]? /Natox 18:05, 22 January 2008 (UTC)[reply]
Sorry, didn't use the {{nosubst}} trick. Robert Ullmann 12:57, 24 January 2008 (UTC)[reply]

User search mechanics

[edit]

I am having trouble understanding the operations of the WT search box that users must rely on. Here are some of my questions. How long does it take before the content of a new entry (or new material in an old entry) is indexed and searchable? Are any portions of the content not indexed and searchable? Is there any stemming? How are characters with diacritical marks handled when a users enters characters without diacritical marks in the search box? I would not mind doing a little work to get a modicum of understanding about this because it would affect, say, my opinions about inflected forms of verb phrases (phrasal verbs and idioms). Where should I start? DCDuring TALK 19:27, 23 January 2008 (UTC)[reply]

The WT search box is woefully inadequate in most respects. I believe it runs off a normalised set of data, probably collected twice a week with the other maintenance tasks, probably Connel MacKenzie knows most about this. In terms of Diacritics etc. I believe it handles a few, but not many - though Hippietrail will be able to tell you everything, he has written an Extension that can handle most diacritics and (as a neat side-effect) can add the {{see}} automagically to articles, so it would be good to get that installed. AFAIK all current content is searchable, though not anything from the History. Conrad.Irwin 20:23, 23 January 2008 (UTC)[reply]
"The WT search box is woefully inadequate in most respects." It's a defect of MediaWiki itself, and one that is readily acknowledged by the developers. Not sure what is their schedule about it, though. Circeus 20:40, 23 January 2008 (UTC)[reply]
Conrad, no, I was not involved in the Lucene search stuff at all. As far as I know, it is an up-to-date index (DB server slave lag can sometimes get as high as 3 seconds, but DB writes are auto-locked whenever that happens.) Absolutely no stemming transformations are performed that I know of. Diacritics get no special treatment for searches - they are just characters that either match or do not match exactly. Entries with diacritics that mention the bare forms are likely to be found when doing a search for the bare forms. The "Exact-case search" is a relatively new extension that does some diacritics-folding, IIRC. I haven't used the new feature quite enough yet, to say if it is so much better...it does now allow for case-sensitive searches, as the very least. --Connel MacKenzie 20:51, 24 January 2008 (UTC)[reply]

Fixing cardinal numbers and ordinal numbers categories

[edit]

Unfortunately, a large number of these categories were created with the wrong naming convention; these are not topic categories, these are POS categories. For example, Category:fr:Cardinal numbers should be Category:French cardinal numbers, they (the entries) are cardinal numbers in French not French words about cardinal numbers.

For example, past participles in French are in the POS Category:French past participles not in the topic Category:fr:Past participles.

Reasonably large mess to clean up; a lot of entries use the {{cardinal}} and {{ordinal}} grammar/context templates, but a lot do not.

And yes, I know the origins of this mess go way back, look at the page history on Category:fr:Cardinal numbers, but it has been made a lot worse in the last year by not it getting straightened out while we were sorting some of the other confusion (almost) two years ago. Robert Ullmann 13:36, 24 January 2008 (UTC)[reply]

If you're edit warring, know that anyways the best way to handle these is not the mixed method employed, but to distinguish the language name consistently, perhaps in another way, such as Category:Past participles in French or Category:Past participles (French). The full language names are preferred to the language codes except for technical difficulties in resolving them. However, Category:French past participles does not generalize because Category:French mountains is misleading and terms like "English" have double meaning. DAVilla 16:03, 24 January 2008 (UTC)[reply]
Currently we use the bare/ISO form for two kinds of categories: topical categories, that hold entries for words with senses relating to that topic (Category:Horses and so on), and context categories, that hold entries for words used in a particular way (Category:Slang and so on); and, we use the language-name form for two kinds of categories: POS categories, that hold lemmata belonging to a certain language and part of speech, and POS-form categories, that hold non-lemmata belonging to a certain language, part of speech, and form. (If/when we eliminate the distinction between lemmata and non-lemmata, obviously this distinction will be lost as well; I'm just talking about how we do things right now.) To me it seems that a solid case could be made for treating "cardinal number" as its own part of speech, or for treating it as a property of certain determiners. In the former case, it would get its own POS header and POS (language-name) category; in the latter case, it would get the "Determiner" (or some such) header and category, as well as a bare/ISO category. —RuakhTALK 23:40, 24 January 2008 (UTC)[reply]
I'd be for the change. And would be will to help out. --Bequw¢τ 17:43, 25 January 2008 (UTC)[reply]
I'm against the change for these categories, since they are topical, not POS. That is, the "Cardinal numbers" categories were created to be a subcategory of Category:Mathematics. The content is topical, because the items included are crdinal numbers, but may not function grammatically as numbers (which would be required for this to be a POS category). That is, there are words which name cardinal numbers which do not function grammatically as numbers, but are nevertheless cardinal numbers.
Note that "Cardinal number" is not a distinct part of speech any more than "personal pronoun" or "indeclinable noun" would be. The part of speech is Numeral / Number. The solution is to have a POS category Category:French numbers or Category:French numerals (a previous vote to standardize these names deadlocked in "no consensus"). Then, Category:fr:Cardinal numbers may be placed as both a subcategory of that POS category as well as under the topical category Category:fr:Mathematics. --EncycloPetey 05:58, 2 February 2008 (UTC)[reply]

ʤ in Edit tools

[edit]

Can sysop please add ʤ replace ʤ with d͡ʒ, invalid IPA characters (ʤ) to the edit tools in the IPA section please. It seems to be missing. I thought ɮ was it, but it isn't. --Keene 15:19, 24 January 2008 (UTC)s[reply]

IIRC, We don't use that character, as it breaks the searches. That is, "dj" is equivalent and outside of a description somewhere, is used instead. What language has that anomaly? --Connel MacKenzie 17:56, 24 January 2008 (UTC)[reply]
Apparently the special ligatures are no longer official IPA usage; instead, you can use d​͡ʒ invalid IPA characters (​), or dʒ, or simply . —RuakhTALK 23:46, 24 January 2008 (UTC)[reply]
They are no more official than "fi" is necessary as a ligature in any font, that doesn't prevent either of them from being widely used. Circeus 02:06, 25 January 2008 (UTC)[reply]
True, but the question isn't "do people use it?", but rather "do we want to use it?". And from past discussions, the answer appears to be "no", for the reason I stated. —RuakhTALK 02:09, 25 January 2008 (UTC)[reply]

uncountable

[edit]

Why does {{uncountable}} categorize words into category:Uncountable, while {{en-noun|-}} doesn't?—msh210 00:15, 25 January 2008 (UTC)[reply]

Because Ec threw a fit when I had {{en-noun}} auto-categorize entries into Category:English nouns, so I stopped there. Seems appropriate to me, though. Rod (A. Smith) 04:10, 25 January 2008 (UTC)[reply]
At this point I would be strongly in favor of somehow distinguishing the entries that had only uncountable sense to focus on reviewing them for correctness. In the long run certainly en-noun should simply catagorize entries as having uncountables. I am curious as to how the category is used apart from the review of the appropriateness of the uncountablity claims. DCDuring TALK 04:40, 25 January 2008 (UTC)[reply]
I'm in favor of having the categorization as part of the template. I'm not if favor of haing to add the category be a separate step. Also, putting it into the template means that updating the template on a page will automatically update the category (i.e. if we change the template to no longer say uncountable it will be removed from the category). RJFJR 20:51, 25 January 2008 (UTC)[reply]
If we are going to place the nouns in categories like this the the correct category would be Category:English countable nouns and so on.--Williamsayers79 00:26, 13 February 2008 (UTC)[reply]

Customizing the namespace area in searches

[edit]

Is it easy to change the bottom of the search page where it lists the namespace checkboxes (after Search in namespaces:)? There's no marking between the namespace pairs which can be confusing and with certain window widths the checkbox can be separated from its text label. And usability enhancements we can make? --Bequw¢τ 17:59, 25 January 2008 (UTC)[reply]

For <form id="powersearch">, I think it would be useful to have some buttons before it - [Select all] - [Select none] - [Select content namespaces] - [Select discussion namespaces]. The same Javascript that adds those, can add an mdash after each "talk" namespace. Re-arranging it into a table might be overkill. --Connel MacKenzie 18:42, 25 January 2008 (UTC)[reply]

W3C validator

[edit]

Happily our front page validates wonderfully for XHTML. It does have an error and some warnings with its CSS (see report). Our edit page also seems to have some XHTML problems (report). Any of this that can be fixed? (maybe it would've caught the Opera problem, though I don't know how it handles javascript). --Bequw¢τ 22:30, 25 January 2008 (UTC)[reply]

Fixed MediaWiki:Edittools (A Gr. error.) --Connel MacKenzie 22:41, 25 January 2008 (UTC)[reply]

I can't get the "double dagger" sign to show up properly. Any ideas? SemperBlotto 11:23, 26 January 2008 (UTC)[reply]

I think I got it working, just by copying and pasting it from another page. Dmcdevit·t 12:25, 26 January 2008 (UTC)[reply]
Thanks - fine now. SemperBlotto 08:35, 27 January 2008 (UTC)[reply]

Mapping translations

[edit]

On IRC we were kicking around the idea of putting translations on a map, so I tried it out. I made a mock-up which didn't turn out terribly. It looks like this in an article. It's made to be large so you can fit lots of languages on it, so I made it scroll and hide with NavFrame. I don't know if anyone else thinks this is useful. If so, we could finish putting in the coordinates for the languages (and there is some problem with the <small>s that I can't figure out). I was using [3] to get the coordinates, but it was still taking a while. Just thought I'd post this here for others to see and build on. :-) Dmcdevit·t 12:06, 26 January 2008 (UTC)[reply]

Well, you'll need a legend on the map, so people can understand what all the colours are. My initial feeling is that this map idea would be a spectacular failure, rather misleading or extremely cluttery, but is a nice idea. Looks like it'll be Incredibly time-consuming too, tho I wish you luck with it. --Keene 12:12, 26 January 2008 (UTC)[reply]
My thoughts on these are that the colors aren't really very important; we could even switch to a blank map. The current map is colored by language family because I was curious if you'd be able to correlate the word similarities with the families, or if it would be meaningful in any way. It will of course be misleading (languages aren't so neat as to fit perfectly on a map; languages can coexist not only in the same region, but in the same person, of course!) and perhaps not pretty when full (though most of our articles won't fill a map yet), but could still be worthwhile anyway. The actual implementation/maintenance of it is the major issue. However, each language could use the normal language code to identify it. In theory, I don't think there's any reason they couldn't be added and updated by a bot like Tbot every time the translations are edited. I'm far from the most technical one here, though, so I'm happy to hear your ideas. Dmcdevit·t 12:35, 26 January 2008 (UTC)[reply]
Where would you plot a dead language or an international language? Harris Morgan 12:46, 26 January 2008 (UTC).[reply]
I don't know if it's useful, but it's certainly awesometastic! :-D —RuakhTALK 16:18, 26 January 2008 (UTC)[reply]
Keep it around. I bet someone would like to test ideas out with it. --Bequw¢τ 19:05, 27 January 2008 (UTC)[reply]

{{IPA}} and stylesheets

[edit]

I would like to remove the style="font-family: {{IPA fonts}}" from this and all of its friends. The declaration is redundant to the styles declared in Mediawiki:Common.css, and in some cases the lists of fonts are already slightly different. There are numerous advantages to having the fonts in the stylesheet, though most of them are purely niceness concerns, one important thing is that with the fonts declared inline the user stylesheets cannot override them without the !important modifier. Is there any compelling reason not to do this that I have overlooked, the templates apparently still have to exist to get round bugs in IE6 (yippee)-: which still holds far too many internet users in its grasp. Conrad.Irwin 00:49, 27 January 2008 (UTC)[reply]

The templates also enable a link to the associated phonology explanation chart on Wikipedia, so the template should continue in use even if it is no longer strictly necessary for formatting the IPA characters. It also contains code that allows auto-formatting of secondary or alternative pronunciations. I can't speak to the initial question, however, since that bit of code and its function are beyond what I know. --EncycloPetey 05:47, 2 February 2008 (UTC)[reply]
Yes, I would leave the templates where they are and just remove the CSS information as it is already given in Mediawiki:Common.css. Conrad.Irwin 10:01, 2 February 2008 (UTC)[reply]

message atop each page

[edit]

The message atop each page still reads "Voting is open for Commons:Picture of the Year", although that's now false.—msh210 17:01, 28 January 2008 (UTC)[reply]

Now blanked as that seems to have been what was done last time. Conrad.Irwin 17:06, 28 January 2008 (UTC)[reply]
I just got that message when I was here without logging in. -- carol 16:21, 30 January 2008 (UTC)[reply]
Hmmm, probably a caching issue - it seems fine here when logged out. Conrad.Irwin 12:59, 31 January 2008 (UTC)[reply]

WOTD Template malfunction

[edit]

Striking. nevermind I figured it out finally. --EncycloPetey 05:38, 30 January 2008 (UTC)[reply]

[edit]

I went to the current version of bus and tried to click on the Swedish Etymology section and got sent to the 2nd English Etymology section?! The English entry has three Etymology sections (each titled "Etymology X"). And there are three plain "Etymology" sections under three other languages entries that MediaWiki auto-numbers to "Etymology", "Etymology 2", "Etymology 3". That's duplicate anchor tags for #2 and #3. Any thoughts on fixes? We could auto-"letter" the Etymology sections:) (FYI the preference "Auto-number headings" doesn't help as it doesn't change the linking/anchor tags, just the display of the headers). Also, see possibly related Mediawiki bug ticket. --Bequw¢τ 16:14, 31 January 2008 (UTC)[reply]

Just in case you thought this would only happen a few times, it's currently a problem on at least 180 pages. We could also solve this problem by introducing punctuation in your hand-numbered etymology sections (eg "Etymology-X" instead of "Etymology X"). --Bequw¢τ 19:30, 9 February 2008 (UTC)[reply]
In the bus entry I noted that the problem exists for the "Tok Pisin" and "Swedish" sections but not for the "Old Irish" section. DCDuring TALK 20:13, 9 February 2008 (UTC)[reply]
That looks like a bug in the way the software resolves the anchor names of identical headers. Since there's already an "Etymology", it gives the Swedish Etymology the anchor "Etymology 2" which it doesn't realize conflicts with English Etymology 2. Instead it should give the Swedish Etymology the next unused anchor name, in this case "Etymology 4". DAVilla 06:42, 12 February 2008 (UTC)[reply]