Jump to content

Wiktionary:Capitalization transition

From Wiktionary, the free dictionary

The vote at Wiktionary:Beer parlour/First letter capitalization was 13-2 in favour of proceeding. Below are a few topics that need to be considered to make the transition as smooth as possible.

[edit]

I don't know the extent of this difficulty. We have probably all been lax in our practices (myself included). Where the grammatical context requires that a linked word appears capitalized the format [[word|Word]] should be used.

I think most of them are correct. Back when I was more active, I was certainly checking whether this happened correctly. The only ones that might be wrong are the ones at the start of sentences. If we are going to check all the entries anyway, it seems better to me to check it all at once Polyglot 08:56, 11 Mar 2005 (UTC)
That is most likely so with the links on the more recent material, but we should be ready for surprises, especially in the endless index pages for various languages. (It might also be a good time to rid ourselves of the "Romanica" and "Espresso" language pages completely. I've mentioned these before but could not take the deafening silence as a mandate to go ahead.) Eclecticology 00:59, 12 Mar 2005 (UTC)
I was using the random page function to see what would turn up. On our many CJK pages "Kanji" and other names for writing forms are written with a capital letter; that should be changed to lower case. Eclecticology 18:56, 12 Mar 2005 (UTC)
[edit]
I'm sure Hippietrail can launch some regular expressions on the dumps to accomplish this. Polyglot 08:56, 11 Mar 2005 (UTC)

Yes, this can be done but I think my proposal makes more sense.

If he could do this it would give us an idea of the scope of the problem. Eclecticology 00:49, 12 Mar 2005 (UTC)

Proposal to use a bot to do a lot of the changing

[edit]

When the en:wiktionary is changed, a bot CAN check the interwiki links. When the interwiki links are ALL undercase with exception of pl and other wiktionaries that still capitalise, the bot CAN decide that it should be renamed and, that the old redirect should be deleted.

I will ask Andre Engels to write this functionality. When we agree that this is a valid proposal, I ask to have my RobotGMtest to be given temporary admin status and execute is as I propose..

Congratulations on a brave decision. GerardM 06:21, 11 Mar 2005 (UTC)

As far as I'm concerned, if people want to use a bot to help out, they can. Checking the other Wiktionaries capitalization probably helps, but those other Wiktionaries will often not contain the terms, so the usefulness is relatively limited. I think it makes more sense to check the what links here and have a look at the capitalization of those links. I know I have been paying attention to whether they were correct or not.
What matters is that the person who is changing an article checks the result of his/her changes and that they comply to the standards we set for our entries. Polyglot 08:56, 11 Mar 2005 (UTC)
Have a look at the amount of edits by User:RobotGMwikt almost 13.000 edits :) , many words have interwikilinks. It will not do ALL words but it will lighten the load. GerardM 21:23, 11 Mar 2005 (UTC)
I would guess that the lower cased links are mostly correct since that is the usual default for English words. That should give us a good starting point. Then
  1. Generate a list of all capitalized links.
  2. Clean those up manually where needed. Temporary category markers could be placed at this time on those pages that will need to be split; this may reduce the duplication of efforts.
  3. Depending how long the clean up takes, the list may need to be generated a second time.
  4. Toggle the switch, and run a bot that changes the title for all pages that do not have any capitalized links. This is the key step. Will we need to be off-line while it is running?
  5. Manually create split articles for words that need it.
Eclecticology 00:50, 12 Mar 2005 (UTC)

Some articles will need to be divided

[edit]

There's no particular immediacy to this. It can probably be done at people's leisure.

Funnily enough it's the thing I would tackle first, but I probably have my priorities mixed up :-) Polyglot 08:56, 11 Mar 2005 (UTC)
Sounds more like practicalities than priorities. It's impossible for it to work until after the switch is made. Eclecticology 01:03, 12 Mar 2005 (UTC)

Do we include cross references when a word has articles for both forms?

[edit]

I think this it is very important to do this. As far as I can remember it was the most valid point to oppose the switch, if it didn't get done. Polyglot 08:56, 11 Mar 2005 (UTC)

I do not see this as a problem, one word can be French the other Italian and why should they be cross referenced ?? they are seperate words. If you look at them alphabetically within a language, a diffenent manner of sorting would do the trick. GerardM 22:00, 11 Mar 2005 (UTC)
You're perhaps right that in many cases it will not be a necessity, but the English language months March, May and August have uncapitalized counterparts with different meanings, but Polyglot's observations about the previous vote are significant too. Doing this right will give that kind of extra touch that makes Wikisource more user friendly. The cross-references can probably be created at the same time that we are doing article splits. Eclecticology 01:15, 12 Mar 2005 (UTC)

How should the "Go" and "Search" functions perform?

[edit]

Search should find both forms and let the user decide where he wants to go. Go could be implemented to go to the existing entry, regardless of the case the user typed in, if only one version exists. Or it could propose an intermediary screen if both capitalizations exist. Polyglot 08:56, 11 Mar 2005 (UTC)

It would be helpful to know what the wiki* built-in options are, in this regard. --Connel MacKenzie 17:07, 11 Mar 2005 (UTC)
That is, in my experience, having a checkbox that says "Match case" has always been the most intuitive user interface for handling this. Is this option available? --Connel MacKenzie 19:02, 11 Mar 2005 (UTC)
According to the developers, the "Go" search is already case-sensitive. On MediaWiki-L awhile back, Brion Vibber wrote: "Since the fields are not case-insensitive, 'go' can't do a case-insensitive exact match check. Instead, it checks for several variants (exact, exact all lowercase, exact all uppercase, etc). On a multi-word title such as a name where the second word is capitalized, typing in all lowercase won't get a 'go' match and the search will be engaged instead." I know from experience that if several case variants exist, it will simply go to the first one it finds. —Muke Tever 17:11, 15 Mar 2005 (UTC)

I think that Polyglot's suggestion is sound. Connel does bring up an interesting issue, which could be rephrased to state that the sophistication and extent of the search function has not kept up with enormous scope of the Wikimedia family of projects. I know that our developers are overworked so I can't blame them. Nevertheless, advanced search features would be an asset in many areas. Clearly documented Boolean operators, searching only titles or only categories, case matching, part word search could all be very useful. Title only searches should go much faster than full text searches. Eclecticology 01:31, 12 Mar 2005 (UTC)

I wasn't trying to place blame; I'm grateful for the work the dev's have done to date! I was just tossing out an idea that I found useful in the past. The checkbox "exact case" or "match case" or whatever, comes in handy if you want to search for my name, "MacKenzie" by misspelling it as "Mackenzie." Sometimes you want it to work, other times, you'd rather have it be an exact match.
I very much agree that Wiki* searching can be refined tremendously, to match the enormous scope, as you said. I'm curious what you mean by "searching only titles" - do you mean article titles, headwords also, or the line of text at headword(s)+1? --Connel MacKenzie 05:10, 16 Mar 2005 (UTC)
I was thinking of article titles to answer the simple question, "Do we have an article on this?" This would be a much faster search than one that looks at everything. Your other suggestion represent other possibiliuties that could be accomodated. Eclecticology 09:17, 16 Mar 2005 (UTC)

Existing templates

[edit]

Some of our articles include templates and links in templates. How these will interact with the capitalization change seems uncertain for now. Eclecticology 01:31, 12 Mar 2005 (UTC)

There are less than 1,000 templates though. I expect any vagarities we find with these can be fixed manually in short order - hours, not days. --Connel MacKenzie 05:15, 16 Mar 2005 (UTC)

Date - April 1 2005 at the latest

[edit]

The decision has been made, it is reasonable to discuss the implications before the change is made. However, our community has a tradition of going on and on talking. I propose that we will change on April 1th at the latest. If we have not decided earlier how to do things, it will just happen on that day. GerardM 06:31, 12 Mar 2005 (UTC)

It took long enough to get to this point. Let's not spoil things by setting artificial deadlines that would only rush things. Getting it done right is far more important than getting it done quickly. Eclecticology 18:52, 12 Mar 2005 (UTC)
OK, How about May 1st then? In my opinion, having an arbirtrary deadline is a very good idea in this case. If two or more regular contributors complain, it can be moved back more. But I agree with the sentiment that the sooner the change is made, the sooner the (unforseen) mess it causes can be cleaned up. --Connel MacKenzie 04:59, 16 Mar 2005 (UTC)
Certainly I would hope that we would be done by then, even without setting a deadline. What we need now is someone technically savvy that can rund the scripts to give us the information we need to carry on. Eclecticology 09:22, 16 Mar 2005 (UTC)

Although I was critical of Gerard's proposed deadline, this does not mean that I consider doing nothing as a valid option. There are things that I would like to do, but I unfortunately consider myself to be a technical zero, and the one person who has shown himself most capable to do the preliminary analysis is also the one regular contributor who has taken the strongest stand against this proposal. So unless somebody is both willing and able to do the technical work needed before the conversion, this ain't gonna happen. Eclecticology 20:55, 24 Mar 2005 (UTC)

Reservations

[edit]
Now that I understand 1) all German nouns have the first letter capitalized, 2) there are a LOT of German nouns that have a corresponding word in English, I seem to be having the same reservations as HT. --Connel MacKenzie 22:58, 28 Mar 2005 (UTC)
Yes, all German nouns are capitalized, but I take that as an argument for distinct articles, and the German Wiktionarians are quite happy to have the ability to make the distinction. Many dictionaries use lower case for a word entry unless there is some special reason to capitalize the word. And what do you do with abbreviations where a capitalization can make a difference. There are also special cases like "pH" and "eBay" where a first capital letter looks strange. We really do need the flexibility, though of course appropriate cross-references will be essential. Eclecticology 03:35, 29 Mar 2005 (UTC)
Having read the extended argument history, I get a sense that perhaps ALL database entries should be lower case, (specifying what the title should appear as within the article, not automatically) and all cases/forms combined into one article (under separate headings.) That way, someone new to Wiktionary looking something up for the first time may actually find what they are looking for. Having the search default to the (often undesired) German word seems like a recipie for disaster...that makes Wiktionary less usable, not more usable.
OTOH, it does seem that case sensitivity is ultimately the "more correct" thing to do. Having case sensitivity turned off for only the first character is inconsistent. But without some way to exclude non-English words by default, or a better way to return a group of case-insensitive search results, that darned "Go" button will become less useful than it is even now.
I can't think of a way to make a change of this magnitude work, without a software enhancement of some sort. It would be nice to have other capitalization variations listed automatically at the top of an article. Without that, the likelyhood of newcomers entering the English definition of kind in the German page (again and again) seems kindof high.
One thing that was not clear from the earlier discussions: the find algorythm presents the first page found. Does this mean the oldest entry? The more recently modified entry? Or does it really always default to the upper case "match" first?
--Connel MacKenzie 04:10, 29 Mar 2005 (UTC)
On la: as an experiment I modified it (for those using Monobook) so that the title appears within the article, not automatically (with two templates to keep the format uniform, one to use the 'default' page title, la:Template:caput, and one to use a customized one, la:Template:caput2). (This was one of the suggestions for alternate solutions when the issue first rolled around, v. User talk:Muke#Decapitalisation issue) This allows for:
  • special page titles, such as on Z ("Z, or z")
  • language marking, such as on 海馬 (the title is specifically marked as Japanese, and a browser such as Mozilla will use the user's specified japanese kanji font instead of a chinese hanzi font for it, and the reverse is also possible; hopefully in the futurue capabilities like this will be extended in browsers for other languages with script variants, such as Arabic-script–using languages and Cyrillic vs. Old Church Slavonic)
  • pages where Mediawiki is unable to render the proper title at all (e.g., , which has no Unicode uppercase--has Wiktionary tried to make an entry for C++ yet?).
  • words which are never to be capitalized at all, e.g. quy'Ip
Best of all it doesn't fiddle with index listings such as categories and special:allpages, or titles in other namespaces than the main article one. —Muke Tever 17:50, 29 Mar 2005 (UTC)
The vote was essentially about the principle of getting rid of forced first letter capitalization. If you look at the explanation for the no votes there you will see that some of the objections were directed at how this would happen, or about whether the vote was intended to apply across all Wiktionaries. Events have shown that the second point is no longer an issue. The first about how to implementat it was treated this time as a separate issue. Once the vote passed we could go on to implementation discussions, and that's what this page is all about. The issue of how the "go" function should work is mentioned further up on this page. Eclecticology 06:48, 29 Mar 2005 (UTC)