Jump to content

Wiktionary:Beer parlour/2004/October-December

From Wiktionary, the free dictionary
This is an archive page that has been kept for historical purposes. The conversations on this page are no longer live.
Beer parlour archives edit
2024

2023
Earlier years

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002
December


Open Letter to Eclecticology

Like all Wikimedia, Wiktionary is an advertisement for the Wiki process and the ideal that a dynamic and decentralized group of contributors can produce a result every bit as good as, if not better than, that of a closed group working under the supervision of an editor. A Wiki, the theory goes, is not subject to the biases and caprices of a single person, but is pluralistic, admitting more than one view of what sort of contribution is appropriate or useful. It is by nature inclusive.

Making this work requires restraint all around. Speaking for myself, I am finding my restraint steadily fraying as what should be an easy and pleasant experience degenerates into an exercise in tedium. I don't like this, and I am reaching the conclusion that the best remedy for me may be to take a deep breath and at least a brief break.

Before I go, though, I would like to explain what has brought me to this point. In brief, I have found myself spending entirely too much time in counterproductive effort centered on the RFD page. As far as I can tell, this page should mainly be devoted to obviously malformed or vandalistic entries which require a sysop's intervention to delete. Even then, this tool should be used with restraint. For example, You are an idiot! had been skillfully co-opted as a page of translations, thus blunting its effectiveness as a prank and even adding moderately useful information in the process. Actually removing it led to a round of editing skirmishes and its current enshrinment as an advertisement that Wiktionary is easily pranked.

Instead, it seems RFD has become a breeding ground for Scrabble ®-style "challenges". If an entry is well-formed but doesn't suit someone's notion of a proper word, up it comes on RFD. Some of these challenges are legitimate: no one has heard the term, there are no likely looking hits in BNC, Google or other sources, and so forth. But as far as I'm concerned, anyone adding an entry to RFD has the responsibility of doing a bit of homework first. "I don't like it" is no justification for RFD, and "I haven't heard of it" even less so.

Further, as soon as someone with no manifest axe to grind is willing to stand up and defend an entry, that should be the end of the RFD. Sure, it's good to have quotations, and it's always good to critique the assumptions an article may be making, but there is already a venue for this, namely the Talk: namespace. RFD should be a last resort, not a first.

Far too many times have I contributed an article, or polished up someone else's garbled attempt, only to find that, no, that doesn't seem right to me, can't possibly be a word, RFD. I then take about two minutes to hunt up suitable evidence on the web, and another few minutes writing it up, and then maybe a couple more rounds of this until the discussion dies down.

I can think of quite a few times this has happened, but few or none when such a term has been legitimately removed. Often an existing entry has been strengthened from the additional evidence, but again there is a forum for this, and one of Wikimedia's main tenets is that a partial first attempt is just fine — articles need not be born fully formed from the head of Zeus.

The proverbial straw on the camel's back is pr0n. Originally this was thought to be a random misspelling, but it should be abundantly clear by now that it's not. This spelling, and this spelling particularly, is widely used for a desired effect. Maybe it's not a proper English word (I'd argue that it is), but so what? Wiktionary is vigorously multilingual. Surely there's a place for it somewhere.

The history of this entry is instructive. Someone created the article, it was soon RFD'd, and a couple of us explained why it could be considered legitimate. You then said you weren't convinced and deleted it.

This is not the Wiki process as I understand it.

The presumption as I understand it is that, absent some compelling reason to the contrary, a well-formed and properly NPOV entry should stand, and even an ill-formed or POV entry should be amended, not deleted. Someone placing an article on RFD does not change this. It does not create a presumption that the entry is bad and that it will be removed if not defended. It especially does not create the presumption that the entry will be removed unless some particular jury of sysops — I use the plural generously — is convinced it should stay.

My advice: Please lighten up. Neither English nor Wiktionary is in serious danger of corruption from the presence of entries like stunod and monkeys humping a football, or from putative translations into Romantica or Ekspresso or whatever they are. The effort spent contesting these issues would be much better spent coordinating efforts and filling in the many remaining gaps in the lexicon. Filling them in, that is, with modern, readable, community-contributed definitions; but that's a separate source of exasperation.

All frustration and irony aside, I actually believe that additory and re-search have every bit as much right to be in Wiktionary as pr0n. To my mind, neither ever had any business showing up on RFD. I apologize, but not abjectly, for losing patience and putting the issue in such a backhanded way. Frankly, I'm more than tired of the whole flap. I would just like to be able to make contributions in peace without having to contest every one that doesn't suit your sensibilities. I certainly don't mind the occasional call for better supporting documentation on the Talk: page. That's normal and healthy. But I would like to know the entry isn't going to go away entirely, and my work with it, if I don't step up to the plate on RFD. And especially even if I do. -dmh 20:36, 15 Nov 2004 (UTC)

I have no problem with the philosophy of decentralization and inclusivity that characterize a wiki, and the value of the product that can be derived therefrom. Thus far, Wiktionary has indeed followed a pluralistic approach that allows a wide diversity of contributions, and I certainly have tried to avoid the worst of Wikipedia's VfD offences. Visions of just what a dictionary does are bound to clash, and the validity of some entries are bound to be challenged, often vigorously; that too is a healthy part of the process.
I don't believe that the RfD page has been overused, and I don't think that it needs to be limited to the obvious vandalisms and malformations. There are candidates there that do qualify for immediate deletion, and others that require discussion. In either case, if the item is deleted there still needs to be room for discussion. There are many articles mentioned there that a pure product of some individual's fantasy, or which reflect an attempt to promote some idiosyncratic or local application of a word. The former entry on Jamesing was such an example that reflected certain behaviours associated with a local individual. That kind of word formation is common, but its application is ephemeral. A James in another community would likely be associated with a completely different activity.
"I don't like it," is certainly not a valid excuse for seeking to have something deleted. But not having heard of something fares a little better. Asking, "Where does this come from?" is perfectly valid. It's consistent with the principle of verifiability. If no-one can produce an answer to that question a decision needs to be made. Many of these questionable items are from anonymous IPs, and could as easily be from someone trying to game the system. We should always be prepared to authenticate our contributions.
The simple fact alone that someone may be willing to defend an article is not reason enough to keep it. That decision should be based on the entire discussion, and the result of these discussions can go either way. Confining this kind of discussion to the article's talk page virtually guarantees that the issue won't be noticed, and that the article will be saved by default.
That you should be willing to spend the few minutes seeking evidence for a word's existence is commendable. I certainly hope that more people will be inspired to do the same thing. If, as you say, this process most often results in the article being saved that only proves that the system is working. The issue is not about partial first attempts; if there is any evidence that the word is legitimate it will likely survive, even if it is short. This project differs from Wikipedia in that there is no general hostility to stubs. An article on children that said nothing more than that it is the plural of child would be a perfectly acceptable beginning.
The issue of pr0n is really the legitimacy of the status of any kind of leet entry. They are not words; they are wilful typographical distortions of words. From what I can see, any English word can be transformed into a leet counterpart. Like pig latin they are the product of infantile word games intended to mask their meaning from the uninitiated. Fortunately nobody is trying to promote pig latin. Like pig latin formulas can be applied to English words to produce new leet words.
Being poorly formatted is not alone a criterion for deleting an article, and NPOV issues are just not a big problem in this project. Verifiability and the legitimacy of a word are. Remember that a lot of these entries are from anon IPs who aren't around to answer the questions that have been raised
The problem with Romanica and Espresso is they are not legitimate languages; they are at best dialects of yet another artificial language, Interlingua. Contesting these issues is an integral part of the wiki process. The fact that a regular contributor is upset when some of these articles are contested is no reason to stop contesting them.
On the whole, no-one is stopping you from adding articles in peace. Those that are challenged are a fraction of what you have contributed. And only a part of these end up deleted. It's not just my sensibilities that are involved; there are others who put far more things on RfD than I. Eclecticology 09:43, 16 Nov 2004 (UTC)
I will simply quote from Wiktionary:Page deletion guidelines
In case of serious uncertainty about an article it is usually better to give the benefit of the doubt to keeping the article. This has a positive effect on the overall communal mental health.'
And from the "Notes to administrators" on the same page:
If you are uncertain about whether a page should be deleted, it is safer to leave it in place.
Am I correct in inferring from the page history that you wrote these? -dmh 16:27, 16 Nov 2004 (UTC)
Yes, I did write these. I still agree with myself. The problem now is what constitutes uncertainty? :-) Eclecticology 18:50, 16 Nov 2004 (UTC)
That's to be decided by the community. Your call as a sysop is not "Do I (with my contributor's hat on) feel certain that this term should be deleted?" but "Is there a clear consensus that this term should be deleted?" Naturally, in cases where the contributor is clearly associated with the term (e.g., vanity terms, vandalism and joke/test entries) the contributor is automatically recused, and in these cases there is nearly always a clear consensus in the rest of the community.
This is why I say that once someone clearly without an axe to grind steps up for a term, the presumption is that the term should be kept. Naturally, the advocate may withdraw support either spontaneously or after discussion (this has happened — I've certainly withdrawn support on occasion). Until then the article should stand. Naturally, an anonymous advocate may be presumed to have an axe to grind. Conversely, a regular contributor willing to publicly defend a term should be presumed impartial unless proven otherwise.
In short, what I would like to hear, but haven't heard yet, is that deleting pr0n in the face of clear public objection was overstepping. Those who spoke up for the article would have to have been convinced to withdraw support before it could be deleted. They were not. It is not anyone's task to convince anyone to keep an article. -dmh 06:17, 17 Nov 2004 (UTC)
There is no reason to suppose that a regular contributor has any less of an "axe to grind" than an anonymous contributor. To presume that an anonymous contributor has an "axe to grind" attacks the fundamental right of a Wiki contributor to anonymity. The disadvantage that an anonymous contributor puts himself into lies in his being unable to be contacted when his contribution is questioned. Even if a significantly higher proportion on indefensible articles come from these anonymous contributors, it does not warrant the presumption that they have an "axe to grind". Similarly there is no automatic presumption that a regular contributor is impartial; When he stands up to defend an article he most certainly is partial. Ultimately, the content should be the deciding factor in what is kept, not the status of the contributors, opponents or other participating editors.
Your position is tantamount to saying that a regular contributor has the right to have his articles retained no matter what they contain. His word is law without any need to verify his contributions; his inventive misunderstandings of words must be presumed accurate. I don't think so. There is no broad consensus about leet words. This debate is really more about leet words generally that about the particular one, pr0n. Once you have convinced the developers to establish a sub-domain for a leet pedia on an equal footing with Klingon, it may be easier to sell your typographical monstrosity to the rest of us. Eclecticology 11:04, 17 Nov 2004 (UTC)
Sorry, I didn't mean to use "axe to grind" so broadly. I was referring to self-promotion and such. For my part, I don't stand to gain in advertising for my business or philosophical movement if (say) pr0n is rightfully reinstated. I do of course have a general bias towards including terms which are demonstrably used in running text and are included in other dictionaries, particularly a carefully-researched one such as The Hacker's Dictionary, but I don't have any particular bias toward pr0n.
Conversely, the anonymous user who has just posted Blargology with a philosophical tract as "definition" and a link to blargology.com can be presumed to have an interest in promoting Blargology. You're quite right in pointing out that the content should be more of a factor than the anonymity, but really there are plenty of cases like this in RFD. Anonymous entries which smell of self-promotion tend to get quickly gunned on the assumption that the particular anon user is up to no good. There is, of course, no shortage of useful entries initiated by anons. The formatting and style tend to leave something to be desired, but the headwords are often good. I wasn't talking about those.
But what are the main issues here?
  • Unilaterally deleting an article in the face of specific and reasoned arguments in its favor is overstepping.
  • Typographical ugliness is no reason for excluding a term (or should we remove A4 and refuse to admit TelePromTer and pH?)
  • Membership in a suspect class is no reason for excluding a particular term (see the "Clean well-water act" below). If I haven't made it completely pellucid yet, I do not support inclusion of all L33T words.
  • "I haven't heard of it" is only worth considering if the person claiming this has done basic homework first and still hasn't heard of it.
Have you actually consulted THD, slashdot or anything at all to support your position? From what I've seen, I tend to doubt it. As far as I can tell, you simply have some deep-seated conviction that a three-letter-one-number word for pornography is an offense to the nostrils of some lexicographic god and so must be smitten down. That's fine, but not relevant to the discussion. -dmh 16:08, 17 Nov 2004 (UTC)

Do I have to register in each wiktionary?

I tried to log in to the Spanish wiktionary and my user id was not recognized. Do I really need to register in each wiktionary I want to contribute? --Opa 02:00, 16 Nov 2004 (UTC)

'fraid so (this is especially annoying if you often change computers and have to login again). I wish there was a common account for all wikimedia. That could also prevent name conflicts (two users registering the same name on two wikis), though I haven't heard of them yet. Flammifer 09:20, 16 Nov 2004 (UTC)
There are currently discussions going on in Meta about aingle log-ins. It is the thing most requested of the developers. The difficulty in implementing it relates to how we would resolve existing name use conflicts. Eclecticology 09:47, 16 Nov 2004 (UTC)

The clean well-water act of 2004

From time to time the argument is raised that a particular entry shouldn't be included in Wiktionary because it belongs to some class that clearly should not be included in Wiktionary wholesale. I realize that this is technically not a case of "poisoning the well", but it's certainly similar in form. In any case, it's clearly fallacious. Some cases in which this reasoning has been applied, but ultimately rejected, include

  • misunderestimate Certainly we don't want to include all random productions of the presidential Markov chain, but this particular one has been used independently in print, as itneatly expresses a particular thought.
  • develope We shouldn't include every possible mis-spelling of every word, but this one is much more common than chance and has a historical precedent.
  • monkeys humping a football Anyone can make up a cute image, but this particular one has caught on and been applied independently by various parties far, far too often to be explained by chance.
  • w00t, which unlike almost all other L33T-isms has passed into general usage (and so should be marked as English — see the discussion page). pr0n would be here too if not for a pissing match currently in progress.

For that matter, do we mark, say, all German or Italian words as English? No, but blitz, presto and many others have been borrowed into English quite comfortably.

Pig-Latin looks to be the next battleground. It would be silly to include Pig-Latin versions of every English word, but ixnay, amscray and possibly a few others are understood and used idiomatically on their own.

In short, the useful question to ask in deciding whether to include an entry is not "where does it come from?" (that's for the etymology), but "how is it used?" -dmh 15:33, 17 Nov 2004 (UTC)

All words in all languages in all wiktionaries?!?

I am extremely confused here. Currently all wiktionaries on all languages try to have all words in all other existing languages. What is the rationale behind that? That just makes it very hard to find any information, and you have tons of different version of the same word too keep updated.

For example: The swedish Wiki has the swedish word kärlek. It translates this to english, as love, and links to the english word love, but on the swedish site. There is almost no information there, but of course, on the English site, the word love has loads of information. It also has translations to many other languages, including the one to Swedish. Which of course links to the swedish word kärlek on the english Wikipedia. Which does not exist. But it exists on the swedish Wikipedia, so why not just link there? Anybody interesting the the swedish translation of the word would probably be much happier with some information in swedish than none at all.

And the word Korean korrektly translated to Swedish as koreanska and korean. The first female form of course does not exist in the english dictionary. But it does in the Swedish: http://sv.wiktionary.org/wiki/Koreanska. With loads of grammar and info. The second male form exists, but as an english word!

I just don't get it. This makes absolutely no sense at all. This must be the most immense attempt at a double effort in history. If a hundred languages all try to cover all other 100 languages well end up with 10.000 separate definitons of every word! Regebro 20:57, 21 Sep 2004 (UTC)

It's all very simple. The English Wiktionary is directed to an English speaking audience, and the Swedish one is for an Swedish speaking audience. The primary function of each is as a dictionary of its own language. The translations are an added bonus. If you feel that the Swedish should show both the masculine and feminine form, feel free to add the one that's missing.
The link from love to kärlek is, as you say, to something that does not exist. What would happen there when the article is written is that it would give a brief explanation of the Swedish word in English. It would also link to the Swedish Wiktionary, where there should be an even more thorough discussion of the word in Swedish. If the same word existed in a language other than Swedish that would also be there. The pages on the English Wiktionary for foreign language words would not include the links for all the translations to third languages.
I would hope that the Swedish Wiktionary would work in a similar way, but whether it in fact does depends on those working on that project.
Tramslation lists should never be taken as the last word on anything. Following your examples, for Korean the translation lists will work reasonably well, but with abstract words like love the results can be unpredictable. The connotations of the word "love" can vary considerably between one language and another. I feel that it's very important to avoid a naive approach to language. Eclecticology 23:15, 21 Sep 2004 (UTC)
I'm not taking a naive approach to language, but maybe a naive apprach to humanity. ;) I just don't think it is possible to actually to do it! There are thousands of words in even the smallest language. There are currently, I think 176 languages in wiktionary. We are talking HUNDREDS of MILLIONS of separate articles. And upwards a million in each language. The result of this approach is not only immensly confusing, people will expect an english dictionary to have English words, not Japanese, but also severaly limiting to the usability of Wiktionary, since of course, most of these articles will never be written, and most translations will never be done. It is better to collect all information about one word in one place. Not 176 different places.
And obviously, I am not saying that a one-to-one link should be done between one word and translations. As in any dictionary, there should be a list of words that are possible translations, or when that is not possible, why not simply use the whole explanation as translation? Nothing stops that.
The current approach still makes absolutely no sense to me. It seems to be the most impractical solution possible to a problem that never existed. :) Regebro 23:46, 21 Sep 2004 (UTC)
You are spot on one hundred procent correct. However having 10.000 different definitions is no problem they define a word in a language. The waste is in all things that are the same in all languages, including the identification of the word type, the gender, the language a translation is in, the language the word is in etc.
This is why I propose to have ONE database with all words in all languages. The proposal is on META.
Consider however the dead wood dictionaries; I have an nl dictionary, a en-nl and a nl-en, a de-nl and a nl-de dictionary. The nl words are in all nl based dictionaries the same so having something to all other languages is not that radically different. GerardM 21:38, 21 Sep 2004 (UTC)
Ah, found it! http://meta.wikimedia.org/wiki/The_Ultimate_wikitonary Regebro 22:48, 21 Sep 2004 (UTC)
(the stuff below is a reaction to Regebro 23:46, 21 Sep 2004 (UTC)) By posting it directly underneath it the other older threats were broken)
But it's not an English dictionary; It's an English-French, English-Chinese, English-Greek, English-etceterian dictionary. It is good that English speakers can look up Japanese words here.
It's true that most of the articles won't be written anytime soon in all languages (which is part of why interwiki linkage should be automatic between extant articles, to make them easier to find...) It may be better to collect all information about a word in one place, but then, it's not always practical. On en: we can say that foo is Bazian for bar, but on baz: the definition will have to be more extensive; further, there's no guarantee at all that the Baz translation will be a direct translation of the English definition of bar, nor even that on, say French, that the word that translates foo will be the French translation of bar. So translations and definitions (the meat of our articles) can't be so easily and honestly commonized this way.
Even things like parts of speech don't translate well. "Latin" in Latin (to give an example I'm familiar with) is what we in English would call an adjective (Latinus), but what the Romans would call a noun (nomen), the difference between what we call adjectives and nouns not amounting to a whole different part of speech. Indeed, the Latin word meaning "adjective" is itself an adjective; an adjective is a kind of noun, or nomen adiectivum. So there is really sort of a lot being translated; one might just as well ask why all the Wikipedias are being written in different languages.
The only things that can really be easily carried over are ancillary things relevant to the word itself and not its meaning or classification, such as the spelling/capitalization, the pronunciation, and the etymology. —Muke Tever 16:31, 22 Sep 2004 (UTC)
Etymology cannot easily be carried over; it needs translation in every case. Pronunciation may be a better option certainly when sound is used.
The en:wiktionary is an en:everything else dictionary. All words in other languages exist if they are defined in English. In the totality of all wiktionaries, a word may be defined in en: and it should be defined in its native wiktionary. The meanings of a word should be the same for both en: and any other wiktionary that carries that word.
The question why there is a need for all those "other" wiktionaries is simple it has to do with what language are you comfortable with. The same question asked for wikipedia's is answered the same but it also depends on what culture you are comfortable with. What is said to be NPOV in an English setting, is not necessarily NPOV in an other language/culture.
The "ancillary" things makes the glue that binds all these wiktionaries together. By sharing the things that are the same in a database, a new meaning defined for an existing word will open a space to translate that word or definition in any other language. The result is that all these different translations/definitions will find their place eventually. When in the en:wiktionary someone defines some Dutch word as a translation for en English word, would it not make sense to have it been seen by all people intrested in Dutch ?? Would you not want to have it corrected by Dutch speakers when it is subtly wrong ? GerardM 10:55, 4 Oct 2004 (UTC)
I think this issue should be taken from the perspective of, is redundant work being done? and is there a way to limit it? For instance, the translation table for car should be on the polish car definition pretty much as is. Though translation tables can't just be copied automatically from language to language, but if there was someway to indicate where parts were language-independent, perhaps they could.--Eean 20:54, 11 Nov 2004 (UTC)
Seems to me that the difficulty is in finding the right tradeoff between avoiding redundant work, and having something that's fairly easy to understand and edit - the current setup seems fairly easy to understand, maybe some huge database would be more powerful, but would diminish casual contributions. (Well. Maybe not) Flammifer 09:11, 16 Nov 2004 (UTC)

Let's write translation with the links to proper languages, then problem of word multiplication is solved.

example: be

And what about writing transcription next to translation? Is it a good idea?

example: be

wikigs 16 Nov 2004

I don't think so. The link in the translation list should ontinue to be to the page in this Wiktionary. It's there that we can show the pronunciation, and have the link to the Wiktionary for that language. The importance of such pages in this Wiktionary is that they explain the term for the benefit of English speakers. A person with an extremely limited knowledge of the other language would be completely lost if he had to read the explanations in that language. Eclecticology 19:18, 16 Nov 2004 (UTC)
I think that we should link to the proper language and to this Wiktionary. How do you like this syntax, that is suggested in help for the German Wiktionary template. -- Notan Frag 20:18, 27 Nov 2004 (UTC)

SQL dump

Recently I have been downloading the SQL dump files and playing with Perl scripts to parse them. Tonight I've just got something basic but useful happening. I have two aims:

  1. Incorporate translation info from Wiktionary into my own GPL project, Linguaphile.
  2. Automatically tidy up or list articles with problems.

Well for number 1 I'm obviously on my own, but for number 2 that would mean a bot. It looks like nobody's used a bot here officially though the CJK characters a while back and many basic English definitions recently seem to have been bot-driven.

Before I just run off and do it, I'd like to find out if we have or should have a system as exists on Wikipedia. I'll set up an account for my bot and do initial small test-runs in batches of about 20 and then ask for feedback, reverting all changes if there's a negative reaction, proceeding with more if there's a positive reaction.

So where do I ask permission to run a bot here? Who do I ask to flag my bot's changes so they don't flood Recent changes?

And lastly, I would like to ask a favour. Since I only have internet access through work's Windows computers I would prefer not to install Perl etc here as my boss may not approve. Would anybody here be able to provide me an SSH account on a *nix box?

Apologies for any repeats from the above conversation but that seems to have gone in its own direction. — Hippietrail 13:08, 28 Oct 2004 (UTC)

Two questions: 1) What do you mean by "Automatically tidy up or list articles with problems"? and 2) What do you mean by "a system as exists on Wikipedia"? Eclecticology 22:49, 29 Oct 2004 (UTC)
  1. Automatic tidying/listing
    • Tidying:
      1. Removing empty sections, empty translations.
      2. Fixing heading levels.
      3. Putting SAMPA pronunciations inside <tt> tags.
      4. Changing "g" to "ɡ" within IPA pronunciation sections.
      5. Dewikifying common language names in translation sections.
      6. Relegating optional Arabic and Hebrew marks to pipelinks.
      7. Other good suggestions given to me.
    • Listing:
      1. Articles which have been deleted and come back.
      2. Articles which have two entries in the database with identical titles.
      3. Other interesting things I find whilst poking through the SQL dumps.
  2. System as exists on Wikipedia
    On Wikipedia there are set rules for how to set up a bot. I'm sorry I'll have to find the exact article. It might be on MetaWiki. Basically I believe you must have a separate account just for your bot. You must list your bot on a certain page. Maybe you have to state its goals and get feedback or permission. If it's going to make lots of changes there is a person or persons who can set that account as a bot account which flags its changes so that they do not clutter the Recent Changes.
Hippietrail 12:53, 30 Oct 2004 (UTC)
Thanks for answering. I very much agree with the second part. Developping our own set of rules would be a very good thing. I've already lamented that we don't have any way to approach these.
Other good suggestions or interesting things would each need to be judged on their own merits; I presume that as long as they aren't identified there are probably few enough to allow them to be maintained manually. Are the other two items that much of a problem? The very few that keep coming back as common vandalism can probably already be watched through RfD. Two entries with identical titles sounds like a software bug unless you're thinking of capitalization variants.
What I'm planning to do is to have people ask for things on User:HippieBot's talk page. Other people can comment on them there. Then I'll try to work out a script to list the affected pages. Then I'll post the resulting list on a subpage of the bot's page. People will be able to check that the results are satisfactory and to follow the links to fix the offending pages manually.
At the moment I'm concentrating on the parsing so I'll be able to find the problems. I haven't made the actual bot part yet which means auto-submitting changes. That's a 2nd step but very useful when people have decided changes should go through but affect a large number of entries. — Hippietrail 03:32, 3 Nov 2004 (UTC)
I looked at your lists in detail, and even fixed a few of the problems that they discovered. Very good!!! A bot that merely searches for problems and dumps the result in a file for people to look at is seldom a problem unless it manages to grind the whole system to a halt. Eclecticology 21:47, 3 Nov 2004 (UTC)
I don't know if we should remove all empty sections, though many are pretty useless. I agree with removing empty translations. I think that heading levels will vary from one article to another, so I'll reserve opinion until I've seen how it works. I have no problems with either point that you make about pronunciations. I also agree with dewikifying the language names; whenever I've tried this manually in a long list the job is pretty tedious. I am not sufficiently familiar with the Arabic/Hebrew issue to make a comment. Eclecticology 19:30, 31 Oct 2004 (UTC)
I'll make some new sections below to ask opinions on empty sections and heading levels though some of these have been discussed and agreed upon on various places. — Hippietrail 03:32, 3 Nov 2004 (UTC)
Hi Hippietrail,
Your bot-embryo seems to be doing some useful stuff! Congratulations. To ask for bot-status for your Hippiebot account, I think you will have to talk to Andre Engels. He will probably want you to get consensus from the contributors over here before he will want to assign the status. So maybe a vote is in order. We might even need a new page to do that kind of voting...
I think the approach you are taking to prove the usefulness of your bot is a good one. I was already editing some of the entries of your list as well.
As for the UNIX account. Wouldn't it be possible to set up a Linux box under your desk? Installing ActivePerl on Windows is not a big deal, but I can understand your boss has a policy forbidding the installation of additional software. What you could also do, is to set up a box at home that you can access through ssh. To solve the dynamic IP address problem, you could use dyndns.org. An other way to get access to a shell, is to subscribe to a university. I know the university of my city provides it. Polyglot 02:42, 4 Nov 2004 (UTC)

Categories on the nl:wiktionary

Following Muke Tever's suggestion, I have implemented categories on the nl:wiktionary. As a consequence everytime it is indicated that an article has a section for a language by the {{-xx-}} templates, a template is inserted for that word.

Andre Engels was really helpfull in running a bot over the nl:wiktionary that saved the record without changes; this resulted in the fact that ALL articles can now be found in the categories. This works for most languages. There are some language with no words yet that do not have the categories.

Thanks again to Muke and Andre, GerardM 08:03, 13 Oct 2004 (UTC)

Sound (OGG) files to show pronunciation

Hello, in the German Wiktionary we are considering to use sound files (.ogg) to show how a word is pronounced. The problem is to find people who speak those words into a microphone and record them (+cut them). In the German Wikipedia I have already found some people who would do so ("Project Spoken Wikipedia") for German words. I think words should be spoken by native speakers (maybe by a man and also by a woman, to hear it twice if necessary?). This means that every language wiktionary have to create their own sound files. Other Wiktionaryies could link to the corresponding file place (eo.wiktionary.org/upload/.../xyz.ogg). What do you think about that. Are there any volunteers who want to record some English words for the whole Wiktionary Community, like some on de.wiktionary will do for German words? -- Have a nice day, Melancholie

It sounds like a great idea. I might record some words in Dutch. Polyglot 21:44, 13 Oct 2004 (UTC)

What about accents? I could record some, but my accent is nowhere near the standard pronunciation. --Vladisdead 03:55, 14 Oct 2004 (UTC)
To inform you of our content and intent, the nl:wiktionary has uploaded some 100 soundfiles (ogg) to Commons. It will not take long and we will be able to link internally to the Commons data. Propably in the next full release of the mediawiki software. We have also worked on the explanations of the use of sound. When the nl:version is "ready" I will translate it to English and post it on META.
Oscar, of the nl: crowd thinks my recordings are not technically advanced enough (he would, because he is a professional sound man) and promissed that he will regularly pronounce some words and record them for this purpose. :) GerardM 05:20, 14 Oct 2004 (UTC)
The next version of the mediawiki software will allow for using Commons data in the projects :) GerardM 16:13, 28 Oct 2004 (UTC)

Vocabulary list

I'm wondering if the list found at http://www.freevocabulary.com/ is compatible with the GFDL. On a similar note, are there any existing dictionaries that are GFDL-compatible? Poccil 03:46, 19 Oct 2004 (UTC)

Princeton's WordNet, and 1913 Webster.
So, can we just copy glosses from WordNet to Wiktionary? I'm wondering, because we are missing many words that are found in WordNet, and it would work well to just copy them, assuming we could. Eh? JesseW 08:40, 5 Nov 2004 (UTC)

The "Morobashi" dictionary?

I am a first time user of the Wiktionary when I was checking out the pronunciation of a Chinese character via Google. I was brought to the following page in your dictionary:

http://en.wiktionary.org/wiki/%E9%9B%BA

Under "dictionary information" there is a reference to

"Morobashi : 42250"

I am assuming this is an error for the Dai KanWa Jiten edited by Morohashi Tetsuji. The reading Morobashi is incorrect and ONLY seems to appear in your dictionary. There needs to be a global change to this dictionary information entry.

Morobashi = Morohashi.

Its your dictionary too. I guess this would be a job for someone with a bot, if its decided to be the case. --Eean 02:54, 23 Oct 2004 (UTC)

Fixing up Albanian defs

I noticed that a lot of the Albanian defs are not in the suggested format. I'm going to write a emacs function to fix them up, and get cracking. This is just to let people know. I'll post the emacs defun on my User page. JesseW 02:35, 23 Oct 2004 (UTC)

I've gotten the necessary tools together so I can do this easily; but I havn't hacked it up to handle logging on, so the changes will be from 134.10.21.179. It's me. JesseW 06:32, 23 Oct 2004 (UTC)

Done. JesseW 07:37, 23 Oct 2004 (UTC)

We have some subject catagories, but there are two different ways they are arranged. Some are called "<language> <subject>" while others are just "<subject>". I suggest we follow the model of Category:Colors and have "<subject>" catagories be purely parent catagories for "<language> <subject>" catagories, which is where all the actual articles should go. If there is no objection, I will do this in a few days. Please let me know what you think... JesseW 04:54, 24 Oct 2004 (UTC)

Since nobody has commented on this for more than a week, I'm going to go ahead. I hope nobody feels the need to revert. JesseW 08:07, 5 Nov 2004 (UTC)
I've realized this is too big a job without a bot; hippytrail, are you listening? ;-) Also, we'll end up with quite a few more catagories of the form "English <subject>" than we have now... I guess that's alright. JesseW
I try to listen but it's easy to miss stuff in the Beer parlour. Please take a look at User:HippieBot. There's a subpage for ideas so please add to it and I'll do a scan for the kinds of Categories you're interested in. — Hippietrail 12:34, 5 Nov 2004 (UTC)

In the Main Graphic on the Title Page of all things, Wiktionary is phonetically transcribed in such a way that the last (read: furthest from first) is a small capital i. It is my understanding -- and, God help me, I may (and, for the love of self, probably am, by the Dog) be wrong -- that this symbol is usually phonetically rendered as i in bird or some similar (including that word just there and this word just here!) word. It seems to me I pronounce Wiktionary (insofar as I pronounce it) as a word ending in e as in see.

What's going on here?

In fact the small capital I (ɪ) represents the sound in "pip", the other "i sound" is represented by a lower-case i with a lengthening mark which looks like a colon (iː) is the sound in "peep". The sound in "bird" is represented by what looks like a small "3" with the same lengthening mark (ɜː). American versions of IPA use somewhat different variations.
Now the final "-y" in English has a range of pronunciations all the way from the sound in "pip" to the sound in "peep" so some dictionaries have chosen to use a regular small "i" with no lengthening mark for this special case. This is the usage I recommend for Wiktionary but the logo has been around longer than I have (-: — Hippietrail 12:29, 24 Oct 2004 (UTC)
I thought that IPA was "International Phonetic Alphabet". How can there be an American International Phonetic Alphabet?--81.157.101.101 23:05, 25 Oct 2004 (UTC)
That's what many people think when they find out. But it was never intended to dictate how pronunciation should be spelled in each given language, but to be flexible enough to be usable by anybody in the way(s) they find useful. In a phonetic alphabet each letter represents a phoneme - British and American English do not have the same phonemes. It sould be as little surprise as the fact that British and American English can spell the same word differently. More surprising perhaps is the fact that even when used for British English, different dictionaries use the IPA differently, even different editions of the same dictionary. — Hippietrail 09:25, 26 Oct 2004 (UTC)

Setting aside vowel disagreements for the moment, the 'r' is just plain wrong. An 'r' is trilled (ref: [1]). I don't think any dialect of English would use a trilled 'r' in "wiktionary". I believe 'ɹ' would be correct in this case.

My dialect of English would also include an 'ɛ' between the 'n' and 'ɹ' but that's another case.

Darrien 07:53, 30 Oct 2004 (UTC)

Again, there's no "plain wrong" when it comes to using the IPA. Since most dialects of English don't include a trilled r, and I'm not aware of any which contrast trilled and non-trilled r, standard linguistic practice is to use the "most romanic" form of a letter to represent each basic phoneme. Thus, since most broad transcriptions of English into IPA don't distinguish varieties or r, the "most romanic" is "r". When the transcriber wants to illustrate differing r's, then two narrower symbols are likely to be chosen.
I only learned this a few months ago after finding different IPA in different dictionaries and I spent a lot of time digging through phonetics textbooks in libraries. — Hippietrail 14:04, 30 Oct 2004 (UTC)

Pronunciation of English

In what accent or dialect is the pronunciation guide for english?

Are words guided to the Received Pronunciation (RP), for example, or US West Coast english, Trinidadian English, or is it in a UK Geordie accent?

I think there ought to be an official choice about which accent is used, as otherwise this could lead to quite extensive confusion. I think also, that this should be very clearly mentioned somewhere noticable, for example on the front page, so that people with other accents do not

(a) mistakenly pronounce words in a manner not fitting their own accent and do not (b) change the pronunciation for an entry to fit their accent, thinking the entry wrong

Any comments?--81.157.101.101 22:44, 25 Oct 2004 (UTC)

I'd assume the standard US English and the standard British English that the national news announcers use, like any other dictionary. But regardless, any unabridged dictionary has multiple pronunciations, so folks shouldn't delete pronunciations unless they think someone made a real mistake. --Eean 07:50, 26 Oct 2004 (UTC)

-ly and -ing?

What is the policy (if any) on adding words with common suffixes such as -ly and -ing? Are those generally added in a different page, such as blooming or are they generally linked to the root word? On the blooming page, there are links to flowering and flowering. Which of these is the proper way to do it? I'd like to fix links to words such as (for example) biting by either creating biting as a different page, or creating a referral page from biting to bite. I've tried to read up on this, but there does not seem to be any information in the help documents, nor does there appear to be an obvious, consistant usage within the wiktionary. Please advise! —RJNFC 23:30, 25 Oct 2004 (UTC)

personally I think wiktionary should work like pulp dictionaries and just give the derivations but not give them separate definitions. So, flower would be correct, not flowering. --Eean 07:55, 26 Oct 2004 (UTC)
Firstly, -ly and -ing can be very different depending on what you mean. The former can be interpreted as either a derivational or inflectional morpheme depending on whether you analyze adverbs of manner as a special use of an adjective or as a completely separate part of speech. -ing is always an inflectional morpheme being used to make the present participle/gerund from a verb.
Secondly, there are differing opinions on what should be included in Wiktionary. I think it's not a bad thing to include inflectional forms and would never bother deleting one to save space, though I would seldom bother adding one unless I thought it worthy. The opposing viewpoint is that they are always a waste of space and people have been known to delete them. Of course all deletions stay in the history so deleting an article actually makes the database file a tiny bit bigger - not smaller!
Thirdly, there are differing opinions on how to make these article if you do choose to make them. My view is to format them like any other article but define them simple as "present participle of to bloom. Others have made redirects but many of us don't like to use redirect on Wiktionary except for rare special conditions.
I hope this provides more help than uncertainty (: — Hippietrail 09:17, 26 Oct 2004 (UTC)
I tend to agree with Hippietrail. Print dictionaries list adverbs (-ly words) and present participles/gerunds/verbal nouns (-ing words), and so Wiktionary should too, in my opinion. While some give them full treatment (the OED, for example), most tack them on to the uninflected entry (eg, "happy" [definition], "happily", "happiness"; "smile" [definition], "smiling").
An "-ing" word can have an adjectival or noun meaning distinct from its present participle, so this is grounds for including it. Where the meaning is simply "present participle of <verb>", I think it should still be included, because that may be the word a person is looking up. Furthermore, the user might not know the uninflected form (is the uninflected form of "travelling" (UK spelling) "travel" or "travell"? A trickier one: is the uninflected form of "lying" "ly" or "lye"? Neither - it is "lie") and so an entry for the word would enable them to see that the word is an inflexion as well as providing them with a link to the grammatically related word.
The few bytes that these entries take up mean that there is little problem with including these entries. I say, let's include them as they are beneficial. — Paul G 09:08, 28 Oct 2004 (UTC)
I agree. We should remember that not only native-speakers will be using this wiktionary, therefore definitions and examples of usage of all the forms and variants of a word, and all the derived words are of significant value. I also think distinct entries for plurals are justified for this reason. — DavidL 09:38, 28 Oct 2004 (UTC)
I have no problem with including them, although I've never been very dilligent about it unless there is a point to be made. Same thing with past participles, which weren't mentioned above. Eclecticology 23:16, 29 Oct 2004 (UTC)

Removal from categories

I recently added budgerigar to Category:English animals and then corrected it to Category:English birds, but "budgerigar" still shows up on the page for the "English animals" category, even when I refresh that page. How come it is not updated? How do I remove it from that category? — Paul G 13:45, 28 Oct 2004 (UTC)

The categories pages are not updated immediately to avoid stress on the hardware, so there will be a delay of some time before it appears. While we're at it I would tend to reduce those categories to simply "animals" and "birds". This is the English Wiktionary so terms should be presumed to be in English unless otherwise stated. Eclecticology 23:41, 29 Oct 2004 (UTC)
I am in favour of this reduction. I asked the user who set up most of these categories if they could be changed from to "Category:English XXX" to "Category:English:XXX" in line with other namespaces (which also makes them read better: for example, a budgerigar is not an English bird), but the user turned down my request. I suspect that making this change would now involve a fair amount of work. — Paul G 10:57, 1 Nov 2004 (UTC)
It certainly would be a lot of work, but it would be even more work if we waited until there were more entries. I did change the items in "Category:English elements" to "Category:Chemical elements", and was surprised that nobody complained. Eclecticology 13:12, 1 Nov 2004 (UTC)

Please have a look at http://commons.wikimedia.org/wiki/Commons:Pronunciation_files_requests

Putting catagories into templetes...

Since we request that people make articles according to Help:Template, why don't we request templetes in the templete, for example: Template:en , and put the appropriate catagory 'within the template! Then any articles that are created properly will automagically be in some of their proper catagories. Sounds good? If I don't hear any problems in a few days, I'll just do it... JesseW 05:53, 24 Oct 2004 (UTC)

What you would accomplish with this is unclear. Template:en only returns the result "English", and Category:English is a major redundancy for the English Wiktionary, even if we accept that language names are proper categories for foreign words. Eclecticology 00:11, 25 Oct 2004 (UTC)
I didn't modify Template:en because it is used in a lot of places, and I didn't want to modify it without some support. Regarding redundancy, as I understood(which is probably wrong), Category:English language should list only English words defined in the English Wiktionary, while Category:Spanish language should list Spanish words defined (in English) in the English Wiktionary. It doesn't seem redundant. And we could also use the Templete idea for putting parts of speech in their proper category; i.e. a Template:enAdj (or Template:enAdjective for searching reasons) could contain:"Adjective [[Category:English adjectives]]" thereby automatically putting all the English Adjectives into that catagory. Category:English adjectives currently contains 29 entries, while there are hundreds of english adjectives in the wiktionary. If the catagories are worth using at all, this seems like a good way to populate them... JesseW 08:54, 25 Oct 2004 (UTC)
The practice of adding categories to templates has been practised for some time now on the nl:wiktionary and others that use the same structures.
It works great for new articles; the category is filled and it gives an always complete list of all words in that language. For older articles they need to be saved without changes to get the content of the categories show the existing words that use a template.
As the en:wiktionary does not make a difference between templates to indicate a translation and templates to indicate that a word exists in a language. All the non-lexicological information that you want to put into it, is hard to put in the "right" place. On nl: we use xx and -xx-, compare nl:Sjabloon:-en- and nl:Sjabloon:en. GerardM 09:01, 25 Oct 2004 (UTC)
So, what do people think of this; Eclecticology - did I explain what I wanted to do better? Can I do as GerardM suggested? JesseW 08:05, 5 Nov 2004 (UTC)
Hmmm! It seems that part of of this confusion arises from using the term "template" in two different ways. We do not ask people to create articles according to Help:Template but according to Wiktionary:Entry layout explained. The latter article has existed here since 2002-12-13, Wiktionary's 2nd day of existence. The former only became a part of this project on 2004-09-24 by the actions of one person, and I admit to finding it thoroughly confusing. Typically, in the course of editing, when I encounter Template:en I replace it with "English" in a context appropriate manner. Category:English adjective is mostly useless since looking up that category will only give a list of otherwise unconnected adjectives. Category:English language is even more useless unless it is restricted to words about the English language.
We want the en:wiktionary to be easily editable by newcomers, without requiring them to put their heads around a template system that can confuse even the oldtimers like me. The content is key. Although I support the idea of categories, and argued strongly for it on the Wikipedia mailing list two years ago, I also recognize that there is a certain art to assigning categories. Categories that are too broad are as good as no categories. Categories that are too narrow risk not to be noticed.
GerardM has a unique software based view toward the use of templates and categories which I do not share. He is welcome to continue his experimentation on the Dutch Wiktionary; I won't interfere with him at all there. Eclecticology 22:58, 5 Nov 2004 (UTC)
By changing content like {{en}} to English, you make the information less usable outside of the English wiktionary. We need cooperation to make better quantity and quality in ALL the wiktionaries. The experimentation is well under way outside of the nl:wiktionary as well. Your changing stuff backwards is from a cooperation point of view, a step backwards as well.
By having a list of ALL the words in a language as can be provided by a category that hides within the template that indicates that a word is in a language, I provide a tool for a speaker of a language to check if the content is correct. THAT has proven to be extremely valuable.
When you have a look at the statistics of nl:wiktionary, you will find that the amount of people active is growing. The amount of edits is growing, so having this awkward system of templates proves not to be a real handicap. With a chosen name like yours, I am amazed that you are not more open to new doctrines, methods or styles. Thanks GerardM 08:01, 9 Nov 2004 (UTC)
It is not a primary obligation of any Wiktionary to make itself usable to the others; it is more important that it better serve its own readers. The material is just as usable if the "English" heading is there instead of the template when you use interwiki links. Since I have accepted that such a heading should be allowed instead of nothing for the English, I don't see why a person wanting a list of all words in the English language can't simply put "==English==" in the search function. Any editing is always easier when it favours plain language instead of templates or other technical soulutions. Growth statistics on nl or my choice of names have nothing to do with the issue. Eclecticology 10:16, 9 Nov 2004 (UTC)
I definitly seem to have run into a long standing disagreement here, but it doesn't exactly seem to relate to what I'm asking about. I'm suggesting we change Wikipedia:Template to suggest (only suggest) the following sort of pattern for articles.
	 	
=={{English}}==	 	
==={{Noun}}===	 	
...	 	
At least to me, requesting that the people who take enough care to look at Wiktionary:Entry layout explained before making an article type 4 more characters per heading is negligble, both in terms of mental energy and finger fatigue. They don't need to understand Help:Template; all they need to do is remember to put double curly brackets around some of the section headings.
It appears that there is some idea that most of the words in the English Wiktionary will be English words; but from various places, I understood that the English (and other) Wikitionary's were trying to include all the words from all languages in each one, defined in the main langauge(i.e. all Spanish, French, Chinese, words would be in the English Wiktionary, but they would be defined in English.) If this is so, eventually there will be many more non-English words than English words in the English Wiktionary. So having a Category:English words would be useful(or at least, quite a small fraction of the total articles on the English Wikitionary), right? JesseW 10:35, 9 Nov 2004 (UTC)
Actually I'm not a huge fan of the template system myself. Though I'm quite a technical person and would love an XML Wiktionary, I know that "regular people" who might enjoy contributing to a site like ours are likely to find them off-putting.
That said, I doubt the search function is currently useful. Take a look at User:HippieBot:all_headings and see all the variations actually in use which are equivalent to "==English=="! Then again if my script could find them all I'm sure there'll be some way in the future to automagically convert between a technical representation and a readable English representation.
Just something to think about... — Hippietrail 10:38, 9 Nov 2004 (UTC)

Standardising article formats

In the discussion of my bot above, I've mention removing empty headings and empty translations, and fixing wrong headings levels. I'd like to ask other opinions on these subjects.

Removing empty headings

Please comment here
Perhaps we need to distinguish between required and optional headings.Eclecticology 23:37, 3 Nov 2004 (UTC)
In the past I have come across articles with all the headings from some template, but with no information under any but one of them. I'll try to add an empty-heading scanner to see what's out there. — Hippietrail 02:14, 4 Nov 2004 (UTC)

Removing empty translations

Please comment here
I have no problem with getting rid of these. Eclecticology 23:37, 3 Nov 2004 (UTC)
I haven't made an empty translation scanner as such yet, but I have an empty bracket scanner which amounts to the same thing. I'll try to post some output later. — Hippietrail 02:14, 4 Nov 2004 (UTC)

Heading levels

  • Currently we do not use level-1 headings at all.
  • Level-2 headings start off each section of an article.
  • Article sections are divided by the ---- marker.
  • The level-2 (or section heading) is used to indicate the language of each section of each article.
  • A missing section heading generally means the section is dealing with an English word.
  • There are some special section headings including "Translingual" used for things such as the 2-letter symbols for the elements. There are also ones for letters of various alphabets and the Chinese Character articles should probably have a heading "CJKV character".
  • All other headings should be level 3 or greater if a heirarchy of levels is possible:
    1. Pronunciation since it is usually the same for all senses. Several formats are common currently.
    2. Etymology since it is usually the same for all senses.
    3. Part of speech if possible, but only one per heading. Below this is the full orthographic version of the word including capital letters, optional accents, diacritics, vowels, macrons, etc.
    4. Next is a series of numbered definitions without a "Definition" heading.
    5. Subordinate to the part of speech a few optional headings:
      1. Synonyms (always plural for consistency)
      2. Antonyms (always plural for consistency)
      3. Derived terms, Related terms. These two are often blurred but it should always be "terms" and not "words". Only terms genuinely related to the superordinate sense and part-of-speech should go here. And it should always be plural for consistency.
      4. Translations (always plural for consistency) There are two main current formats: single-column and multi-column. Some use a macro for the name of the language, some do not. Articles which use multi-column translations have a separate subsection for each sense and sometimes use.
        Translations should come last in this list because most people are interested in the functions of a single-language dictionary. People looking for bilingual dictionary functions, being a minority, should be ok with looking a tiny bit further down.
      5. See also. Normally there is only one such heading at the bottom of the section. A few articles have stuff worth linking to which is clearly related to only one sense or part-of-speech.
    6. Now we can begin again with the next Part of speech.
    7. After all parts-of-speech we have one final level-3 heading, See also. This is at the bottom because it is the least like any traditional dictionary content and can be a bit of a grab bag. Typically it contains external links, links to Wikipedia article, and links to other words which are semantically related, but not etymologically or derivationally related to the title word.

I'll leave off here due to my propensity to ramble, I'll endeavour to improve it in the next few days when I see what a mess I have made. (-:

Now please discuss these points. I'm using these observations in view to my bot and now perhaps also in making an "wiktionary to xml" tool. — Hippietrail 13:42, 3 Nov 2004 (UTC)

What makes most of this workable is that it is reasonably consistent with curent practice. I prefer maintaining the hierarchical approach, but there will be times when it should be allowed to vary from the standard format to reflect, for example, multiple pronunciations and etymologies.
Yes I agree in allowing flexibility. However I don't think we've really answered the multiple pronunciation/etymology question yet. Maybe I'll be able to scan & analyze these cases soon and we can take a fuller look at the issues. — Hippietrail 02:14, 4 Nov 2004 (UTC)
Why CJKV rather than simply CJK?
Because the same characters were formerly used for Vietnamese also, they even invented some characters of their own. See w:Chu Nom but that's a surprisingly weak article. I do feel the V makes it a bit cumbersome but it would make articles with Vietnamese info into exceptions. — Hippietrail 02:14, 4 Nov 2004 (UTC)
This could become an issue with some people in the future, but for now I don't really have strong feelings either way on this. Eclecticology 19:40, 5 Nov 2004 (UTC)
Interwiki links and categories should be moved to the bottom of the page. When I reviewed a few of the links that came up on your list of "level zero" pages many got on there because these items were at the very top of the page.
Excellent idea! I used to move a lot of these on Wikipedia before I discovered Wiktionary. I will make a scanner for them. Later on this is a perfect candidate for an auto-editing bot.
I have seen both Categories and Interwikis also linked inline but haven't made up my mind on whether this is a good thing. Any comments on that? — Hippietrail 02:14, 4 Nov 2004 (UTC)
One other observation that came out of your headings list was that a distinction was made between "===Noun===" without spaces and "=== Noun ===" with spaces. This doesn't really show up in what the general public sees, but I wonder whether we should migrate to a standard. I tend to omit the spaces, but others consistently do the opposite. Eclecticology 23:37, 3 Nov 2004 (UTC)
Indeed. I always go without the spaces except when editing an article which always uses them I tend to make my edits fit in. Many users also insert a space after the "*" or "#" used for list items. I might make a scanner to compare the frequencies so we can see what's more popular. — Hippietrail 02:14, 4 Nov 2004 (UTC)

What should be done with Maculare?

It's a list of Latin conjugations of "to stain", I think. It looks well done. What should be done with it? JesseW 08:45, 5 Nov 2004 (UTC)

Although I support the idea of having a place for Latin conjugations, I don't think that this is it. I also don't think that it is very helpful to have a detailed conjugation of every Latin verb, or, for that matter, in any other language.
More appropriate would be to have certain pages that give model conjugations, and to provide links to that page with a statement like, "For conjugation see ..." Other solutions are also possible. Eclecticology 18:44, 5 Nov 2004 (UTC)

Examples - quote from famous sources?

An anon has added various examples that are (unattributed) quotes from Roger Ebert. This made me wonder - what do we think about example sentances? Should they be newly made up sentances, and preferably bland; or are named quotes OK, or even better? Are attributed quotes even better, or link spam? And finally, how exactly are example's to be included in a properly formatted article. The Template page is not clear on this. These are some qustions off the top of my head; I'm sure more will come up. I look forward to all your thoughts... JesseW 09:02, 5 Nov 2004 (UTC)

I think quotes from famous sources, quotes from literature, early quotes, are all better than just making something up. I've done all of these but lately I try to find something real using Google. When I'm reading I often think "this would be a good quote for Wiktionary" but it's gone by the time I'm on the computer again... — Hippietrail 12:56, 5 Nov 2004 (UTC)
I agree. The OED was originally based on a hoard of little slips of paper on which people wrote quotations from a multitude of source, all properly documented. At issue here is the root question of what is a dictionary about. Is it descriptive or prescriptive? Does it describe how a word has been used in the past, or does it tell the reader how to use it? The quotes are more evidence than example. Made up examples can inject an unwitting bias into an article. Good quotations help to illustrate the subtle variations that can accompany a word. Quotes should virtually always be attributed; it seems that it may only have been a matter of luck that you were able to recognize the quotes as coming from Roger Ebert. Quotes don't need to be limited to famous sources; identifiable and verifiable contemporary sources are excellent for showing the evolution of the language. Sometimes the quote needs to be found before the word.
I just reached out and grabbed a copy of Arthur Conan Doyle's The Land of Mist (1926 edition), and browsed until I could find a good example of an illustrative quote. I came up with this: "My client is not a vagrant, but a respectable member of the community, living in his own house, paying rates and taxes, and on the same footing as every other citizen." (p. 122) This sentence helps us to understand the highlighted word, but would do little for any other word in that sentence. Or how about from the same page: "The elder policewoman popped up in the box with the alacrity of one who is used to it." Eclecticology 19:19, 5 Nov 2004 (UTC)
There is some material on formatting quotations at Wiktionary:Quotations, but this may need some revision. Eclecticology 23:07, 5 Nov 2004 (UTC)

Prank pages that persistently reappear

I noticed that an Alert section has been added to the Deletion page for Starfrosch and You are an idiot!

As the operator of my own Wiki, I think you're going about this wrong. The "joke" is to keep reinstating the page. It has nothing to do with content, it's the reinstatement they think is funny. Every time you delete it, you just start a new round in the game.

Instead, I suggest this: Create the pages. Make them say something like "This page was created by pranksters and has now been locked to prevent further abuse". Then protect it. If the page exists, they can't create it. If it's locked, they can't edit it. If anyone finds it, they know why it's there. End of game.

Pinkfud 22:22, 6 Nov 2004 (UTC)

I agree; that makes sense. JesseW 00:26, 7 Nov 2004 (UTC)
OK, I'll try it. Eclecticology 20:35, 7 Nov 2004 (UTC)
I think the pranksters would then move on to creating other pages if they have their toy pages taken away from them. This method won't stop them from going on to vandalise genuine pages, of course, but then we can always ban the miscreants. That's not to say we shouldn't try this - I think it is a good idea. — Paul G 10:03, 8 Nov 2004 (UTC)

Cleanup - reminder

Can I remind everyone about the cleanup page? I add the occasional page when there is a lot of work to be done that I don't have time to do at the time, but I rarely see anything coming off it.

There is quite a long list of pages needing attention there. If anyone has some time to spare, could they have a look and see what they can do, please.

Thanks. — Paul G 18:32, 9 Nov 2004 (UTC)

An excellent reminder. I think that if all the "regulars" commit themselves to clearing up a few the list could be shortened very quickly. Beyond that you may need to re-post the reminder every few months. Eclecticology 18:21, 10 Nov 2004 (UTC)
Another suggestion would be not to add Webster 1913 material without editing it into shape first . . . -dmh 21:21, 10 Nov 2004 (UTC)

Common mis-spellings

What do we do about common mis-spellings? Should I add "superceed" with an explanation and a link to "supersede"? -- SGBailey 14:12, 10 Nov 2004 (UTC)

The difficulty here is in deciding just what is "common", and also what is a mere typo. I would hate to superseed the database with a lot of misspellings that are unique to a handful of individuals. Perhaps a minimum number of Google hits for the word inquestion might be in order, assuming that Googling does not turn up another legitimate use of the misspelling. Typos tend to reflect a failing of hand-eye co-ordination rather than any lack of knowledge; they will happen even to people who are otherwise meticulous about spelling.
"Superceed" gives me over 3000 Google hits, and is probably a good candidate. On the other hand there could be a lot of argument over whether "supercede" is an acceptable variant or outright error. Eclecticology 18:04, 10 Nov 2004 (UTC)
See the history and discussion of develope. Just where to draw the line is somewhat subjective, but here are the informal criteria I've used (recognizing that these rules may not be quite rigorous or self-consistent):
  • Well-known commonwealth/US and similar variations automatically get separate notice. In most cases the pattern followed by womaniser/womanizer/womaniser womanizer should do just fine, though some cases like check/cheque require full separate entries as the variants are used differently in compounds and phrases.
  • Spellings with roughly equal google hits (perhaps weighting for regionality) are marked as variants, most common variant used for entry, others redirected. An example might be the variation in hyphenation among lady killer/lady-killer/ladykiller.
  • If a spelling is significantly less common than the most common:
    • If the less-common spelling can be shown to occur in print, it should qualify as a variant. Example miniscule/minuscule.
    • Otherwise, it's a common misspelling.
  • Simple typos tend to be three or more orders of magnitude less common than the main variant.
I'm leaning towards redirecting to a single entry with all variants as opposed to entries of the form Less common spelling of foo, but I'm not going to change existing entries (OK, I just did for womaniser, but only because Paul G wrote it. No, not really. I just seem to be stepping on Paul's toes quite a bit lately, for no significant reason :-).
I would, however, give common misspellings separate entries. In the case in point, I would
  • Enter supersede as the primary entry, with the two spellings as "supersede, less commonly supercede"
  • Enter supercede as a redirect to supersede
  • Enter superceed separately as a common misspelling (wiki link, no redirect).
If supersede and supercede were US/commonweath variants, I'd instead make supercede supersede (terms in alphabetical order) the primary entry and redirect supercede and supersede both to it. -dmh 19:06, 10 Nov 2004 (UTC)
"Supersede" is the correct spelling, because it comes from the Latin for "seat" or "sit" (if I recall correctly) rather than the common "-cede" root, as in "precede", etc. "Supercede" is very often seen in print. I would suggest Wiktionary takes this as the criterion for inclusion of misspellings (as with "miniscule", which already has an entry).
I hadn't noticed my toes were being trodden on - fortunately I have my hobnail boots on rather than my blue suede shoes :) — Paul G 10:06, 11 Nov 2004 (UTC)
Glad you're wearing your boots. I just noticed that a couple of times after harshly editing an entry I looked at the history to find that it was your work. (I think the examples were right-angled triangle et. al. and false cognate, which you should of course feel free to tone down). I made a mental note to be less harsh, regardless of whose work it was.
Anyway, Merriam-Webster online, along with another source I can't locate at the moment, points out that Middle English had superceden (infinitive form), giving etymological support to that spelling. In other words, it's an old spelling mistake :-). I agree that etymological evidence is important in English spelling, and that supersede wins on that account. Print dictionaries and google hits concur. I believe this all supports the breakdown above (supersede primary noting both spellings, supercede as a redirect, superceed as misspelling). -dmh 16:11, 11 Nov 2004 (UTC)

In my experience, American (at least M-W) dictionaries regard supercede as acceptable, while British dictionaries don't. I am not sure if this indicates a genuine difference between US and UK English or if it's just that M-W is more descriptivist/permissive. -217.44.206.191 14:01, 15 Jan 2005 (UTC)

Webster 1913

The more Webster 1913 material I see in Wiktionary, the less I like its wholesale imporatation. Some entries are not too bad, but I see less than no value in adding egregious examples like slight. While Webster is well-researched, it fails on NPOV (what's a "barbarous corruption"?) and readability (what word is defined as "To be; to become; to betide;"? What about "To cause to spread to extend; to impel or continue forward in space") and out of date (here's the original entry for punk).

In short, while the material in Webster 1913 is useful, it should not be brought in unedited. If it is, I would strongly prefer that the person doing so mark any items needing cleanup as such, using the {{webster}} template to make them easier to locate and to warn potential readers. -dmh 21:14, 10 Nov 2004 (UTC)

I agree - I think these entries look awful and archaic and don't sit well with the entries started from scratch. The use of the "webster" template is an excellent idea. — Paul G 10:09, 11 Nov 2004 (UTC)

To the extent that I use it I leave the unedited material on the Webster 1913 pages. I couldn't see what was wrong with the slight entry except that all the quotations were stripped out, and these are the most valuable part of the Webster material. I saw no reference there to "barbarous corruption" or to your readablity example guessing game. I find nothing wrong with the "punk" entry, except that again the quotes are missing. The lack of examples for the modern addition to that word leaves it suspect on POV grounds. The only POV problem with Webster is in its tendency to Americanise the language. Eclecticology 11:38, 12 Nov 2004 (UTC)

Point by point:
  • The only usage I've heard of "slight" as a verb is "to insult". None of the three definitions given comes even close to that. If I say "I'm sorry to have slighted you", I don't mean "I'm sorry to have overthrown or demolished you", or "I'm sorry to have made you even or level" or "I'm sorry to have thrown you heedlessly." This seems worse than useless.
  • "Barbarous corruption" was from the original entry of a, taken from Webster's
  • "To be; to become; to betide;" is from worth (first definition given; the second is the adjectival sense "valuable", which I've never heard of, though of course worthy works just fine.) In other words, the first two definitions given are woefully out of date. Looking further, the last two senses listed under "adjective" are actually noun senses.
  • "To cause to spread out ..." is from propagate.
  • NPOV is a bit subjective, but Websters tends to throw around words like "corruption of" a little loosely.
As I've said, many of the Websters entries are just fine, but many are of slight worth. -dmh 15:15, 12 Nov 2004 (UTC)
Sorry, forgot "punk". The four senses are decayed wood, a fungus, an artificial tender, and a "prostitute; strumpet". The senses that I would give if starting from scratch would be the punk rock related senses, to pull a prank on, a small stick used for lighting fireworks, and prison slang more commonly heard as "bitch". I doubt that the prostitute sense is current, but then, neither is "strumpet". This is another case where the 1913 entry is so different from a useful modern entry that it's literally less work to create an entry from scratch.
What people should do is consult Webster (and other extant dictionaries) for senses that might have been missed, and for etymological insight. But I see no point whatsoever in just bringing in Webster entries wholesale and undigested. If I were coming to Wiktionary cold, I would rather see a missing entry and think "they haven't got to it yet" then see a weird obsolete entry and think "what planet are they from"? -dmh 15:26, 12 Nov 2004 (UTC)
Thanks for clarifying the details. I still believe that the old entries should be kept, but some of them do need to be marked "obsolete". Properly identifying the quoted sources with its date will also help in that direction. Referring to a word as a corruption is perfectly correct when we are talking about a word whose morphology has significantly migrated from what it originally was. I admit that "barbarous" may be somewhat unnecessary.
I assume you're referring to "The act of changing, or of being changed, for the worse; departure from what is pure, simple, or correct; as, a corruption of style; corruption in language." (italics mine). I'm hard-pressed to see where this is ever appropriate in an NPOV descriptive dictionary. -dmh 07:05, 13 Nov 2004 (UTC)
"To cause to spread" ("out" is not there in Webster) continues to be in use for "propagate". That meaning has even expanded beyond light and sound, and now includes radio frequency waves and ideas.
Some of the obscure meanings for "slight" are already marked as obsolete in Webster. Your selected examples don't do justice to the situation. Webster uses a wuote from Shjakespeare for the third one: "The rogue slighted me into the river." (from Merry Wives of Windsor) For the second, what boxer would ever be sorry for having slighted his opponent?
The more relevant question is what boxer would ever say he'd slighted his opponent? -dmh
"Punk" is interesting in that its late 16th century origins may in turn have come from a pun on the Latin "puncta", a woman, who is easily punctured.
Anyway, we should not avoid meanings just because they are obsolete or old. The word in a historical context may be the most fascinating part of lexicography. Eclecticology 01:28, 13 Nov 2004 (UTC)
That's all well and good. But we're not talking about the 1913 edition of Websters. We're talking about the Wiktionary entries that were brought in en masse and, I might add in light of the color/colour flap, over the strong objections of at least one frequent contributor. These entries miss out all of the valuable information you note above. Again, if one is new to Wiktionary and want to look up punk, slight, worth or most likely a significant proportion of the Webster 1913 entries I didn't spot-check, one will see what appears to be a random assortment of archaic senses; oddly punctuated; larded with fusty expressions; hence, more obfuscatory than revelatory.
Why this is preferable to simply inserting a link to the more informative original Webster 1913 entry, along with a note that the article is a stub, is and always has been beyond me. But I didn't make the world. -dmh 06:36, 13 Nov 2004 (UTC)

Banning users

How do I ban users? I see one of the other sysops (I don't remember who off the top of my head) has been banning vandals and spammers for various periods, but don't see a "Ban this user" link anywhere. The latest miscreant is User:221.197.16.90, who has been posting long lists of links to Chinese websites to various pages (all now cleaned up, at the last check). — Paul G 10:12, 11 Nov 2004 (UTC)

Go to the "For sysops only" section of the "Special pages" of the sidebar. Eclecticology 04:51, 12 Nov 2004 (UTC)

Where should spammers and other abuse be reported? I would like to report User:Woolbinkle; just look at Template:GuZ, which he is trying to spam all over the site. -- Adam Katz 11:22, 17 Nov 2004 (UTC)

Thanks for the heads up. I noticed that all his efforts were done within a 10 minute period. Many of these people have a short attention span, and soon go away. Unless they come back a second time banning is pointless. I'll watch for further efforts by him, which may involve a new user name. He is not completely naïve since he knows about categories and templates. Eclecticology 18:25, 17 Nov 2004 (UTC)

Template linking to Wikipedia

What is the Wiktionary equivalent of w:Wikipedia:Wikitionary#Wiktionary? I was unable to find info on it anywhere (nor relevant policies/discussions, if it exists, I think it should be moved to Wiktionary:Entry layout explained or some other easy to find place. For example, I linked w:Elite and w:General to their Wiktionary entries, but I'd like to know what is the best way to link them back except [[w:article name]]? Please copy your responce to my User talk:Piotrus page, tnx. --Piotrus 11:29, 13 Nov 2004 (UTC)

Can we have random pages limited to a preferred language

The Random Page link is a good idea, but a good idea gone wrong. I clicked on it 4 times, and 4 times I got a Chinese or Japanese character. Not a lot of use / interest to this strictly monolingual Australia (well, I can get by in French !)

My suggestion / request is to have some way of choosing if you want your random pages limited to a certain language - perhaps a language you have specified in a profile or something.--Richardb 15:09, 14 Nov 2004 (UTC)

You are not alone in wanting this but there is a problem. The wiki software grew up with Wikipedia. Wikipedia has a "do not force structure" policy. Therefore all articles on all wikis are pure content with only as much structure as the writers of individual articles happen to agree upon. The offshot of this is that the wiki software has no idea in which languages (plural) the term for each page is.
I am currently working on a parser to make the most out of the structure which happens to involved. So far I've had quite good success at extracting which language(s) an article is for. But at this point there is no way I know of to incorporate this work into the wiki software. — Hippietrail 01:57, 15 Nov 2004 (UTC)
By the way, the Chinese or Japanese characters come up so frequently because a huge number of these were entered fairly early on the project. The number of English pages has clearly not caught up yet. — Paul G 14:59, 15 Nov 2004 (UTC)

SAMPA pronunciation in sans serif fonts

Anyone else spotted the problem with SAMPA in sans serif fonts? Capital "i" (representing the "short i" sound in "bit") looks identical to lower-case "l"; this means, for example, that the pronunciations of "bay" and "bell" look identical. Oh dear. What, if anything, can we do about this, short of the drastic step of abandoning SAMPA? — Paul G 14:57, 15 Nov 2004 (UTC)

...except, of course, "bay" uses lower-case "e" and "bell" uses upper-case, but the SAMPA for "bay" can still be misread as bee-ee-ell. — Paul G 15:01, 15 Nov 2004 (UTC)
The way I've been fixing this is by changing thus:
/beI/ → /<tt>beI</tt>/ = /beI/
The "tt" tag specifies a typewriter font, which means serifed. Also since SAMPA was designed for "typewrier era" font technology, this seems appropriate. By the way, this is just the sort of thing a bot could fix wholesale. My Wiktionary parser has been able to find these since 2 days ago now. — Hippietrail 11:36, 16 Nov 2004 (UTC)
Not much of an argument from me against using a bot on this one, since I've never been a fan of SAMPA to start with. I usually keep out of pronunciation issues, and this will be no exception. Eclecticology 19:00, 16 Nov 2004 (UTC)
I've just remembered about "tt", and realised that this is the solution. — Paul G 14:07, 17 Nov 2004 (UTC)

Flotsam and Jetsam

From time to time some random person makes a fragmentary (or completely bogus) entry for a valid headword. Some examples that have started life this way are stunod, islamonazism and norge. The recently deleted chicken and egg would also fall in this category. While simply deleting such entries causes no great harm and flushes the history, it also loses the reminder that the actual headword needs a definition. A simple remedy is to add the deleted term to the request list, but the request list is quite long as it is.

Another option would be to mark the item for cleanup after removing any clearly inappropriate content. This has the advantage that anyone can do it and there is no need to clutter RFD. Instead we clutter cleanup, but in the process we also draw attention to cleanup and perhaps entice people to tackle more items on it.

I was originally going to suggest a dedicated "flotsam and jetsam" page for items with legitimate headwords but unhelpful content, but I'm not sure what sort of entries would not already be covered by cleanup, remove/request or both. -dmh 16:25, 17 Nov 2004 (UTC)

plural of the word 'rum"

Is the word "rum" like "Deer" which is both singular and plural. I don't think you ever use the word "rums" do you? I would think you would say just "rum" with no such word as "rums" when talking about the alcohol. We play Scrabble and have this controversy going!! Please respond. PB

In normal use, "rum" would be an "uncountable" or "mass" noun like "information" or "smoke". However, most uncountable nouns that refer to physical things, and drinks especially can have a regular plural. Consider: "Bundaburg is one of my favourite rums". On Google there are 188,000 hits for "rums" so I think it would count in this sense. — Hippietrail 17:41, 18 Nov 2004 (UTC)
All drinks have a regular plural in the sense of "a glass of the drink". You might order two rums, for example. As Hippietrail says, many uncountable nouns have countable senses meaning "a type of...". "Cheese" is uncountable in "Do you like cheese?" but countable in "France produces many different cheeses" (that is, many different types of cheese).

Main_Page is not correctly named !

HAte to point this out, but to comply with the general structure of Wiktionary as it seems to have developed, the page called "Main_Page" should actually be named "Wiktionary:Main_Page", since it is a Wiktionary Internals page, not a content entry !--Richardb 14:23, 20 Nov 2004 (UTC)

This is a standard wiki custom. The only non-article page in the article namespace is Main page. See wikipedia:Main page It's way too late to change it now. ;-) JesseW 03:05, 25 Nov 2004 (UTC)
Actually Richardb makes a good point and it's not too late at all. As a proof-of-concept I have changed en.wiktionary's main page to Wiktionary:Main page - currently it merely redirects to Main page but if the test is a success I will move the page, talk, history etc to the new location. If anybody finds a gaping problem, alert any sysop and they can revert my change. I think this is an important step in moving all non-dictionary-articles, meta-articles, appendices, indices, whatever, out of the generic default namespace. Please comment if you have any strong feelings etc. — Hippietrail 05:36, 26 Nov 2004 (UTC)
Okay so it's been another day and nobody's commented so I've now actually moved "Main page" to "Wiktionary:Main page". Everything seems to work fine. The only tic I notice is that the namespace shows up in the "navigation" section at the left of the page and causes the text to wrap. Comments are still welcome and it can still go back if it has to. — Hippietrail 12:08, 27 Nov 2004 (UTC)

It was a shock to find that after being away from Wiktionary for just ten days to come back to such a major change without any significant discussion. Making such a proposal, and acting on it barely more than a day later does not give much opportunity for discussion. The existing naming convention for the main page is pretty well the same across all the projects, and I see no reason for this one to be different. It's helpful for newcomers to feel that there is consistency in the way that a person enters any Wikimedia project. In theory at least it should be the one place to which everything in this project is linked. Eclecticology 10:38, 7 Dec 2004 (UTC)

[[Wiktionary:Index]] now Wiktionary:Index to Internals

{I've now moved/renamed [[Wiktionary:Index]] to Wiktionary:Index to Internals)
I find it confusing to look for pages about Wiktionary internals, and decided to build an index to simplify the matter. If you think this is worthwhile, please add to it. If you think it's a waste of time, well, this is a Wiki, and I'm free to waste my time on it if I want to. If you are opposed to it existing, then please argue your case - not here, but on the Talk / Discussion page associated with Wiktionary:Index to Internals--Richardb 14:54, 20 Nov 2004 (UTC)

One problem - that talk page seems to have disappeared ???--Richardb 15:58, 20 Nov 2004 (UTC)

Having now discovered the Wiktionary:Utilities page, I propose to move/merge that page with the newly created Wiktionary:Index to Internals which covers the same sort of area, but is intended to be a little more comprehensive as it intends to be a full index to all things Internal to Wiktionary, not just a list of Utilities.

If/when I do move/merge this page, I will change all the links to this page to refere to Wiktionary:Index to Internals instead, suitably changing their context too.--Richardb 15:21, 20 Nov 2004 (UTC)

I think that your new title is somewhat verbose. Perhaps "Wiktionary:General index" might have done as well. Eclecticology 21:30, 21 Nov 2004 (UTC)
Now changed it to Wiktionary:Index to Internals--Richardb 07:36, 28 Nov 2004 (UTC)


Can we kill with kindness - Wiktionary that is

I'm feeling that maybe people can add too much to Wiktionary, and end up killing it with just too much content. Adding Rodger was a mistake of mine. I was not adding it as a personal name, I was adding it for a specific meaning. When someone pointed out that there was already an entry for this meaning under Roger I deleted the Rodger page content. But Paul G added it back as just simply a Personal Name. Does that mean we have every single known first name as an entry in the Wiktionary ? My vote is NO. Especially in this day and age when people invent given names. I'm even worried by someone adding pages such as Wiktionary Appendix:Surnames begin with m. But I think it would be better to have a pageWiktionary Appendix:Personal names beginning with R, and only having Rodger in that list, not as an entry.--Richardb 01:53, 21 Nov 2004 (UTC)


How to deal with local usages?

Being born in the UK, living in Australia, watching American TV, having travelled through India, I know there really is not just one language English. There are many English-es. Some words are used in one locality, not in another, eg: nesh. Some have different meanings in different places. In the UK it would be fine to ask if you can borrow someone's rubber, promising to return it after use. Wouldn't suggest you try that in Australia. Vice-Versa for borrowing someone's Durex. Then, even within Australia there a major differences. What you wear for going swimming can vary from swimmers to bathers to cossies to speedos, all meaning exactly the same thing in different states. And the same size beer glass is called different things in different states, while I'm still confused over whether a middie is a differnt size glass in different states.

Anyone got a standard way, or good examples, of indicating local usages ?--Richardb 02:25, 21 Nov 2004 (UTC)

Funny, I'm an Aussie and growing up in Melbourne we always said "rubber" for "eraser". I didn't hear it used in the "condom" sense until I saw "The World According to Garp"!
But for regional stuff, in the article defining the word, put the location in parentheses and italics before the def, after the #. If you add the term to the "translations" or "synonyms" sections of other articles, do the same but put it after the word:

English

Noun

durex®

  1. (Britain) condom
  2. (Australia) sticky tape

and

English

Noun

condom

  1. blah blah blah
Synonyms

Hippietrail 11:17, 21 Nov 2004 (UTC)

It's never that easy. "Durex" is used in Canadian French to mean cellophane tape, where many people in English Canada tend to use "Scotch tape" (a genericised brand name). "Sticky tape" would not normally be used, and although it might personally evoke an image of some kind of double-sided tape that would still be an attempt to make sense of an unfamilar term. I would likely ask the user what kind of tape he means. The term "adhesive tape" tends to refer to the white tape that is used to put dressings on wounds. To make sense of local usages you need to have a good grasp of the general terminology. Eclecticology 21:22, 21 Nov 2004 (UTC)


Lots of Esoterics, but lacking some real basics, eg Step

People seem to be doing some very esoteric things with Wiktionary - Concordances, Lists of Surnames, English Animals - you name it, someone is doing it. But then, I look up a very basic, simple, everyday word like step and find no definition. I've added just one rudimentary effort.
Who will do the basics? Will anyone ? How can people be encouraged to do not just the exciting stuff, the esoteric stuff, but also ensuring the basic stuff is done, and done fairly well--Richardb 14:39, 21 Nov 2004 (UTC)

You notice step already had translations, since that is the main reason anyone would look up step. Of course, its good that it actually has a definition now. But it goes to show, people will work on what they see as a need. So, lots of esoteric definitions and translations, less in the way of basic definitions. If there's demand for it, you could start a project similars to Wikipedia's project to counter systemic bias to work on the basic defs. --Eean 17:11, 21 Nov 2004 (UTC)
I very much agree with what Richard says. Doing the basics should be the primary objective. The translations are nice to have, but still secondary in importance. I've defended the concordances elsewhere. They weren't mine, but they have great potential for someone who wants to take that further. I view the llists of surnames with benign tolerance, and contrary to my intuitive interpretation "English Animals" is not about animals that are native to England. I do disagree with Richard when he says that the basic stuff is not exciting.
Systemic bias has nothing to do with this issue unless you propose that an attack on leet and other invented nonsense represents a systemic bias. Eclecticology 22:30, 21 Nov 2004 (UTC)
re-read what I said. I wasn't saying the lack of basic words is a systemic bais, but that wikipedia's project to combat systemic bias could serve as a model.
I disagree about the definitions, there are some good English dictionaries out there already. The public domain ones aren't so woefully out of date like public domain encyclopedias, so the fact that Wiktionary is Free isn't really enough. However, Wiktionary is attempting to do something new in regards to translations. But everyone has their own itch to scratch.--Eean 00:39, 22 Nov 2004 (UTC)
Hey, when I got here, we didn't even have definitions for the whole Swadesh List, let alone Basic English. There are still siginficant holes in the top 1000 by frequency (see user:dmh/playpen), particularly when you count words that just have auto-generated Webster 1913 entries (some of those are marked with stars in the list, but I didn't start doing that until it became clear that Poccil was going ahead and doing it because it was pronounced a "great idea"). The top 100 or so need special attention for completeness and consistency.
Along with others, I've spent quite a bit of time on such projects, and I'll probably get back to it. Lately, though, I've been taking a break and mainly working on interesting words at they come up, either in print or speech, or when someone adds an intriguing but incomplete entry. This is mainly easier than defining basic prepositions (which goes right to the core of some very interesting linguistics), though occasionally you'll see a perfectly good entry gunned down because some sysop believes it to be invention despite overwhelming evidence to the contrary. C'est la vie. -dmh 04:00, 22 Nov 2004 (UTC)
Any chance you could make a checklist of the Basic English list, so that we check off those that have a reasonable entry, and then attend to those that don't have any decent entry - as a COLLECTIVE effort, rather than put it into your own Playpen.

That way, I'd be happy to tackle one word a day from the list, and tick it off. But right now, I have no way of knowing which words on the list are OK, and which words are lacking. (Oh F!*K. Have I just got another stupid cleanup idea into my head!!!!!! Damn!) Or can dmh tackle setting this up as a shared project ? Please.--Richardb 12:08, 12 Dec 2004 (UTC)

Basque to Spanish in en?

Was wondering if anyone knew what the story with Wiktionary:Basque_index_a is. Its a list of Basque words and their Spanish meaings. This is the English Wiktionary, shouldn't these pages be on the Spanish Wiktionary? --Eean 02:20, 22 Nov 2004 (UTC)

You're right, of course. It also applies to the rest of the alphabet. The language indexes, are really nothing more than a want-list for new words. Ultimately, there should be no definitions at all on that list, just the bare names. Eclecticology 10:16, 22 Nov 2004 (UTC)
I see, the Spanish just came along for the ride. Makes sense. --Eean 01:06, 23 Nov 2004 (UTC)

Appendices versus categories

Dennis Valeev has set up a number of appendices that are lists of words with a common theme. I think thieir content really belongs in categories, as this is the function of categories rather than appendices. If they are left as appendices, we risk ending up with a huge list of appendices on the front page that will be nigh on impossible to find one's way through. — Paul G 11:07, 25 Nov 2004 (UTC)

Is this project new? There are lots of adjectives missing in English!

I notice that there are hordes of adjectives that someone might want to look up not present like avaricious, noctilucent, vivacious, etc. Isn't there a faster way to get these listed in from some public dictionary?

It is a new project, as stated on the frontpage. And yes, you can take from works of public domain (which generally means from before 1929) or which are released freely like Wordnet. The OED took a few decades to release (23 years to get A-Ant... that wikipedia article is a good read BTW). And they weren't even bothering t to define all languages (imagine that!). (BTW, where is a public domain OED?) So, yes, we're in it for the long haul. Do not take from works such as the American Hertiage Dictionary, as I noticed somone did recently with noctilucent. Which leads to next topic... --Eean 04:01, 28 Nov 2004 (UTC)

Adverbs in wiktionary

Question Moved from my user talk by--Richardb 04:42, 28 Nov 2004 (UTC) Hello, I've found you in 'recent changes' and want to ask a couple of questions, for I don't have time to leaf through the rules of wiktionary; What do we have to do about adverbs? Do we create a separate article for each adverb or we had better incorporate all related words into one article with the same root and then for every related word make a redirectional page? Thanks for your attention. Is there an irc channel for this purpose [relegation of difficult questions to someone for an immediate answer]? --Dennis Valeev 14:50, 21 Nov 2004 (UTC)

Personally I think if you see the need fo a seperate adverb page (translations a good reason for example), go ahead. I don't see the point in having pages which are nothing but "the adverb form of x". Basically going on the rule-of-thumb that if it isn't in a pulp dictionary, there should be some reason for it. --Eean 19:48, 29 Nov 2004 (UTC)
I'm not sure I understand what's special about adverbs here. I'm guessing you mean adverbial forms that are easily derived from other words, e.g., happy/happily or whatever. If so, this is a more general question.
The general rule, informally, is that we tend to leave out "predictable" terms. If you could figure out what a word or phrase means from its parts, then it will only get in if someone happened to put it in anyway. So if blarg is a verb, there probably won't be an article for blargs, blarging, blarged, or blarger, unless one of those has a particular idiomatic sense (e.g., we do have an entry for blues). Similarly, if blargy is an adjective, we probably won't have an entry for blargily. On the other hand, we probably should have an entry for swimmingly (as in "they got on swimmingly").
Unlike a print dictionary, we give related words — if they are mentioned at all — their own page. That is, if you think blargily is worth its own mention, you should create a page for it and list it under "Related Terms" with a line like *[[blargily]], and not as *blargily — in a blargy manner or such. One reason for this is so that someone looking up blargily can find it directly.
Words derived by affixes like un-, re-, -ment, -tion probably should be listed, becuase these aren't applied completlely uniformly, and we need to document that it's, say, incapacitated and not uncapacitated or commutivity and not commutativeness. These are unpredictable in the sense that you have to just know which ones are used.
Finally, attestation matters. Logically speaking, re-uncollation (the act or process of taking out of collated order something which had been restored to collated order) is a word, but I doubt anyone has ever used it (and I'm not using it here, I'm mentioning it :-). We are under no obligation to include every possible derivation, alternate spelling, coinage, typo or nonce. On the flip side, we should include anything that has found its way into consistent, current usage, no matter how shady its pedigree.
An extreme example would be the multi-thousand-letter chemical names that people like to trot out as the "longest word English." In theory, there are infinitely many such, but in practice only a relative few are published anywhere, and only a very few are seen outside specialized texts. I'm of two minds. On the one hand, I doubt that any of these sees use in real text or speech ("Hey Ralph, pass me that vial of acetylseryl..........."). On the other hand, if someone really wants to take the time to make an entry for such a term and proofread it, do we really care? -dmh 21:33, 29 Nov 2004 (UTC)
I've been reading some books on lexicography. It seems that:
  • easily derived forms like adverbs don't get full entries because of space reasons. (most failings of dictionaries are a consequence of lack of space: luckily Wiki is not paper)
  • the practice of adding run-on entries (e.g. the list of words in -ly, -ment, -ness, etc. at the end) is used mainly to swell the entry count (they're "implicitly defined" just by being thus related to the root).
However, a lot of regularly formed adverbs do pick up additional senses beyond those of their root, but many dictionaries (again, mainly because of space) leave them to be picked up by context. If the only definition is going to be "adverb form of x", then it probably won't need its own entry, but if there's actually going to be stuff on xly that isn't going to be on x, it should have its own page. —Muke Tever 16:26, 1 Dec 2004 (UTC)
Yes, exactly. --Eean 02:28, 3 Dec 2004 (UTC)
If people want to add separate pages for adverbs they should feel free to do so without fear of being flamed or having their work deleted. It may be a waste of their own time but it's not wrong. The interesting thing about the "blargy" example is that it is not clear that it should be "blargily" rather than "blargyly". :-) Eclecticology 01:18, 6 Dec 2004 (UTC)
Right, though if its not formatted correctly I don't bother fixing it and just turn it into a redirect. --Eean 04:32, 6 Dec 2004 (UTC)
Why do you need to do that? If you can't be bothered to fix it, there's no point being bothered to change it into a redirect. If formatting is the only problem leave it for someone else to fix. Eclecticology 04:44, 6 Dec 2004 (UTC)


Any thought to having stats on how many times a page is viewed ?

Would be interesting to have a view counter on each page, to get stats on how often a word / an article is viewed. Or something providing say the 100 most popular words, or searches.

Anyone had any ideas on this ? Anything in other wiki projects ? --Richardb 12:42, 30 Nov 2004 (UTC)

It's a feature already built into the mediawiki software (in fact it's on by default, see e.g. my wiki at wiki.frath.net along the bottom of the page). There's also a [[Special:Popularpages]] specialpage that lists the most visited pages. But these are turned off in wikimedia projects for performance reasons. —Muke Tever 16:48, 30 Nov 2004 (UTC)
Thanks for the info--Richardb 10:46, 1 Dec 2004 (UTC)

Please bear with me as I try to engineer an improved Cleanup and Deletion process

Based on my many years as a business process analyst, I think we can have a much more understandable, simpler to operate and administer Cleanup and Deletion process.

I've started work on it, but can only give an hour or so to it each day .

So, please bear with me if you see my stuff all over the place. Just give me a few days to complete it before rolling it into operation.

It's just a little difficult to do this when

  • everyone can check in on your work as you hack it together - including when I made a couple of false starts.
  • trying to keep some consistency with the old process, for a while at least, while providing an improved process.
  • trying not to step on the toes of whoever developed the processes as they are at the moment. I hope they don't mind the Wiki improvement process being applied to their processes, as well as to their words, entries etc.

Please bear with me till it's a bit more complete. Thanks--Richardb 13:25, 1 Dec 2004 (UTC)

If you want to see something of how the new process is developing, go to Wiktionary:Cleanup and deletion process/index--Richardb 13:15, 2 Dec 2004 (UTC)
The new CLeanup rpocess is now in place. Nothing earth shattering, but hopefully a little more consistenet, easier for users to grasp. I hope !--Richardb 15:14, 6 Dec 2004 (UTC)

In trying to improve the Cleanup process I wanted to make more use of Help pages. The Editing Help" link opens a a new window. What is the syntax for that?--Richardb 10:33, 2 Dec 2004 (UTC)

Not doable. The Editing help link is hard-coded in HTML, but the wiki doesn't allow HTML links, and has no syntax for generating new window links. This is probably because new window links are considered harmful. If a user wants a new window, they probably already know how to do it manually with their browser. —Muke Tever 12:37, 3 Dec 2004 (UTC)
Basically disagree. Getting help on something is a specific reason for opening a new window. If you go out of the context of what you are already reading then it becomes too hard to juggle - for most users. The other alternative is to treat the page as paper, trying to include all the detail into one long page.
As to "not Do-able", well, I think that it probably is do-able, someone somewhere probably does know how to do it, the Wiki engine being so powerful, but not so well documented!--Richardb 02:28, 5 Dec 2004 (UTC)
I agree, the behavior that people expect is for it to open a new window when they click for editing help, as this is the standard. But yea, judging by the email, its not possible currently in MediaWiki. --Eean 04:36, 6 Dec 2004 (UTC)


What happens when you delete all the content of a page ?

I might turn my attention to marginly cleaning up/ tidying up/ improving documentation on the "Deletion Process", as I've just done for the Wiktionary:Cleanup process. But there is one area I don't know enough about, so wonder if someone in the know could explain please, what happens when you completely erase the contents of a page ?

For example, I've just renamed/moved a page [[Wiktionary:Abbr's in Webster]], to Wiktionary:Abbreviations in Webster, and then deleted the content of the resulting Redirect page (and corrsponding Talk: page). What do I then have to do to get that now useless page fully deleted ? Or will it be somehow "automatically" deleted some time in the future, it having no content ?

If the case of a deleted redirect page is somehow special, what happens for general pages ? Those pages cannot automatically be deleted if they have history. So does some administrator peridically review all Empty pages, and then delete them fully ? If you know where this is documented, please point me in the right direction. If you know how the process works, please give me a brief explanation. Thanks. --Richardb 11:34, 8 Dec 2004 (UTC)

You probably shouldn't delete redirects. For people that have bookmarks or links from outside sites, suddenly they find that their links or bookmarks are broken and don't know where to find the original page. The only good reason for removing a redirect that I know of is if you plan to put a different page at the same title.
No pages are automatically deleted. The power to delete pages is reserved for sysops and bureaucrats. There's no utility to specifically display a list of empty pages for a sysop's convenience: there is a Special:shortpages, which empty pages would rise to the top of, but, just like many of the other special pages, its functionality is disabled for performance reasons, displaying cached data instead (which is apparently updated only whenever the software that runs the website is upgraded).
The case of a redirect page is no different than the case of a regular article. A redirect page is just an ordinary article whose text consists of the redirect directive (and optionally a comment explaining the redirection). —Muke Tever 16:41, 10 Dec 2004 (UTC)
I amend that bit about redirects being no different a little — a redirect is specifically not counted in the number-of-article counts by the software. There may be other differences ... I think they don't come up as random pages either, but I'm not sure of that. —Muke Tever 17:06, 10 Dec 2004 (UTC)

ie:

Well, I was going to create an entry ie: (just a redirect to i.e. ), but Wiktionary won't let me, as it appears ie: has a special meaning to Wiktionary. How do we get around that ?--Richardb 14:02, 8 Dec 2004 (UTC)

You can't. ie is the ISO 639 code for Interlingue, and thus links starting with ie: are interpreted as interwiki links to the Interlingue Wiki-Vocabulárium (notice how ie: by itself here is a link to the main page). Probably you should just put the link at "ie", not ie:. —Muke Tever 16:36, 10 Dec 2004 (UTC)


List of All Appendices

In the Main Page section Appendices, the last two entries are (View All) and (Edit). But both link to Wiktionary:Appendices--Richardb 13:01, 10 Dec 2004 (UTC)

wiktionary:Appendices was actually a list of Writing Systems and Alphabets, and I moved /renamed it to Wiktionary_Appendix:Writing Systems & Alphabets.
we certainly do need a list of all Appendices - preferably some form of automated list. Any ideas ?
This effort appears to have created a complex series of recursions. Linking the problem in several places has tended to make it even more complicated. I'll see what I can do to clean this up. Based on the contents of the page, moving Wiktionary:Appendices to Wiktionary Appendix:Writing Systems & Alphabets was the correct thing to do. The former, however, should not have been kept as a redirect. Instead it should contain a link to the template and nothing more. The template can then be amended to eliminate potentially recursive links. I will also be checking the "What links here" to eliminate other difficulties that have arisen from this effort. No blame is attached to what has happened. Eclecticology 19:49, 12 Dec 2004 (UTC)
Reviewing some of these appendix pages leaves me with the impression that some (but not all) might be better treated through the category process. Eclecticology 08:40, 13 Dec 2004 (UTC)

Sub-Project to rename a lot of Template pages.

I propose to move/rename the [[Wiktionary:Basic English template]] (and all other related Template pages), as this is not a template at all. I will rename it to "Basic English list", which is what it is.

It confused the hell out of me for a while when I started. I looked for a template to create an English entry, and I got this strange so-called "template".

It's going to take me a time to rename all those pages, and refix all the links. But, unless basic concepts are upheld, Wiktionary will remain confused.

I'm giving notice now, and will do it in the next few weeks. Unless anyone has a good argument against it.

I've set up a page Wiktionary:Sub-Project -Template renaming to conduct any discussion around this sub-project.--Richardb 05:43, 12 Dec 2004 (UTC)

Subsequent discussion here moved to the project discussion page Wiktionary:Sub-Project -Template renaming

Handling Neologisms through category code instead of lists

To stub or not to stub

Page Wiktionary:Find or fix a stub seems to be rather aggressive on the need to delete stubs. Elsewhere we seem to support the idea that stubs are to be kept, if they convey any useful information at all. I would suggest that page needs updating to keep in line ith current Wiktionary thinking, rather than historical Wikipedia desire to reduce stubs. What do you think ?--Richardb 12:58, 20 Dec 2004 (UTC)

{{sofixit}} as they say. I brought the article more in line with the active policy I think. --Eean 19:33, 21 Dec 2004 (UTC)

Proposal for Project to move to more structured policies and guidelines

I am proposing a project to move toward the kind of structured policies and guidelines of the more mature project Wikipedia. Please see User:Richardb/Project - Structuring Policies and Guidelines and add your comments, discussion there.--Richardb 13:53, 20 Dec 2004 (UTC)

Lets try to wrap up this PRotologisms argument - please "Vote"

Can we please try to make some sort of collective decision on Protologisms. Please go to Wiktionary:Beer parlour/Protologisms to see thae condensed arguments, and then make a decision and "vote". Unless you are

    • not interested in this topic
    • still think there is more to say on the topic, in which case please say it.
Thanks--Richardb 05:46, 24 Dec 2004 (UTC)