Jump to content

Wiktionary:Beer parlour/2007/November

From Wiktionary, the free dictionary
This is an archive page that has been kept for historical purposes. The conversations on this page are no longer live.
Beer parlour archives edit
2024

2023
Earlier years

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002
December


Old English script template?

Should we use a script template for Old English terms? I've seen some entries let the default fonts apply (which I prefer), but some use {{unicode}}, presumably to maximize the likelihood of showing letters like þ, ð, and ȝ properly, but the "unicode" font set seems too generic. Should we use a specific font set/script template for Old English? Rod (A. Smith) 23:54, 1 November 2007 (UTC)[reply]

Originally I was unsing {{unicode}}, but then someone (I forget who) asked me to use {{Latinx}} instead. They both look the same to me. You need to use one of them though, otherwise the macrons look horrible. Widsith 10:35, 3 November 2007 (UTC)[reply]
Excellent. Thanks, Widsith. Wiktionary:About Old English now mentions {{Latinx}}. Rod (A. Smith) 19:07, 3 November 2007 (UTC)[reply]

Greek derivations.

I think {{Gr.}} (Greek derivations) should be deprecated in favor of {{AGr.}} (Ancient Greek derivations) and {{†MGr.}} (Modern Greek derivations). For everything else here, we use "Greek" to mean "Modern Greek", and that makes sense for most purposes, but for etymologies, editors here frequently use "Greek" to mean "Ancient Greek", because this is how other works do it and they don't know that we do otherwise. This results in essentially erroneous content.

So, I'm not proposing any change to the templates or anything, just seeking agreement that this is how we want to do it. (Greek derivations will then essentially be a cleanup category, in that it shouldn't contain individual entries, only subcategories, and any entries in it will be for-fixing.)

Thoughts?

RuakhTALK 00:15, 2 November 2007 (UTC)[reply]

Seems good to me. Am I right in thinking that about 95% percent of the {{Gr.}} entries should be {{AGr.}}? If someone could locate the handful of modern-Greek derivations in there, the rest of the cleanup could be automated. -- Visviva 14:10, 2 November 2007 (UTC)[reply]
Good to me as well, and 95% could well be 99%! —SaltmarshTalk 14:37, 2 November 2007 (UTC)[reply]

Yeah, the mistake of using {{Gr.}} when {{AGr.}} is meant is made all the time by numerous editors (myself included, probably). This is an excellent solution. However, I’m a little wary of the calls for automation. (Though I understand that the clean-up task is formidably large if we don’t allow at least some automation; oh, what to do?)  (u):Raifʻhār (t):Doremítzwr﴿ 23:41, 3 November 2007 (UTC)[reply]

I think the confusion stems from an import from a public domain dictionary that uses "Gk." to mean "Ancient Greek". In any event, I support the proposed cleanup. Rod (A. Smith) 00:05, 4 November 2007 (UTC)[reply]
Can we redirect {{Gr.}} to {{MGr.}} ? This would allow us to tell people to use MGr. but provide a workaround for the mistake. (And yes it's about 99%, I started on them some time back but it's a dreadful task.) However I think doing by hand cleanup is wise, as most of the ones I touched needed formatting changes, corrections of the word, the transliteration, etc. ArielGlenn 04:52, 6 November 2007 (UTC)[reply]

1,024 pages link to {{Gr.}}; the vast majority of those will be due to inclusion in entries.  (u):Raifʻhār (t):Doremítzwr﴿ 14:05, 6 November 2007 (UTC)[reply]

Is there some reason that (modern) English seems to be about the only major language without its own etymology category? cuanto gets {{L.|es}} but equine, an English derivative of Latin, just uses {{L.}} and puts it in the general category. This seems both less informational and less consistent, but I can't believe no one has thought of it before now. Was I wrong to want to create Category:en:Latin derivations? As I side note, why do the etymology-related categories use the xx: prefixes; I thought those were for topical or meaning-related categories? Dmcdevit·t 07:21, 2 November 2007 (UTC)[reply]

It's confusing, but Category:Latin derivations is supposed to be only Modern English words with Latin derivations. ":en:" is not used in our category naming system. Rod (A. Smith) 07:38, 2 November 2007 (UTC)[reply]
Even though I don't really like the inconsistent category tree levels this creates between English and others, I guess that makes sense to me for something like Category:Mammals, but here, it really means something different grammatically to be a Latin derivation in Arabic or English. It's not like there are English forms, and then translations of them in all the other subcategories, like in Category:Mammals. Maybe it would be better if we weren't using the "xx:" categories at all for these etymology categories. Dmcdevit·t 09:19, 2 November 2007 (UTC)[reply]
So you are proposing we rename all the categories that exist and rework all the current uses of the templates and pages that use them? --EncycloPetey 13:12, 2 November 2007 (UTC)[reply]
We wouldn't need to rename anything or rework any pages proper; we'd just need to create en: counterparts for these categories, inside the main categories, and edit the templates to infer en as their first parameter if none is specified. (I'm neutral on whether we should do this, but if we do decide to do it, it's an easy bot job with just a few hundred edits — though some of those templates should only be modified at off-peak hours and not all at once, to avoid slamming the job queue — I could handle it.) —RuakhTALK 16:29, 2 November 2007 (UTC)[reply]
That's the opposite of what Dmcdevit was saying: "Maybe it would be better if we weren't using the "xx:" categories at all for these etymology categories". That would mean eliminating all the "xx:" prefixes and renaming all the categories. --EncycloPetey 12:14, 3 November 2007 (UTC)[reply]
Sorry, he was saying two different things, and somehow I went crazy and I decided you were "asking" about one when you were actually "asking" about the other. My bad. —RuakhTALK 17:30, 3 November 2007 (UTC)[reply]
Although the xx: form clearly isn't ideal here, using the long name for both languages creates confusion: what language would entries in the category "English Latin derivations" be? ("Latin derivations in English" would be better, but it's kind of wordy and still not perfectly clear.) -- Visviva 14:07, 2 November 2007 (UTC)[reply]
I'm not sure if I'm making that specific proposal, but it seems to me that whereas "English mammals" (or whatever) doesn't make sense, because that's about meaning, and a mammal is a mammal in any language. "Latin derivations," however, does not mean the same across languages, and carries less information (and more difficult navigation) when we don't specify if we mean English or not. We should come up with a solution for that. The current Category:Latin derivations is actually probably a mix between English entries and non-English entries where no one gave the template a parameter. Dmcdevit·t 18:22, 2 November 2007 (UTC)[reply]
The distinction between "Category:[Language]..." and "Category:xx:..." is not made based on whether it "makes sense". The distinction is that "Category:[Language]..." is used for the language as a whole and its appendices, and for categorizing entries by part of speech. We use "Category:xx:..." for all other categories, primarily these include topical, etymology, and usage categories. --EncycloPetey 12:19, 3 November 2007 (UTC)[reply]
I've been going through Category:Latin derivations for the last couple days and nearly all the entries I've seen are English language entries. I've corrected a few dozen cases where non-English entries were in the category, creating a few new derivation categories in the process. The one thing that is a little confusing that I haven't done anything about yet are the "Translingual" entries like Scomber. Currently, they are mixed in with the English entries and account for many of the capitalized entries at the beginning of the category. Mike Dillon 19:53, 5 November 2007 (UTC)[reply]
Translingual derivations from Latin are a problem. Does anyone have a means to deal with these? It will affect the etymology for many of the scientific names of organisms. (which is to say, it will affect thousands of entries) --EncycloPetey 00:03, 6 November 2007 (UTC)[reply]
I think we can treat it like any specific language, with mul as the language code. (I'm not sure what the reasoning behind that code is — I'm guessing it stands for "multiple" or something — but we already have {{mul}}, and it's already in use in a few dozen entries.) —RuakhTALK 01:14, 6 November 2007 (UTC)[reply]
"mul" is the ISO 639-3 code for "multiple". Pretty cool, huh? Rod (A. Smith) 01:40, 6 November 2007 (UTC)[reply]
This all sounds fine to me. The funny thing is that there are no categories for anything "Translingual" as far as I can see (cf. Special:Prefixindex/Category:Translingual). We'd have to create Category:Translingual, Category:mul:Etymology, and Category:mul:Latin derivations. Mike Dillon 04:12, 6 November 2007 (UTC)[reply]

Template:wikipedia above English header

Why? Links to other language wikis go under the language header, and in articles such as Timişoara with large tables of contents, the {{wikipedia}} box is separated a great deal from the other relevant information. It's a little box floating in the middle of nowhere and it looks kinda strange. — [ ric | opiaterein ] — 16:46, 2 November 2007 (UTC)[reply]

I believe that's simply a mistake. The {{wikipedia}} box should go within a language section — and other language sections should have links to the article in the relevant-language Wikipedia, if it exists. —RuakhTALK 03:29, 3 November 2007 (UTC)[reply]
It's not a mistake; it's a common but inconsistent practice. Since we often have an image and a {{wikipedia}} link and because these can interfere with the Translations section, we often have the English {{wikipedia}} link or the image preceding the English header to correct formatting. We've discussed standardization of this halfheartedly in the past, but never set a specific policy. Likewise, we've never decided whether to allow, encourage, disallow, or discourage links to the other language Wikipedias within appropriate sections. Though, when such non-English links are present, it would make sense to include them within the appropriate language section. --EncycloPetey 12:25, 3 November 2007 (UTC)[reply]
There’s a way of adding some code which prevents large images, Wikipedia link boxes, and the like from pushing trans-tables and rel-tables to their underneaths; see the Image & box “floating” table within the main table at User:Doremítzwr#Useful fragments of code.  (u):Raifʻhār (t):Doremítzwr﴿ 17:34, 5 November 2007 (UTC)[reply]
They really ought to go consistently within the English section; any program extracting sections needs to know from the structure what section it belongs to; this includes bots, mirror conversions, and things like Hippietrail's extension to display selected languages in different tabs (colours/whatever). The RHS formatting, including the (awful) default "[edit]" links, is a design issue that could use some work. Trying to fix that by pushing the WP template around on individual entries is not terrible useful ;-) Robert Ullmann 14:07, 3 November 2007 (UTC)[reply]
The only all-encompassing solution to these issues I've seen, has been to use {{projectlinks}} in an ===Further reading===

section. There is no way to use {{wikipedia}} that satisfies all concerns. When there are more than three headings, it comes closest to making sense, if placed above the ==English== heading. Every other combination is problematic. Bots generally ignore these; even when they don't, they do know the assumption that English language sections come before other languages. While the Wikipedia boxes seemed clever at first, in the long run, they've only been a hassle. --Connel MacKenzie 17:18, 5 November 2007 (UTC)[reply]

I see this on a few pages. My heart just sinks. Is this really a commonly-used term? I've never come across it before. I wonder whether it is really appropriate to use unnecessarily technical tags. Transitive or intransitive seems better to me – for English at least. Widsith 10:42, 3 November 2007 (UTC)[reply]

I've never come across this term either. I'm not sure whether or not it would be useful, since transitive and intransitive grammar of usage will differ in English, even if the morphology of the verb does not. I suspect this is also true for the various Romance languages. --EncycloPetey 12:12, 3 November 2007 (UTC)[reply]
If there's a Wikipedia article on it, I wouldn't even worry about it. (The article also talks about Romance languages, it was pretty interesting.) — [ ric | opiaterein ] — 15:43, 3 November 2007 (UTC)[reply]
While I understand what you mean, I'll add a caution that Wikipedia also has an article on "Proper adjectives" and other abominations that should never appear here. Their article on Proper nouns is very poor (it is a miniscule, incomplete, and misleading section of the Noun article!). --EncycloPetey 22:00, 3 November 2007 (UTC)[reply]
The meaning is very obvious, though — ambi- is a pretty common prefix known to most fairly well-educated people to mean “either” or “both”, as in ambiguous and ambidextrous ; the “transitive” part needs to explanation. If there is anyone who can’t figure it out, there’s always the link to a definition.  (u):Raifʻhār (t):Doremítzwr﴿ 00:02, 4 November 2007 (UTC)[reply]
Very obviously notaword, you mean. This is exactly the sort of disruption that can't be construed as helpful no matter how optimistic one tries to be. How does your invention help readers understand what you are trying to say? The default format is to include neither {{transitive}} nor {{intransitive}} when the verb sense functions as both. Items that have been reworded to make such distinctions are supposed to be recombined. Adding contrived jargon is pointedly obfuscating. I see no possible justification for the addition of this nonsense. --Connel MacKenzie 17:29, 5 November 2007 (UTC)[reply]
The link was provided in the off-chance that a reader were to become confused. {{ambitransitive}} is a far neater and more elegant solution than {{transitive|or|intransitive}}. However, if a verb is meant to be left untagged if it is ambitransitive, then that means that neither {{ambitransitive}} nor {{transitive|or|intransitive}} need ever be used, meaning that there is no bone of contention and no disruption. Only your writing “[t]he default format is to include neither {{transitive}} nor {{intransitive}} when the verb sense functions as both” was necessary.  (u):Raifʻhār (t):Doremítzwr﴿ 17:40, 5 November 2007 (UTC)[reply]

My point is only that we should try and use terminology which conforms to most other authorities on the subject. This term doesn't seem common in any grammar or linguistics books that I own. Widsith 10:47, 4 November 2007 (UTC)[reply]

Tbot flag vote

Tbot updates and adds the {{t}} templates in Translations sections. It has been extensively discussed in the WT:GP.

I've gotten it to the point where it does most of what it should, and has been tested. An earlier version was run in April, and again in August, when there were many fewer templates.

See documentation at User:Tbot.

Vote is Wiktionary:Votes/bt-2007-11/User:Tbot for bot status.

Comments, always good, User talk:Tbot. Robert Ullmann 08:59, 5 November 2007 (UTC)[reply]

I am running short tests, getting the regex worked out. Want to catch things like explicit links to FL wikts. Robert Ullmann 23:23, 6 November 2007 (UTC)[reply]

You may note Tbot creating some translation entries. This is experimental; all such entries are in Category:Tbot entries. This function is not part of the present vote. When I have something reasonable set up to look at, it can be discussed and then voted on separately. Robert Ullmann 01:24, 9 November 2007 (UTC)[reply]

Smooth out Main Page

I know that the main page was revamped mid- 2006, but I feel that it still has to evolve a long way before it is good. The current design has several issues, with the content and the style. Firstly it is very dark and blocky and the off-white backgrounds really give a very dingy look to our front page. Some of the links, particularly Selected Entries, point to places, which although they sound good, don't actually have much value.

I have, as seems to have been done in the past, made a few alterations and saved it at User:Conrad.Irwin/Main Page where I think we should settle on something a little more professional looking. I am certainly not a graphic designer or even a passable writer, so feel free to utterly change my layout and text, but I do think we need something lighter and more open. One of the major problems, that I have not looked much at resolving is the ugliness of the list of wiktionaries, a think that a more neatly aligned one would be better, even if it took up a little more space. Perhaps we could wrap some of the list in auto-hiding tables.

(Incidentally I have only tested in Firefox (800x600 1024x768 1680x1050), Konqueror and Opera) Conrad.Irwin 00:22, 6 November 2007 (UTC)[reply]

I think the boxes are ugly. :-D other than that, I think it's decent. I almost never look at the main page anyway. — [ ric | opiaterein ] — 00:37, 6 November 2007 (UTC)[reply]
It's lost some of the useful searching techniques in removing the various additional alphabets. It looks a little too simple, but that might not be a bad thing. IT does need some formatting tweaks, and maybe some icons to direct visual attention, but I'm not sure how best to do that. --EncycloPetey 05:47, 6 November 2007 (UTC)[reply]
This conversation should probably move to User talk:Conrad.Irwin/Main Page Conrad.Irwin 10:34, 6 November 2007 (UTC)[reply]
I think that Main Page redesign efforts have faltered in the past for lack of exposure/transparency. Because smaller, incremental changes can't really be done with the current protected setup, an annual redesign effort is probably a good thing. I think most regular contributors forget that Main Page is even there, much of the time. (I confess, I got a little sick of it, when trying to stay ontop of the WOTD entries.) As always, it is healthy to review what other language Wiktionaries have done with their main pages in the last year. Fr: is eye-candy, de: is quite pretty, vi: has the search input box repeated at the top, prominently (someone recently requested that here, didn't they?) I think this is a good start; I'd like to see where it goes. --Connel MacKenzie 15:49, 6 November 2007 (UTC)[reply]
As lack of exposure does seem to be a problem, can a link to User:Conrad.Irwin/Main_Page be added to the top of this page - near where the policies are. Would it be better if the page was moved to Wiktionary:Main Page/Redesign 2007 - that would allow a "once a year" revamp reasonably easily. Conrad.Irwin 13:45, 7 November 2007 (UTC)[reply]

Indo-European v. Proto-Indo-European derivations

Can anyone say why there is a split between Category:Indo-European derivations and Category:Proto-Indo-European derivations? Mike Dillon 04:33, 6 November 2007 (UTC)[reply]

Quotations formatting.

Please see User:Ruakh/quotations and {{User:Ruakh/quotations-top}}.

Firstly: is this (or something like it) something that we want to do sometimes? Secondly: if so, then when do we want to do this, and how do we want to format it?

I'm thinking that we'll want a three-option scheme:

  • uncollapsed quotation after definition, as currently (if just one)
  • collapsed quotations after definitions, as at User:Ruakh/quotations (if 2–5 or so)
  • quotations on a citations page, as currently (if >5 or so)

with possibility for blending (e.g., the after-definition quotations can end with a link to the relevant part of the citations page), but I know that at least one editor doesn't like this approach and wants to use "Quotations" sections broken down by sense. (Personally, I think "Quotations" sections should be reserved for the case where it's not clear what sense a quotation belongs to, since the collapse-y box obviates any other use.)

RuakhTALK 06:45, 6 November 2007 (UTC)[reply]

I love it personally. In fact to my mind it makes the citations sub-page all but redundant. Widsith 10:20, 6 November 2007 (UTC)[reply]
As do I. Regarding when we’d want to use it: I don’t think we need to stick numerical limits on its use. It will depend a lot on the actual amount of space a given sense’s quotations take up. As soon as quotations become distracting or unæsthetic, move them to the quote-tables; as soon as the sum of quotations in an entry causes load time to become noticeably longer, move them to a subpage (a quotations subsection won’t help with this one), leaving only copies of a few particularly illuminating quotations in the entry itself. I don’t see that general rule causing many arguments.  (u):Raifʻhār (t):Doremítzwr﴿ 14:30, 6 November 2007 (UTC)[reply]
The Citations namespace was already voted on. They are in no way redundant. We may very well have citations collected which cannot be placed into an existing definition sense, or whose use in one or another sense is contentious.
As far as I'm concerned, the goal should always be towards a citations page loaded with reference citations, since that forms the data that underpins dictionary definitions. On the entry itself, we should select one (or two) quality citations to place under a single definition sense. Collapsibility works best if limited to the Quotations section, and even there it is limited use, since a long list of them should be moved to a Citations page. I don't think any of this fuss is really worth the effort. --EncycloPetey 14:10, 6 November 2007 (UTC)[reply]
Both quotations subsections and citations subpages have the problem of introducing “redundant user-maintained data fields” into entries (in database construction, a cardinal sin, apparently). Until a means is found to program the software to copy definitions to glosses, this problem with quotations subsections and citations subpages alike will be a major one — of particular concern to highly polysemic words. (I personally like the subpages, but this shortcoming cannot be denied.) Of course, none of this applies to quotations of words which do not yet satisfy the CFI, which therefore do not have senses written for them, and can in any case only be kept in the Citations: namespace. I am more than slightly surprised that you cannot see the brilliant utility of this new innovation of Ruakh’s.  (u):Raifʻhār (t):Doremítzwr﴿ 14:21, 6 November 2007 (UTC)[reply]
Obviously, you are too new to remember the various wheel wars over using templates at all. This is horrific. The over-complexity of the collapsible sections within definitions is no simplification at all. The addition of HTML elements is likewise horrific. The interspersing of complex HTML constructions within a numbered list is guaranteed to break with the next random HTMLTidy update. The Citations: namespace is where these belong - having a small number of very relevant citations near the definitions is a good thing, but most of the time, is better when arranged in a ===Quotations=== section, "disambiguated" by "gloss." Note that "gloss" in this context is an un-ambiguous reduction of the definition, not the definition repeated. Going through astronomical gyrations to achieve interspersed collapsible tables leaves our already difficult (for newcomers) syntax utterly incomprehensible. While limiting all input to three or four people who understand the syntax, you (inadvertently) raise the barrier to entry, no longer a matter of minor formatting, newcomers would no longer have the ability to provide the citations they have found, with such a completely incomprehensible counter-intuitive syntax. On the other hand, the collapsible sections within the ===Quotations=== sections have been discussed and agreed on, as obvious improvements. I do not comprehend the rationale of purposefully working against the ===Quotations=== heading by interspersing large volumes of quotations within definitions. Perhaps it was simply a mistake to allow quotations to be placed between definitions at all. But the farther from plain-text we get, the worse it is. The more template levels of opacity we add, the worse it is. --Connel MacKenzie 15:18, 6 November 2007 (UTC)[reply]
I disagree that this adds unwieldy complexity. The proposed templates are only as complex as rel-tables and trans-tables (assuming, of course, that the various paramaters such as examples=, nopadding=, and nocollapse= are not present in the final version of the template) — and they, like this one, add a great deal of functionality to an entry. Conversely, the problem of “redundant user-maintained data fields” has still not been addressed. Whatever issues there were with using templates in the beginning, I think it’s safe to say that there has been a major change of opinion since then. BTW, I don’t understand what you mean by “the next random HTMLTidy update” — please elaborate.  (u):Raifʻhār (t):Doremítzwr﴿ 15:41, 6 November 2007 (UTC)[reply]
One of the largest influencers of the W3C is Microsoft, who mandated (not so long ago) that <DIV>s can't traverse other count elements. HTMLtidy is a utility used by the MediaWiki software that forces compliance with the W3C regulations. When it was first turned on, it broke many pages (on this site and more publicly on Wikipedia.) With each quarterly update, more things seem to break with basic page rendering, particularly when the DIV restriction was added (again, breaking pages here and more publicly, pages on Wikipedia.) --Connel MacKenzie 16:02, 6 November 2007 (UTC)[reply]
I think the problme is that there is a need to link quotations to specific senses. To me that is absolutely crucial. Without that, citations are (to me) all but useless on some complicated words. I voted for the Citations namespace as well; I am a fan of it. But if we reiterate the definitions there, we need to reproduce the definitions wholesale on the citations page, which makes you wonder what the point of the original page is... Widsith 15:36, 6 November 2007 (UTC)[reply]
That relates to the misconception that a gloss is a definition - it is not. A gloss is supposed to be just part of the definition - just enough so that the glosses themselves are not ambiguous about which definition they refer to. That allows the definition to be reworded as needed, without confounding the "disambiguations" in other sections (e.g. Translations.) A "Citations:" page, as I understood it, was to join real-world lexicographers in providing ample examples of use. The ===Quotations=== sections were instead to highlight the most relevant quotations that complement the definitions very well. --Connel MacKenzie 16:02, 6 November 2007 (UTC)[reply]
Fine by me, as long as the quotations are not all lumped together I'll be happy. Widsith 16:17, 6 November 2007 (UTC)[reply]
You're overreacting. This will not make for overly-complicated wikitext; it involves the addition of two lines, one with {{quotations-top}} before the quotations and one with {{quotations-bottom}} after them — and thanks to HTMLTidy, any mistakes people might make can only affect the display of quotations after the specific sense(s) they're messing with. As for your claims about Microsoft/W3 specs/HTMLTidy changing: the XHTML 1.0 spec hasn't changed since August 2002, and has no errata. So, I really don't know what you might be thinking of. —RuakhTALK 16:29, 6 November 2007 (UTC)[reply]
WTF? I was answering a questing, not overreacting. HTMLTidy had a painful integration into the MW software (I guess that was before your time?) and had other horrific problems when it was revised, last. So, I don't know what you might be thinking of...I was talking about MediaWiki.
Your two additional templates are two too many. Those would be interspersed on definition lines before and after actual definition information, within the raw "#" definition lines. Sorry, no, that is not good. Your rendered output itself is bizarrely complicated as well. Kept where they belong, in a =Quotations= section or sub-page, none of the bizarre syntax is needed and the result is coherent. --Connel MacKenzie 00:39, 11 November 2007 (UTC)[reply]

Preferential treatment of one spelling over the others

Are we encouraging this sort of thing now or are we still against it: http://en.wiktionary.org/w/index.php?title=facade&diff=1214883&oldid=1030374

Who gets to decide which one gets to be the full article when even diferent dictionaries don't agree? — Hippietrail 06:52, 6 November 2007 (UTC)[reply]

I think we should have only one full entry, but we should stop viewing this as a "preferential treatment" of one over another; for example, butterknives vs. butter knives (an apparent attempt to denigrate one spelling by calling it an "alternative" of the other) is a bad idea. —RuakhTALK 06:56, 6 November 2007 (UTC)[reply]
Well that would be nice, but while the format of the dictionary is under our control, I doubt that the psychology of people using it is. Also standardisation and consistency is good. It would seem unprofessional to have color point to colour but coloured point to colored. And since those very articles have been contentious here for years I think you've got a tough task ahead of you if you're volunteering to be the one to stop caring about UK vs US spellings etc... )-: — Hippietrail 08:06, 6 November 2007 (UTC)[reply]
We can draw up a fair few criteria which can help us decide these issues…
In the case of façade vs. facade, façade is preferable because:
  1. It is identical in spelling to the word in French whence it derives, thus allowing users to follow its etymology whilst needing to click one fewer link; and
  2. The cedilla indicates that the ‘c’ is correctly pronounced as [s], and not [k] as would be implied if it were omitted (the general rule being that an “unadorned” ‘c’ is pronounced as [k] before consonants and the vowels ‘a’, ‘o’, and ‘u’, whereas it pronounced as [s] before the vowels ‘e’ and ‘i’).
In the case of butterknives vs. butter knives, butterknives is slightly preferable (on technical grounds) because a “butter knife” may refer to a knife made of butter (not that such confusion would ever arise outside of comedy).
Actually, scrap that; consider butterfingers.  (u):Raifʻhār (t):Doremítzwr﴿ 15:47, 6 November 2007 (UTC)[reply]
Google shows that butter knife is clearly (and by a vast margin) the preferred use - 500,000+ web hits (compared to 60,000+ for butterknife), over 700 books hits (compared to 169 for butterknife). Also, the term doubtless originated as a phrase (like carving knife, hunting knife, and steak knife), from which some people have gradually dropped the space. Enough to make it acceptable and not a typo, but hardly enough to make it other than an alternative to the norm. Cheers! bd2412 T 08:29, 14 November 2007 (UTC)[reply]
Yes, I agree with your analysis.  (u):Raifʻhār (t):Doremítzwr﴿ 11:32, 14 November 2007 (UTC)[reply]
I share Ruakh’s sentiments that these alternative spellings should not be assumed to be “inferior” just for being so listed; however, as experience and Hippietrail note, that seems to be an inexorable assumption made by a great many users and editors here.  (u):Raifʻhār (t):Doremítzwr﴿ 14:43, 6 November 2007 (UTC)[reply]
Well Hippietrail, AFAIK, yes, we are still vehemently against that sort of edit. They aren't all caught/noticed, though. There are numerous reasons to have complete entries for "alternative spellings" with the only exception being the ===Translations=== subsection. (Non-native English speaking contributors are not expected to understand subtle differences in meaning, I guess.) Particularly for regional spellings, the morphology itself is different; while a new sense may make sense in one morphology, it may not in another (confer: color/colour.) That is, the separate regional spellings should be given full wiki-freedom to diverge from each other (where appropriate.) FWIW, I never spell facade incorrectly as façade. The foreign-language spelling should be listed as an inferior/offensive spelling. (That is, precisely opposite how it is currently listed, here.) --Connel MacKenzie 15:33, 6 November 2007 (UTC)[reply]
Whoa, whoa, whoa. There is nothing incorrect, inferior and certainly not offensive about it. What are your grounds for saying this stuff? Widsith 15:49, 6 November 2007 (UTC)[reply]
It most certainly is pompous and offensive to use characters that aren't even in our alphabet to spell a fully-naturalized borrowed term. The French word is façade, the English word is facade. That's why "façade" gets the magic red-squiggle under it here, with the spelling-corrections suggestion being "facade." Is that different in your dialect? --Connel MacKenzie 16:07, 6 November 2007 (UTC)[reply]
Façade is listed as a valid alternative spelling at M-W and in Chambers, and as the only spelling at the OED. It gets more than 3,500 Google Books hits. So I must ask again what your authority is for saying it's wrong. Widsith 16:15, 6 November 2007 (UTC)[reply]
I believe dictionary.com lists it as an "other" spelling because it is abused in marketing and advertisements here. As a native English speaker, the basic rules of morphology are obvious to me, to I'll have to visit the library to dig up several usage guide's discussions for it. I presume that you are assuming that because it is a French word, the Google hits mean something? You did a comparison right? At best, the adorned spelling could be considered "rare", but that would miss the important distinction that spelling has: that it is #1) impossible in normal communication (typewriter/computer/e-mail/handwritten notes) #2) utterly ridiculous in the context of a Wild-West ghost town, #3) pejorative in both meaning and hauteur, #4) sometimes abused by marketeers. I ask again: is that spelling considered valid in your dialect? More to the point, is that considered the primary spelling (in English, not French) anywhere? --Connel MacKenzie 16:54, 6 November 2007 (UTC)[reply]
Lol. Preferential spellings. Some spellings are more widely used. That's basically a fact.
I've never seen the word "facade" spelled without the cedilla "c" in a novel or any other book. As a native English speaker, façade is more accurate. Without the cedilla, there is no indication (as stated by Dor above) that the c should be pronounced as [s] and not [k]. Whether it has been "fully-naturalized" or not does not seem at all relevant to me. Both spellings are recognized and accepted and I don't see why you'd make such a big deal out of it (or any other alternative spellings). — [ ric | opiaterein ] — 17:25, 6 November 2007 (UTC)[reply]
Note: in the interest of WikiLove, part of the preceding comment has been removed by another editor.
The French word remains French. The process of naturalizing the word in English changed its spelling. So it is no longer italicized and no longer written in a foreign script. That's why the adorned spelling is kicked out by most (all?) English spell checkers. By "more accurate" perhaps you mean "more alien" and therefore closer to its etymological roots? Sorry, but the word has been naturalized into English using our alphabet, not the French alphabet. --Connel MacKenzie 17:51, 6 November 2007 (UTC)[reply]
Note: in the interest of WikiLove, part of the preceding comment has been removed by another editor.
What do Wikt and your other dictionaries say about déjà vu, née, mañana. A lot of borrowed words keep their original form in English. Façade is one of them. Algrif 17:46, 6 November 2007 (UTC)[reply]
There is an enormous difference between accents and characters in a different script. But even assuming your premise is correct, facade is not one of them; it has been fully naturalized. --Connel MacKenzie 17:51, 6 November 2007 (UTC)[reply]
I always thought the cedilla was an accent. Ç doesn't appear in the French alphabet does it? OTOH Ñ does appear in the Spanish alphabet, and as such it is a different script. Or not? Algrif 18:00, 6 November 2007 (UTC)[reply]
You are correct. —RuakhTALK 18:35, 6 November 2007 (UTC)[reply]
Is he? Arguing semantics about characters vs. letters with accents (rendered as characters) seems more than slightly inane. And very much beside the point. English morphology does not include the borrowed diacritics, especially when there is no existing English word in conflict. --Connel MacKenzie 05:20, 7 November 2007 (UTC)[reply]
But you're the one who made the claim "There is an enormous difference between accents and characters in a different script"; Algrif was addressing your point on your terms. If you think that's "more than slightly inane" and "very much beside the point", then that's your fault for bringing it up. (N.B. If anyone else pulled the about-face you just did, you'd claim it was an intentional deception, you'd use it as proof that they were trolling, and you'd request a CheckUser and block. Fortunately for you, I'm not you, and won't do any of those things.)RuakhTALK 06:28, 7 November 2007 (UTC)[reply]
Ouch. OK, re-reading what I wrote, indeed, I share some blame for mixing inappropriate terminology. However, I was taught that "c" and "ç" were separate letters in the French alphabet...but that was so long ago, I could be mixing that up with the computing interpretation of what a character is. Nevertheless, the cedilla is not an "accent" used in English...it is only an accent used in other languages (such as French, where the word comes from.) But all that is beside the point - the word facade is fully naturalized. --Connel MacKenzie 07:13, 7 November 2007 (UTC)[reply]
For what it's worth, my spell checker (MS Office Word 2007) corrects *facade to façade. Rod (A. Smith) 18:49, 6 November 2007 (UTC)[reply]

Nice trolls, all around. All very amusing. I see that people are taking liberties to quash comments they don't agree with - very nice indeed. Ric: this is a discussion about the treatment of those alternate spellings. If that makes your butt hurt, I have to wonder why you joined in. Algrif: interesting twist of semantics - so, if you consider the cedilla only an accent, surely you realize it is used in French, not English? Rod: OK, good spell checkers (rather than "all?" spell-checkers.) --Connel MacKenzie 19:22, 6 November 2007 (UTC)[reply]

What "twist" of semantics do you refer? Basic English used for basic logical reasoning. Accents are used in imported words. French not AND English. déjà vu, née, fiancée and, I'm afraid, façade. Algrif 00:08, 7 November 2007 (UTC)[reply]
Well, gee, I can hardly guess at what you are smoking now. Which of the insults were removed from the above comments this time? I was talking about a fully naturalized word - a word that has existed in the English language for a very long time. You and Doremitzwr may think that words in italics aren't actually using foreign terms...sorry. But that is not a naturalized term - that is a foreign borrowing. When naturalized into English, the foreign markings are not retained. There are a tiny number of counter examples which you have not named. That doesn't, however, negate normal rules that govern the very natural morphology as a term is naturalized. The English terms are deja vu, nee, not the borrowings you identified. --Connel MacKenzie 05:20, 7 November 2007 (UTC)[reply]
You seem to be confused and angry, and some other things I'm not going to try to put into words.
é, à and ñ are English? I wonder, when did this happen?
Alternative spellings should be treated as such. Alternatives. Being an alternative doesn't make something bad. It makes it something different. — [ ric | opiaterein ] — 19:33, 6 November 2007 (UTC)[reply]
Gee, your comments were bowdlerized again. Imagine that. --Connel MacKenzie 05:20, 7 November 2007 (UTC)[reply]
Connel, which spell checkers have you checked? Rod (A. Smith) 20:00, 6 November 2007 (UTC)[reply]
Several variants of @(#) International Ispell Version 3.1.20 (but really Aspell 0.60.3) (sorry, didn't realized they were all using the same starting point at first,) plus the various flavors of FireFox, etc. I can't recheck OOo right now, but I think that is ispell based, too. Let's see, there's also W1913, Wordnet, m-w.com, dictionary.com (find main entries at facade), Cambridge (Hey, didn't someone lie and say it waws there?) and Bartleby . --Connel MacKenzie 05:20, 7 November 2007 (UTC)[reply]
It doesn't even matter. Both are completely correct, though façade is better. Connel just seems...slightly crazy. :-( — [ ric | opiaterein ] — 20:09, 6 November 2007 (UTC)[reply]
Ohhh, nice troll again. Bravo. You must've studied up: personal attacks against me are generally rewarded here. Spelling it with the cedilla may be better in French, but not in English. --Connel MacKenzie 05:20, 7 November 2007 (UTC)[reply]
Better, yes... if one is 1) writing in French or 2) a prissy pedant wishing to demonstrate superior intellect. -- Thisis0 22:53, 6 November 2007 (UTC)[reply]
Well, I'll be damned. My Microsoft Word just auto-corrected to façade without even asking. -- Thisis0 22:55, 6 November 2007 (UTC)[reply]
It does that with common "misspellings". — [ ric | opiaterein ] — 23:02, 6 November 2007 (UTC)[reply]
It is also known for being error prone, inflexible and incapable of correcting its own errors. --Connel MacKenzie 05:20, 7 November 2007 (UTC)[reply]
You know, technology fixes everything. I bet the people who wrote the 4700 books scanned by Google wish they had that particular modern advancement. Boy, especially to have so many misspellings emblazoned right on the front of so many books in huge print? They must all be kicking themselves for that. -- Thisis0 23:22, 6 November 2007 (UTC)[reply]
Most of those Google hits seem to be scanos. Rod (A. Smith) 23:35, 6 November 2007 (UTC)[reply]
quite. --Connel MacKenzie 05:20, 7 November 2007 (UTC)[reply]
Hmm... 'tis true; glancing at the "Full view" books, 'tis true. I just feel bad for those poor saps who have it emblazoned on their covers. Guess they didn't know how to type %C3%A7, cause neither did I. Well, I guess when you borrow a word you shouldn't Wal-mart-ize it just 'cause your language doesn't have fancy c's. -- Thisis0 23:54, 6 November 2007 (UTC)[reply]
This, if you'll notice the quotation marks. You're pretty unpleasant. :-( Makes me sad. See the sad face? Your Microsoft did correct it (automatically, too) afterall.
Not all fonts contain the ç character, and a lot of people are too lazy to actually look for it, so do you think it might be possible that using "facade" might just be easier? — [ ric | opiaterein ] — 23:44, 6 November 2007 (UTC)[reply]
Sorry, but that is absurd. The word facade was naturalized into the English language long before any computer was invented. The rare exception of a pompous publisher adding the French spelling does not change the English word. --Connel MacKenzie 05:20, 7 November 2007 (UTC)[reply]
Glad you agree.-- Thisis0 00:00, 7 November 2007 (UTC)[reply]
So now s/o is going to construct a Bot to search and destroy all words with accents in an English L2 header. I suppose I could see it coming deja vu. Fare thee well fair fiancee. Adios manana. (does this rhyme with banana now?) Algrif 00:16, 7 November 2007 (UTC)[reply]
As we say over here, hasta banana.  :-)   --Connel MacKenzie 05:20, 7 November 2007 (UTC)[reply]

I hope that you’ll all agree that this entire discussion has hitherto been pretty pointless. Now, can anyone offer any functional reason why the “primary” English entry ought to be housed at facade?  (u):Raifʻhār (t):Doremítzwr﴿ 00:33, 7 November 2007 (UTC)[reply]

You still haven't demonstrated any evidence that shows the primacy of your spelling, in your dialect. --Connel MacKenzie 05:20, 7 November 2007 (UTC)[reply]
That wasn’t really my intention. My intention was to provide functional bases for prescription. Unfortunately, I think this discussion is too polemical for that. FWIW, I’ve listed two dictionaries at the entry which only list façade. IMO, façade is preferred generally, especially in published writs, but that the cedilla is often omitted due to ignorance, laziness, or some similar human vice.  (u):Raifʻhār (t):Doremítzwr﴿ 13:44, 7 November 2007 (UTC)[reply]
What a spectacular revelation, User:Doremitzwr. Because the "discussion is too polemical" you resort to idiotic attacks? You personally feel that foreign diacritics belong in English, so you push your POV with insults like that? It is not laziness - that character combination doesn't exist in English. That's why it is stupid for the OED to ignorantly use the French diacritics in their spelling. From the references above, it is pretty clear that most American references aren't stupid enough to think that English is French. --Connel MacKenzie 19:44, 10 November 2007 (UTC)[reply]
“Because the ‘discussion is too polemical’ you resort to idiotic attacks?” — Do you disagree that there has been much pointless flaming from editors on both “sides” of this “discussion”?
“You personally feel that foreign diacritics belong in English, so you push your POV with insults like that? It is not laziness - that character combination doesn’t exist in English.” — I agree that the cedilla is a foreign diacritic (in that it is not productive in English, as far as I’m aware of). Nonetheless, “that character combination” (presumably ‘ç’) does exist in English — there is overwhelming verification thereof. Whether it should or not, is up for discussion; the fact that it is, is not. Whether you call it laziness or a lack of pedantry, most people cannot be bothered going to the effort of inserting characters like ‘ç’ if they do not appear on their keyboards. As for ignorance, both the widespread use of the facade spelling and the absence of understanding of what a cedilla does to the pronunciation of a ‘c’ mean that people are less likely to use the façade spelling. (Conversely, the fact that Microsoft Word™ and other word processors autocorrect facade to façade make it more likely for a person to use the latter spelling, as most people don’t question most of the autocorrecting that Word does.)
Nota that this discussion is over 42 Kb long.  (u):Raifʻhār (t):Doremítzwr﴿ 13:20, 13 November 2007 (UTC)[reply]
As you point out, any laziness is from people not questioning Microsoft Word's incorrect "autocorrection." What other spell-checker that you know of makes that mistake? Does Microsoft Word have that behavior for historical reasons, such as to break compatibility with raw-text editors? While that seems very likely, it cannot be proven easily. (It is much more difficult to prove now, since even plain-text editors today support Unicode.) I'm not sure what you are trying to suggest by saying people don't use characters that aren't on their keyboards. Characters that aren't part of a language, aren't put on a keyboard for that language. If I were talking about the French word façade, I'd simply italicize it and use the French spelling. If I wish to use the English word, I'd simply type facade. Have you done even preliminary web searches (e.g. "façade -facade" vs. "facade -façade"?) How then, even with that "popular" editor "autocorrecting" everyone's English words to French, is there such a gigantic margin in favor of the unadorned spelling? Could it be, that most people were taught, as I was, to use English spellings for naturalized English words; French spellings for French words? --Connel MacKenzie 17:00, 13 November 2007 (UTC)[reply]
  1. I am aware of the workings of very few spell-checkers.
  2. “Does Microsoft Word have that behavior for historical reasons, such as to break compatibility with raw-text editors? While that seems very likely, it cannot be proven easily.” — Proof or not, I would be more likely to believe you if you explained what motive they would have for doing so.
  3. Most internet text is not prepared on Micorsoft Word.
  4. People are not usually taught what diacritics do in English, and with the exception of the acute accent, most people cannot infer what pronunciatory changes they denote; as such, most people don’t realise that the cedilla in façade indicates that that word is pronounced as /fəˈsɑːd/ (and not as /fəˈkɑːd/ or somesuch).
 (u):Raifʻhār (t):Doremítzwr﴿ 21:49, 13 November 2007 (UTC)[reply]
Connel, when you say that his preferred spelling (which is mine as well, BTW) is incorrect, inferior, and offensive, and that you as a native speaker consider it obvious that it violates basic rules of "morphology" (a word that you haven't looked up), you can't feign high-mindedness when he puts forth his contrary opinions. And when you go on to resume your insults toward his preferred spelling (“it is stupid for the OED to ignorantly […]”), you just paint yourself as a hypocrite. —RuakhTALK 22:32, 10 November 2007 (UTC)[reply]
Please, go look up the word morphology in your favorite dictionary. The word facade itself changed when it was naturalized into English - the word's pattern of formation is using characters in the alphabet used in the English language, not some other alphabet, replete with that script's diacritics. Please - if you are so stupid as to suggest I misused a word (when I did not) then at least do the world a favor and look the word up yourself, first. --Connel MacKenzie 17:00, 13 November 2007 (UTC)[reply]
No, Connel, you are using morphology incorrectly; you mean orthography.  (u):Raifʻhār (t):Doremítzwr﴿ 21:49, 13 November 2007 (UTC)[reply]
See, what you have to understand is that Connel is a native English speaker, born (like me) in the United States of America. All the rest of you can sit around and armchairily speculate about hypothetical abstract justifications for retaining the odiously foreign spelling "façade", but Connel doesn't have to speculate, he knows. "Facade" has been "fully naturalized" into the English language, so it doesn't have that funny squiggle that's not even on his keyboard. It's as simple as that. If you spell it "façade", you're wrong. If the New Yorker spells it "façade", they're being pretentious. If the OED spells it "façade", who cares; what do they know about English?
I hope this clears things up. —scs 03:57, 13 November 2007 (UTC)[reply]
I didn't realize that the selective bowdlerizing above could still be so greatly misunderstood. Numerous attacks were reworded deceptively. Yes, the trolls had their enormous troll-fest here. One slams a troll in, another waits for a response them rewords the troll in the name of "wikilove." Great. Then they slam in a counter-troll to the un-bowdlerized response? Oh please. They were asserting that the French spelling is the only spelling in English. My responses were at times rude, to be sure, but no where near the virulence of the attacks. The primary spelling here, is without a doubt, without the cedilla. That's why I provided evidence in the form of links, above. For another example, take a look at this - note that searches there are normalized, so you have to actually view some sample articles...I didn't see any with the French spelling. Gosh and be-golly - I just even checked my typewriters...not a single one has the "funny C." I don't think I said that BrEng prohibits it - in fact, I don't recall saying anything about that dialect. See, I asked if it was different...once someone said (I haven't verified this yet) that their OED lists only the French spelling, I don't believe I asked anything more, in that vein. Not that the OED is a paragon of usability, anyhow. But since you mention it, in terms of usable dictionaries, the OED is one of the poorest (in my humble opinion.) Their British-English bias is not denied nor deniable (but I'm not sure what relevance you are trying to assert by that, anyhow.) --Connel MacKenzie 07:17, 13 November 2007 (UTC)[reply]
Checking Oxford Reference Online, the listing in COED is facade ... - ORIGIN C17: from Fr. façade... So wha'dya know - even Oxford understands the difference between English and French. I would never have guessed, from the assertions given above. --Connel MacKenzie 07:43, 13 November 2007 (UTC)[reply]
But Connel, what you too seem to misunderstand is that this discussion is not (or, at least, should not be) about whether facade or façade is "more valid". It is abundantly clear that, at some significant level, they are both valid. (Maybe not equally valid in everyone's eyes, but both valid.) So if we can have both color and colour in our dictionary without our heads exploding, it ought to be a simple matter to also have both facade and façade.
We don't call the British wrong for spelling color with a u, and we shouldn't call the francophiles and the New Yorker readers and the OED editors wrong for spelling facade with a ç. Okay? —scs 17:01, 13 November 2007 (UTC)[reply]
I'm not sure what you mean by "functional" here. I search Groups for "facade" (210,000), "façade" (230,000), and "facade -façade" (20,000). Some points from this:
  • 10:1 preference among those groups of "ordinary" folks
  • many things can be accomplished with character maps
  • 20,000 is a lot of uses of "façade"

In my life I doubt that I have used extended character sets for non-graphical use on more than a twenty occasions. That would be a very low annual rate. I would think that it would be very useful to include the extended-character-set renderings and several forms of pronunciation guides for those entries where the lack of extended characters makes a difference. But I would expect that most users don't notice the orthographic difference between "façade" and "facade". DCDuring 01:00, 7 November 2007 (UTC)[reply]

  • The point of this discussion, by the way, is to justify keeping the misspelled entry façade, instead of converting it back to the correct "{{alternative spelling of|facade}}" type entry. The trolls that wish to promote façade as a valid spelling at all haven't yet demonstrated just how militant their language-corruption / spelling-reform POV actions are, but they have demonstrated that they routinely thwack valid spellings in favor of their (rare/very rare) preferred spellings. Of course such actions are in direct conflict with Wiktionary. Of course, such nonsense is in direct conflict with "all words in all languages." Of course such actions undermine (if not directly conflict) with supporting all dialects. Of course, the original vandal that thwacked the facade entry's content should be indef'ed. --Connel MacKenzie 05:20, 7 November 2007 (UTC)[reply]

FWIW, the OED has only façade -- facade is not even listed as a variant. And it is treated as naturalized in this form (non-naturalized words are marked with a || symbol.) --Ptcamn 06:28, 7 November 2007 (UTC)[reply]

Hah, it's even spelled "façade" in Dutch. Why is this such a big deal. I vaguely remember a huge discussion about not using "Alternative spelling" because it showed "preference" or something to one spelling or another. Connel seems to be fighting pretty hard to see one spelling preferred, or am I just reading him incorrectly? — [ ric | opiaterein ] — 16:16, 7 November 2007 (UTC)[reply]
I believe you are. This thread started in response to an editor erasing the entry at facade. It should be restored to equal footing with its erudite alternative. It is the squiggle-philes who have been promoting a pedantic preference. Don't think for a second that someone will come to this dictionary and type façade. Please. 99.9% of folks wouldn't know how to try. WordNet wouldn't even let you search for it if you knew how. Follow that link and tell me how you feel about the word invalid in your "sad" face :-(
To this moment, I don't know how to type a ç and make it appear here. Honestly, who here in this entire thread actually typed it without the keys [Ctrl-C] and [Ctrl-V]? -- Thisis0 20:08, 7 November 2007 (UTC)[reply]
It's funny how if you search on wiktionary for "facade" they BOTH come up! You don't need to search with the special characters to get the page you want. You can search for "xingyugaochao" and even though an entry written exactly like that doesn't exist, the results of the search are still pretty damn useful. — [ ric | opiaterein ] — 20:38, 7 November 2007 (UTC)[reply]
It's just stricken me that Thisis, you're even more confused here than I am.Nope. -- Thisis0 21:50, 7 November 2007 (UTC) The "squiggle-philes" seem to be the ones who want both entries to remain intact with English sections, while it seems to me that Connel is saying that that (copyxpaste) façade is not correct in English and should be removed? So if I'm not still all kinds of confused, which side are you on exactly? — [ ric | opiaterein ] — 21:02, 7 November 2007 (UTC)[reply]
Side? Now I know why you're so contentious. Side. As for my opinion, I think 1) facade was massacred and should be restored to equal status; 2) I'm glad Wiktionary's search is more intuitive than WordNet's and leads you to both entries; 3) I still don't know how to type ç. Though I am not confused about the issues here, I certainly am perplexed at how you can call facade a "misspelling" and at the same time claim you have been advocating its equality. I realize Connel seems to be arguing to raise facade as the preferred spelling, but to circumvent further confusion, stop putting me on a "side". The task to be done is to restore facade and co-identify the alternates, avoiding any sense of preference lest we become prescriptive, while also ensuring what a user is likely to type will lead him to the right place. If nothing else, what are we still yammering about? -- Thisis0 21:50, 7 November 2007 (UTC)[reply]
Exactly why I was bidding a sad, although jocular, goodbye (above) to accented entries in EnglishL2. The entries should have more or less equal standing. In this particular case, I'm not over concerned where the main entry for fac(ç)ade is housed. You can find one from the other v. easily, even if you are one of the poor 30% (odd) who do not possess the Ç key on their keyboard, or do not know how to access the symbols from the control panel. But if you remove an entry like this, well, it won't be found. Making the dictionary not what was intended to be - usable and useful. Algrif 21:22, 7 November 2007 (UTC)[reply]
You have Ç on your keyboard?? Is it on all British keyboards? What makes you say that 70% of people have this key? In America (vast majority of internet and Wiktionary users) this key does not exist. Access characters from control panel? Please explain this to an apparently clueless guy who thought he was pretty tech-savvy. And in any case, i thought you said the capital Ç didn't exist in French. (guess i am confused now, ric) -- Thisis0 21:55, 8 November 2007 (UTC)[reply]
This argument is a bit silly, but since you're asking: (1) <Ç/ç> does exist in French, but is not a separate letter in the French alphabet — it's just a <C/c> with a cedilla attached — unlike, say, Spanish <Ñ/ñ>, which is a separate letter coming between <N/n> and <O/o>. But that's kind of irrelevant; one editor brought it up as a potentially relevant distinction, but as far as I'm aware neither he nor anyone else is still considering it relevant. (2) "Most people don't have this diacritic on their keyboards" is not, in and of itself, any sort of argument. Certainly the French entry should be at façade no matter what we decide here, even though we're the English Wiktionary and most of our readers can't type that; but that's how the word is spelled in French, and that's that. Similarly, in English, the word is spelled both facade and façade, so clearly we need entries at both spellings. Personally, I think it would make sense for one spelling to be a simple "alternative spelling" entry so we don't need to keep them synchronized, but there's one editor who will presumably freak out if make facade the "alternative spelling" entry, and he might have a counterpart who will freak out if we do the reverse, so I guess we'll need to keep both of them. (I don't know how someone can be offended by an alternative spelling they don't use — I mean, aside from alternative spellings of their name or something — but obviously people are capable of being offended by absolutely anything, so what can you do?) —RuakhTALK 23:23, 8 November 2007 (UTC)[reply]
Entirely agree, Ruakh. But just to answer Thisis0, please remember that only a small percentage of the world and internet users actualy live in USA. The rest mostly have Ç on their keyboard, including many UK citizens. And if not, then it is not at all difficult to find it in the symbols character set. (ñ however is another kettle of fish!). Algrif 00:31, 9 November 2007 (UTC)[reply]
Appreciate the responses, but I want to say that I didn't think it was an argument any longer and apologize for it appearing so. As far as the entry goes (and other examples of alternate spellings), equal status is important so we avoid the appearance of prescription. I wouldn't want you to think the reason is to keep an editor from freaking out. There should be better reasons, and there are. I feel that the editor's reactions (actually getting offended), and any pushing for displaying a preferred spelling one way or the other are not appropriate.
Re: "Ç" -- even with kind responses I still don't know how to create this character from scratch.
And re: internet users -- I understand fully what a small part of the world population the US is, but I was submitting that it makes up the largest percentage of internet users, especially users of en.wiktionary. I also still don't get the reference to 30% of people who don't have the Ç key. This doesn't make sense in any analysis of world population. -- Thisis0 02:53, 9 November 2007 (UTC)[reply]
Post script: I looked on amazon.co.uk and amazon.fr for keyboards (and clavier) and couldn't find one sporting a Ç key. I'd really love to see one. -- Thisis0 03:14, 9 November 2007 (UTC)[reply]
Hi. 30% guesstimate (US + UK), 30% guesstimate (rest of Europe, and Russia, where Ç is needed), 40% guesstimate (Rest of world, where SA uses a Spanish keyboard, and most African & Asian countries use a French or Spanish keyboard). You can ask any manufacturer/supplier to put the keys you want, and instal the keyboard for a particular character set. I use a Benq with Ñ and Ç keys along with universal accent keys. It gives me the $, but if I reset the character set, I can get the £ sign (which I just inserted in 10 secs from the symbols set). Algrif 10:56, 10 November 2007 (UTC) p.s. http://academic.cuesta.edu/rsutter/keyboard.html Algrif 11:11, 10 November 2007 (UTC)[reply]
Ruakh, sorry, but you are completely wrong. It isn't about "being offended." And it certainly isn't silly. The POV that removed content to show preference for their spelling (which isn't a word in my dialect of English) is very much in direct conflict with most everything we do here at Wiktionary. As a multi-lingual dictionary, the spelling is of utmost importance. That's why we have separate entries for word forms. That is also why we have different language sections included on any given page for a particular spelling. That is also why we had to have elaborate statistics generated, just to discount the form-of entries. That is why we don't have redirects (neither for valid spellings, nor for misspellings.) That is why interwiki links work the way they do. That is why translation tables are laid out as they are. That is why we have 'alternative spellings,' 'alternative forms,' {{see}}, 'synonyms,' 'related terms,' 'derived terms,' 'see also' and other similar sections. That is why we have the multiple etymology structure. If you wish to assert that spelling is now secondary, I suggest you garner support from everyone that has built en.wiktionary.org these last few years, first. While I have my own opinions about why you pursue abominations like a "Root" heading, I had no inclination that you don't understand the most-basic principles here. I don't understand how you could be contributing here at all, with such a fundamental misconception hampering you. It is an error to think that Wiktionary can function like other (monolingual or bilingual) dictionaries - our scope is much larger. --Connel MacKenzie 19:34, 10 November 2007 (UTC)[reply]
You know, this is one of the reasons I stopped contributing here as much. The bombast inherent in statements such as "you are completely wrong" and "the misspelled entry façade" and "The trolls that wish to promote façade as a valid spelling" is just terribly offputting (not to mention completely wrong). —scs 00:05, 11 November 2007 (UTC)[reply]
I'm sure it must seem that way, when you arrive at a particular discussion that fits into part of a larger campaign by some to incorporate exotic spellings as valid as the only valid, primary spelling. That seems the main point trying to be pushed, at any rate. I'm sorry you find that off-putting, but it is a small part of a larger debate. It is impossible to keep rewording things neutrally, while under a constant barrage. Ruakh's comment that "[it is about] being offended" is 100% completely and totally wrong - that is just another underhanded attack. Such attacks, I label as trolls because they are. He is pushing to assert something he knows is wrong, and against conventions here. To not identify his comments as the trolls they are, is to permit him to continue in the same vein. --Connel MacKenzie 17:20, 12 November 2007 (UTC) (edit) 17:59, 12 November 2007 (UTC)[reply]
Nonsense, Connel. Most editors here are asserting something they know to be right. There is no Académie to protect an English character set. Instead, each letter, character, and accent mark is English to whatever extent English speakers use it. There is no need to assume anything here other than good faith. Besides, Ruakh is not pushing to format either one as an alternative spelling, but to keep both entries so that you won't freak out. Rod (A. Smith) 18:09, 12 November 2007 (UTC)[reply]
Rod, you seem to be confused. What you said is factually incorrect. He is pushing to replace content with a soft redirect, not to "keep both entries." --Connel MacKenzie 21:15, 12 November 2007 (UTC)[reply]
No, the most recent comment from Ruakh above says, "... the word is spelled both facade and façade, so clearly we need entries at both spellings. Personally, I think it would make sense for one spelling to be a simple "alternative spelling" entry so we don't need to keep them synchronized, but there's one editor who will presumably freak out if make facade the "alternative spelling" entry, and he might have a counterpart who will freak out if we do the reverse, so I guess we'll need to keep both of them." Rod (A. Smith) 21:21, 12 November 2007 (UTC)[reply]
The relevant potion of what you quoted, expressed the heretical statement he made: "I think it would make sense for one spelling to be a simple "alternative spelling" entry so we don't need to keep them synchronized." That, by the way, is exactly what we call a "soft redirect" here. If you look way, waaay up at the very first link of this section, you'll see that that is the vandalism that prompted Hippietrail's initial inquiry. --Connel MacKenzie 06:48, 13 November 2007 (UTC)[reply]
[belated reply to Connel four paragraphs up]
I haven't followed the debate between you and Ruakh, so if you're sure he's completely wrong, I'll take your word for it, and retract 1/3 of my complaint. But the other two thirds stand: I just don't understand how/why you're still saying things like "the misspelled entry façade" and "The trolls that wish to promote façade as a valid spelling". It's not misspelled; it is a valid spelling -- in fact it's the spelling *I* use, and I'm an Amurrican, too. —scs 03:10, 13 November 2007 (UTC)[reply]
But we have plenty of entries that say simply "alternative spelling of" and link to a different entry. If you think that's a bad idea, that's your right, and you can say so (that's what this discussion was supposed to be about, anyway); but don't go around pretending that your way is one of “the most-basic principles here”, nor that it's a necessary consequence of the other things you mentioned. (And bringing up an unrelated discussion — one where you everyone else disagreed with you and you ended up resorting to trolling — wins you no points.) —RuakhTALK 22:32, 10 November 2007 (UTC)[reply]
Your continued personal attacks here (again, in an attempt to sidestep the issues) are not productive. You trolled with that other discussion. Your rhetoric has lots of fans here, whom you were able to sway by being ruthless in your argumentation. But your premise remains wrong on that topic. As far as the very well established principle of not removing content, WTF are you talking about? Existence of entries that use the stop-gap template "alternative spellings of" does not preclude actual definitions from being entered. It never has; any exceptions to that have either been missed or rolled back. --Connel MacKenzie 17:20, 12 November 2007 (UTC)[reply]

Maybe this is simplictic, but I think a good common sense answer is to list both (or all) spellings as seperate words, with each entry saying alternate spelling of:xxx.; and then the definition (each entry having the same def. unless an alternate spelling can have a seperate def.) If particular entries are typically regional, that can be mentioned as well. Just my thoughts. sewnmouthsecret 21:27, 12 November 2007 (UTC)[reply]

my own conclusion

This facade/façade debate has gone on for far too long, and I'm sure everyone is more than sick of it, but for what it's worth, here's my own take on the matter (formed after I spent 'way too much time arguing this topic a year or so ago).

For contentious dual-spelled words -- the canonical example is of course color/colour -- it turns out there's really only one solution. It's not the best solution, in fact it's almost the worst solution, but it's the only one everyone can (barely) agree on and tolerate; every other solution is, in at least one person's mind who matters, even worse.

On today's Wiktionary, using today's Mediawiki code, and with Wiktionary's current crop of editors, you have to have two completely separate entries, one for color and one for colour. The spelling is different, the etymology is different, the definitions are all different, so you might as well give up and have two separate entries. There have been various valiant efforts to salvage some remnants of commonality by transcluding a certain amount of shared content via templates, but for the most part it ends up not being worth the nuisance. (Thankfully, it looks like for color/colour the translations, at least, are shared.) Maintaining two separate-but-equal pages is a nuisance, too, it's true, but in the end it's far less work than arguing about it over and over and over and over and over and over again.

I'm not saying we have to have two completely different separate-but-equal pages for facade and façade, but it could happen. (Perhaps it already has.) The problem with "alternative spelling" links (and all variants thereon) is that no matter how fervently some of us wish that those links could and should be construed as casting no negative connotations whatsoever on the merely linked-to spelling(s), there are enough people who imagine they do (or who imagine there are people who imagine they do) that the links can't stand forever; eventually we have to have two parallel pages, so that no one's favorite spelling gets even the appearance of being slighted in any way. —scs 03:38, 13 November 2007 (UTC)[reply]

I don't think anyone would object to having a full entry on both pages. No one has ever argued that facade is not valid; the problem has been that Connel doesn't think façade is valid. Widsith 10:18, 13 November 2007 (UTC)[reply]
Widsith, that is blatantly untrue. Not only is the vandalism mentioned in the very first link of all of the above in direct conflict with your assertion, the freakish next several responses in support of that vandalism all suggest crippling the proper spelling facade (obviously, in direct conflict with long-established Wiktionary practice of treating the terms equally.) The most recent posts from those same antagonizers suggest only that they still support only the incorrect "soft redirect" POV methods. If behavior like that persists, perhaps I should just start systematically deleting terms entered with diacritics? That would certainly be more efficient than trying to discuss it. --Connel MacKenzie 16:16, 13 November 2007 (UTC)[reply]
Connel, are you really now issuing threats?! Rod (A. Smith) 16:25, 13 November 2007 (UTC)[reply]
What is a threat about what I wrote? There is an active POV contingent adding diacritics to spellings incorrectly - a simple solution would be to eliminate those spellings. Or, perhaps we could return to the discussion of how to accurately represent all spellings that are used, indicating primacy where possible (i.e. web search comparisons of "façade -facade" vs. "facade -façade") otherwise entering equivalent, cross-referenced items. But the vandalism that Hippietrail noticed cannot be permitted. Suggesting that that sort of vandalism ever had justification is a bit ridiculous. --Connel MacKenzie 17:09, 13 November 2007 (UTC)[reply]
Just so you know, though, the only reason I got into this discussion was that my eye was consistently caught by what looked like a whole lot of inappropriate POV pushing by you -- i.e. that the spelling "façade" is not only wrong but offensively so, and that anyone asserting validity of that spelling was a vandal or a troll. —scs 18:57, 13 November 2007 (UTC)[reply]
The initial vandalism itself wasn't caught as vandalism. When brought to people's attention by Hippietrail, it (the vandalism) was claimed (absurdly) as a preferred method. When I pointed out that error, I got vitriolic personal attacks (that were edited/removed after I replied.) And this one example is not an isolated error. Do I think those tactics were malicious and intentional? Absolutely. --Connel MacKenzie 19:44, 13 November 2007 (UTC)[reply]
Connel: “If behavior like that persists, perhaps I should just start systematically deleting terms entered with diacritics?” See our entry for the word (deprecated template usage) threat. Rod (A. Smith) 19:02, 13 November 2007 (UTC)[reply]
What I suggested is not "retribution," rather, it is just cleanup. (Apparently, that is too controversial to seriously pursue.) --Connel MacKenzie 19:44, 13 November 2007 (UTC)[reply]
Perhaps, Widsith, you too misinterpreted all the above comments, due to the curious troll/bowdlerize/troll patterns used above to retroactively change their numerous insults? --Connel MacKenzie 16:18, 13 November 2007 (UTC)[reply]
Oh really? Maybe. I thought that was your position. Widsith 16:36, 13 November 2007 (UTC)[reply]
My position is that we should have an entry for façade but it should be indicated as a less-common/rare/stilted variant of facade, somehow. --Connel MacKenzie 17:09, 13 November 2007 (UTC) To clarify further: I mean as a usage note, not as a "soft redirect." --Connel MacKenzie 17:10, 13 November 2007 (UTC)[reply]
Note that the belief that "façade" is rare or stilted or a variant is a POV, too. (It certainly differs from mine.) —scs 18:57, 13 November 2007 (UTC)[reply]
That's why I provided evidence, above. I've also seen some indications (MS word, claims of OED listing) that the façade spelling is sometimes considered valid, so despite my gut reaction, I do accept it as an alternate. But a strict numeric analysis (choose your own tools: web search, concordance, whatnot) shows an overwhelming preference for the "normal" normalized spelling. Calling it "stilted" is me grasping for words, as I'm not sure how best to describe it. But that fits fairly well. --Connel MacKenzie 19:44, 13 November 2007 (UTC)[reply]
It is a shame that the software does not allow "transclusion" (if I am using the term properly) (with pari passu apelling adjustments) of the parts of entries that would not differ by their orthography. Alternate spellings seems to imply equality of status. Usage notes can convey nuance, but will not be noticed in long entries like "facade-façade" (copied, not entered). Descriptively, "façade" seems to be used in part as an hommage to the language that deserves credit for the great refinement it achieved at an earlier date than many other languages. DCDuring 17:25, 13 November 2007 (UTC)[reply]
The software does allow for that, but that prevents the entries from diverging. For example, the contents of facade would be {{:facade-façade|facade|façade}} and the contents of façade would be {{:facade-façade|façade|facade}}. The entry at facade-façade would use {{{1}}} and {{{2}}} where appropriate. --Connel MacKenzie 19:06, 13 November 2007 (UTC)[reply]
Just so I understand, either:
  • the transwikification is total or
  • there is no transwikification and all parts the entries can diverge without limit.
That it, there is no such thing in exisitng software as partial transwikification that would allow any divergence of selected headings or lines while maintaining pari passu identity of the rest. DCDuring 20:29, 13 November 2007 (UTC)[reply]
It's possible to allow conditionals in the shared content, but it would make editing the shared content overly complicated. Mike Dillon 05:00, 14 November 2007 (UTC)[reply]
Slightly simpler (but still somewhat complicated) is the translation sharing of color / colour. --Connel MacKenzie 07:47, 14 November 2007 (UTC)[reply]
Would it be possible to protect the shared portion of one definition (the one with complicated code) from change by editors? The idea would be four zones. Assuming "colour" is 'simple' and "color" has the complicated, protected code section:
  • "Color" would have two sections:
    • an editable section that allowed for durable divergence between it and "Colour"
    • an editable section that would be transwikified into "Colour" periodically
  • "Colour" would have two sections:
    • an editable section that allowed for divergence and
    • a non-editable, protected section that could not be altered except by a transwikification process that was periodically initiated whereby it would "get" the altered core section from "colour".
I am saying this from a position of blissful ignorance of the feasibility or difficulty of implementing anything like this and even of the number of applications it would have. DCDuring 11:57, 14 November 2007 (UTC)[reply]
It isn't possible to protect just a section of the page. My general opinion is that anything involving a bunch of subpages and complicated rules as to which content goes where raises the bar for non-technical editors too much. One of the great things about Wiktionary is that it's reasonably approachable for new editors by just looking at existing entries. I don't even think one level of transclusion is good, but for well-developed entries like color/colour, I guess it's better than constant bickering. Mike Dillon 04:47, 15 November 2007 (UTC)[reply]

Sister projects

There is still (WT:VOTE) a proposed vote on the display of sister projects in articles which appears to have been abandoned. Does this leave a decision for the future or had a consensus been reached? I am uncertain which method to use to indicate sister projects: display box (see Isaiah) or ====Further reading==== (see Proverbs) ? —SaltmarshTalk 15:55, 6 November 2007 (UTC)[reply]

There is currently no policy on this, either way. Wiktionary:Votes/2007-06/Wikipedia box template does seem to indicate to me that there is very little love for boxes though. I'd like to get back to work on the alternative some time, but I just got distracted and busy. If you ask me, {{projectlinks}} in a "See also" (which makes more sense to me than external links for sister projects) is the best option available. Dmcdevit·t 16:23, 6 November 2007 (UTC)[reply]
Thanks for that —SaltmarshTalk 06:14, 7 November 2007 (UTC)[reply]
Please don't use {{projectlinks}}! It is very poorly written, and the simplest call expands to hundreds of parser functions and a huge amount of text parsed only to be discarded. Use {{pedialite}} for WP, other things will need work. Robert Ullmann 06:29, 7 November 2007 (UTC)[reply]
Er, so? Are you saying this will cause problems for the servers, or speed problems for the viewers? If the former, I hardly think it's something we have to worry about. The amount of uses this has is miniscule compared to other strains on the server (e.g., that other project with 2 million articles). If it's something that will slow browsers, only then should we worry about it, but I haven't noticed any issues. Dmcdevit·t 07:07, 7 November 2007 (UTC)[reply]
Sure, if it was doing something productive. But parsing 300 lines of template code for no effect is just wasting CPU. We have a lot, but none to waste. And it is not a good tradeoff: using {projectlinks} is harder than just using whatever set of xxlite templates you want. (why should someone have to figure out "lang4=" when they can just put "lang=" on the desired template?) Robert Ullmann 01:17, 9 November 2007 (UTC)[reply]
That makes no sense. Worrying about load is what we have developers for. I'm not being dismissive here. The concept that users should not worry about server load when they need something to be done is an established one on the Wikimedia projects, and one which was suggested by the developers. If something is truly an issue, they'll fix it, or prevent it with technical means. The vast majority of all sister project template uses are simple links with no parameters at all. I don't think your feelings about the templates opaqueness are really an issue, but even so, that's a far cry from "don't use it!" Projectlinks has advantages that outweigh the hardship of typing "{projectlinks|pedia}" most of the time. Dmcdevit·t 06:44, 9 November 2007 (UTC)[reply]
You don't get it: the way the template language works, it parses all branches of all switches and conditionals and template calls. So {{projectlinks}} generates 300 lines of parsing every single time. Brion would shoot it on sight. The "technical means" are what we should and must do: get rid of it. (I'm sorry, I know you wrote it, but it just isn't usable.) Robert Ullmann 08:14, 9 November 2007 (UTC)[reply]
E.g. use {{sourcelite}} Robert Ullmann 06:31, 7 November 2007 (UTC)[reply]

Request for bot flag: DerbethBot

I wish to start using my bot - DerbethBot - to add pronunciation files from Commons to entries here. I have already used it to add IPA pronunciation here. Bot is needed because lately there have been very many pronunciation files added to Commons and as I see, English Wiktionary is not doing much to utilize them. At the moment I am will be uploading pronunciation found on Shtooka project for Czech, French, Chinese and German - 4000 words each.

Bot is written in Perl and using Perlwikipedia framework. It's completely automatic.

I plan to use the bot occasionaly - every three or four months. The same bot (with a modified entry parsing part) will be running on Polish and German Wiktionary. --Derbeth talk 20:58, 6 November 2007 (UTC)[reply]

Does the bot check first to see whether we have an entry in that language before adding the pronunciation? --EncycloPetey 03:40, 7 November 2007 (UTC)[reply]
Wasn't there a licensing conflict with the original Shtooka repositories? Dvortygirl had to dual-license everything I uploaded via User:Dvortybot. I'd be very cautious in that regard, before getting too far ahead of yourself.
That said, I haven't been able to recover some of the crucial components for Dvortybot, since my last hard-drive crash. (Transwiki's are back in action as of today, though, so not all was lost.) In general, I wish to say SUPPORT. But please reconfirm the licensing issue. It is very refreshing to see someone else making progress, albeit from a completely different (technical) starting point.
Lastly, is your code posted for review anywhere? GPL?
--Connel MacKenzie 05:32, 7 November 2007 (UTC)[reply]

Of course, the bot is changing only applicable language section and it checks whether there already is any pronunciation file there (by looking for {{audio}} template; if you use any other methods of linking to Ogg files, please let me know). I'm going to do testing on a database dump on my computer and also use the dump to filter out entries which already have audio files (to avoid raising server load).

I don't see any problems with Shtooka licensing; I just download packs of files from donwload page; there's clear Creative Commons declaration for each package (README). I'll also send an e-mail to the website owner to let him know what I am doing. And yes, I'll post my source code, perhaps even today if I find some time. --Derbeth talk 16:18, 7 November 2007 (UTC)[reply]

Tagalog? What template? ;-) You mean {{audio}}? ({{tl}} is Tagalog, you wanted to write {{temp|audio}}. Would you be adding the Pronunciation section if there is none at all? If so, be aware that you will need to handle the case of more than one etymology properly. Robert Ullmann 18:37, 7 November 2007 (UTC)[reply]

Sure, otherwise the whole issue won't have any sense. I add Pronunciation section when it does not exists. When Etymology section is present, I add the new section just below the first Etymology section.

I have finished the bot and done excessive testing on my local copies of English, German and Polish Wiktionary. Everything seems to work fine. You can see the code here. It's under MIT license. --Derbeth talk 01:48, 10 November 2007 (UTC)[reply]

That's a very interesting approach. (Just dumping in after the 1st etym.) I'd like to see how it works in practice. For Dvortybot, I had to use exception lists - which is about as undesirable as putting them in the wrong place. Can you run your bot on 10 or 20 or 30 or 100 entries, please, then start a WT:VOTE. I wonder if the ones you add after the 1st etym. (when there are multiple) could have a Robotic audio pronunciation placement unverified cleanup category of some sort. That might turn out to be an excellent way to finesse the issue, i.e. how to add image:En-us-lead-past.ogg (noun) and image:En-us-lead-present.ogg (verb) - just add both to the first pron section in the first etym., with the cleanup tag. (That's a bad example - it looks like lead has been mangled beyond belief, right now. But it already has audio links.) --Connel MacKenzie 18:07, 10 November 2007 (UTC)[reply]
That's why I asked: if there is one Etymology section, it should go after it; but if there is more than one (if the first one is Etymology 1, it (a new pronunciation section) should go above/before the first etymology section). ScsRhymeBot got this "wrong" (although it wasn't as clearly defined then) and there are still entries to be fixed. Robert Ullmann 12:27, 14 November 2007 (UTC)[reply]

Unfortunately my bot would not accept these two example files (or, more exactly, won't match them against any existing entry so they won't be used). There are so many different standards of naming different pronunciation with the same writing, that it would be very hard to handle all of them. I think it's better not to add such files automatically rather than risk to mess something up.

I have made the test run, you can see the results. If you don't like the edit summary or want the bot to do some simple cosmetics on edited enties, please let me know. --Derbeth talk 00:29, 11 November 2007 (UTC)[reply]

I think storm is an excellent example. Support. Again, please start a WT:VOTE so it can be official. --Connel MacKenzie 16:49, 12 November 2007 (UTC)[reply]
Wiktionary:Votes/bt-2007-11/User:DerbethBot for bot status. --Derbeth talk 16:55, 12 November 2007 (UTC)[reply]

Transwiki on RFDO?

Should Transwiki articles be on RFD instead of RFDO, where most of them wind up now? After all, they are entries, rather than templates, categories, appendices, or cetera.—msh210 21:05, 6 November 2007 (UTC)[reply]

These are articles that just have yet to be vetted. I would say they go in the main RfD, if it really matters. Dmcdevit·t 00:34, 7 November 2007 (UTC)[reply]
Historically, they usually end up on RFDO, but there's no rule about where they go since (ideally) they should either be deleted or become a regular entry in most cases. --EncycloPetey 03:38, 7 November 2007 (UTC)[reply]
Maybe. One of the benefits of having all other namespaces on RFDO was the simple reduction in volume on RFD. Offhand, it seems like a very simple distinction: main namespace or non-main namespace. If that changes, please announce it prominently for a while - I promise I will forget which is which, the first few times I see it. --Connel MacKenzie 05:40, 7 November 2007 (UTC)[reply]
Word, on all counts. —RuakhTALK 06:57, 7 November 2007 (UTC)[reply]
Vote yes, the Transwiki RFD's should move to RFD with other words. That is where people go to review words, whether or not they are merged to main pages yet. The benefits are it would help to cull out bad words before they are really merged. Goldenrowley 01:45, 8 November 2007 (UTC)[reply]
There's a pretty simple change in Template:rfd that could facilitate this. DAVilla 03:40, 12 November 2007 (UTC)[reply]
Whatever you decide, please announce it somewhere. I imagine it will be a moderate pain to re-list all the current WT:RFDO "Transwiki:" entries that are on WT:RFD. And using the ParserFunctions #switch syntax should never be described as "a pretty simple change." But if you all agree that you really want to do that... --Connel MacKenzie 16:40, 12 November 2007 (UTC)[reply]
I never meant to relist listed items: I was referring to future listings.—msh210 14:33, 13 November 2007 (UTC)[reply]
Then how does one navigate from an entry to WT:RFD#PAGENAME? Moving them from RFDO to RFD would also give a quick indication of how much volume to expect, for future entries. Perhaps. --Connel MacKenzie 19:01, 13 November 2007 (UTC)[reply]
I wonder if imported pages will be in "transwiki" namespace forever while they have full edit history. I also have a very great news. I just tested adding myself importing right here through Meta and importing function would be expanded to allow importing through local disks. We should select a place to allow users, especially admins, to request expanded importing rights here. Then requests approved here will be sent to Meta for activation. Thanks to those who supported my steward election with 73-1-4-99%.--Jusjih 04:21, 24 December 2007 (UTC) (admin here and new Meta steward)[reply]

zh-hans vs. zh-cn

A message was posted at Template talk:zh-forms. Opinions? -- A-cai 23:09, 6 November 2007 (UTC)[reply]

We need an admin to edit the protected page. Do a template exists to request this change, like the w:template:editprotected of the english Wikipedia. Thanks! surueña 11:49, 19 November 2007 (UTC)[reply]
zh-hans is wrong it should be zh-Hans. The code for ISO 15924 codes start with a capital, it indicates simplified Chinese. zn-cn is ambiguous because it implies Chinese (and this can be any form of Chinese including Wuu and Min Nan among many others. For purposes of dictionaries both codes are utterly useless in a Wiktionary context. GerardM 10:18, 21 November 2007 (UTC)[reply]
We know that ;-) We do use the codes inside script templates, in HTML font spans; I changed this one to zh-Hant/zh-Hans. Robert Ullmann 11:00, 21 November 2007 (UTC)[reply]

Video display standard codes

In RfD one of the 30-or-so codes for video display standards came up. They are things like VGA, WUXSGA, QQVGA. Does this kind of thing meet CFI (always assuming that it could pass RfV)? It is certainly no fun to go through the RfV process if it won't meet the rest of CFI. At least they are not primarily trademarks, even though some or all of them may have trademark status.

Has this or something like it been discussed before? Would these things be acceptable? DCDuring 23:23, 7 November 2007 (UTC)[reply]

They should be fine, they are initialisms for technical standards someone may very well want to look up. The only dubious case would be one that is used as a trademark by a single vendor, not an industry standard or convention. Robert Ullmann 07:33, 8 November 2007 (UTC)[reply]
PALplus and WUQSXGA are in RfD now. I've cited PALPlus. The newer ones, like WUQSXGA, are hard to cite, but may be more useful. DCDuring 16:51, 8 November 2007 (UTC)[reply]

Using ISO language templates in translation tables

I would imagine this has been discussed here before, but in case it hasn't: Have we ever considered following the examples of several other wiktionaries that use the ISO language code templates in their translation tables? for example....

*{{fr}}: {{t|fr|chaleur}}
*French: chaleur

It would certainly be easier for people who add a lot to these tables using Wikipedia links and things like that. — [ ric | opiaterein ] — 04:43, 8 November 2007 (UTC)[reply]

People can do it if they want; it'll just get subst:'d. Is there some advantage to leaving them be? —RuakhTALK 05:20, 8 November 2007 (UTC)[reply]
We don't do that for several reasons, probably most importantly that they are alphabetized by language name, not by code (very common in the wikts that use codes), if you don't know the codes for the "nearby" languages it is harder to get correct in the edit window. Also, many people do not know the codes for less common languages; it would be helpful if they knew at least the code for what they are adding, but often not. Some of the languages used in tables are not coded, and some of them don't have the code templates yet. We have many more languages in translations tables than most wikts. (see User:Robert Ullmann/Trans languages)
There is no particular reason to use {t} (which needs the code anyway) beyond the 170 languages that have wikts. (If a wikt is added, Tbot will automatically convert them. ;-) So the code isn't needed anywhere on the line.
It is relatively harmless now if someone uses the code template, AF will replace it sooner or later. But the intent is that they subst: them when they use them. So "people adding a lot" etc. should just be routinely subst'ing them. Robert Ullmann 07:26, 8 November 2007 (UTC)[reply]
Lol Robert, that's what the previsualization is for. You put in the codes and edit everything to fit the translation table, hit preview and then organize them alphabetically. I've learned a lot of them, so even that's becoming less necessary and I can start organizing them immediately.
I've never seen 170 languages in one translation table... lol.
The thing is though, I don't see why they would need to be changed from the language code templates to the full names. The display is the same, and when you're editting, things line up better and are easier to see. If you don't know what language a code is, you can use the previsualization to see it (or even open a new tab/window and go to the template to see.) The French, Catalan, Romanian and Hungarian Wiktionaries use the ISO codes in their translation tables. Why not here? — [ ric | opiaterein ] — 14:01, 8 November 2007 (UTC)[reply]
You understand that, and that's cool. But unless you are willing to spend a major fraction of the next few years of your life explaining it to all the infrequent contributors and IP-anons that add most of the translations, it needs to be obvious that the tables are alpha by language name rather than code. Anyone adding a language knows the name (even if not quite the standard form); if we used codes, we would seriously discourage contributors who don't know the codes. (and would think they are required) And see the translations at butterfly and iron. Robert Ullmann 14:24, 8 November 2007 (UTC)[reply]
There are some translation tables that have grown very large. See butterfly, e.g. We long ago decided not to use the codes, in part because of the reasons above, and in part because transcluded items (such as templates) enormously slow down the server. Before rendering the page, all the content of all the transcluded templates is examined and dealt with. So, lots of transcluded templates whose sole purpose is to add one text word is a Bad IdeaTM. --EncycloPetey 14:28, 8 November 2007 (UTC)[reply]
Will AF order translations in a table alphabetically? (And if not, then can it?)—msh210 18:35, 8 November 2007 (UTC)[reply]
One of the things on my to-do list is to do some restructuring in the middle part of AF, which has become lengthy as it parses through lines. I intend to be able to break out each table and have a routine that will format it, then it can sort and re-balance columns. Note that it can't just sort lines, there are frequent uses of subtext. (Cyrillic/Roman, many other things, none of them described anywhere yet ;-) This isn't hard to do, just not there yet. Robert Ullmann 01:10, 9 November 2007 (UTC)[reply]
This is off topic, but the Serbian Cyrillic/Roman translations should probably just be brought back up onto the "* Serbian: " line. Rod (A. Smith) 01:16, 9 November 2007 (UTC)[reply]
Agreed. Now, it's not a transliteration, so it shouldn't go in parentheses. But there could be a slash between Cyrillic and Roman, just as there is between traditional and simplified Chinese.
I've used the subtext for dialect (a.k.a. region as it's usually spelled out). Parens are okay when it's just one or two, but sometimes there's a whole slew of them.
Subtext has been used for "dialects" of Chinese, but I feel pretty strongly that anything with its own "language" header should get its own * translation line. DAVilla 03:32, 12 November 2007 (UTC)[reply]

WOTD Icon

This one

Hi there, I started a vicious stabbing cycle on IRC by using the (infamous) tile icon at Wiktionary:Main_Page/Redesign 2007 for the new word of the day template. Is this a complete faux pas - I can't see the problem with it (obviously). Can someone find a better icon to use - or persuade the tile-burners to extinguish their blowtorches? Conrad.Irwin 01:03, 10 November 2007 (UTC)[reply]

I like it. Rod (A. Smith) 01:34, 10 November 2007 (UTC)[reply]
I dislike the Scrabble-style icons, and a numbber of other regular editors here despise them as well. Part of the reason is historical - the infamous tile icon redesign on Meta, where a new project icon was voted in for us by people who don't even contribute to the wiktionaries. Personally, I dislike them because it makes a connotative connection to a specific board game, which has nothing whatsoever to do with the function of a dictionary. I have been giving some thought to what icon might be suitable, but haven't thought of one yet. An open book is too generic for the use you want to put it to (the WOTD). A quill and inkpot might work, but that's a little archaic and also doesn't really say "here's an entry being featured today." --EncycloPetey 01:34, 10 November 2007 (UTC)[reply]
Not to restart an old debate, but as an opposing view: that redesign on Meta was not secret; it was hugely publicized here, and anyone was welcome to participate. Plenty of en.wikt regulars did. I never understood why there was a core of "other regular editors" who despised the new logo so strongly, or who asserted (wrongly, IMO) that the new logo was somehow "thrust upon us". It never appeared to me that the redesign was "infamous", or that it had widespread "very stern opposition here" (as Connel suggested earlier). —scs 23:42, 10 November 2007 (UTC)[reply]
My publicity of it can hardly be considered "hugely." The three Wiktionarians that participated (myself included) were swept aside during their bizarre voting phases. (Some of the votes and voting phases were started before announcements about a logo redesign project were even out...all the exiting FL Wiktionaries that had cool alternate logos, like de:, were excluded before being notified.) Yes, later, all were welcome to participate, but almost no one did. At the first mention of adopting it here, it was rejected (with rather harsh words) then again the next time, then again the next time. It is no surprise that it is being raised from the dead again.
Take a look at the Wikipedia logo. The fusion of a globe a the old-style print-head was a unique, cool idea. Take a look at the scrabble pieces. It isn't unique, it isn't cool and it emphasizes games? How does that convey "DICTIONARY" to you? I suppose I should be happy that they didn't try an animated Rubik's Cube of letters. --Connel MacKenzie 00:27, 11 November 2007 (UTC)[reply]
As I said, I'm not trying to re-start the debate. (For which reason I'm tempted to <s> your second paragraph!) I do believe, though, that there were more than three of us participating, and in any case, you can't say all our inputs were swept aside, because mine sure weren't. (I also thought, but maybe this is just me, that all those who (indeed) didn't participate kinda lost their right to complain afterwards.) —scs 00:45, 11 November 2007 (UTC)[reply]

I am not trying to use the tile as a logo, <flame>though I do prefer it to the one we currently have</flame>, I want to find an icon that can be used to indicate Word of the day; the tile icon seems to fit quite nicely, mainly because it is a nice clear and simple shape, and as an added bonus it has connotations implying words. I feel that leaving even a hint of tile around is going to cause irritation to some, so I would like to find an alternative. Is anyone here graphical enough to make one, or is there one lurking in a shadowy corner of the Commons? (See Wiktionary:Main Page/Redesign 2007 for the context of this discussion). Conrad.Irwin 14:41, 11 November 2007 (UTC)[reply]

Would be an acceptable alternative? Conrad.Irwin 11:34, 12 November 2007 (UTC)[reply]

How is the picture on the right? Here is the same with white oval background if the star doesn't show on some backgrounds, but I managed to get my background grey and it looked quite ok. Best regards Rhanyeia 14:53, 12 November 2007 (UTC)[reply]

I like it and have put it at Wiktionary:Main Page/Redesign 2007 for the moment, though further comment would be appreciated. I am not sure if we should maybe seek something slightly simpler. Conrad.Irwin 15:28, 12 November 2007 (UTC)[reply]
The design looks good; I think it might need slightly bolder colors and stronger contrast to work as an icon, though. --EncycloPetey 15:49, 12 November 2007 (UTC)[reply]
It looks flat to me. I think it should look more "dynamic" or something. I don't like the colors used for the borders of the Word of the Day and Behind the Scenes boxes, and how the right edges of those boxes don't line up with the Wiktionary box above.
"578,606 words with English definitions from 401 languages" <- That's technically a lie lol — [ ric | opiaterein ] — 16:12, 12 November 2007 (UTC)[reply]
I agree. The most accurate numbers (as of the last XML dump) we have are at WT:STATS#Detail. "8,266,025 entries" would be correct. And counting languages only if they have more than 10 definitions, seems more reasonable. --Connel MacKenzie 16:24, 12 November 2007 (UTC)[reply]
Wow, User:Rhanyeia, the star with the parchment and quill, is magnificent. --Connel MacKenzie 16:24, 12 November 2007 (UTC)[reply]
I agree. Wow. :-) —RuakhTALK 16:31, 12 November 2007 (UTC)[reply]
I'm sorry that sharing an aesthetic opinion with you, offends you.  :-)   But that logo is really nice. --Connel MacKenzie 18:02, 12 November 2007 (UTC)[reply]
Sorry, I meant for my "wow" to agree with your "wow", not to express shock/offense at our agreeing, heh. :-) —RuakhTALK 20:18, 12 November 2007 (UTC)[reply]

Thank you for your great comments. :) I don't know now should I keep the colors of the picture or still try something else. :) I have one with a yellow star and some slight change, if someone wants to see it let me know and I'll upload it and link it from here. Best regards Rhanyeia 16:20, 13 November 2007 (UTC)[reply]

Per discussion above I have corrected the wording of the statistics, lined up the right hand border, and tried to make the borders slightly more defined. I would appreciate comments on the differences between [1] and [2]. (And would love it even more if someone who can actually do design would edit Wiktionary:Main Page/Redesign 2007) Conrad.Irwin 00:54, 13 November 2007 (UTC)[reply]

I experimented with the border colors a bit. :) Best regards Rhanyeia 16:49, 13 November 2007 (UTC)[reply]
I tried it with slightly darker colours, I think that then works better with the borders matching the boxes. Conrad.Irwin 20:28, 16 November 2007 (UTC)[reply]
I like the overall design a lot, and I think the top box looks good. :) But I think pink and turquoise don't look very good together on that page. Best regards Rhanyeia 10:06, 18 November 2007 (UTC)[reply]

Commit Changes

When this Main page redesign started it was suggested that we could aim to get a new main page for Wiktionary Day. That is in just under a months time, and as votes (which I think we should have) take between a month and two weeks, I think we should start voting for it now. Are there any more changes that need to be made before this happens? Conrad.Irwin 20:28, 16 November 2007 (UTC)[reply]

I tried something again, I hope for a bit more time before voting. :) Best regards Rhanyeia 09:49, 18 November 2007 (UTC)[reply]
Your last edit looks good and as for me I think the page is ok. Best regards Rhanyeia 20:29, 19 November 2007 (UTC)[reply]

As a side effect of a discussion on the pronunciation markup guide for Greek (Wiktionary talk:About Greek/Pronunciation), User:Gilgamesh found (and fixed) some things in the general pronunciation key that were not in compliance with the IPA specs, for example the use of g (regular roman letter) instead of ɡ (IPA letter). Unfortunately the letters in the guide have been used in most of our transcriptions; I did a count for example and found > 9000 entries with g (roman letter) and only 3-400 with ɡ (IPA letter). Worse yet this chart has been copied to other wikts and undoubtedly so have our IPA transcriptions. Any thoughts about how we can clean up this mess? ArielGlenn 09:30, 10 November 2007 (UTC)[reply]

Well, I think that we should stick to the standard - it will make things easier in the long run. Is there a reasonably comprehensive list of the characters that are used incorrectly? In which case it should just be a matter of loosing a bot onto them. Otherwise, it will have to be done the hard way - though it is probably not time-critical so that isn't too much of a problem. Conrad.Irwin 10:47, 10 November 2007 (UTC)[reply]
It would certainly help to provide a shortcut to ɡ in the IPA panel below Edit boxes. In fact, it might not be a bad idea to throw all the IPA symbols in in a logical table-style order, even if they already exist in ASCII, e.g. m p b ɸ β ʙ ʘ ɱ f v ʋ ѵ n t d ɗ θ ð t͡s s ˢ d͡z z t͡ɕ ɕ d͡ʑ ʑ t͡ʃ ʃ d͡ʒ ʒ ɹ ʴ r ʳ ɾ t͡ɬ ɬ d͡ɮ ɮ l ɫ ˡ ɺ ǀ ǃ replace ѵ with ⱱ, invalid IPA characters (ѵ) etc. Also, wouldn't it be better ultimately for the {{IPA}} template not to specifically prescribe fonts in its source? It should just use the .IPA style class like the Wikipedia Template:IPA, and individual registered users can edit their account style sheets (User:Name/monobook.css) to use a font of their choice if they like.
What I really miss from the IPA edit box are the diacritics such as the dental [ ̪] and lowered [ ̞] marks.--Jyril 12:56, 10 November 2007 (UTC)[reply]
Apart from making identifying pages to be changed by bot, which could be useful, something like w:User:Mike Dillon/Scripts/highlightNonIPA.js might be helpful. It highlights non-IPA characters inside of span elements with the class "IPA" (and the specific class is configurable). This allows editors to plainly see bad IPA characters when they go to a page to edit or preview, letting them fix them as they're found. It was written for Wikipedia, but I could probably adapt it for use here pretty easily. Mike Dillon 15:28, 10 November 2007 (UTC)[reply]
I just tried the script out and it works fine here. The only requirement is that it's currently dependent on w:User:Mike Dillon/Scripts/i18n.js, but it's not getting a lot of value from that a script and could easily be changed into a standalone script. Mike Dillon 15:46, 10 November 2007 (UTC)[reply]
Would it change all non-IPA characters to the IPA ones? like a to ɑ etc? — [ ric | opiaterein ] — 16:26, 10 November 2007 (UTC)[reply]
It shouldn't change a to ɑ, since those are different IPA characters for two different sounds. --EncycloPetey 17:08, 10 November 2007 (UTC)[reply]
That script doesn't do that, it just highlights non-IPA chars. I wouldn't actually recommend an auto-converting script, since there are only a couple cases that can be done unambiguously (e.g. g replace g with ɡ, invalid IPA characters (g) to ɡ). This wouldn't work for something like "a" to "ɑ" since "a" is actually the valid IPA for "open front unrounded vowel" (which means it wouldn't be highlighted either). Mike Dillon 16:41, 10 November 2007 (UTC)[reply]
Thinking about it, it doesn't really make that big of a difference whether we use g or ɡ, especially when I keep seeing "r" used to represent what's actually the ɹ sound in English. :-( — [ ric | opiaterein ] — 19:04, 10 November 2007 (UTC)[reply]
Exactly. —scs 23:50, 10 November 2007 (UTC)[reply]
ɡ (IPA letter) is a square on my computer. RJFJR 21:08, 10 November 2007 (UTC)[reply]
Are both of these a square?
Mike Dillon 23:18, 10 November 2007 (UTC)[reply]
Yes. (I'm probably missing some font set loaded. I don't know what I'm missing, how many other people are missing it as well?) RJFJR 02:33, 11 November 2007 (UTC)[reply]
I'm not RJFJR, but as I mention below, I can see the "plain" ɡ in my copy of Firefox just fine, but the one using the template not at all. —scs 02:53, 13 November 2007 (UTC)[reply]
Are you using IE? There's an issue with every version of IE at least up to IE 5 (I don't know about IE 6 as I've never used it) where, if an HTML calls for a specific font, IE will display all the text in only that font, and will display text that that font doesn't support as an unencoded character, appearing in most fonts as a box. But when no font is called for, IE uses the default font, and if that font doesn't support a given character, it will comb the default fonts starting with the most canonical ones until it finds a font with the character, and will use the alternative font inline with the rest of the text. Firefox and other Gecko-based browsers, however, will always comb alternative fonts for displayable characters, whether or not a specified font supports it. And if a list of fonts are specified, then the display engine will go through that list sequentially first. If a character isn't in the first font in the list, it will go to the second font, and then the third font, and so forth, and then comb the installed fonts and such. This issue in IE has long been accepted as one of the ways the browser is broken, and a perennial issue as Microsoft went for so many years not making any significant changes to IE—being the browser 99% of people used, they had zero practical incentive to fix any of its non-security-related bugs. In fact, they only felt compelled to start making changes again when Firefox became increasingly popular. - Gilgamesh 04:32, 11 November 2007 (UTC)[reply]

Perhaps I'm missing something, but does g replace g with ɡ, invalid IPA characters (g) mean something else in IPA? If not there doesn't really seem any point in making the distinction in font. Widsith 23:22, 10 November 2007 (UTC)[reply]

My question exactly. I know IPA says we're supposed to use 'ɡ' instead of 'g', but... why?!? What goes wrong if we use 'g'? —scs 23:50, 10 November 2007 (UTC)[reply]
Well, the appearance of <g> depends on font and context, with many fonts having a two-loop roman version and a one-loop italic version (see the table in the upper-right part of w:G if you can't picture what I mean); only the latter is the IPA symbol for a voiced velar plosive. Granted, the appearance of <ɡ> also depends on font (since some fonts render it as a box or a question mark), but that's more likely to improve over time. —RuakhTALK 00:41, 11 November 2007 (UTC)[reply]
Yes, I know all that, but again: what goes wrong if we use 'g'?
(Ironically, not only can the appearance of ɡ vary also, but on the computer I'm using, ɡ has a loop tail and g has an open one, which is backwards from the traditional explanation of the distinction between these symbols. That is, if you say it's important to use ɡ so that everyone will see it with an open tail, as opposed to the "wrong" closed-tail typographical g, I, for one, will see a closed tail anyway...) —scs 00:52, 11 November 2007 (UTC)[reply]
Well, that is really an issue of your display configuration (fonts, system, whatever), but standards encode IPA [ɡ] as the equivilent of &#X261;. There's something of an expression. 'Fix the system, don't "fix" the text.' Adhering to standards is most appropriate. - Gilgamesh 04:25, 11 November 2007 (UTC)[reply]
Of course it's an issue of my display configuration! That's why that part of my argument was in parentheses; it's quite secondary.
The primary part of my question still stands. I'm all for adherence to standards, but not blind obeisance. One more time, why is it important to use ɡ? What goes wrong if we use g instead? (And I'm sorry, but in this case, "because the standard says so" is not a good enough answer. Why does the standard say so?) —scs 04:40, 11 November 2007 (UTC)[reply]
Because it appears more polished and professional. It improves the quality of the data. Maybe g replace g with ɡ, invalid IPA characters (g) is optional, but there ought to be no problem with changing any given g replace g with ɡ, invalid IPA characters (g) to a polished ɡ, and putting ɡ in the references only helps that. Besides, I also like the idea of bot-automated IPA polishing. - Gilgamesh 07:46, 11 November 2007 (UTC)[reply]

According to w:G, Generally, the two minuscule forms are interchangeable, but occasionally the difference has been exploited to make a contrast. The 1949 Principles of the International Phonetic Association recommends using for advanced voiced velar plosives and for regular ones where the two are contrasted, but this suggestion was never accepted by phoneticians in general, and today is the symbol used in the International Phonetic Alphabet, with acknowledged as an acceptable variant. So if it's an acceptable variant, there doesn't seem any point in changing them all in languages which don't distinguish between the two. Widsith 10:10, 11 November 2007 (UTC)[reply]

Thanks for confirming that. I knew that at some older (pre-Unicode) point in time, IPA had a barely-significant distinction between the two kinds of g's, but I didn't realize it was that long ago! —scs 02:50, 13 November 2007 (UTC)[reply]
The Handbook of the International Phonetic Association has a code table toward the back which provides a list of characters with IPA numbers, phonological descriptions and Unicode codepoints. The character used for voiced velar plosive in the charts is 110, "opentail g" which corresponds to U+0261. However, two rows down from that is 210, "looptail g", corresponding to U+0067. This is the ASCII-encoded g, but both entries have a note saying "Equivalent to [other number]". There are no qualifiers of any sort. —Leftmostcat 19:38, 11 November 2007 (UTC)[reply]
In that case, it seems like we should actually be replacing /ɡ/ (the special character) with /g/ (the universally-supported character). —RuakhTALK 00:26, 12 November 2007 (UTC)[reply]
I agree. --EncycloPetey 03:51, 12 November 2007 (UTC)[reply]
I disagree. Maybe it's okay to use g replace g with ɡ, invalid IPA characters (g) if there is automated bot-assisted replacement to ɡ. - Gilgamesh 06:19, 12 November 2007 (UTC)[reply]
There is no reason not to use "g", but there are reasons not to use U+0261 in final rendered text and more importantly, within wikitext. I cannot understand why you wish to assert the opposite, User:Gilgamesh. Any bot replacement should be from "ɡ" to "g" (neither of which render as loop-tailed, even with the non-standard/rare IPA font loaded.) --Connel MacKenzie 06:51, 12 November 2007 (UTC)[reply]
Because why should we always use g replace g with ɡ, invalid IPA characters (g) when nearly every polished professional source uses ɡ? Remember? "but this suggestion was never accepted by phoneticians in general, and today is the symbol used in the International Phonetic Alphabet" For the sake of ease of editing, we could input g replace g with ɡ, invalid IPA characters (g), but clearly ɡ is the polished and professional form in common use, and it seems logical to have bot-assisted substitution to aid in that. Why should Wiktionary intentionally do something that would make it appear less professionally academic? Since when is it Wiktionary's place to rewrite conventions for the wider academic world? Wiktionary should take cues from what is already considered well-formatted in the academic world, not the other way around. We have the technology. Let's use it. - Gilgamesh 08:21, 12 November 2007 (UTC)[reply]
Since when is a little square box "more professional looking?" --Connel MacKenzie 16:12, 12 November 2007 (UTC)[reply]
That's an IE CSS bug, and has remained very well known and unfixed for years because of apathy at Microsoft (during most of that time, IE was used by 99% of users, and Microsoft had no incentive to improve the browser even if bugs were well-known and caused nonstandard display behavior). You fix the browser, you don't "fix" the content. And to the best of my knowledge it appears perfectly fine in the other major modern browsers—Firefox, Safari, Konqueror, Opera, Camino... Besides, if there are square boxes, chances are you're seeing square boxes for other IPA characters too, which means you're either using IE and don't have the fonts installed that are specified in the template, or aren't using the {{IPA}} template where you should, in which case the IPA is not going to be useful to you. ɡ is part of the most commonly-used part of the Unicode IPA range, and will display equally as well as the other IPA symbols. Enclose all IPA texts with either {{IPA}} or {{IPAchar}}, and if the fonts are installed, even IE (at this time) should be able to display its contents. Arial Unicode MS and Lucida Sans Unicode (both currently in Template:IPA fonts) are two Microsoft fonts that support the IPA range, albeit with well-documented (but yet again still unfixed) bugs with some of its combining diacritic characters. There is also Code2000, the free and extensive serif font that accurately supports the entire IPA range and clearly distinguishes between g replace g with ɡ, invalid IPA characters (g) and ɡ. Another free font, Gentium (currently the first listed in Template:IPA fonts) also does this. Ultimately though, site behavior would allow all text to be viewed even in IE if absolutely no part of the website style prescribes fonts in its stylesheets at all—not in the default style, not in template styles, nothing. Then the site's font defaults to browser's defaults, and even IE is better conditioned to display a greater variety of characters in this circumstance. Wikipedia and Wiktionary IMO would be better to just not declare fonts (including any default Arial, Verdana, Tahoma, etc. in the website's default display), and let users' browsers display text according to the browser settings, and certainly IE wouldn't choke as often as it currently does. That could also make it easier for users to customize their account stylesheets at User:YourUserNameHere/monobook.css, which in some cases is difficult to configure because of hard-coded zealous font declarations in certain templates. - Gilgamesh 18:30, 12 November 2007 (UTC)[reply]
I didn't say I see a square box - I have to find other machines that I haven't used to see it. However, that remains the platform that most of our readers use. I ask again, what is professional looking about a little square box? --Connel MacKenzie 21:07, 12 November 2007 (UTC)[reply]

(cleared indentation) I am assuming, though it is mere conjecture, that those who do not have an IPA font installed will be unable to read any of the IPA characters, unless someone has taken the time to painstakingly create all of the characters except the ɡ. As such it really doesn't matter if a g replace g with ɡ, invalid IPA characters (g) is used instead as the rest of it will be illegible anyway. I feel that, as the standard says that the two characters are synonymous, there is no need to update what we have currently got - though, if there is a reason that ɡ is more correct, then the pronunciation key should reflect this. Also all the font references should be in the style-sheet. I think they should be included because there are some scripts which the default sans-serif is not good at displaying, even though it has a representation of the character; the style sheet is the place for them because that allows regular editors to use whichever fonts they prefer. Conrad.Irwin 23:35, 12 November 2007 (UTC)[reply]

Despite my querulousness above, I do agree that, everything else being equal, in an IPA transcription, we ought to use all IPA characters. But as is so often the case on this vale of tears, all is not, alas, equal.
One problem is that the IPA block in Unicode does not contain all of the IPA characters. The IPA characters that are identical to other characters aren't repeated there. So any IPA transcription will always contain a mix of characters from the IPA block and other blocks. At one level this is no problem at all, since Unicode characters are just characters; nobody (least of all Unicode) says that the different "blocks" are also different "fonts". And, of course, to deal with the issue of different browsers taking different characters selected from different blocks and displaying them in different fonts and generally making a hash out of things, we've got the {{IPAchar}} template. But that brings me to my second point.
That there {{IPAchar}} template isn't, I'm afraid, perfect. Oddly enough, in my own (otherwise Unicode- and IPA-compliant) environment, it breaks a few characters. And the two characters it breaks are probably the least obscure Unicode IPA characters there are: ɑ and ɡ. If I leave those two characters naked, as I did in the preceding sentence, I can see them just fine, but if I use the {{IPAchar}} template on them, like this: ɑ ɡ, they disappear! But everything else -- every other goofy character in the IPA table -- looks fine, whether "protected" by the {{IPAchar}} template or not.
I hasten to add that I am not complaining here; I am not asking that 'g' be used as opposed to 'ɡ' (or 'a' instead of 'ɑ') for my sake. I agree that we can't pander to every broken browser out there -- at some point, people have to upgrade to browsers that work. But at the same time, if there are a lot of people out there with broken browsers, we have to decide how harsh we want to be with them.
(If you're curious, my environment is Firefox 1.5.0.12 under Mac OS X 10.4.9. Safari on this machine can display the two characters in question just fine, whether they're in the {{IPAchar}} template or not. I plan on upgrading to Firefox 2 RSN. At some point I may also take a look at the {{IPAchar}} template and see if there isn't a 12th font I can add to its list that will work with this browser for ɑ and ɡ also.) —scs 02:46, 13 November 2007 (UTC)[reply]
That's pretty odd behavior. It sounds like one of the fonts specified by IPAchar is being found on your system and that it has a blank for those characters instead of omitting them. If so, that's a really badly behaved font. Mike Dillon
Yup, sure is. (But thanks for the analysis; it fits.) —scs 05:32, 13 November 2007 (UTC)[reply]

See a related discussion that has started at User talk:Gilgamesh#MS apathy of well known bug. It appears that, as of the latest IE 7, the glyph fallback bug we've mentioned has been fixed. As for the template fonts, User:Flyax told me at Wiktionary talk:IPA pronunciation key that for him, the Gentium font (first in the font list) especially at our usual font size, actually makes IPA look quite ugly. To be honest, I've got to agree. Gentium is fine at larger sizes, but is quite ugly at Wiktionary's default small font size. In contrast, however, I've noticed that Code2000 is the most omnidisplayable IPA font I've ever seen. It antialiases at this size (at least for me), and it also has some of the most reliable placement of diacritics (not so much for Windows before XP or one of its service packs or something though, as previous versions had many general problems when it came to combining diacritics of almost every kind), having additional logic to be able to place diacritics in logical positions and to even stack more than one of them elegantly in place above and below Latin and IPA characters. Code2000 is also cutting-edge in supporting more and more IPA characters that other fonts (including Gentium) do not yet properly support. Get Code2000, install it and try it out, and you'll see what I mean (the following text uses specifically Code2000 in style): ʲ j i ɪ e ɛ æ a ʲʷ ɥ y ʏ ø œ ɶ ɨ ɘ ə ɚ ɜ ɝ ɐ a ʷ ʉ ɵ ɞ ˠ ɰ ɯ ɤ ʌ ɑ ˠʷ w u ʊ o ɔ ɒ ˈ ˌ . ˑ ː ˥ ˦ ˧ ˨ ˩ ŋʷ kʷ ɡʷ gʷ ɠʷ k͡xʷ xʷ ɡ͡ɣʷ ɣʷ ˠʷ ʍ w ʟʷ ǁʷ ɥ ʸ m p b ɓ p͡ɸ ɸ b͜β β β̞ ʙ ʘ ɱ p͆ b̪ p͜f f b͡v v ʋ ѵ n̪ t̪ d̪ ɗ̪ t͜θ θ d͜ð ð n t d ɗ t͜s ʦ s ˢ d͜z ʣ z t͜ɕ ʨ ɕ d͜ʑ ʥ ʑ t͡ʃ ʧ ʃ d͡ʒ ʤ ʒ ɹ ʴ r ʳ ɾ t͜ɬ ɬ d͜ɮ ɮ l ɫ ˡ ɺ ǀ ǃ ɳ ʈ ɖ ʈ͡ʂ ʂ ɖ͡ʐ ʐ ɻ ʵ ɽ͡r ɽ ɭ ɲ c ɟ ʄ c͡ç ç ɟ͡ʝ ʝ j ʲ ʎ ǂ ɧ ŋ k ɡ g ɠ k͡x x ɡ͡ɣ ɣ ˠ ɰ ʟ ǁ ɴ q ɢ ʛ q͡χ χ ɢ͡ʁ ʁ ʶ χ͡ʀ ʀ ħ ʕ ˁ ʡ ʡ͜ʜ ʜ ʢ ʔ ˀ ʔ͜h h ʰ ɦ ʱ ò ó ô õ ō o̅ ŏ ȯ ö ỏ o̊ ő ǒ o̍ o̎ ȍ o̐ ȏ o̒ o̓ o̔ o̕ o̖ o̗ o̘ o̙ o̚ ơ o̜ o̝ o̞ o̟ o̠ o̡ o̢ ọ o̤ o̥ o̦ o̧ ǫ o̩ o̪ o̫ o̬ o̭ o̮ o̯ o̰ o̱ o̲ o̳ o̴ o̵ o̶ o̷ O̸ o̹ o̺ o̻ o̼ o̽ o̾ o̿ ò ó o͂ o̓ ö́ oͅ o͆ o͇ o͈ o͉ o͊ o͋ o͌ o͍ o͎ o͏ o͐ o͑ o͒ o͓ o͔ o͕ o͖ o͗ o͝o o͞o o͟o o͠o o͡o o͜o o͢o oͣ oͤ oͥ oͦ oͧ oͨ oͩ oͪ oͫ oͬ oͭ oͮ oͯ. - Gilgamesh 11:47, 13 November 2007 (UTC)[reply]

Oh, by the way, ɑ is actually a very specific vowel articulation quite separate from any articulation of a, and as far as I know, has no alternate equivilent. So if you can't see ɑ, you more or less have to fix the situation, because ɑ and a cannot be properly merged in transcription. - Gilgamesh 12:05, 13 November 2007 (UTC)[reply]

live mirror

Not sure where to report this, but: http://wikitionary.biz/ appears to be a live mirror of en.wikt, and I believe those are frowned upon. —scs 23:35, 10 November 2007 (UTC)[reply]

I mentioned it in #WM-tech which usually does the trick. --Connel MacKenzie 00:01, 11 November 2007 (UTC)[reply]

Excessive use of uncountability

I have noted an enormous number of abstract nouns presented as "uncountable", for example, capitalism. A quick run of g.b.c. will yield a significant number of books with "Capitalisms" in the title, as well as thousands of other hits. It is as if we had determined that the intellectual activity of comparing two versions of a concept to be too rarified an activity for Wiktionary. Anything to be done about this? DCDuring 22:32, 12 November 2007 (UTC)[reply]

Many abstract nouns like (deprecated template usage) capitalism have a usual uncountable sense and a less common countable sense denoting a specific variation/version/enactment of the usual abstract meaning. By keeping that in mind, you'll probably understand the motivation the editor had for marking the noun as uncountable. When you do find an entry whose headword is regularly used in the plural but an earlier contributor claimed it is not used in the plural, remove "uncountable" if there really is only one countable sense or mark it as countable/uncountable, add the countable sense, and, if you have time, consider adding a citation showing the plural use. Rod (A. Smith) 22:49, 12 November 2007 (UTC)[reply]
I edited our (deprecated template usage) capitalism entry showing how I treat such terms. Let me know if you have any suggestions for improvement. Rod (A. Smith) 22:54, 12 November 2007 (UTC)[reply]
I can't imagine any other way to handle it. I had done something like that once or twice, but the frequency of it is quite overwhelming. I suppose that we could consider ourselves blessed with so many opportunities to add more senses and complexify the headword line. DCDuring 23:32, 12 November 2007 (UTC)[reply]
I'm having conceptual trouble with this. Let me explain:
Each member of the class of "isms" can be viewed as:
  • a single core set of tenets (possibly with only one tenet)
  • any philosophy that includes the core set of tenets
  • a time- and place-specific implementation of either of the above
  • an entire time-sequence of the evolving tenets, philosophies, and implementations
  • the set of all such time sequences.
If this is true, then it seems to me that any "ism" that is not a proper noun can have a plural. And even in the proper noun cases, there will be writers who will insist on comparing today's proper noun with yesteryear's proper noun or the Western one and the Eastern one. I know that people use terms as if they were uncountable, but it does not seem to be inherent in the words themselves. It makes me wonder why a definition of a noun can ever begin with the word "the" if the word being defined is not a proper noun. Can someone direct me to a work that addresses this kind of issue or correct any obvious error I may have made? DCDuring 00:38, 13 November 2007 (UTC)[reply]
Yes, English is flexible. For English entries, our notes about countability of nouns, comparability of adjectives, and even inflections are really just helpful indicators about usual usage patterns, so I wouldn't worry about defining absolute rules, but fall back on citations. By the way, not that this helps, but your list above seems similar to a sequence Douglas Hofstadter used to show the fluidity of abstration/instantiability: a publication; a newspaper; The San Francisco Chronicle; the May 18 edition of the Chronicle; my copy of the May 18 edition of the Chronicle; my copy of the May 18 edition of the Chronicle as it was when I first picked it up (as contrasted with my copy as it was a few days later: in my fireplace, burning). Rod (A. Smith) 01:01, 13 November 2007 (UTC)[reply]
Once more, facts (quotes) and gut reactions to the rescue. I have noticed that there are some entries that seem much more resistant to plurals than others, even though there is nothing formal to differentiate them from other entries that seem to beg for more pluralism than others might see. Thanks for cutting the Gordian knot for me. BTW, I have old Hofstadter (Goedel, Escher, Bach) in my library, but it wasn't a knowing copyvio or plagiarism. I barely even remember where the book is, let alone what it says. DCDuring 01:52, 13 November 2007 (UTC)[reply]
In English, "uncountable nouns" ("mass nouns") can generally be pluralized. Quoting George Tsoulas, Exceptions [...] can be found when the interpretation of mass nouns is coerced to that of (a) Standard servings, or typical units of measurement, (b) Type, or (c) Idiomatic expressions. Exemplified in (a) - (c) respectively: (a). We ordered three waters an hour ago (i.e. glasses, bottles etc. . . ), (b). Our restaurant serves only three waters (tap, still, and sparkling water), c. Matilda’s waters broke. As sense (b) "Types" is broadly applicable, plurals of almost all "mass nouns" are used. Does that mean a generic countable sense should be added to all "uncountable nouns" as was just done to capitalism? Another alternative would be to add these pluralization cases to Appendix:Glossary#uncountable and edit {{en-noun}} to allow the plural form in inflection line of an entry just marked as uncountable. Thoughts? --Bequw 21:22, 24 November 2007 (UTC)[reply]
The word used to illustrate uncountability in Wiktionary's glossary is "information". I wouldn't try to claim I could use it in a plural way, given its current meanings, but a new one could arise. The real problem I've had is mostly with words that describe states or conditions. For many of those words, editors seem not to see the potential for senses that are countable because one sense might be uncountable. In 1848 when Marx had just writen the Manifesto, there was at most one "Marxism" (at least in practice). With its success, there came to be variants, and comparing them became possible. A student today could learn what Marxism "is" and not be aware of how many Marxisms there "are" or have been. DCDuring 22:53, 24 November 2007 (UTC)[reply]

Definitions of foreign terms

Wiktionary:Translations says, “In entries for foreign words, an English translation is given instead of a definition.” The intent is apparently to impose definitions like "# to talk" and to prohibit longer, more precise definitions. Frequently, though, a simple one-English-word semantic approximation seems inadequate to explain a specific sense of a foreign word. So, I'd like to change that restriction into a mere recommendation to include an English approximation in the definition. Thoughts? Rod (A. Smith) 00:36, 13 November 2007 (UTC)[reply]

For some words that works. I put extra notes on semantics and stuff like that under Usage notes. — [ ric | opiaterein ] — 00:54, 13 November 2007 (UTC)[reply]
The text must be in English, but I see no reason not to have a complete definition for foreign words. I am not sure that Wiktionary:Translations intended to prohibit longer definitions, but it could be interpreted as such. How about we change it to read "In entries for foreign words, an English translation should form the basis of the definition." Conrad.Irwin 01:00, 13 November 2007 (UTC)[reply]
Wikt historical note: Yes, that was worded that way to explicitly prohibiting longer definitions, in lieu of minimal translations (i.e. an English word [+ "disambiguation" gloss].) The number of translation entries for, say, butterfly, was considered too impossible to maintain, otherwise. --Connel MacKenzie 06:26, 13 November 2007 (UTC)[reply]
The principle (as I understand it) is that if a single-word English translation will communicate what the word means (e.g. "butterfly"), then that is all that is required and all that should be used. If the single-word tranlsation could lead to confusion, then a gloss should be added to tie it to the relevant English sense (e.g. "violet (flower)" ) or a list of two or three translations (e.g. "dim, faded, worn") for clarification. However, there will be some words that have no equivalent single-word translation in English, and those words whould get a full definition (e.g. "for the last time"), with key words in the definition linked. That's how I've been proceding with Latin. Like Opiaterein, I use Usage notes when even that is insufficient. --EncycloPetey 19:08, 13 November 2007 (UTC)[reply]
Agree entirely. The principle that a word should only be translated unless that's not possible is a good one though. Widsith 13:31, 14 November 2007 (UTC)[reply]
OK. So, I'll change it to say, “In entries for foreign words, one or more English translations should form the basis of the definition. If the translation has multiple senses and may lead to confusion, a gloss may be added to indicate a particular sense of the translation. If no precise translation is known, clarification may be given after the translation or in the Usage notes section.” Does that seem right? Rod (A. Smith) 20:17, 14 November 2007 (UTC)[reply]
I think it's useful to stress that we want by preference a translation not a definition. If you "define" a foreign word you end up (using a very simple example) saying that chien means "A dog." whereas in fact it means "dog". We should only be defining foreign words if there is no suitable English translation, or if a simple translation is in some way misleading or unsatisfactory. Widsith 09:35, 15 November 2007 (UTC)[reply]
Thanks for that, Widsith. I wasn't aware of that distinction, and I'm still not entirely sure I understand it. Am I right in thinking that "(to) talk" is a translation of French parler, but "to talk" is a definition? Rod (A. Smith) 02:20, 17 November 2007 (UTC)[reply]
I disagree. We can require everything to be hyper-labeled — {{countable|lang=fr}} [[dog]] (the animal) — or we can take advantage of the nature of English to clarify simple cases — A [[dog]].. I prefer the latter approach; the labels can be added on top of it, but it's just so much clearer to use real English. (Note: That's actually not a great example, because (deprecated template usage) du chien does exist as well, so in this case we'd want two separate lines and one defined as [[dog]], dog [[meat]] but you see what I mean.)
For me, the point of emphasizing translation is (1) that the easiest way to learn and remember a word is by its translation into your native tongue and (2) many of our readers will have an actual text in front of them that they're trying to translate. We can tell our readers that a word is a vulgar term for a certain organ, or we can show them that by giving roughly equivalent English terms (provided they exist) that they can use in translating the term. However, this doesn't always stand in place of a brief definition, and by restricting ourselves to just a translation we open the door for ambiguities that we don't see, and thereby risk confusing or misleading our readers. (Anyone who's ever relied on a paper-bound bilingual dictionary knows what I'm talking about here.)
RuakhTALK 05:21, 17 November 2007 (UTC)[reply]
That's why we encourage the addition of a sample sentence, quotations, and a link to the word's entry in the native-language wiktionary for that word. As I pointed out above, we do include additional gloss or a full definition if there is a possibility of confusion, but for the majority of nouns and adjectives, and a fair number of verbs, there is little chance of that. It's not clear from your statements above (and below) what position you favor, because you seem to be arguing both sides. --EncycloPetey 03:26, 18 November 2007 (UTC)[reply]
Good question. I guess I don't know what these "both sides" are that you speak of. I'd have described myself as favoring translations, with definitions being helpful and desirable but totally secondary; but I read Widsith's explanation of a translation and don't at all favor that. Maybe I'm actually pro-definition, but feel that the best definition of a foreign-language word is generally one that emphasizes its English translations? (Well, you can judge for yourself. Some entries that I've created and am fairly happy with are cruauté and chauve, though both could benefit from quotations and/or example sentences. Feel free to tell me what side I'm on. :-P) —RuakhTALK 04:11, 18 November 2007 (UTC)[reply]
The point is that chien doesn't mean "a dog". To say "a dog" in French you must say "un chien". The reason all translating dictionaries that I own opt for translations rather than definitions is because of this phenomenon: that users expect where possible to be able to perform a simple substitution. We all know things aren't really that simple, but that is the objective. I agree with all of Ruakh's second paragraph. Sometimes there is no equivalent, and that's fine. Widsith 08:42, 17 November 2007 (UTC)[reply]
But I think sometimes (deprecated template usage) chien does mean "a dog"; at least, the phrase "il est chien" is possible in some contexts (granted, usually when (deprecated template usage) chien is the first word of a doggie job title, like (deprecated template usage) chien de garde). And if we decide that's a rare edge case and irrelevant, then we have to decide where the line is; since Latin, as I recall, doesn't have articles, would we translate (deprecated template usage) canis as "dog, a dog, the dog", or as "a/the dog", or what? I don't think carefully crafted translations, or even carefully crafted translations plus copious carefully crafted usage notes, can substitute for some baseline knowledge of the language and how it works; and I don't think we should try to make it substitute. —RuakhTALK 16:41, 17 November 2007 (UTC)[reply]
That's a matter for French grammar, not translation. Any translation dictionary requires that the user understands how to use the translation in context, and it is not the dictionary's job to do that for them, because the dictionary does not know what the context is. Hence "canis" and "chien" are both translated "dog". "A dog" and "the dog" are misleading and incorrect in most contexts.

I've been asked to contribute to this discussion. Here is what I feel is the best way of handling foreign words, and it is more or less what is usually done:

  • Wherever possible, give a translation, not a definition. Hence translate "chien" as "dog" and not as "an animal of the genus Canis". Why? Because "an animal of the genus Canis" could equally well be a wolf or a fox. Giving the definition expects the user to work backwards to get the word they are after. If they want the definition, they can click on the link to read one in the English entry.
  • If the word has multiple meanings in English (which will be true in most cases), follow the definition with an italicised gloss in parentheses to clarify which sense(s) is/are intended. Hence translate "chien" as "dog (animal; the male of this animal)" (as "chien" can also refer to a male dog, as opposed to a bitch) to distinguish it from "a man", "a morally reprehensible person", etc. This is important, because the translation alone may not be sufficient to give the sense of the word, even in context.
  • If there is no equivalent in English, then give a definition. For example, there is no equivalent to Italian Befana in English. It's a national holiday on Twelfth Night (and so "Twelfth Night" isn't sufficient as a translation, because that's not a national holiday in English-speaking countries) and also an old woman who is supposed to give children presents (a bit like Father Christmas/Santa Claus, but again, that is not a good enough translation). So the definitions would need to be given in these cases. This will be the case for many words in polysynthetic languages, where a single word (by which I mean a continuous sequence of characters without any spaces in it) is equivalent to an entire sentence in English.

Paul G 10:58, 8 December 2007 (UTC)[reply]

Inter-project links: "See also", or "External links"?

Currently a lot of entries put textual links to Wikipedia in the "See also" section, and a lot put them in the "External links" section; as far as I can tell, none of our policy pages says which of these is correct. Since there are already efforts toward standardizing our presentation of inter-project links, this seems like a fairly simple thing that might be worth standardizing, probably via ELE. So, do y'all think this should be standardized via ELE, and if so, do you prefer "See also", "External links", or something different? (This isn't about the boxes, which are really a discussion unto themselves — though obviously a decision about this might end up affecting that discussion.) —RuakhTALK 04:50, 13 November 2007 (UTC)[reply]

Personally, I tend to put them in "See also" when I think about it. I don't consider the sister projects external in the same way as other sites are, and I don't really love the idea of having a section titled "External links" anyway—instead of, say "References" or something that indicates what the links are doing there. I also kind of like the idea of a "Sister projects" section title, but that may be too cumbersome in the ToC. Dmcdevit·t 07:29, 13 November 2007 (UTC)[reply]
It seems like "See also" has the same defect of not indicating what the links are doing there, though. Conceptually I like the idea of a separate "sister project"-type header, but don't we try to encourage (properly attributed) mirroring of our content? Our mirrors might not like having Wikipedia et al described as sister projects … :-/ —RuakhTALK 19:59, 13 November 2007 (UTC)[reply]
I like putting them in "See also", for the same reason, they aren't "external" Robert Ullmann 07:36, 13 November 2007 (UTC)[reply]
I like putting them in the External links, to let people know they're leaving the English Wiktionary. This is how Wikipedia links to Wiktionary, Commmons, Wikisource, etc. We already have people confusing us with Wikipedia without this adding to the problem. --EncycloPetey 19:02, 13 November 2007 (UTC)[reply]
Seeing as the links will presumably continue to say things like "Wikipedia has an article about ___", I don't think putting such links in "see also" would particularly engender confusion. (If we're worried about this issue, it seems like the links in initialism definitions, in quotation metadata, etc., are a bigger problem; but those seem so useful that I'd rather not dispense with them unless we really have to.) —RuakhTALK 19:59, 13 November 2007 (UTC)[reply]
From WT:ELE, the "See also" section seems to have been added in as an afterthought. I much prefer the "Related terms" heading for links to related definitions. This leaves the "See also" section to contain the links to indices and appendices. I feel that the rest of Wikimedia is essentially one massive appendix as it is stored on the same shelf, so the links should be in the "See also" section. It does make me wonder what would be put into an "External Links" section as presumably most of the possible candidates will either be references or usage material of some sort. Incidentally Wikipedia guidelines are unclear, but seem to support "See also" over "External Links". Conrad.Irwin 16:48, 16 November 2007 (UTC)[reply]
But Related terms is only for words that are morphologically related. In other words, the entry for warp can have warped or warpedness as Related terms, but not weft. The term weft is not morphologically related, even though its meaning is closely tied to warp. The See also section thus can function as a catch-all for those words that really do need to be mentioned on a page but which do not fit into any of the standard sections. --EncycloPetey 03:18, 18 November 2007 (UTC)[reply]
I thought we were moving away from "See also" towards "External links". Wow, guess I was wrong.
I don't want to mix up individual words with "Wikipedia has an article on..." so if you're going to put them in "See also" we have to exclude or find appropriate places for those other links.
I also think there's something to be said for consistency with other projects. And do we really anticipate having all that many external-to-wikimedia links? Edit: If not then best to populate this section with sister project links. DAVilla 05:12, 17 November 2007 (UTC)[reply]
Every translingual scientific name of an organism could have a link to both Wikipedia and to Wikispecies, as well as a link to images on Commons. That's more than a million potential entries with such links. I'd like to have See also reserved for internal linking and External links to non-Wiktionary material. --EncycloPetey 03:18, 18 November 2007 (UTC)[reply]

Whatever we decide, it'd be good to tear through this long ago proposed vote as well. DAVilla 03:49, 21 November 2007 (UTC)[reply]

Alternative spelling vs. Alternative spellings

I recently cleaned up an article that contained only one alternative spelling, yet AutoFormat changed the title to the incorrect Alternative spellings. Should the bot be updated to check for only one alternative spelling, or is there a reason to have the title unquestionably pluralised. Conrad.Irwin 23:58, 14 November 2007 (UTC)[reply]

It should be plural, just as ====Synonyms==== and ====Translations==== should be plural even when only one synonym is listed. The consistent plural allows applications to read our database, allows editors to add new translations and synonyms without also having to edit the section heading, and creates no grammatical error or possibility for reader confusion. Rod (A. Smith) 00:18, 15 November 2007 (UTC)[reply]
Agreed, always plural, so the same header can be used no matter how many there are. Cheers! bd2412 T 22:18, 24 November 2007 (UTC)[reply]

{{en-verb}} wording

This has probably been mentioned before, but does anyone else find the rather excessive third-person singular simple present a little overkill for {{en-verb}}. I do realise that this is technically correct, however it would be really nice to shorten this. Would third-person present be a reasonable compromise between legibility and precision (note precision, they are both accurate). Conrad.Irwin 14:14, 15 November 2007 (UTC)[reply]

No, the additional terms are required in order to be accurate because of the significant differences between the present tense in English and in other European languages. English has additional present tense constructions (e.g. He walks, He is walking) that do not exist in Romance languages, for example, but which are important in English for distinguishing subtlties of present tense action. Also, removing the word singular implies that they are acceptible for use with the plural, which they isn't. --EncycloPetey 14:23, 15 November 2007 (UTC)[reply]
For the record, at least in Spanish, there is a distinction between "he walks" and "he is walking". "el camina" vs "el esta caminando" — [ ric | opiaterein ] — 01:30, 16 November 2007 (UTC)[reply]
As the other present tense is also included in the template, under present participle, it seems unlikely that removing the word "simple" would cause ambiguity. Incidentally, how many native English speakers know what the simple tense is, I didn't :)? I also feel that a vast majority people (both learners by instruction and natives by intuition) will already know that the "singular" third person is the only one that is different, and so restating that for every verb in the dictionary is over the top. Conrad.Irwin 15:02, 15 November 2007 (UTC)[reply]
I share your opinion that the over-precise wording is off-putting. It seems to add more confusion than clarification...I still don't understand why "third person" is not sufficient here. Unfortunately, {{en-verb}} is used too frequently to experiment with complicated WT:PREFS hooks to (optionally) simplify that wording. --Connel MacKenzie 16:08, 15 November 2007 (UTC)[reply]
I'm not sure I like "third person" as a label, but since the -s forms and -ing forms are kind of obvious to anyone who speaks even a few words of English, I'm not sure we need to label these at all. We can also merge the presentation of the past tense and the past participle in cases where they're the same (-slash- only list the past participle in cases where it's different from the past tense). —RuakhTALK 20:01, 15 November 2007 (UTC)[reply]
I certainly think the word "simple" could be removed. Widsith 08:35, 16 November 2007 (UTC)[reply]
The problem with that is that English has more than one present tense. (Simple present, present continuous, etc) Although I think that the English Wiktionary should be a source mainly for speakers of English, I don't think we should try to over-simplify grammatical terms, just in case. — [ ric | opiaterein ] — 14:22, 16 November 2007 (UTC)[reply]
This level of precision is a little silly. Consider that, technically, we don't even list the base form, that is the plural or first or second person (i.e. non-third person singular) simple present tense form. Instead we list the infinitive, which is to + base form. The user has to know the grammar rules to be able to use a language, and whatever labels are adequate are adequate, I say. DAVilla 05:02, 17 November 2007 (UTC)[reply]
All persons besides the 3rd person singular are the same as the infinitive in English. I say, you say, he says, we say, you say, they say. I think that would be one of the first things about verbs that students of English learn... — [ ric | opiaterein ] — 03:44, 19 November 2007 (UTC)[reply]
The consensus here seems to be to change something, but what should be changed? I am still in favour of using "third-person present" for the first label (I know that the label was changed away from this before), and I also like the idea of combining the "past participle" and "simple past" if they are the same, but that may meet with less approval - and might be tricky to implement. Conrad.Irwin 20:58, 24 November 2007 (UTC)[reply]
Third-person is inaccurate, because We don't walks.See, I've told you guys I'm spacey :D — [ ric | opiaterein ] — 23:59, 24 November 2007 (UTC) — [ ric | opiaterein ] — 21:16, 24 November 2007 (UTC)[reply]
What does that have to do with the price of tea in China? ((deprecated template usage) we is first person.) Rod (A. Smith) 23:45, 24 November 2007 (UTC)[reply]
They don't walks* My bad lol — [ ric | opiaterein ] — 23:56, 24 November 2007 (UTC)[reply]
As a stickler I would maintain that third person is accurate, though I am happy to agree it is imprecise. As mentioned above, it seems unlikely that anyone who can understand even a very small amount of English would think it implies third person plural, as the third person plural never differs from the first person singular. Perhaps it would be better to shorten it by abbreviating the words "3rd pers. sing. simp. pres." but that is a bit ugly. Conrad.Irwin 19:06, 25 November 2007 (UTC)[reply]
Having conversed with many non-native English speakers (in English), you might be surprised at how easy it is for them to mix up verb conjugations. So it's not hard for me to believe that someone who doesn't speak English as a first language might not realize that 3rd person is only referring to the 3rd person singular. I agree that abbreviating it doesn't look very nice :D But still, I really don't think it needs to be changed. — [ ric | opiaterein ] — 19:35, 25 November 2007 (UTC)[reply]

maternal and paternal family entries

I refactored our entry for (deprecated template usage) grandmother to eliminate the artificial distinction in the definition. [3] Is it a big deal that there are more translation tables than there are definitions? Rod (A. Smith) 00:27, 17 November 2007 (UTC)[reply]

One definition is correct. More than one translation table is okay since it's noted that many languages split maternal and paternal. We've accepted that before. I'm not sure that having three tables is a good idea though. If the user doesn't know how a language deals with the maternal/paternal issue, where should they look? The definition should either be split, completely, or not. In this case I'm leaning towards not, and directing to maternal grandmother and paternal grandmother for each language that distinguishes them. DAVilla 04:55, 17 November 2007 (UTC)[reply]
Some languages, like Chinese, extend the paternal/maternal distinction to cousins, aunts and other stuff. I'm thinking the best way to format the translations would be to keep the specific ones at the entries like maternal grandmother and paternal grandmother (with the translation tables only at those entries, with a See also link). For languages that don't distinguish, the translations should be at grandmother. To avoid a mess, the non-parental specific translations should only be put under the "mother of someone’s parent" box. (Apparently, there are languages like Norwegian which have a different translation for all 3, which would obviously be an exception.)
For the maternal grandmother and paternal grandmother entries, it wouldn't be a horrible idea to include translations like the Italian "nonna materna". Thoughts? — [ ric | opiaterein ] — 20:48, 20 November 2007 (UTC)[reply]
Yes, with individual see links, that's basically my suggestion taken to much greater detail. But see also the split tables at uncle. DAVilla 05:31, 21 November 2007 (UTC)[reply]
If there aren't individual articles, I guess there's no choice but to keep them at the main one. My main issue was that the same stuff isn't kept in two difference places. Already, there's stuff that's at grandmother in the "maternal grandmother" translation table that isn't at the maternal grandmother entry. (I'm going to try to fix that right now, though.) — [ ric | opiaterein ] — 13:17, 21 November 2007 (UTC)[reply]

Citations: and Citations talk: finally created

Note Bug 11981. Thanks to JeLuF, the two namespaces, Citations: (NS:114) and Citations talk: (NS:115), have been created. Can all the /Citations subpages now be moved to the Citations: namespace?  (u):Raifʻhār (t):Doremítzwr﴿ 01:03, 18 November 2007 (UTC)[reply]

First we should write Wiktionary:Citations so we don't wind up doing things twice. DAVilla 06:15, 18 November 2007 (UTC)[reply]
I don't see why they can't be moved. No, we are not going to use the horribly ugly uppercase like Citations:ABIDINGPLACE. They should be the same as the pagename (or what the pagename would/will be if/when it is created). If the capitalization differs WM will take you to the page anyway; spelling variants can be redirected (e.g. in cases where we would not redirect in NS:0) Robert Ullmann 10:31, 18 November 2007 (UTC)[reply]
I agree with Robert in re naming. I’d add that, in some cases (chiefly those where we have tonnes of citations), it may be a good idea to split the citations of alternative spellings over different pages. In cases where that isn’t necessary, then the citations can be housed at the same spelling as the one which houses the “primary” entry, and at the lemma for multiple plurals, adjectival comparatives and superlatives, and verb forms (very much like what is done already).
Shall I make a start at the moving then?  (u):Raifʻhār (t):Doremítzwr﴿ 13:00, 18 November 2007 (UTC)[reply]
Okay, we'll use "primary". But I still think it would be a good idea to have all the same case. Sometimes the spelling difference is just in the case. DAVilla 17:01, 18 November 2007 (UTC)[reply]
By the way, wasn’t the idea that there would be an extra quotations tab, sandwiched up there with the article, discussion, edit, history, move, and watch tabs? (Thereby obviating the need for template-created interlinks.)  (u):Raifʻhār (t):Doremítzwr﴿ 13:13, 18 November 2007 (UTC)[reply]
This would require altering the all of the MediaWiki skins to include the tab if it were to be done in the cleanest way. Barring that, the tab could be added using JavaScript, which would make it visible to the majority of editors; however, I'm not sure it would be possible to have the link be red/blue depending on whether the page exists yet. The red/blue thing could be done, but it would require an additional page load behind the scenes to check for the page's existence (or two if "Citations talk" is included too). I guess something tricky could be done using one of the system messages in MediaWiki: to get the backend to check the existence of the page(s) and CSS/JS tricks to move the link(s) into a tab, but that could be fragile... Mike Dillon 15:35, 18 November 2007 (UTC)[reply]
There's an "Add Citations tab" checkbox in WT:PREFS, which goes to /Citations right now. Cynewulf 16:04, 18 November 2007 (UTC)[reply]
We could just scratch that now, since the citations page is not necessarily at the same spelling, given that there are often alternate spellings that would be best to collect on the same page. DAVilla 17:01, 18 November 2007 (UTC)[reply]
You are creating the unnecessary problem of it not being at the same spelling. If the word exists in the main namespace, the Citations: page should be at the exact same title; if there are variants, they can redirect in the Citations: namespace. If we mod the code for the existing "Add Citations tab" it will work nicely. Robert Ullmann 12:14, 19 November 2007 (UTC)[reply]
Let me be clear that I don't care if you criticize my brainstorming in the original proposal. You shouldn't expect the same from others, but I can take it without feeling ashamed. I think outside of the box there's no way for me to turn that off.
However, the problem stated immediately above does not lean on that proposal. The existence of that is independent of my own. The pages are simply not 1-to-1 as you had first envisioned. Did I invent the aspect of language that produces variant spellings? Do not lambast me for highlighting problems. If you see ways around them then great, we can find ways to fix them. On your advice I had already incorporated redirects into the skeletal proposal by the time you read my response. Note that on the tabs issue you have not addressed the Citations talk: space, and that Citations: can be arrived at differently than by tabbing, and that they can exist independently of the main entry if the latter fails RFV by a margin, for instance, and that this isn't of general use since most people don't even visit the prefs pages. I value your time enough to say that in my opinion it's worth scrapping. DAVilla 21:05, 20 November 2007 (UTC)[reply]
I have to agree with Robert that not having the citations at the exact spelling presents some problems (not to imply that having then at the same spelling doesn't also have problems, of course). I think I would like to see citations always at the exact spelling, but wonder if there is a simple way to duplicate those citations at a standard spelling or lemma spelling page, without having to manually synch them. --EncycloPetey 21:20, 20 November 2007 (UTC)[reply]
Yes there are issues, but unless you want separate pages for Citations:storm in a teakettle, Citations:storm in a tea-kettle, and Citations:storm in a tea kettle (and please be clear if that's what you're advocating) then my point is the issues are unavoidable. DAVilla 21:35, 20 November 2007 (UTC)[reply]
Yes, I mean just that, and likewise for other spelling variations. However, I also want to know whether we have some means of dynamically (?) collecting citations from related forms to a single page as well. --EncycloPetey 22:01, 20 November 2007 (UTC)[reply]
Wow, you would? In your opinion then, is the Chaucer quotation at therefor misplaced? DAVilla 18:48, 21 November 2007 (UTC)[reply]
Yes, not only because of the spelling, but also because it should be in a section for Middle English, not Modern English. Understand that I'm not against, say, the citations listed at parrot, which clearly have obsolete spellings listed among the modern one, but only because we don't yet have a way to separate them to other pages and simultaneously get them to appear in the standardized lemma page. I would like to see a separate Citations: page for each and every spelling, but with some sort of cross-communication between spellings for the same "word". --EncycloPetey 17:08, 22 November 2007 (UTC)[reply]
Does the quotation, "Deism made sense in its day as a reaction against religious wrangling and warfare" belong at Citations:deism or Citations:Deism? (In other places in the work the author uses the lowercase, but what if he had only used the word once?) DAVilla 05:06, 26 November 2007 (UTC)[reply]
"unavoidable" Why? We use redirects in the Citations namespace exactly as in the main namespace for exactly that reason. No one is talking about separate pages when there clearly would be (or is) one entry. This "issue" not only is avoidable, it doesn't exist. Robert Ullmann 10:55, 21 November 2007 (UTC)[reply]
We've got a communication gap. I'm starting not to understand you at all, and I'm not sure if you understand me.
We don't use redirects in the main namespace. "...when there clearly would be" one entry in which namespace? Is "storm in a tea(-)kettle" an example of that? At least now I know where EncycloPetey is coming from. What's your angle? Would you always put a citation on the exact same page as the spelling? DAVilla 18:48, 21 November 2007 (UTC)[reply]
We sure do have a comms gap; you are trying to "solve" a problem that you haven't shown exists at all. Of course the citations go on the exact same spelling. Cites for facade go on Citations:facade, cites for façade go on Citations:façade. When there is a redirect instead in the mainspace, there is a corresponding redirect in the Citations space. When the mainspace entry(ies) do not yet exist, the Citations: space entries are set up as if it/they did. (And this works now with the tab and {seeCites}). What the devil do you think is the "problem"? Robert Ullmann 07:43, 22 November 2007 (UTC)[reply]
Unless you want separate pages for Citations:facade, Citations:facades, Citations:façades, and Citations:façade then my point is the issues are unavoidable. Everything I have said is based on the premise that we would want to group such entries together, making the spelling distinction on the page itself. The communication gap arises from my assumption that notion was already understood, that there was no argument to have to win there.
I see carrying this policy of spelling separations over from the main space as complicating things much more than all but the most left of left field ideas I've brought up. Redirects without the silly tabs are a more sensible solution. Even redirects with the silly tabs are better. But I guess our policy in the main space has momentum now, and it's hard to argue against momentum. DAVilla 16:22, 23 November 2007 (UTC)[reply]
It would be nice to extend MediaWiki that way. For instance, I would like to see templates divided into code, documentation, and talk. But it isn't absolutely necessary, or even near that, and not any sort of prerequisite for this restructuring. DAVilla 17:01, 18 November 2007 (UTC)[reply]

(dropping indent, as I'm not sure where it would be ;-) I've fixed WT:PREFS so the Citations tab works with the namespace. Citations: should (of course!) be at the spelling/capitalization being cited. Redirects to be used for variants that are not usefully separated, separate pages when useful with see also links. Go to (for example) Vogue (not vogue) and you will see the create page form with a citations tab (do turn it on in WT:PREFS first!). So if you are creating the page, you can easily access the citations that have been collected.

(DAVilla, if you feel I am lambasting you, it is because you always take something very simple and straightforward and complexify it. This is simple and easy, and does not need or want the creation of new mechanism or method!)

Of course Citations: pages may be arrived at by various routes, all the more reason to have them at the form being cited. Robert Ullmann 11:25, 21 November 2007 (UTC)[reply]

Splendid. That is exactly as I envisioned it.  (u):Raifʻhār (t):Doremítzwr﴿ 12:31, 21 November 2007 (UTC)[reply]

{{seeCites}} is also fixed, it will take you to the Citations: page if it exists, else the /Citations page. When we've moved them all it can be simplified again. Robert Ullmann 07:43, 22 November 2007 (UTC)[reply]

Straw poll

Should the placement of quotations and references in the Citations: namespace depend on the the exact spelling of the term, including diacritics and inflection? For instance, should there be separate pages for Citations:storm in a teakettle, Citations:storm in a tea-kettle, and Citations:storm in a tea kettle, and for Citations:facade, Citations:facades, Citations:façades, and Citations:façade? Or should all related citations rather be placed within the same, primary entry, say Citations:storm in a teakettle and Citations:facade, and the others redirect? DAVilla 16:22, 23 November 2007 (UTC)[reply]

Both, we should accumulate a list at the exact spelling but direct to a central "lemma" as well. --EncycloPetey 16:45, 23 November 2007 (UTC)[reply]
These are mutually exclusive (assuming each quotation belongs on a specific page). I'm only asking about placement of the quotations. I haven't prohibited linking or any other aspect of presentation on a page. Or do you mean that the quotations should be duplicated, belonging on both pages? DAVilla 16:54, 23 November 2007 (UTC)[reply]
Duplication would be one way to accomplish it, but it wouldn't be the ideal way to go about it. I just don't know enough about the technical possibilities and limitations of Wikimedia and wiki code to propose a particular "avatar" for my idea. --EncycloPetey 19:03, 23 November 2007 (UTC)[reply]
Citations for all variant spellings should be at the same central place. Or at both, as per EncycloPetey above. Widsith 16:51, 23 November 2007 (UTC)[reply]
Is this a for-all-time decision, or "just" for the initial move operation, correctable for individual pages?
For English, I think that each Etymology-PoS should get its own page, with all its inflected forms on that page. Adjectives that are spelled identically to participles and have related meaning ought to be on the verb page as well. Nouns used as adjectives should be on the noun page. For compound words differing only in spacing and hyphenation, same page. For phrases, same page also for those that differ only in pronouns (his, one's, etc.). There are many times when upper- and lower-case ought to be combined also. DCDuring 18:52, 23 November 2007 (UTC)[reply]
Separate pages for different parts of speech presents problems, because it isn't always clear (or agreed upon) what the POS is. Consider words like (deprecated template usage) one, where the POS is highly ambiguous. I think it would be better to have the various POS uses on a single page, rather than separate ones, for ease of comparison and discussion. --EncycloPetey 19:01, 23 November 2007 (UTC)[reply]
I agree. Widsith 10:07, 24 November 2007 (UTC)[reply]

Suggested alteration to CFI on citation

I would like to change the WT:CFI Attestation section to:

“Attested” means a term is in clearly widespread use or is verified—through usage in permanently recorded media, conveying meaning, and optionally appearance in a refereed academic journal—in at least three independent instances spanning at least three years. If no quotations or references are in print then two additional quotations of use are required, in permanent media, conveying the same or similar very closely related meaning.

"Same or similar" means that a different part of speech with parallel meaning, an inflection, or a spelling variant would suffice for the two additional citations.

The time span is broadened to three years with little residual effect. This is in other ways a little stricter than before and will clearly require a vote. I know there are people would say that it doesn't go far enough, but this at least closes what I consider to be a few loopholes, including nonces in (however we gage) well-known works and references in journals that are never substantiated.

Any opinions? DAVilla 05:34, 18 November 2007 (UTC)[reply]

  1. I think we ought to stick to “durably archived”, rather than switch to “permanently recorded”. The distinction may seem pedantic, but permanent implies eternality, which noöne can ensure.
    CFI currently says "permanently recorded" although it also uses the phrase "durably archived" twice immediately below. DAVilla
    The preferable term is definitely “durably archived”. Are there other such inconsistencies of terminology in the CFI?  Raifʻhār﴿
    No, that's the only one. :-P DAVilla
  2. We need to define what we mean by “clearly widespread use” somehow; is 1,000+ Google Book Hits and/or 1,000,000+ Google Web Hits a good start?
    I don't see how drawing a line is so important. A word in widespread use, if anyone were to contest its ubiquity, would be easily cited. DAVilla
    I’d always interpreted the “clearly widespread use” criterion as existing to preclude editors’ adding terms pointlessly to RFV without actually challenging them. It would be nice to have at least a guideline of what we meant by that. (If only to give peace of mind to editors, knowing that the existence of the most common words they add won’t be challenged.)  Raifʻhār﴿
    An anon or new user flooding RFV would be blocked. An admin you have to poke fun at. Look what I did to the first definition of attender. DAVilla
    Sorry to be obtuse, but I don’t get the joke intended by those four citations. I have trouble with blocking a user for actions that, in the long-run, are probably of benefit to Wiktionary (as a number of entries will end up with citations — in isolation, a good thing). I also dislike the idea of treating differently users who do the same thing, simply because of personal status.  Raifʻhār﴿
    The quotations are all half a century apart. It might take a few trips to the library, but I could probably dig up a citation for that definition (originally the only one on the page when questioned) for each and every year from 1850 to the present.
    I'm saying that the block, at least a temprory one, would happen. I'm not saying I would be the one to block an anon or new user, not before giving an explanation of why that can't be done. A regular contributor would not be blocked because, on account of the volume of their contributions, they are given more leeway to do clean-up work. I could show you instances both except I'm sure you believe me. To be clear, this is observation, not a position I'm advocating. DAVilla 20:38, 20 November 2007 (UTC)[reply]
  3. What do you mean by “optionally” in “and optionally appearance in a refereed academic journal”?
    I was trying to get the wording so that at least one of the three citations would convey meaning. Maybe it's better to just say that. After more input I'll try to rephrase. DAVilla
    Yes, it isn’t very clear as it is. Remember that such a restriction may still bar words in some of the world’s smaller languages.  Raifʻhār﴿
  4. A strict one-year limit is as unintelligent as a strict three-year limit. That criterion ought to be variable, as it makes very little sense as it is, particularly for recent historic events (consider that your three-year criterion would mean that 7/7 would not satisfy the CFI).
    You're right. We should allow shorter timeframes if there is enough good evidence. Maybe three years should be more of a guideline. DAVilla
    It’s tempting to argue that all these rules ought to be guidelines (but I won’t — the arguments caused thereby would never end). However, it’s probably better to try to define our terms in relation to “case studies”, creating useful distinctions along the way as much as is practical. For the sake of example: whereas names given to historic events, words and inflected forms precedented by similar patterns elsewhere and which fulfil some use not fufilled by another, and so on should need to span a shorter timespan, words and phrases coined after eponyms (“do as [Celebrity X]”), vulgar slang, and so on should need to span a shorter timespan. IMHO, that is.  Raifʻhār﴿
  5. I suggest changing “same or similar” to “same or very closely related”. I think that very sensible allowance should be extended to all quotations, as for some rare words, the lack of such an allowance means that for, say, a verb, we could have two citations for the lemma, two for the third-person singular present-tense form, two for the present participle form, and two each for the past forms, and it still wouldn’t satisfy the CFI as presently worded, even with ten citations for what is obviously the same word. (Granted, we’ve never so absurdly judged an RfV, but obeying the literal rule in our reading of the CFI would require that.)
    Ha! See confuzzle, confuzzled, confuzzles. DAVilla
    Are you referring to the lack of attested countability of the nounal sense? (I assume you aren’t talking about the lack of an entry for the present participle, given all the hits a Google Groups Search yields.) That does seem a little absurd (and wrong considering that most hits obviously use it as a count noun, as evidenced by a plural use and the two uses with the indefinite article).  Raifʻhār﴿
    I'm referring to the fact that each of the pages above has three citations and wouldn't exist otherwise. Yes, we have so absurdly judged the RFV, obeying the literal rule. DAVilla
    It’s a good thing that they’re all separately cited. All I’m saying is that that should not be a necessary prerequisite for their inclusion. But that should not stop such citations being added nevertheless.  Raifʻhār﴿
    Well don't argue that with me. I wasn't all too pleased about having to cite each one. But neither am I optimistic about a plan that weakens the criteria in any way. DAVilla 20:38, 20 November 2007 (UTC)[reply]
  6. Well-known works are those read by many people, therefore, it is highly likely that a reader would come across, say, one of Shakespeare’s nonces, and want to know what it means. Nota bene that that is the general rule governing our criteria — that a word ought to be included if it is likely that someone would come across it and want to know what it means. Therefore, even the nonces in well-known works ought to be defined herein, precisely because the works in which they appear are well-known. (How would you judge sillion, presently listed on RFV, which is clearly a nonce?)
    I propose that a well-known work is one that has been reviewed in academic journals. In that case there isn't really a need to make an exception. DAVilla
    I don’t know whether The Windhover has been reviewed in an academic journal, but what about the 100+ books which discuss that specific word?  Raifʻhār﴿
    I'm sorry, which word would that be? If it's got 100+ discussions, someone is bound to have actually used it somewhere. DAVilla
    I’m talking about the noun sillion. It was coined by Gerard Manley Hopkins in his 1877 poem The Windhover. The line in which sillion appears has been quoted in over a hundred books accessible via Google Book Search; however, only three other works I’ve found actually use the term, and they all fail the independence criterion. I looked through all the hits, and I don’t think that I missed anything.  Raifʻhār﴿
  7. The “appearance in a peer-reviewed academic journal” criterion was intended for languages which, though documented and analyzed in (chiefly linguistics) journals, have very small corpora. Perhaps that criterion should be qualified that it mainly applies to such cases, and not to, say, popular Indo-European languages with vast literatures. Though this is in no way backed up by the general rule which I invoked in my previous point, it is backed up instead by our mandate to include “all words in all languages”.
    I wish that the criteria were the same across all language Wiktionaries so that we could defer all decisions on foreign terms. For more minor languages, we are already very lenient. I am not too worried about an admin trying to destroy the project by challenging those terms, but if there is some way of codifying that then all the better. DAVilla 16:39, 18 November 2007 (UTC)[reply]
    I don’t really understand what you’re saying here: Are you willing to treat languages differently or not?  (u):Raifʻhār (t):Doremítzwr﴿ 19:39, 18 November 2007 (UTC)[reply]
    I am. Minor languages are the ones that don't have Wiktionaries. There's no point writing a dictionary in a language that only twelve people speak. How do other Wiktionaries handle requests for verification? I hope they're not deferring to a language authoriy. DAVilla 21:45, 18 November 2007 (UTC)[reply]
    No idea. (Although I believe Y Wiciadur (the Welsh Wiktionary) basically doesn’t have one, which isn’t surprising, considering what a sorry state it’s in.)  (u):Raifʻhār (t):Doremítzwr﴿ 14:18, 19 November 2007 (UTC)[reply]
    What about the Wiktionaries that are in good shape? Anyone? DAVilla 20:38, 20 November 2007 (UTC)[reply]
    The Hebrew Wiktionary — which isn't exactly "in good shape", but most of the entries it does have are of really high quality — is as old-fashioned and grammatically tight-assed as any of my Hebrew dictionaries. I don't think it includes slang (though shockingly, it does include ungrammatical loanwords like פיזיקה). —RuakhTALK 02:01, 21 November 2007 (UTC)[reply]
Sorry to be mostly negative in my response. ;-)  (u):Raifʻhār (t):Doremítzwr﴿ 13:42, 18 November 2007 (UTC)[reply]
I dislike the removal of "well-known work". The proposed change makes it far more difficult to get attestation for Old English words and words in other ancient languages known from a small corpus of "well-known" works (e.g. Beowulf) where a word may appear only once. It has a similar impact on including words from Shakespeare or Chaucer. I would not support a change to CFI that eliminated the "well-known work" criterion. --EncycloPetey 15:44, 18 November 2007 (UTC)[reply]
Could you give an example of a such a word? One that you think might be very difficult to cite otherwise. DAVilla 16:39, 18 November 2007 (UTC)[reply]
The talk about Shakespeare nonces (which I think we should include) got me thinking: what about languages with very little in the way of native durable media? I've always interpreted "appearance in a peer-reviewed academic journal" to mean use, such as the cite in mesolect, but what about adding a citation to lah from a journal article that is explaining the use of the word in Singlish rather than using it itself? "lah" is probably the most common Singlish word, but it's hard to cite since (I assume) people tend to write in Standard English instead. Cynewulf 17:18, 18 November 2007 (UTC)[reply]
Actually lah [4] isn't at all difficult to cite. DAVilla 18:51, 18 November 2007 (UTC)[reply]
Not the music word wat, the w:Singlish particle lah! This sort of thing. Cynewulf 19:39, 18 November 2007 (UTC)[reply]
Oops. Okay, one easy try is sorry lah on Usenet. With it being so common, I'm sure there are many other ways to find cites. DAVilla 20:06, 18 November 2007 (UTC)[reply]
Actually, you can get sorry lah in print, and a few other combinations (okay lah, sure lah) if you play around a little. DAVilla 20:11, 18 November 2007 (UTC)[reply]
Two comments:
  • I would remove the time span: it is not consistent with the general principle (users are very likely to search the meaning of newest words). Requiring a number of independent citations should be sufficient.
  • It is very common to look for other information than a definition: etymology, plural, pronunciation, translation, etc. I would change the basic principle accordingly. This would have huge (but good) consequences. Lmaltier 21:42, 20 November 2007 (UTC)[reply]
I'm not sure what you mean by the second one. Do you mean that each part of an entry should be verifiable? Or how would the principle of attestation change? DAVilla 23:16, 20 November 2007 (UTC)[reply]
Sorry, I wansn't clear. Actually, this was not related to the proposed changes. The current principle is that a word should be present if users are likely to find the word and look for its meaning in a dictionary. What I state is that, even when the meaning is already explained in the text or clear from the context, a user may still want to know its etymology, its pronunciation, etc. Lmaltier 06:11, 21 November 2007 (UTC)[reply]
I think that has more to do with idiomaticity than attestation. For instance, an unusual pronunciation pattern could legitimize a phrase that is otherwise sum-of-parts. Etymologies (indirectly and tenatively) permit certain company names and should probably permit placenames. Translations find an exception in phrasebook entries.
When CFI is written correctly, terms that can't be attested are ones that won't be run across, in which case no one would need to know anything about them. DAVilla 04:01, 22 November 2007 (UTC)[reply]
My reason was that, sometimes, some citations are rejected just because the meaning is clear from the text. This should not happen, in my opinion. Lmaltier 21:05, 22 November 2007 (UTC)[reply]
Maybe if they were defined in the text without usage, but not if they were just used in a clear context. That shouldn't be the case for common words. Brand names yes, acronyms likewise probably, and maybe proper nouns, but not lah or other fringe words or even the majority of entries. Please give me an example of where this happened. Who pulled one over on you? DAVilla 15:54, 23 November 2007 (UTC)[reply]
Yes, you are right, it was probably about brand names. This does not change my reasoning on the principle: we need dictionaries sometimes for the meanings, sometimes for other information. Lmaltier 20:30, 25 November 2007 (UTC)[reply]

Sorry, but I disagree with most of your proposed changes, for reasons people have already mentioned. I do think our current CFI are a bit too lax with scholarly-journal uses/mentions, but your proposal seems to go too far in removing that laxity; if a lexicon of an obscure language gets published in a peer-reviewed academic journal, and the compiler of the lexicon gives us the O.K. to do so, I think we should bleed it dry. —RuakhTALK 02:01, 21 November 2007 (UTC)[reply]

Revision

Okay, this is getting very technical. I'm not proposing The following terse language but I believe it conveys (perhaps too) precisely what I am trying to embody and hopefully takes into account the above comments.

“Attested” means a term is verified through independent instances in durably archived media. A term with clearly widespread use or with usage in a well-known work is automatically deemed verifiable. Otherwise three citations are required for verification:
  1. A quotation from an edited, printed work that conveys meaning.
  2. A reference in a refereed academic journal, or a quotation as above.
  3. A quotation from any work that conveys meaning, or a reference as above.
If these span fewer than three years, a corresponding number of additional citations are required.
When lacking instances in print, a citation of the last type may be substituted if there is an additional uncounted quotation that conveys the same or very closely related meaning.
For terms of a language where the Wiktionary is unestablished, the time-span and the first citation are waived.

Thus, at most, a one-year-old neologism in English not appearing in print would require seven citations. At minimum, a journal mention and a single non-print usage would permit a term in an obscure language. If anyone would like to tear this down and build it up again, please be my guest. DAVilla 05:13, 21 November 2007 (UTC)[reply]

(Edited to be less mathematical and more lingualistical.)

I take "established" to mean that the foreign language Wiktionary has its own criteria for inclusion and maybe even a running verification process. DAVilla 03:07, 22 November 2007 (UTC)[reply]

'X' or "X"?

Quick question. I've made a number of categories styled as Category:English plurals ending in "-ies"; and a series of appendices styled as Appendix:Variations of 'a'. It occurs to me that I should render all such schemes to consistently use either a single quote or a double quote. Any preference? Cheers! bd2412 T 05:02, 21 November 2007 (UTC)[reply]

Are those the only examples, suffixes and single letters? DAVilla 05:26, 21 November 2007 (UTC)[reply]
Perhaps you ought to use < and > instead, thereby avoiding that issue, and the one of whether the quotation marks ought to be directed (as and or and ) or not.  (u):Raifʻhār (t):Doremítzwr﴿ 12:37, 21 November 2007 (UTC)[reply]
DAVilla, suffixes and letter appendices (some are double letters) are all I'm concerned with. Doremítzwr, that's a good suggestion, but I think it might confuse some people. Also, half the things mentioned above already contain either single or double quotes, so switching all to one or the other will be half as much work as switching all to a different scheme. However, if the community prefers that, I'll go with it. Cheers! bd2412 T 14:02, 21 November 2007 (UTC)[reply]
I prefer doubled-up single quotes, fore and aft, for Latin/Roman script only. :-P
Seriously, if you keep it as is I don't see much harm. DAVilla 04:11, 22 November 2007 (UTC)[reply]
I really do think we should lay down a consistent rule. Double-quotes are fine with me (straight, not directed). Cheers! bd2412 T 07:08, 22 November 2007 (UTC)[reply]
I think it's perfectly natural for characters to be delimited by single-quotes and strings (including, granted, single-character strings) to be delimited by double-quotes, but perhaps I've been brainwashed by C and C++. ;-)   Seriously, though, I think this might be one of those things that no one will care about as much as the person who invested all the effort in them; heck, I'd even be happy with Category:English plurals in -ies and Appendix:Characters related to English's letter A, but. So, consistent-ify at your leisure. :-)   —RuakhTALK 09:29, 22 November 2007 (UTC)[reply]

RE: Donation quotes at top of page...

In reply to one quote stating "Knowledge at our fingertips, what more could we ask for?"

I say...

Knowledge in our brains.

There was a discussion somewhere about the deletion of a template that created links only if the page existed. It was used in inflection templates to make them look nicer. However that discussion didn't (I think) establish a consensus about whether inflection tables should contain links to other words. There are two major opinions, either the links are useful or they are ugly. So I would like to ask if there is a preference for having links or not having links? Conrad.Irwin 13:31, 22 November 2007 (UTC)[reply]

The discussion was about "hiding" red links if the page didn't exist. Is not the way we do things; we want the red links and we want the pages. If it offends someone that a table is a mixture of black and blue and red, go create the missing pages. (Discussion was at RFD/O for template:notred.) We definitely want the links. Preventing them from being edit links is contrary to everything we do. Robert Ullmann 13:45, 22 November 2007 (UTC)[reply]
Creating articles just to get rid of the ugly red links isn't much of an incentive to make good articles. :p Especially since we still can't decide how we want to format form-of entries. Right now, all of the Romanian adjective, noun and verb templates have that 'only link if it exists' thing going on. — [ ric | opiaterein ] — 15:10, 22 November 2007 (UTC)[reply]
Psychologically speaking, color differences distract from the more important morphological differences between table entries. In other words, red links in large inflectional tables distract from the very information the table is designed to convey. Yes, we want red links but we don't want to put them in places where it damages the usefulness of the information to readers. --EncycloPetey 16:42, 22 November 2007 (UTC)[reply]
It's also easy to use the ctrl+c/v keyboard commands to copy and paste words to start entries for, so I really don't think having red links is that important at all. In inflection lines it's fine, but in tables that can have many forms, not so much. — [ ric | opiaterein ] — 03:12, 23 November 2007 (UTC)[reply]
Personally, I don't mind redlinks, and think that if you're allergic to the color red, then all the more reason to create articles. However, I'm sympathetic to the difficulty of trying to read through a fruit salad inflection table. So how about a compromise? For instance, this is probably technically infeasible without serious CSS hacking, but what about making a table that has links that are black until you mouse over them? Or something like t+/t-, where the table only becomes linked after a bot adds all the inflected forms, so it's always monocolor? Just some ideas. We may just have to bite the bullet and decide that one choice outweighs the other. Cynewulf 05:56, 23 November 2007 (UTC)[reply]
Agreed. —RuakhTALK 23:12, 23 November 2007 (UTC)[reply]
I have put links in inflection tables as I've created them. But rethinking this - Why do we need links in inflection tables? For example "amabimus" (actually "amābimus") in the table at amo#Latin: the table tells me that it is 1p pl future active, if the word had a link the def would tell me exactly the same thing, or have I missed something? —SaltmarshTalk 07:32, 23 November 2007 (UTC)[reply]
Because while most of those are barely minimal stubs now, they should have definitions, examples, and translations of the examples, and then are very useful to someone trying to figure out how to properly use the form of the word. Robert Ullmann 10:01, 23 November 2007 (UTC)[reply]
Moreover, someone reading a work in Latin who comes across the word "amābimus" and does not know what it means is likely to look up "amābimus", and not "amo". bd2412 T 00:27, 25 November 2007 (UTC)[reply]
They won't come across "amābimus" as Latin is not written with macrons, they are an affectation of dictionaries and "readers" for people learning Latin. They might very well look up amabimus (at least we have the entry at the correct form, even if we can't for some reason present Latin as it is written. ;-) Robert Ullmann 14:08, 26 November 2007 (UTC)[reply]
I don't think that was quite the point of what bd was saying. Either way, the macrons mark vowel length in Latin, don't they? Useful, I'd say. Especially for differentiation. — [ ric | opiaterein ] — 12:58, 30 November 2007 (UTC)[reply]

About Lojban

Wiktionary:About Lojban and Wiktionary talk:About Lojban. About getting the Lojban entries to use legitimate headers in English, and having actual definitions. Robert Ullmann 13:40, 22 November 2007 (UTC)[reply]

Fitting usual POS headers to Lojban may not be possible. Lojban just isn't structured the way natural languages are. It has three Parts of speech, one of which is a noun-pronoun-substantive class. --EncycloPetey 16:37, 22 November 2007 (UTC)[reply]
Oh, we may very much need new headers. But they must be English. Robert Ullmann 09:52, 23 November 2007 (UTC)[reply]
You are presupposing that English words exist that equate to these ideas, but I don't think there are. You'd have to use a whole lengthy definition in place of one of Lojban's POS headers, and that's not a likely solution. We might have to go with the Lojban names for the POS, but template link them the way we do for Initialism, Acronym, etc., so that users can click to go to an explanation. --EncycloPetey 16:40, 23 November 2007 (UTC)[reply]
I'm not sure anyone actually understands it; what PierreAbatt wrote on the talk page about "brivla" seems to have little or no relationship to what is on the about page (that he wrote a day or two earlier)? Of course, since the about page fails to defines any of the terms in English, but only completely circularly in Lojban, it is hard to be sure ... ;-) Robert Ullmann 10:14, 23 November 2007 (UTC)[reply]
From reading (admittedly briefly) about Lojban it seems that the language was constructed to try and get away from typical language and thought patterns. This means that the words are not divided by the mundane Noun Verb etc. but instead by the level of concept in the word. Compare the list of cmavo with the list of gismu to see how much these appear to overlap if you try and apply normal grammatical terms. That is also why the grammatical terms are so hard to define, the types of words themselves are concepts that English doesn't have - though I do agree we can probably do better than we are at the moment. Conrad.Irwin 11:40, 23 November 2007 (UTC)[reply]

Declension vs. Inflection in headers

(This is another one of my topics that I can't imagine hasn't been discussed before, but) So far it looks like we're using something different for certain languages, which seems strange to me because, until today, I'd never seen a difference in definition between the two anywhere I'd looked. So I go to Wikipedia to try again, and it says:

Inflection — the modification or marking of a word to reflect grammatical information, such as gender, tense, number or person
Declension — the inflection of nouns, pronouns and adjectives to indicate such features as number (typically singular vs. plural), case (subject, object, and so on), or gender.

So now it seems like inflection is the more general term, while Declension refers to nouns, pronouns and adjectives, while conjugation refers to verbs. That said, should we differentiate with our headers? I've been using ====Declension==== and ====Conjugation====, but it seems for some languages the (seemingly more general) term ====Inflection==== is used.
Thoughts? — [ ric | opiaterein ] — 14:20, 23 November 2007 (UTC)[reply]

The general header we use is Inflection, but users are welcome to apply 'Conjugation to verb entries and Declension to (usually) noun entries. The choice is often language specific, but probably ought to be consistent within a language. Thus far, I haven't seen anyone propose that we standardize these headers across the entire project, though we've had discussions about their use. In any case, you are correct in summarizing the meanings of those terms and their application. --EncycloPetey 16:35, 23 November 2007 (UTC)[reply]
So they're not plural? Apart from that, of two possibilities here:
  1. Use more specific term, and reserve "Inflection" for cases where POS doesn't align with either (no doubt we have such cases somewhere).
  2. Always use "Inflection" as the header.
I slightly prefer the former. DAVilla 16:39, 23 November 2007 (UTC)[reply]
Me, too. If inflection can refer to either, it seems more logical to use the more specific term to avoid confusion. The average person isn't going to know that inflectio can mean either declension or conjugation, and to have all three of them floating around to me seems messy. Maybe that's just my OCD, but y'know. — [ ric | opiaterein ] — 17:29, 23 November 2007 (UTC)[reply]
I would prefer (and do myself) use only Inflection for Latin entries, when I think of it. I always found "declension" to be a confusing idea when I was first learning a language that had declensional patterns. --EncycloPetey 18:58, 23 November 2007 (UTC)[reply]
It's basically the noun/adjective/pronoun equivalent of conjugation, which I think is a well enough known word. It just makes more sense to me to use the more specific terms — [ ric | opiaterein ] — 20:34, 23 November 2007 (UTC)[reply]
For added perspective, Category:Korean adjectives (e.g. 좁다 (jopda, narrow)) use ====Conjugation==== because the inflections indicate things like tense and mood. Also, note that some editors here call the headword line the "inflection line" because it shows a few key inflections of the headword. Rod (A. Smith) 01:06, 24 November 2007 (UTC)[reply]
Also, a lot of people (including EP, seemingly) associate "declension" with "morphological case", while a lot do not (including me), so even though "declension" is a more widely recognized term than "inflection", we won't necessarily escape confusion by using it. (Incidentally, my Hebrew-English dictionary gives what it calls the "declension" of various prepositions: the list of their pronoun-including forms. I doubt Latin-speaking editors would even recognize that use of the term, much less agree with it.) I think "Conjugation", on the other hand, is widely understood with a fairly constant meaning (even if in Korean and Japanese it doesn't only apply to words we classify as verbs). —RuakhTALK 07:42, 24 November 2007 (UTC)[reply]

I think that the new version of the main page has now reached a stage where any further alterations will be minor tidy-ups, and so I would like to propose that we open the vote. Conrad.Irwin 01:08, 24 November 2007 (UTC)[reply]

Can we get the right edge of the Behind the Scenes box in line with the rest of them first? :D — [ ric | opiaterein ] — 01:44, 24 November 2007 (UTC)[reply]
Now fixed, any further bugs anyone? Conrad.Irwin 12:48, 24 November 2007 (UTC)[reply]
describe ... by sounds odd ("it aims to describe all words of all languages, by definitions and descriptions in English"), but I guess you're trying not to imply "all words that have English descriptions". Also "an encyclopedia Wikipedia" -- "the" instead of "an" maybe. Looks good apart from these minor things. Cynewulf 16:17, 24 November 2007 (UTC)[reply]
Enough work on the main page. It's all the secondary pages that need attention, e.g. Wiktionary:Topics. DAVilla 18:36, 24 November 2007 (UTC)[reply]
There are a lot of secondary pages that need looking at - I may move on and look at them - but I felt the main page would be easier to impeovw (laziness basically). Presumably Wiktionary:Topics should become more like w:Category:Categories ? Conrad.Irwin 18:47, 24 November 2007 (UTC)[reply]
After a while of no discussion, I decided to start the WT:VOTE. I hope this isn't presumptive. Conrad.Irwin 11:21, 30 November 2007 (UTC)[reply]
Being presumptuous (obnoxious) I've paused it again, suggesting that we vote on using the design Dec 10-31 (Wiktionary Day is Dec 12), and then see where we are? Even if people haven't taken the time to look at it yet in detail, this should be generally supported? And then we will (?) have much more feedback.Robert Ullmann 12:14, 30 November 2007 (UTC)[reply]
Unfortunately it is too rushed now. It cannot be decided by Wiktionary Day, but it can still go through on a later timeframe, as Robert suggests. DAVilla 02:49, 1 December 2007 (UTC)[reply]

Brilliant idea I had

In my infinite dorkiness, it came to me that it would be cool to have audio recordings of example sentences instead of just written ones. Thoughts? — [ ric | opiaterein ] — 15:55, 24 November 2007 (UTC)[reply]

For phrasebook type stuff, sure. "Thank you very much" and "I'm sorry" in five bazillion languages. I'm not sure about everything else. Cynewulf 16:13, 24 November 2007 (UTC)[reply]
I don't think most people would take the time to go through even every word, let alone every example sentences. But for major languages, I think it'd be pretty neato. Not saying that it should be mandatory or anything, but as an option I think it would work okay. — [ ric | opiaterein ] — 16:31, 24 November 2007 (UTC)[reply]
Yeah, I wouldn't prohibit it right now just because it's new, let's see where it goes. Cynewulf 16:35, 24 November 2007 (UTC)[reply]

Bot flag request

The vote concering giving a bot flag to my bot has been finished, but the flag has not been given yet. I'm just writing to remind the bureaucrats about the vote; if you set the flag, please leave me a message on my talk page. Thanks. --Derbeth talk 01:45, 25 November 2007 (UTC)[reply]

Ordering of definition

Question: what order should definition be in? Should it be in rough order of frequency? Of relationship to each other? In chronological order? Seems like a pretty important thing to know, for example, when deciding whether, at drop "A place where items may be left anonymously" should go before or after "A small mass of liquid". Circeus 05:08, 25 November 2007 (UTC)[reply]

I don't know the right answer, but I believe that there is some correlation among concreteness, intelligibility, earliness, and frequency of use (not necessarily of look up), and breadth of geographic scope and context. That might mean that selecting which senses in an entry are first should be easy, even without good dates of first attested use for each sense. Wiktionary starts by grouping meanings by etymology. I would guess that date of earliest attested use of the earliest sense within any PoS should determine which etymology is first. Grouping senses by their relatedness of meaning would seem like the next step. Within each relatedness grouping, you would again look for the earliest, most concrete, intelligible, common, and broad senses (unless there were another relatedness sub-grouping). I think you would find that most dictionaries do it like that, giving primacy to date of first use if that information is available. Many Wiktionary entries do need help in this area. DCDuring 05:38, 25 November 2007 (UTC)[reply]
I'm of an overall differing opinion. To me definitions should go in rough order of decreasing commonness (so that anything with a topical label drops below stuff without), and then, if practical, rough relatedness could be used. They should not (I'd even say never) be organized by datation in a synchronic dictionary (which I believe we are), because older meanings are actually not that likely to be as common today as they used to. In my mind, the major reason for having relatedness groups is to allow the definitions to rely on each others, which in the context of a wiki, is a bad idea, as stuff changes constantly and reordering would destroy that organization. Besides wikitext is currently simply not advanced enough to carry the complex hierarchical definition groups found in well-developed dictionaries. Circeus 05:51, 25 November 2007 (UTC)[reply]
This can of worms gets opened every now and then. There is neither clear consensus, nor clear guidelines. Some (myself included) believe most common usage to least common usage would be most logical, others believe that age comes before beauty, while still others belive that messing around with the order just makes life difficult all round and so first come, first served. There are some other opinions that I don't even understand, but the discussions can be found on various pages and dates. - Algrif 10:06, 25 November 2007 (UTC)[reply]
Yeah, we talk about this a lot. I find anything other than ordering by date a massive waste of time. I think there has been some talk of using templates, with a view, one distant day, to allowing users to decide between different arrangements. That seems a long way in the future though, and for now we are just sticking with common sense and not messing with what's already there unless it's really nonsensical. Widsith 12:57, 25 November 2007 (UTC)[reply]
There is a third way, namely ordering by increasing deviation from etymology. Regarding the ordering of the nounal senses of drop, I’d say the “small mass of liquid” sense should precede the “place where items may be left anonymously” sense, as the former is more common, earlier-established, and can be used in a broader context (that is, any context) than the latter (the nearest sense whereto my COED tags with “informal”).  (u):Raifʻhār (t):Doremítzwr﴿ 14:32, 25 November 2007 (UTC)[reply]
Part of the problem with with implementing some of the approaches is that they require either a lot of data (frequency, earliest attestation date) or a great deal of knowledge and experience (etymological deviation). With the data now available, most contributors couldn't implement the data-intensive approaches. I know that I don't have the knowledge or experience to reliably implement the etymological deviation approach. That is one reason that I like the idea of focusing on the most basic physical sense, which would tend to be the oldest use, a common use, close to the etymological origins, as well as providing a sound start for grasping more figurative, metaphorical, and abstract applications or senses of a word. DCDuring 15:01, 25 November 2007 (UTC)[reply]
Yeah, none of them seem like easy solutions. Your system has problems too - what is the "most basic" sense of a word like set? Or mark? I wish we were allowed to subdivide words like other dictionaries do, which I think would help us please more people in a feasible "third-way" system. I have a particular soft spot for the way I had organised mark in this revision, which survived for a long time till another editor got rid of it.. Widsith 14:45, 26 November 2007 (UTC)[reply]
The only senses that I often find easy to place are the most physical ones that involve no tools. In the case of set (verb) the sense listed first involves a general placing of an object. The archaic sense of "sit" would get a high placement. Setting a task would immediately precede compiling a crossword puzzle. I certainly like the idea of subdividing senses to support grouping of related senses with subsequences that are virtual etymologies. I'll take a look at mark. DCDuring 15:03, 26 November 2007 (UTC)[reply]
Yea, like that. Your mark is a lot more intelligible to me. And if folks have disagreement about specifics of it, we could have factual determinations to base changes on. Entries would become more complex, but would convey more meaning. There are probably thousands, certainly not hundreds of thousands, of potential entries that would benefit from such structure. Why is it against the rules? DCDuring 15:13, 26 November 2007 (UTC)[reply]
(moving left) In my experience, the other Wiktionaries, when they want to create an English entry, pull only the first definition from the en:wikt as the sole definition they place on their page. In other words, the other wiktionaries are assuming that the first definition is the most common and most useful. This is also standard practice for most dictionaries in any language. That is the strongest reason I prefer to (roughly) put the common definitions first, then jargon, then archaic and obsolete uses. --EncycloPetey 00:11, 3 December 2007 (UTC)[reply]
Couldn't we influence them to do otherwise? We could provide a marker of some kind for the single best sense which they could read. DCDuring 11:20, 3 December 2007 (UTC)[reply]
Well, the Latin Wiktionary does that already (or at least they used to...I can't find an example), but I'm not sure how receptive this community would be to the idea. --EncycloPetey 15:15, 3 December 2007 (UTC)[reply]

The Role of Wiktionary in the Formation of New Language

Does common usage warrant inclusion in a dictionary or does a modern online dictionary guide the formation of new language? In a world that is quickly running out of words and expressions that have been hijacked by corporate marketers for disposible goods and services, how does society regain control of the evolution of the english language? A case in point is the ESB or Enterprise Service Bus. Although commonly used it has been acknowledged that no agreed definition exists. Would a Wiktionary entry with community debate solve this dilemma? Who would be considered qualified to establish and evolve the definition? Should Wiktionary acknowledge emerging language using a specific convention and policy? Timhibberd 11:11, 25 November 2007 (UTC)[reply]

I think dictionaries from good publishers should be given even more to say here than they do now. This may go too far into original research. I'm not very well aware of all the earlier discussions though. :) Best regards Rhanyeia 11:45, 25 November 2007 (UTC)[reply]
Is there some guideline when creating a definition is original research? Rhanyeia 12:07, 25 November 2007 (UTC)[reply]
Copyright violation is more of a problem at Wiktionary. If Wikipedia uses cited definitions from various dictionaries in articles that really need to clarify a term, that is arugably fair use. For Wiktionary to be taking thousands of definitions from dictionaries that are arguably competitors would probably not be fair use under U.S. copyright law. I am not an attorney and not administrator, so please take the above with a grain of salt or two. DCDuring 12:28, 25 November 2007 (UTC)[reply]
It's not a copyright violation if the definition is reworded (but it can't be reworded so much that the meaning changes), and sometimes it's possible to combine information from two sources to create one brief definition. I think it's more a copyright violation if sources are in fact used and then their names are not mentioned. The problem with the current verification system is that only the usage is verified, and I think it's not enough. It often doesn't tell the exact definition. Best regards Rhanyeia 14:44, 25 November 2007 (UTC)[reply]
  • Rhanyeia is on the right track with the question regarding creation of definitions. The question I pose is one of the scope of the role of Wiktionary. The ESB example was not raised as a problem to solve but to illustrate what happens in a modern society where expressions that should become a useful element of our language become an element of confusion. Should Wiktionary sit back and wait for the marketplace to try and fail to agree on a definition or step in and provide the rallying point to complete the refinement process once an expression gains some significance in some reasonable form?Timhibberd 19:48, 25 November 2007 (UTC)[reply]
I think that if no credible source has defined some word, or doesn't provide clear information to have it defined, I think those words shouldn't be on Wiktionary. That would be original research, and I think Wiktionary should wait that someone else defines the word. I also think that if there's any divisiveness about how to define some word then the definition would need a source. And there are many words where the definition can't be taken from the usage and would need its own verification. I hope there would be created a guideline about these things in the course of time. Best regards Rhanyeia 20:09, 25 November 2007 (UTC)[reply]
Personally, I'm foresquare against passivity, confusion, corporate hijacking, and failure and equally in favor of usefulness and modernity. But I am having trouble grasping what is being suggested. You had advanced Enterprise Service Bus (ESB) as an example. What is illustrated thereby? Is this a question about some kind of standard-setting effort and Wiktionary's role in it? DCDuring 20:17, 25 November 2007 (UTC)[reply]
You are close. The question I pose is where language formation can best be guided in a modern online world. Wiktionary could be a powerful rallying point to assist society in consistent language formation and help to avoid the confusion that traditional societal mechanisms can often produce (ESB was an example). Wiktionary requires printed source in traditional media for validation! Why! Who says that in a modern world Wiktionary is not best positioned to be the credible source quoted by the print media and not the reverse! Is Wiktionary to be a passive observer of the chaos which is traditional societal language formation or an active participant?. Following on from Rhanyeia's observation above... what might be the definition of credible source for a submission of new / emerging language to Wiktionary? Government, corporations, universities? As a democratic community dictionary I would suggest that all indiiduals could be credible sources subject to a credible proposal and a democratic vote by the Wiktionary community. Should Wiktionary have a separate category for emerging vs. commonly accepted language and a standard voting policy on new language submissions (with appropriate pre-spam filters, of course, to weed out the purposely counterproductive submissions and an archival process for proposed language that fails to take hold in society at large within, say, 12 months of submission)! Timhibberd 20:43, 25 November 2007 (UTC)[reply]
I support passive, detailed observation: we inform our readers about who uses a term, in what situation(s), with what sense(s), and to what effect, and let our readers decide what to do with that information. —RuakhTALK 21:25, 25 November 2007 (UTC)[reply]
Timhibberd, the problem with your idea comes from pretty much the same than a lot of Wikipedia's problems. Anyone can edit, which is good, but at the same time it means that editors may pull to their own directions. If definitions can be created without sources that'll lead to a situation where people could try to create definitions which they want to see, and I don't mean with simple words like a dog but for example scientific terms and other difficult concepts. I believe the reason why Wiktionary is not yet having this difficulty so much is that it doesn't have nearly as many readers than Wikipedia. If it gets more popularity the amount of editors gets higher. That's going to require clear basic policies and sources are essential if there's going to be quality in the long run. Best regards Rhanyeia 22:19, 25 November 2007 (UTC)[reply]
I have introduced terms and senses in psychology that might not have been introduced otherwise. I stuck close to the meanings in the Amer. Psych. Assn.'s APA Dict. of Psychology (2006) and either checked in advance to make sure the usage was there or responded to RfVs. As much as I might have wished to, I did not remove any sense that could be viewed as conflicting.
If a new term emerges out of a public debate, there should be plenty of citations supporting its inclusion. The fact that a given term has some kind of "official" status can be noted in References or in Usage Notes. This would seem to provide room for competing terminology systems. I think Wiktionary can play its best role by helping more folks educate themselves to participate in whatever public debate there might be and in the competition of different systems. DCDuring 22:39, 25 November 2007 (UTC)[reply]
  • I would like to thank everyone for their constructive comments so far. This is getting interesting :) So far everyone has provided very reasonable positions common to traditional dictionaries. Is the concensus that passive observation will be the role of Wiktionary? Should we not consider embracing the community concensus aspect that makes this unique from a simple online dictionary? I would be disappointed if Wiktionary became yet another passive online dictionary and the Wiktionary community simply warm bodies that copy all the entries from existing dictionaries into its database. Does the answer lie somewhere between the extremes of total passivity and benign dictatorship! Would taking the small step to minimal participation in the evolution of language start Wiktionary down a slippery slope! Would a voting mechanism for adding entries to the database become unwieldy! The key question in a world that is rapidly moving away from traditional print to massively online, is... who are the credible online sources for language going to be in determining if an entry belongs in Wiktionary or not if there are no "print" references? Timhibberd 22:46, 25 November 2007 (UTC)[reply]
Individual initiative is corrected by review processes in accordance with rules. Durably archived media include usenet and google news. Other documents can be submitted to WikiSource, I believe, which can provide another means for an organization or group to attempt to influence the vocabulary included in Wiktionary. DCDuring 22:59, 25 November 2007 (UTC)[reply]
Thanks DC. I agree that, where language has evolved consistently and is multiply sourced, there is no issue in acceptance to Wiktionary. With the example of the Enterprise Service Bus, however, there are numeous references but no agreed definition. This leaves it to the Wiktionary submitter to hypothesise a common definition from the contradictory sources in retrospect and raises the question of the qualifications and / or bias (or lack of) of the Wiktionary submitter. Should a submitter require credentials in the field of expertise that the submitted terminology relates to! Should the inventor of the terminology have the tie-breaking vote or the final say! Should invented terminology be submitted to Wiktionary when created with a proposal as to why it should be included and be subject to democratic vote for inclusion! Perhaps ESB might have evolved to be a more useful and consistent definition had it found a home, such as wiktionary, in which to be socially incubated.
What triggered my interest in this topic was the premature submission from our organisation (Smart Internet Cooperative Research Centre) to Wiktionary of new terminology for an emerging network concept that has evolved from years of research and interaction with a number of vendors, government, and universities called "Collaborative Service Networks". The rejection of the submission by a Wiktionary editor was on a sound basis as the Wiktionary submission preceded the publishing rollout that is only now starting to occur. Please note that I am not here "soap-boxing" to have that ruling overturned and our organissation will re-submit after the terminology starts to appear in respected journals as per current Wiktionary rules. It simply triggered the question in my mind as to the role Wiktionary might play in providing an environment for incubating new terminology that is based on a sound premise. I consider Enterprise Service Bus as an example of "what might have been" had the original inventor of the terminology been able to submit it to Wiktionary for sensible debate and evolution in a democratic forum. I thought that the idea that language might start in an active dictionary forum rather than have dictionaries simply passively record the past worthy of discussion. I hope I was not out of line in submitting it to the Beer Parlour. It seemed like an "over a beer" kind of conversation :) Timhibberd 00:24, 26 November 2007 (UTC)[reply]

Protologistic definitions, as (I infer) you refer to, can be added to the list of protologisms, where such usage can be prescribed (however unauthoritatively). Words with a number of different meanings are given entries with multiple defined senses. Although words’ definitions are generally inferred from their use in a number of non-indentical contexts, other dictionaries can be consulted to provide the detailed meaning intended in the use of a word which is not immediately evident from context (though the original coiner’s opus wherein a term is defined would be preferable). Democratic votes are irrelevant to Wiktionary. Have I addressed all the issues you have raised?  (u):Raifʻhār (t):Doremítzwr﴿ 01:27, 26 November 2007 (UTC)[reply]

It is an interesting subject, but I would expect Wiktionary to seek to have policies and procedures designed to protect its integrity against a wide set of challenges. It may be difficult for Wiktionary to accommodate many desirable objectives while keeping true to what it is and is trying to be.
The LOP may be a start, but I'm sure that you would be seeking more visibility. As I see it, Wiktionary could play an early role in a process that was conducted in a completely transparent way. If early discussion of ESB and CSN had been conducted on a usenet-type forum (I don't know whether usenet rules would allow the membership restrictions such a process might need.), there might have been the potential for generating citations at an early stage in the process. Some editors have questioned usenet citations, perhaps partially because of the potential for manipulation. Well-written usenet posts might be welcome by some editors here, though I haven't seen too many well-written posts there in my searches. DCDuring 01:43, 26 November 2007 (UTC)[reply]
Protologistic definitions (thanks for that recommendation) would definitely be the place to put Collaborative Service Networks (CSN) at present but I would suggest Enterprise Service Bus (ESB) has graduated. My previous question regarding who submits the definition for ESB, what should the submitter's qualifications be and how should they best resolve the confusion of ESB definition variants still remains. The big question remaining... is the management of protologistic definitions a fundamental unique mission for Wiktionary or simply a catch-all for those pesky words no one knows what to do with (put them in the closest when company comes over until we can figure out what to do with them). Wiktionary seems like a wonderful opportunity to explore a new way for society to evolve language that was just not available in the pre-internet days. If nothing else, perhaps protologistic definitions that are being widely and actively socialised on a proactive basis need to have a discussion page available to debate their evolution as adoption progresses! It may be too late for ESB... but it's not too late for CSN :) Note: Thanks to all for contributing. Perhaps this conversation thread is best moved to the Protologistic area of the Beer Parlour. This has been interesting. Timhibberd 03:36, 26 November 2007 (UTC)[reply]

Identifying abbreviations.

When a CFI-meeting term — say hypothetically, (deprecated template usage) functional organization and orientation — has a CFI-meeting abbreviation (in this case (deprecated template usage) FOO), I've noticed that different editors link differently from the entry for the term to the entry for the abbreviation. Approaches I've used or seen used include:

  1. Mention in a usage note.
  2. Listing as a derived term, or sometimes as a related term.
  3. Mentioning in the etymology.
  4. Mentioning in the inflection line.
  5. Mentioning in a sense line.
  6. Listing in the "See also" section.

My preference is for either of the first two, though the fourth makes sense to me as well. The third and fifth both strike me as misguided (granted, a sense-line approach might make sense in a case where the abbreviation is very widely understood and might genuinely be useful in actually defining the term, but I've never seen that done), and the sixth seems pointlessly uninformative.

What do y'all think? How do you do it? How should we do it?

RuakhTALK 21:34, 25 November 2007 (UTC)[reply]

The derived-term approach fits with the facts that the abbreviation is, in fact, usually derived from the name, (although there are occasions where the abbrev. influences word selection for the long term, making etymology section the right place to go). Usage note is also a good place because the appropriate context and relative frequency of use of the abbrev. vs. the full term is a usage question. A possible problem with both of these is that the abbrev. can be missed where the entry be long. The inflection line hadn't occurred to me as a possible location, but seems like a very good place for all abbrevs. to go. The kinds of words and phrases that would have an abbrev. wouldn't usually have much on the inflection line. I would vote inflection line for now. DCDuring 22:22, 25 November 2007 (UTC)[reply]
3 and 6 are out, and I would argue against 2 as the primary vehicle for the same reason that you would consider 6 pointless. Among 1, 4, and 5 I can only state slight preferances. DAVilla 04:01, 26 November 2007 (UTC)[reply]
I don't suppose you could explain why #5 makes sense to you? (To be honest, I'm a bit surprised that someone could think about it and support #5. I had been assuming that people were putting it there without thinking about whether that made sense, but obviously I was assuming wrongly.) Relatedly, how would you format this information to put it in the sense line? There seems to be a lot of variation within this approach. —RuakhTALK 08:43, 26 November 2007 (UTC)[reply]
I vote for an option not given above: Alternative Spellings. These are after all just shorter ways of writing words, not distinct terms derived from them. Inflection line would be a bad idea if there are multiple possible abbreviations, particularly if you want to give more information (e.g. Christmas has the abbreviations xmas/Xmas/X-mas (now informal, but formerly were less so), xpmas/Xpmas (obsolete), Xtmas/Xtmas (also obsolete), Chtmas, maybe others.) --Ptcamn 08:49, 26 November 2007 (UTC)[reply]
I think Derived terms (per DCDuring) or Alternative forms makes sense, or possibly Alternative spellings, per Ptcamn.—msh210 14:37, 27 November 2007 (UTC)[reply]
Alternative Spellings. My thinking is that there are many users have a lot of info in their brains which can be made accessible by reminding them what the acronym is. They see something's full name and fail to connect to the acronym. We would make their use of WT rewarding by giving them that instant connection. Other users would probably also appreciate knowing a shortcut for many keystrokes. Nobody loses. Alternative spellings is conceptually close enough to abbreviation so that it is unlikely to cause too many users to hesitate after their first encounter with the placement. DCDuring 16:03, 3 December 2007 (UTC)[reply]

Minor pedantic RFC's

Right now bots are going around putting RFCs all over the place for minor pedantic reasons. I think the RFC box as it is, is intended more for cases where a page is in serious need of cleanup, ie, where the page by itself is a mess. Bots are putting big clunky RFC boxes in entries which look fine to all but the most discriminating editor-- and certainly look fine to our main userbase. The RFC boxes, usually placed prominently at the top, end up being a far greater eyesore in the entry than the minor errors they talk about. As an example, look at [5]. In this example if AutoFormat really thought it was that big of a problem to have a "See also" at level 2, AutoFormat could have helped us out a lot by just fixing it himself. If a casual user looks at that page in the revision # linked, the RFC box is bigger than the entry itself. And a casual user will just be confused by the blurb about "See also as level 2 header".

If people are really adamant on flagging minor things like this, an alternative to the big RFC boxes should be used. Those big RFC boxes should be reserved for entries which are literally a mess, for example [6]. Instead they've become an all-purpose tag for pointing out very picky deviations from WT:ELE.

Take a look for example at [7]. "Personal pronoun unknown"? What does that even mean? And which language is it even referring to? The RFC box floats above the languages and makes no attempt to indicate which language it's talking about. It's probably complaining because someone used "Transitive verb" as a header, but that's not obvious, and still doesn't explain the "Personal pronoun unknown" nonsense.

I propose a much smaller, less intrusive box which would go at the BOTTOM of the page and wouldn't glare out at our casual readerbase. It could be called rfc-minor, or something like that. Basically, it'll take a long time before all these tiny pedantic RFCs get cleaned, and in the meanwhile the big ugly RFC boxes are more of an eyesore than the things they're meant to fix. Language Lover 08:14, 26 November 2007 (UTC)[reply]

I completely agree, except for the thing about going at the bottom of the page; I think the ideal would be an inline tag along the lines of "cleanup is requested here (these quotations are in the wrong format)" placed right by the problem. (BTW, AutoFormat is not a "he" so much as an "it". And the problem with han is that it uses the unknown/invalid header "Personal pronoun" in the Swedish language section.) —RuakhTALK 08:39, 26 November 2007 (UTC)[reply]
AF has spend so much time changing gender tags that he/she/it is no longer quite sure of he/she/its gender identity. Robert Ullmann 13:39, 26 November 2007 (UTC)[reply]

Besides making pages look bad, the promiscuous bots are also distracting us from pages that really do sincerely, urgently need work. For example [8], a mess in a fairly major entry which needs some serious cleaning and which has been that way since before May 2006. Noone would see or clean this because the requests for cleaning category is littered with "See also at level 2" junk.

Furthermore, if you actually *read* the content of the RFC box, it very clearly is intended to be manually placed by a human, who then *is supposed to initiate a discussion at Wiktionary:Requests for cleanup, or at the entry's discussion page*. Bots like AutoFormat *are not doing this*. Language Lover 09:02, 26 November 2007 (UTC)[reply]

The RFC box for "Personal Pronoun unknown/invalid header" was placed last March; AF was changed thereafter to use the (invisible) {{rfc-header}} tag. One does wonder why the humans that have edited the entry since have not fixed it? Perhaps it needs something BIGGER? ;-)
The RFC tag for "See also" (or whatever) at level 2 has to be placed; in the general case AF can't fix it (which language section does it belong to?) I know you got annoyed because it tagged a number in a row; the solution is to fix them? Having headers out of language sections is a serious structural problem with the entry that requires human attention. Shall we hide them in the rfc-level cat? Have you ever fixed a single entry in that cat?
AF could do that; but they will never get fixed. In the meantime, any mirror or program reading the DB will produce output for language "See also". They do not "look fine": they are seriously broken. Robert Ullmann 09:58, 26 November 2007 (UTC)[reply]
I've gone through and fixed most of them, also fixing missing headwords, typos, and adding the required # before {{substub}}. That's the other reason for the big RFC box tag; usually entries with improper L2 headers have lots of other things wrong with them. Robert Ullmann 13:39, 26 November 2007 (UTC)[reply]

Language identification

What do we do when it's not clear what language a word or text belongs to? For example, could/should a text printed when Serbo-Croatian was officially considered a language be used to verify words in Serbian/Croatian/Bosnian? --Ptcamn 09:14, 26 November 2007 (UTC)[reply]

If a work claims to be in Serbocroatian then we should treat it as a word in a language called Serbocroatian. There are many cases where one language or dialects thereof go under several names and few people mind (Swedish, Norwegian, Danish; Hindi, Urdu). There are also cases where most people think of one language whereas ISO 639 language codes force us to instead chose between several languages. There are many ways to group and differentiate languages and dialects, linguistic ways and political ways. There was a time when Serbocroatian was generally accepted as a single language and as we document former languages as well as current languages we should document Serbocroatian as a one-time language just as we should document Croatian, Bosnian, Serbian, and any other languages which may declare themselves to be separate forms in the future.
Even if people want to argue over whether Serbocroatian was ever a language or not, documenting it here will help people on both sides of the debate to find examples to substantiate or refute claims. If we refuse to document it that will leave in the dark any people wishing to study it. — Hippietrail 09:34, 26 November 2007 (UTC)[reply]
There isn't a single word of Serbo-Croatian/Croato-Serbian that cannot be found in their respective successors. The only one who would be left in dark are those unable to confront the truth, i.e. the death of artifically constructed pancentralsouthslavic linguistic unity). Hopefully next time voting is organized to shut down permanently all the "Serbo-Croatian" (sh) wikiprojects, which are all dead anyway and live out of 1-2 people copy/pasting articles from their bs/hr/sr neighbours, more people will be mobilized and "Serbo-Croatian" will finally gain as much usage on wikiprojects as it does in the real world (i.e. none). --Ivan Štambuk 10:40, 26 November 2007 (UTC)[reply]
So you are against all dead languages including Latin and Old Church Slavonic, all artificially constructed languages such as Esperanto and Ido, or just the dead and artificially constructed languages for which you hold a passionate point of view? Do you also wish to erase all mentions of the dead and artificially constructed Yugoslavia just because it no longer exists? — Hippietrail 13:15, 26 November 2007 (UTC)[reply]
I'm against 1) needless redundancy 2) political agenda behind the term Serbo-Croatian, most importantly inventing some kind of "reasonable argumentation" just to promote it. There is nothing in "Serbo-Croatian" left, that isn't covered by individual standards of today. There is nothing left in the "darkness" for people wishing to study it, that isn't covered in standard Bosnian/Serbian alone.
"Serbo-Croatian" is not dead/extinct as the languages you describe, just desintegrated and the term deprecated. When it was still being officially used in Serbia, Bosnia and Croatia had their own standards. Since the term "Serbo-Croatian" by definition also refers to lexis of it's "western variant", that would also partly include lexical treasure of Croatia and Bosnia Serbs had no right to claim. And I already mentioned that since 1974. federal states of Yugoslavia had, by constitutional changes, right to their so-called "standard language idiom", which Croats of course embraced wholeheartedly. Anyone who would, for example, include Katičić, Radoslav: Sintaksa hrvatskoga književnog jezika, (Syntax of Croatian literary language, printed in 1986 when Yugoslavia was still very much alive) as part of the Serbo-Croatian "heritage" that needs to be "documented" just because communist regime protected the term "Serbo-Croatian" by law, whould be....I don' know, just terribly wrong. --Ivan Štambuk 14:30, 26 November 2007 (UTC)[reply]
If you're not a native speaker, you shouldn't be checking/adding translations in the first place, especially not with obsolete reference literature.
Do you claim to be a native speaker of Serbian, Croatian, and Bosnian? And what about Montenegrin? Since you appear not to claim to be a native speaker of Serbocroatian you could just take advice as you give it and have nothing to do with obsolete languages in the first place.
Please, spare me of pointless sarcasm. "Native" speaker, as the term is commonly understood, of those Ausbausprache should be born in the moment their respective standards where already formulated and actively used. Mutual intelligibility has nothing to do with Ausbau languages. Stating your question in such absolute manner without regard to broader geopolitical context reflects dogmatic viewpoint that one cannot productively held discussion with. If you're so passionate on Serbo-Croatian language, go to recently activated Serbo-Croatian wiktionary. With 12 articles in the main namespace, I'm sure your contributions would be much appreciated. Comrade DCabrilo is there also, apparently engaging in discussion how to massively enter thousands of lemmata.
As for the adding of new words - just add them under Serbian, unless you happen to have a rare copy of "Croato-Serbian". Croats stopped using the term "Serbo-Croatian", basically since 1974, Serbs continued up until 1997, when they silently switched it in constitution to just Serbian after accepting the newly promoted orthography.
We should add new Croatian words under Serbian also?? Nobody calls my country Terra Australis or New Holland any more but certainly nobody would suggest erasing them from reference materials.
You probably haven't read a book written in official "Serbo-Croatian" in your life. They're filled with "tačno", "prevashodno" and similar serbianisms. Your parallel is out of place. Serbs continued to name their language Serbo-Croatian for a very simple reason - being so vaguely defined (2 scripts x 2 pronunciation variants), approval of any other nations of Central South Slavic continuum wasnt necessarily needed. --Ivan Štambuk 14:30, 26 November 2007 (UTC)[reply]
"Bosnian" language, or Bosniak language, as it was originally supposed to be and should be called (as an Ausbau language of Muslems in Bosnia, i.e. Bosniaks) is a mixture of both, and can acccept both Latin/Cyrillic, and characteristical Croatian/Serbian lexeme oppositions (suradnik/saradnik, kisik/kiseonik, verbs in -irati/-isati etc.). So you can use Bosnian as a general container for words you're not sure where to put them. --Ivan Štambuk
  • We should document all the unpopular stuff, the stuff we'd like to forget, the artificial, the antiquated, the obsolete, the stupid, the offensive, the political, everything. We should document dead languages, archaic spellings, terms related to Nazism, communism, terrorism, racism, whatever. Languages change naturally and they also change politically. All the slavic languages sprang from Old Church Slavonic or Old Bulgarian. Sometimes two dialects of one language are more different than what are called two languages. Sometimes governments pervert spelling or writing system or try to force languages to merge or separate. Sometimes such movements are popular, sometimes they are not. I'm not a fan of silly American terms like freedom fries and I don't support the Lao government publishing official spellings that nobody wants to use, but I support documenting them all in Wiktionary. This is the nature of language and we should deal with it. — Hippietrail 13:15, 26 November 2007 (UTC)[reply]
Hear, hear. Widsith 13:22, 26 November 2007 (UTC)[reply]
I don't know much about this topic, so maybe I'm missing something obvious; but your analogy seems flawed. Obviously we're going to include an entry for Serbo-Croatian, just as we do for Aryan, Mongoloid, unicorn, and vomitorium; but there doesn't seem to be any need for us to list Serbian words as both "Serbian" and "Serbo-Croatian", nor Croatian words as both "Croatian" and "Serbo-Croatian", any more than to list all Old English words as "English" whether or not they made it, or all words in all languages as "Turkish" or "Sun Language", just because there exist strange political ideologies that would so classify such words. —RuakhTALK 15:56, 26 November 2007 (UTC)[reply]
A word like god means "god" in both Old English and modern English. Obviously, it's just exactly the same word - we just decided arbitrarily to call words before 1100 "Old English". It means we have to list it twice. Seems a bit pointless, but that's the way language names are - arbitrary. Widsith 16:05, 26 November 2007 (UTC)[reply]
The problem is that the all lexis of "Serbo-Croatian" (or of what that terms tries to encompass) can be found in it's successors. Moreover, almost every standard B/C/S word whould have it's corresponding "Serbo-Croatian" entry by definition of that term. It's not just a matter of correspondence of lexical stock synchronically, but also diachronically. You cannot just say that SC "died" in 1997 and all words coined afterwards are not SC. Building, OTOH, SC entries on the basis of dictionaries and grammars of SC that were printed up until 1991 makes no sense also, as they are only marginally different from it's descendant languages, if different at all. If this discussion were being held in year 2150, it would make sense. But not in 2007. --Ivan Štambuk 16:49, 26 November 2007 (UTC)[reply]
It would be very nice if there were a way to objectively differentiate languages rather than to rely on political influence. I am told it is a futile cause. Nonetheless it would be nice. DAVilla 03:57, 27 November 2007 (UTC)[reply]
I probably should have used a less controversial example than Serbo-Croatian. My problem is with Nahuatl: I'm not sure where Classical Nahuatl is supposed to end and the other varieties begin. But we should probably have a general guideline on what to do in situations like this. --Ptcamn 08:15, 28 November 2007 (UTC)[reply]

Add [[fr:Catégorie:serbo-croate]] please --Henri Pidoux 15:04, 26 November 2007 (UTC)[reply]

Done. Conrad.Irwin 15:54, 26 November 2007 (UTC)[reply]

Requesting citations

I consider myself to be a casual Wiktionary user. I use the site frequently to look up words, and I occasionally take an interest in things like RFVs and WOTD nominations. One thing I couldn't help but notice is that a vast number of entries are uncited. I believe I may have irritated a number of people by putting in a RFV of "concordat". I knew the word existed, but I hadn't heard it used in one of the senses and wanted to read some citations. I was advised to use {{rfquote-sense}} in such instances. So, my question is, am I allowed to add that template to every article I come across without citations, and, if I did so, would it do any good or just be a nuisance? RobbieG 15:58, 26 November 2007 (UTC)[reply]

The more entries have {{rfquote-sense}}, the less useful it will be, and given how poorly cited we are, I think it would be absolutely useless if added to every single sense without quotes. I'd say, add it in places where you'd particularly like to see a quote; say, places where you don't think the definition makes terribly clear how to use the word, or how two senses differ. —RuakhTALK 17:57, 26 November 2007 (UTC)[reply]
OK, thanks for the advice! RobbieG 21:13, 26 November 2007 (UTC)[reply]

Harry Potter

Apparently, citations from a "well known work" will allow a word to be entered. Doesn't this allow all the words exclusive to the Harry Potter novels? Given the sales, they're probably the best known books of modern times. RobbieG 16:01, 26 November 2007 (UTC)[reply]

It depends on the term, some like Hogwarts have apparently come into the language proper, others like apparate haven't. I personally think that we should include all of these words - but they would have to be clearly marked as fictional - often words like this get taken up by other fiction authors, see w:Encyclopaedia Galactica, and so it is useful and interesting to have the etymologies. A normal dictionary would probably not include fictional words, and so I expect that the CFI will unfortunately be updated to close this apparent loophole. Conrad.Irwin 17:24, 26 November 2007 (UTC)[reply]
I don't think that they qualify as well-known works just yet. Let's see what's what in fifty years, say. Although the CFI don't indicate what's meant by a "well-known work", the standard examples people give are works by Shakespeare, Milton, and Dickens, and the Bible. The Harry Potter books, though currently "in", aren't Oliver Twist, Romeo and Juliet, or Ecclesiastes.—msh210 17:38, 26 November 2007 (UTC)[reply]
Word (though I hope I'm not present for the argument over what constitutes "the Bible"!). —RuakhTALK 17:54, 26 November 2007 (UTC)[reply]
Don't qualify? If you can find my one person who's never heard of Harry Potter, I'll... err. Do something gross that I haven't thought of yet. Harry Potter is probably one of the best known series since (insert something big here). That shit is all the rage. — [ ric | opiaterein ] — 20:26, 26 November 2007 (UTC)[reply]
Well, that's what I was thinking. As far as the phrase "well known work" can be interpreted, the Harry Potter books are one of the most obvious examples - as Opiaterein points out, you'd be hard pressed to find anyone who hasn't heard of them. Now, if it is acceptable, I can happily to add some Harry Potter words, but if it is not, might I tentatively suggest that the wording be changed to make your intentions more clear? If they do count, but nevertheless aren't wanted in the main site, maybe we could start an appendix? I figured people might not want expressions like "house-elf" and "Quaffle" in Wiktionary, which the current wording leaves us open to. I rather like Harry Potter, so I'm not fussed whatever the verdict; I'd just like to see what the thinking is on this. RobbieG 21:31, 26 November 2007 (UTC)[reply]
Well, perhaps it is well-known. Actually, it is well-known: I'll grant it. But it may be briefly well-known. The wording of the CFI seems to indicate that that doesn't matter; on the other hand, do we want to start RFVing all these words in ten years, when the only work that cites them (besides dependent works) is no longer considered well-known? I'm not sure what people think, and perhaps this should be straw-polled, but I suspect that "well-known-for-a-long-time work" is what was meant, or what should be.—msh210 21:52, 26 November 2007 (UTC)[reply]
I started reading the series about 10 years ago. It's stayed well-known and grown over that time, and they still have 3 movies left to finish. That's a fairly long time, I'd say. — [ ric | opiaterein ] — 22:19, 26 November 2007 (UTC)[reply]
If we start adding Harry Potter words which haven't really entered the standard lexicon, then someone should restore my entwife entry. See [9] , as well as the entry directly above it. Tolkien, and particularly LoTR, should be allowed if Harry Potter is. Language Lover 23:18, 26 November 2007 (UTC)[reply]
Agreed. Likewise any word appearing in Star Wars, Star Trek (original or The Next Generation, at least), Ogden Nash, and L. Frank Baum, among many others. (Actually, I'm not sure if Baum had odd words. But you get the point.) I seem to recall a bunch of Star Trek words deleted because of independence issues: but Star Trek is well-known if Harry Potter is, in which case they should be allowed.—msh210 23:41, 26 November 2007 (UTC)[reply]
The fantasy and sci-fi novels that have made a point of inventing novel words are a challenge to the attestation rules when they become well-known. Though Shakespeare may have truly invented words, it doesn't seem to have been quite as self-consciously done as Carroll, Tolkein, and Rowlings (or Gerard Manley Hopkins cf. sillion). Maybe the "well-known work" exception to the general rules is the problem. It pains me to say it but perhaps some words of Shakespeare that weren't taken up into general use might not belong in a modern dictionary, not to mention sillion. DCDuring 00:37, 27 November 2007 (UTC)[reply]
I would disagree, words are invented by authors to express things more elegantly than they feel they are able to do in current language, I would (although it is probably blasphemy) lump Shakespeare with the fantasy authors and science fiction authors, all of whom were just trying to find a nice way to express something. I feel that these words should be included in Wiktionary main namespace, but as they are not technically English, perhaps we could use a new first level heading such as Fictional. The reason I think they should be in the main namespace is that that is the one in which words are put - the appendices should be reserved for grammar and groups of related words, etc. is all there, (I don't know to what extent these words get translated between languages when the books are translated [10]- but I doubt we would need "Fictional English" and "Fictional French" etc.) Conrad.Irwin 01:22, 27 November 2007 (UTC)[reply]
My own criterion for whether a "not quite 'real'" word should be in an allegedly-unabridged dictionary is: how likely is someone going to want to look it up? So made-up words in a popular work of fiction which end up being frequently used by other writers (if only in homage to the original work, i.e. whether or not the words "enter the language"), definitely belong. (But to be sure, it's still hard to decide when to apply this rule, and the floodgates could really open in unwanted ways if we started allowing this.) —scs 02:48, 27 November 2007 (UTC)[reply]
An interesting example of this is "slithy" from Carroll's Jabberwocky. Our entry includes a Translations section with the word used in the French translation of the poem: "lubricilleux". Jabberwocky is probably exceptional, though, because there are dozens of essay examining Carroll's use of nonce words in this poem. It seems to me that coinages should have some citations besides the original work (and fan fiction) before they are qualified for inclusion. Mike Dillon 03:13, 27 November 2007 (UTC)[reply]
The solution is to revise the criteria. "Well-known work" means that verification isn't necessary in the same vein that "widespread use" makes it unnecessary. DAVilla 03:46, 27 November 2007 (UTC)[reply]
Perhaps "well known work that is X years old?" RobbieG 15:41, 27 November 2007 (UTC)[reply]
I would think that, in order to effectively compete and survive, Wiktionary ought to be more open to the new than print-based dictionaries, while being much more selective than "urbandictionary". Wiktionary also needs to attract contributors whose contribution nets out as positive. Can a fictional words namespace serve as a place to attract more young contributors and still allow the maintenance of strict standards for entries in the main namespace? DCDuring 17:28, 27 November 2007 (UTC)[reply]
In general I think that's the right idea. As to your specific proposal I would be worried because some "real" words have senses that are protologismatic, which would ultimately just result in a mess. How would one patrol or even maintain the namespace if there were no rules? Ultimately I think we just have to agree to be bound by community consensus, which at this point is both liberal and specific, apart from gray areas and discrepencies such as this. Harry Potter may be well-known, but nonces are regularly turned down when addressed individually. DAVilla 07:57, 28 November 2007 (UTC)[reply]
Or "classic" would nail it down just enough. DAVilla 07:49, 28 November 2007 (UTC)[reply]
Ah, but then you get into the debate about what constitutes a "classic". It's a very subjective term. RobbieG 08:16, 28 November 2007 (UTC)[reply]
It's not a problem if the CFI were to permit the nonces in these works via references (academic journals being an existing example). Then "classic" would extend these criteria the same way "widespread use" extends the current. If the label of "classic" for a work were under contention then a few references would settle the issue. DAVilla 07:48, 30 November 2007 (UTC)[reply]
As to an alternative namespace, I would definitely not propose that it have no rules, but possibly fewer and certainly different rules. In the case of fictional characters, RfV challenges would be met by citations from the "official" corpus of works. The people policing it would be those with a long-term interest in such a namespace, with oversight during start-up by others. It could be that separate user accounts would be appropriate. To me it just seems like a way for WMF to capture the enthusiasm of a group of younger potential users. Hasn't there been some prior proposals along these line? What is the right forum for suggesting and discussing this ?DCDuring 11:57, 28 November 2007 (UTC)[reply]
The problem with having an alternative namespace would be that it would create a split in the dictionary. It would mean that someone wanting to look up a word would have to know in advance that the word was specific to the work of fiction and had not 'graduated' into mainstream usage - something which is unlikely to be known (and people may possibly want to look up using wiktionary :). The Wikisaurus namespace, I would argue, suffers badly from being separated in this fashion - but that is a discussion for another time. I feel that as long as a word is clearly marked as fictional - which would be relatively easy to do - it could be added to the main namespace. This would create a (possibly unique) place to document the formation of new words from fiction. I suppose there is a worry that by including the word we would be increasing its audience and thus the possibility of it entering the language, however I do not think that that is a bad thing (or particularly likely). As a fairly poor example, but the first one I could think of which had another language definition, here is how I would format such an entry... Conrad.Irwin 15:46, 28 November 2007 (UTC)[reply]
==Neapolitan==
===Alternative spellings===
* [[àccio]]
===Noun===
'''accio'''
# [[celery]]
----
==In Fiction==
===Etymology===
 <edit>From the Latin [[accio]] I summon. Coined</edit> in [[w:Harry Potter and the Prisoner of Azkaban|]], [[w:J. K. Rowling|]] 1999.
===Noun===
# A magic spell causing a named object to fly towards the caster.

Um... that second one should just be Latin, for "I summon". The word was not invented for the Harry Potter novels; it is simply a Latin word in an otherwise English text. It "first appeared" in Classical Latin. --EncycloPetey 00:05, 3 December 2007 (UTC)[reply]
I think that that should be in the Etymology, but the word in fiction does not mean "I summon", in the same way that 'real' English words can be strongly based on Latin words. (I think I should of chosen a better example word :). Conrad.Irwin 00:15, 3 December 2007 (UTC)[reply]
Well, if you're going to make corrections, you should note that the word is used as an interjective command, not as a name for the spell itself. --EncycloPetey 06:32, 3 December 2007 (UTC)[reply]
I take your point about the splitting of the user base. Something like what you suggest might work although I'm not sure about "In Fiction" at level 2. I like the idea of the drawing of a line between normal "English" and "Fictional English" (or whatever nomenclature works). Is the idea of different CFI/RfV standards within the namespace likely to confuse? DCDuring 16:03, 28 November 2007 (UTC)[reply]
(IFYPFY.) Yeah: "In Fiction" (imo) doesn't work, as the word is in English in fiction, but not, let's say, in Arabic in fiction. (Some words from fiction may be in numerous languages, but I imagine very few.) If we're going the level-2-header route (which I do not recommend), then I suggest instead ==English (in fiction)== or ==English (fictional media)==.—msh210 17:33, 29 November 2007 (UTC)[reply]
Would something like ==English Literary Coinage== work? sewnmouthsecret 19:43, 29 November 2007 (UTC)[reply]
Ah, that's better. (Again, this is assuming we have a level-2 header, which idea I do not like.)—msh210 20:04, 29 November 2007 (UTC)[reply]
I remember someone having the idea of an appendix with made-up words from specific works. I think that would work better than a level-2 header, which just seems weird to me. The same words aren't always used translingually, of course. — [ ric | opiaterein ] — 22:28, 29 November 2007 (UTC)[reply]
That could get ugly. As long as we limit what is allowed (via some arbitrary method), allowing terms in well-known works as regular entries shoudln't be a problem. sewnmouthsecret 22:51, 29 November 2007 (UTC)[reply]
There may be more of this kind of thing than we want without some way for users to exclude it when they want. Is Vonnegut's Slaughterhouse Five well known? The Dune novels? Matrix? Winnie the Pooh? SpongeBob? Raymond Chandler? Damon Runyon? Mark Twain? DCDuring 23:10, 29 November 2007 (UTC)[reply]
Any ideas for parameters without being too exclusionary or inclusionary? sewnmouthsecret 00:07, 30 November 2007 (UTC)[reply]
How about something like Any word with a clearly fictional referent [thus excluding those that are fictional according to non-adherents of a particular religion, for example] is an exception to the "one-well-known work" rule.?—msh210 07:58, 30 November 2007 (UTC)[reply]
<devil's advocate>Jedi? Anyone non-jedi-ist would call them "fictional", and presumably outside of a "religion" doctrine, but what about those self-proclaimed believers in "jedism"? Would their opinion on the matter, matter?</devil's advocate> \Mike 09:55, 30 November 2007 (UTC)[reply]

(cleared indent) I think the heading would have to be ==In literature==, or ==Literary coinage== as above, which would solve the problem with classing "true" texts as fiction. Where to draw the line for inclusion is much harder, it is probable that we only want (reasonably) well known works, but even within novels, there seems to be a large difference in ghits for similar words. I think that a few simple criterion should be added, and then individual cases wrangled over if necessary. For example, we should not include fictional names or place names, and we should, in general, only include words that have featured in a piece of writing (possibly a critique, blurb or review) about the work itself. Conrad.Irwin 10:43, 30 November 2007 (UTC)[reply]

  • Um, aren't these all supposed to simply appear on WT:LOP? Why all the fuss? --Connel MacKenzie 06:20, 3 December 2007 (UTC)[reply]
    • Problem being (from my view, at least) that words used to describe a certain fictional spell in Harry Potter or a fictional race of beings in Star Wars or Star Trek are not "protologisms" in the sense that they are not words that someone thinks should actually be introduced into everyday language to describe real things. I think an appendix styled as a "concordance" would work fine, as it will basically end up being a glossary of words found in this one mythical universe that are not, and will not be, found elsewhere. bd2412 T 16:35, 3 December 2007 (UTC)[reply]

How about adding the following, or something like it, to the CFI? Any word with a clearly fictional referent, and any word invented as a humorous play on another word or words, is an exception to the one-well-kown-work rule and requires three citations, as mentioned above. This sentence would be referring to Star Wars characters, words Ogden Nash invented to rhyme his poems (see w:Ogden Nash#Poetry_style), and Carrol's chortled, et al.—msh210 17:44, 3 December 2007 (UTC)[reply]

So would the three different books in the Harry Potter series that mention confundo as a spell causing confusion, or expelliarmus as a disarming spell, be sufficient to have an entry on that sense? bd2412 T 17:53, 3 December 2007 (UTC)[reply]
They're not independent: you know the rules.—msh210 18:07, 3 December 2007 (UTC)[reply]
Is that the case covered by independence? I thought it was mirrors, not different uses in different books by the same author. Although I certainly think that for a Harry Potter term to enter the lexicon, it would have to be used by someone other than Rowling, and in a context other than simply describing a spell or other such thing in Rowling's book universe. bd2412 T 18:32, 3 December 2007 (UTC)[reply]
That case should be covered by independence. DAVilla 03:34, 4 December 2007 (UTC)[reply]
Would said 3 citations have to be independent? The wording overall sounds good to me. Here is my general opinion on all of this: As an online dictionary trying to not be every other dictionary, I think it would be smart to stay ahead of the curve and allow literary coinages. I am not for every single word ever put into print, just major or well-known words that can see future usage or have seen usage, are translatable, are generally linguistically or etymologically interesting, or are in well-known works (Potter books, e.g.). Things like sith, Horcrux, outgrabe, alohomora, lumos, nox, auror, crucio, etc. I understand trying to keep integrity in Wiktionary; however, some of these terms are already appearing elsewhere and given time likely will continue to do so. Many are also deserving of being in the main dictionary, as opposed to being relegated to an appendix or somesuch. sewnmouthsecret 18:03, 3 December 2007 (UTC)[reply]
I say again that most of the magical words Rowling uses are straight uses of Latin, and the rest are hack Latin coinages. Maybe some of them will enter English, but I rather doubt most will have staying power since they don't have wide application to anything outside of their fictional context. That's where a principle like WP:CRYSTAL would apply. Wiktionary is not in the business of predicting which words will become part of the English lexicon. Rather, we document those words that already have by applying a set of criteria on which new words are considered. --EncycloPetey 01:26, 4 December 2007 (UTC)[reply]
[EDIT CONFLICT] Shouldn't there be an independence clause to prevent that? That A Clockwork Orange Nadsat concordance is awesome though. I suggest that fictional works with similarly large invented vocabularies could have similar pages, and a note should be added to the CFI stating that words which only appear in three works within the same series or universe (e.g. three Harry Potter books) should not have their own pages. RobbieG 18:06, 3 December 2007 (UTC)[reply]
The reason I suggested using a level 2 heading for "in literature" or "fictional" is that these words are not "English" until they have entered the lexicon, however they are legitimate words which people may well come across and wish to look up, hence they should be in a dictionary that aims to define every word. I would think that people are far more likely to come across many of these words than they are to come across words in, say, Lojban or Aragonese, and so it would make Wiktionary a much better resource to include them. The reason that including these terms in Appendices, or other namespaces wont work, is that "Look this up on wiktionary" boxes/widgets use Special:Search/term which does not (and rightly so) search those namespaces. Conrad.Irwin 01:52, 4 December 2007 (UTC)[reply]
Well, they are English in that they are used in English literature. They may (in few cases) be used in one corpus (although many have entered the lexicon), but I agree, Wiktionary would be better to include them. A level 2 header of "Literary coinage", IMHO, would work wonderfully; but this seems to go round and round. I'd prefer taking this to a vote to either change the rules to reflect this or keeping them as they are, in which case I could cite most of those terms anyway. I think changing them would make it easier, though. sewnmouthsecret 03:50, 4 December 2007 (UTC)[reply]
I would strongly object to the inclusion of coined names of fictional items (and particularly fictional people) unless it can be shown that the term has entered the lexicon through usage completely independant of the source material. Consider this example: 2004: Rob N. Hood, Beyond the Wind, p. 1: "Wielding his flashlight like a lightsaber, Kyle sent golden shafts slicing through the swirling vapors". This quote is not from a Star Wars book, and does not even mention Star Wars - it simply presumes that the analogy will be understood without explanation. That is a word that has entered the lexicon. Cheers! bd2412 T 04:39, 4 December 2007 (UTC)[reply]
Yes, precisely. Agreed. Hence my proposal, above: to avoid adding, on the basis of one well-known work like the Harry Poter series, words that have not entered the lexicon.—msh210 16:51, 5 December 2007 (UTC)[reply]
I would likewise strongly object. That floodgate should never be opened. We do not need a flood of entries for all the names of planets and moons ever visited by Doctor Who, every Klingon dish and drink, every piece of fictional technology to ever appear in the corpus of Robert Heinlein's fiction, or all the myriad other words that appear in the vast collection of unsold Dungeons & Dragons fantasy novels. It would even worse to create a whole new "language header" to accomodate them. Especially so since the concept doesn't even consider the possibility that the same phenomenon occurs in French, German, Spanish, Latin, Japanese, etc. Our fundamental and basic division within a page is the language header. The idea being touted here strips that away and replaces it with a L2 etymology header. This would mean a complete overturn of the data structure. --EncycloPetey 05:53, 4 December 2007 (UTC)[reply]
Whatever the verdict, Doctor Who planets shouldn't have entries, unless people start attributively using words like "Mondas" (which doesn't seem very likely), because proper nouns don't get entries. As for words like skrewt and Sith, I would oppose the inclusion of these words in the main languages section, but I would support the creation of special concordances for fictional words (similar to our existing Nadsat concordance). RobbieG 16:51, 4 December 2007 (UTC)[reply]
Points taken. I will work on a Potter concordance; any other terms mentioned I will leave to someone else as I don't know them all that well. sewnmouthsecret 16:57, 4 December 2007 (UTC)[reply]

More is needed

Having the concordances is very useful, but Wiktionary provides no way of looking up words that are hidden in them. I think that we should create a template, that we can put on pages to direct people to the correct place - I would be happy with something as ugly as the sister project floaty box things, more happy with a sister project inline like thing, and most happy with something (as yet undecided) that is completely new. There is no point in gathering up this data, and leaving it inaccessible.

The template should be worded something like "For {PAGENAME} in [[w:{Book Name}]] go to our Concordance page", and it should be possible to create an entry that contains nothing but the template, for words that are not used anywhere else. Conrad.Irwin 02:32, 26 December 2007 (UTC)[reply]

Showing other Semantic relationships

There are many standard ways that words are semantically related. For example, abstract nouns generally have associated adjectives and adverbs (beauty, beautiful, beautifully), and many verbs have associated terms for the agent and object (employ, employer, employee). Aside form synonyms and antonyms (and the more obscure Wiktionary:Semantic relations), these words generally go in Related Terms or Derived Terms w/o any relationship info (though often w/ gender and perhaps POS). The relationship info is quite useful, especially for words that vary from a language's standard pattern for these relationships. Has this been tried for the more prevalent types of relationships? --Bequw 14:00, 27 November 2007 (UTC)[reply]

I would assume that most of that information would be in the article being linked to... But that's just an assumption. — [ ric | opiaterein ] — 14:14, 27 November 2007 (UTC)[reply]
Unfortunately[?] this is a hypertext dictionary rather than a wordnet. The etymology, derived terms, related terms, see also, and other semantic relations sections, categories, "what links here" provide for many relationships but not all. The examples you provide above are etymological/derived-term relationships, which are accommodated although not differentiated. The prefix and suffix (and infix ?) entries give some room for more detail for that kind of morphological relationship. I wonder whether there are some exisitng categories that might be a useful foundation for the kind of more elaborate structure that you suggest. There might be some creative wiki-programming that could accomplish what you seem to seek. Have you been to the Grease Pit ? DCDuring 15:16, 27 November 2007 (UTC)[reply]
I don't see any reason why we can't do Wordnet one better, and find ways of incorporating all relevant information on lexical relationships... although so far there has been a massive lack of initiative on this front. (I tried to stir the waters on this a bit a few months back, but I didn't end up showing much initiative either.) I think that having navboxes for closed sets would not be a bad idea, though AFAIK it has not been experimented with yet. -- Visviva 13:13, 10 December 2007 (UTC)[reply]
It's nice when the Derived terms are distinguished between compound terms (like phrasal verbs) and morphologically related terms like you mention. Otherwise the latter more relevant terms get mixed up with the others. Unfortunately, I've rarely seen this done. DAVilla 15:06, 10 December 2007 (UTC)[reply]
What would be a good example of one that was done as you suggest? What's a good format? Separate rel-tables? What heading/gloss? DCDuring 15:24, 10 December 2007 (UTC)[reply]

Endangered and Extinct language categories

I'd like to create these categories (Category:Endangered languages and Category:Extinct languagesHeeeey that exists. Fancy :D — [ ric | opiaterein ] — 18:54, 27 November 2007 (UTC)) and start populating them. I find these kinds of things interesting, and don't think I'm the only one. I've started this section for any comments etc. :-) — [ ric | opiaterein ] — 18:51, 27 November 2007 (UTC)[reply]

What are your criteria for inclusion therefor?  (u):Raifʻhār (t):Doremítzwr﴿ 21:33, 27 November 2007 (UTC)[reply]
I was just looking at Wikipedia's w:List of endangered languages and w:Extinct languages. Endangered isn't clearly defined, but Extinct is a language with no more native speakers, so that's pretty simple.
No objection from me. Wikipedia could probably do a better job, though. —RuakhTALK 21:35, 27 November 2007 (UTC)[reply]
We already had our category of Extinct languages, so one for endangered languages can't be that bad. (There are a lot of them, but not -that- many out of all of them have categories on here.) — [ ric | opiaterein ] — 00:41, 28 November 2007 (UTC)[reply]

Yeah. That idea sounds like a good one. Could you note a little about the languages’ endangered statūs in their entries too?  (u):Raifʻhār (t):Doremítzwr﴿ 19:30, 29 November 2007 (UTC)[reply]

Not a bad idea :) — [ ric | opiaterein ] — 23:07, 29 November 2007 (UTC)[reply]

UserBoxes

From what I understand, Userboxes other than Babels aren't allowed under NoPOV or something, but what about userboxes for linguistic interests? Specific languages, parts of language, language families, you know. I think that would be interesting. — [ ric | opiaterein ] — 00:45, 28 November 2007 (UTC)[reply]

There's been some stuff out there like that. Anything specific you have in mind? DAVilla 07:39, 30 November 2007 (UTC)[reply]
Mostly stuff like "This user is interested in Romance/Slavic/x languages", "This user is interested in morphology", "This user is interested in the evolution of languages", things of that nature. Interests related to languages and things like that, nothing too irrelevant. — [ ric | opiaterein ] — 12:56, 30 November 2007 (UTC)[reply]
The wording was to allow specific exceptions. Those do seem reasonable at first blush, but please be more specific. Do you have these as in-line (text) categories on your userpage yet? If they (the categories) get other adherents after a month or two, it would then seem reasonable to contemplate allowing them as userboxes. --Connel MacKenzie 17:20, 30 November 2007 (UTC)[reply]
My point wasn't for categories. Just little boxes to show "Hey, this is what I'm interested in. You can talk to me about this." Categories could be included, it wouldn't hurt. But that wasn't what was on my mind.
I thought my examples were pretty specific. Looky.
Б This user is interested in Slavic languages.
A This user is interested in Romance languages.
[ʑ] This user is interested in phonology.
ΣکԱ This user is interested in writing systems.
Just things like that. Nothing too "controversial", I don't think. — [ ric | opiaterein ] — 22:34, 30 November 2007 (UTC)[reply]
A few of those might not be so useful, as they don't convey a lot of information. They don't seem too harmful either, and I said I've seen similar in the past. But if you're looking for advice, it would be something else to say how proficient you were at IPA, or what you were most interested in. Just my two cents. Whatever you come up with, I'd be interested in trying it out myself. DAVilla 03:51, 1 December 2007 (UTC)[reply]
I don't think saying what you're interested in (keeping it related to language) through userboxes is really such a horrible idea. I was just using those as examples. Something like you're talking about (IPA levels and stuff) would be good, too. So I'm not just talking about interests, but levels other than languages and just whatever else related to languages. (I'm tired and probably not making enough sense. oh well. =\ ) — [ ric | opiaterein ] — 15:02, 1 December 2007 (UTC)[reply]
This is were I should probably put my hand up and say, I have userboxes without permission. I put them there (twice) before I found out about the ban, and arrogantly left them there after I knew. They are, in my opinion, useful - in the same way as the IPA box mentioned above - as they allow people who are seeking someone to lend a hand with templates etc. to stumble across me. A lot of the argument given against userboxes was based around template-space clutter, which as they aren't templates on my page isn't a problem; though I do appreciate that lots of people just dislike them out of principle. Conrad.Irwin 15:22, 2 December 2007 (UTC)[reply]
Personally, I'd rather look at a few userboxes to see what a person is interested in and what they like to contribute mostly, than reading through 3 paragraphs that say the same thing. They're concise and (sometimes) attractive. I've never understood how some dislike them out of principle, and what principle, etc. — [ ric | opiaterein ] — 16:46, 2 December 2007 (UTC)[reply]
On Wikipedia, they've just gotten totally out of hand. Some people have literally hundreds of userboxes advertising every opinion they hold, like or dislike, sports teams they follow, food allergies, and all kinds of other information of questionable value to the construct of an encyclopedia. However, an encyclopedia is a wide-open thing, so it arguable that a Led Zeppelin fan can be trusted to write on the band, or that someone who loves cheese can write about cheese. A dictionary requires a far narrower set of skills, i.e. knowledge about words, in various languages. Wiktionary userboxes should therefore be limited to conveying information about the user's knowledge of words (or of a language, or of a jargon), along with other information that is relevant to their ability to edit, such as what time zone they live in, and maybe what editing tools they are able to work with. bd2412 T 05:04, 3 December 2007 (UTC)[reply]
My point with this wasn't a userbox freeforall like they have at Wikipedia. I was aiming for userboxes with some relevance to the project. :) — [ ric | opiaterein ] — 18:19, 4 December 2007 (UTC)[reply]

Wikisaurus idea

As I mentioned in one of my posts above, the Wikisaurus namespace seems to suffer from lack of TLC. In order to remedy the situation, and to make Wikisaurus more useful I would like to propose a new way of doing things. Instead of having Wikisaurus only on a separate page, it should be integrated into the main namespace, using transclusion. See User:Conrad.Irwin/anger for an example of how this may work. (You should note that the 'Wikisaurus' entry is at User:Conrad.Irwin/ms). This would let the same synonyms and antonyms be easily added to each of the articles to which they refer. There are a lot of technical issues that would need discussing - but as a pre-alpha idea, how does this strike people? Conrad.Irwin 17:01, 28 November 2007 (UTC)[reply]

In place of a "See WikiSaurus" link or box? That would be great! I should probably be more cautiously optimistic though. How prevelant will this be, ultimately? Do we want to relegate so many synonyms to a thesaurus? The main problem I'm worried about, potentially, is transitivity.
As to your design itself, does it require transcluding entire Wikisaurus pages within each dictionary entry? DAVilla 07:33, 30 November 2007 (UTC)[reply]
Wait, I would prefer not to see all of the synonyms (the idioms in particular) except on the Wikisaurus page itself, following a link. Is there a way to transclude only part of a page? Maybe push it off to a subpage? DAVilla 07:36, 30 November 2007 (UTC)[reply]
It is possible using <includeonly> tags, but the author of the Wikisaurus page would have to put these around the parts of the page they did not want transcluded. I would see this as hopefully being used on most pages with more than one or two synonyms, as it would allow the synonyms section of each word to be updated in tandem The main problem is that it is unfortunately not so friendly for newcomers to edit (though it is fine if they use the edit section links). Conrad.Irwin 10:13, 30 November 2007 (UTC)[reply]
Second thoughts - it is easier to do this using templates, so it would be reasonably transparent for editors. Conrad.Irwin 13:01, 30 November 2007 (UTC)[reply]
Sorry, but I can't tell from the text above or the sample page exactly what it is you're proposing. Could you give a more careful description? There is also a real problem in trying to transclude a single list of synonyms to multiple pages because of shades of meaning and the fact that we want the synonyms to match a particular definition sense. For example, how would the synonyms of "set" be transcluded onto the entry for set? I'm sorry, but I just don't see how this could be feasible. --EncycloPetey 23:46, 2 December 2007 (UTC)[reply]
Set is a perfect example of a title that shouldn't be a Wikisaurus entry since it has so many meanings. I couldn't be sure if any is the primary, and even if so it would probably be better to have Wikisaurus:collection or Wikisaurus:permanent. From what was on the page before, I'm sure Conrad has assumed there would be a single meaning per thesaurus entry. DAVilla 07:59, 3 December 2007 (UTC)[reply]
But we're still talking about setting up transclusion of synonyms to appear in multiple pages, yes? So how would the various senses be transcluded for set, and would it make for an improvement? I don't like the idea of implementing a new formatting for a section if it can't be applied to all entries, but rather only to select pages. --EncycloPetey 15:10, 3 December 2007 (UTC)[reply]
Ah, I see. Then the page set would transclude synonyms from Wikisaurus:collection, Wikisaurus:permanent, Wikisaurus:prearranged corresponding to individual senses, and any others for which there is no Wikisaurus page would simply be listed as done now. DAVilla 17:15, 3 December 2007 (UTC)[reply]
To be honest, I have been having serious second thoughts about the application of this, but if someone else can see a way to proceed I would be very interested in hearing it. It has meant that I have had second thoughts about the usefulness of the Wikisaurus namespace, see the wikisaurus box on run for example. The only advantage I can see to having the seperate namespace is to provide a page to put slang terms for things, which should probably be in a subpage anyway. Conrad.Irwin 16:09, 3 December 2007 (UTC)[reply]
That's because Wikisaurus:walk doesn't seem to have any focus. I'm worried about this problem more generally, but that one might be fixable. DAVilla 17:15, 3 December 2007 (UTC)[reply]

Unattested forms of extinct languages

See for example declension of глава (glava). Not all of these derived forms were attested in OCS canon (set of manuscripts the define OCS). Similar point can be made for most other highly inflective extinct languages, that record only particular form of a word (sometimes not even a lemma).

Some of the options are:

  1. Mark all derived forms as speculative with * by default, and provide special override parameter for every case in all the inflection templates for such a language. Hopefully, some diligent dude would spent enough time painstakingly searching the manuscripts to mark the truly attested forms as such, leaving the unattested ones marked by *. The drawback of this approach is that forms marked by * would be potentially left as such for a long period of time, leaving the poor readers that happen to see them in delusion that all of them are hypothetical, not just some of them when the other ones are "wating" to be confirmed.
  2. Provide an empty inflection table that editors would populate with attested forms.
  3. Leave it as it is now, as for most of the words there is 99.9% chance that the unattested forms are the real ones, possibly adding a special note to the declension table that some of the forms are possibly not attested AND prohibit generating derived form pages from such tables. --Ivan Štambuk 17:30, 28 November 2007 (UTC)[reply]
Also for Tocharian - great deal of lemmata are reconstructed (i.e. attested only in a derived form). Some dictionaries include derived form as lemma and make a special note of it, some include reconstructed lemma, which sometimes even proves to be false (almost all OCS dictionaries/grammars even today list accusative form krъvь as nominative as well (these two cases had tendency to equalize), instead of kry (кръі (kry)) as attested in Psalterium Sinaiticum).
Since it's not so many of them (the whole OCS canon, which represents quite preserved language with respect to some others, has < 10k words), I think it would be good idea to step back from the usual doctrine of "only attestable, citable lexemes" even provide reconstructed lemmas, under the condition that they appear in usual dictionaries and that a note is made of it being reconstructed. --Ivan Štambuk 17:28, 29 November 2007 (UTC)[reply]

manual vs. bot-performed indexing

Why do we index by hand? I'd think a bot could very simply add to an index any page with the appropriate level-2 header. (Or any page at least yea old and not containing {{rfd}}, {{d}}, {{rfv}}, or synonyms, say.) The result would be, of course, more complete indices, with fewer man-hours spent having built them, and sans redlinks.—msh210 18:53, 29 November 2007 (UTC)[reply]

  1. No one has automated anything quite like that yet.
  2. Each language/script has separate concerns.
  3. The existing indexes' red links help skew Special:Wantedpages (in a helpful manner.)
If you want to do something about it, you are welcome to! (But please retain most redlinks.) --Connel MacKenzie 18:12, 30 November 2007 (UTC)[reply]
As far as I'm aware, no one has done much to maintain the indices lately. They mostly existed early on when the word count was small. As the project has rapidly grown, the associated indices fell into disuse. I tend to use categories more than an index, or create my own word lists. I find that the old Latin index is full of words that aren't in my Latin dictionaries. It's just such a morass that I don't want to mess with it. --EncycloPetey 23:42, 2 December 2007 (UTC)[reply]
I have given some thought to it, and I generated some index like things of Hungarian and Greek, though in the process I borked the tools I was using. If there is a particular language that wants doing then I may be able to do it - though no instant promises. For anything longer than Greek it will definitely need splitting into short pages, but my preference for languages the length of Greek to have one uber page like that. Anyway, kind of tell me what you want and I'll try to do something vaguely like it :). I don't like the idea of redlink maintenance either, they should be in a "requested entries" list not in the index, but apparently lots of people do like them there. Conrad.Irwin 13:07, 2 March 2008 (UTC)[reply]