Wiktionary:Beer parlour/2011/September

This is an archive page that has been kept for historical purposes. The conversations on this page are no longer live.

Beer parlour archives edit

2024

2023

Earlier years

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

December

adding-translation script

Discussion on de.Wikt: de:Wiktionary:Teestube#.C3.9Cbersetzung-Hinzuf.C3.BCgen-Skript.

Hello English Wiktionary-Users,

in the German Wiktionary we would like to add the function that allows people to add translations without manually editing the code section. Could anyone explain how to do it? That would be great. Thanks in advance! Kampy 08:11, 11 September 2011 (UTC)[reply]

The German Wiktionary seems structure translations sections completely differently from the English Wiktionary, with translations showing which senses they correspond to by having numbers next to the translations, rather than putting the translations for each sense in a separate box, so simply copying the code wouldn't really work. --Yair rand 20:17, 11 September 2011 (UTC)[reply]

The code would have to be modified, yes, but that shouldn't be too complex a task. One option: the de.Wikt programmers could code another box between the ISO box and the translation box, which would take the sense number(s) as input. Is all of the code that operates the function contained in User:Conrad.Irwin/editor.js? - -sche (discuss) 05:48, 12 September 2011 (UTC)[reply]

No, it also uses the newNode function in MediaWiki:Common.js#Dom_creation. Another issue is that the script seems to make use of the translation table glosses, which dewikt doesn't have, for locating tables. (Not completely sure about that.) --Yair rand 06:08, 12 September 2011 (UTC)[reply]

I think if we (on de.Wikt) changed

(values.qual? '{'+'{qualifier|' + values.qual + '}} ' : '') +

to

(values.qual? '[' + values.qual + '] ' : '') +

, we could use the code as-is (abgesehen von the problem of glosses, which we could add), with users adding the sense-numbers (1, 1–2) in the "qualifier" field. - -sche (discuss) 01:06, 13 September 2011 (UTC)[reply]

About the numbers I think it shouldnt be too much of a problem. We dont use headlines saying the definition again instead we use those numbers. So there will only be one box at all times. Anything added to this box just needs an additional input box for the number it relates to. Can anyone code this? Kampy 00:05, 14 September 2011 (UTC)[reply]

There will be more than one box (and there will be no numbers) once the translations are (per the vote) separated by sense, though...

I have copied the code to my de:Benutzer:-sche/common.js, and have copied the en.Wikt and de.Wikt translation tables into subpages of my userspace for testing (de:Benutzer:-sche/sw4), but even with classes and gloss-support added to the German translation tables (de:Benutzer:-sche/sw1c), I haven't got it to work yet. - -sche (discuss) 01:01, 14 September 2011 (UTC)[reply]

Oh, I didn't know a vote was going on. I agree that the English version is more practical. I will support a change. Kampy 10:57, 14 September 2011 (UTC)[reply]

Is it possible that the code in my de.Wikt .js isn't considering itself enabled, Yair rand? - -sche (discuss) 01:08, 14 September 2011 (UTC)[reply]

[[de:Benutzer:-sche/common.js]] has some syntax errors that will cause browsers to stop processing it. [[de:Benutzer:Ruakh/common.js]] fixes the most severe errors — you can take it as a starting point for further debugging — but it still doesn't create the form for adding translations, so there's still something wrong. :-/ —Ruakh_TALK 02:30, 14 September 2011 (UTC)[reply]

Thank you for catching that! I'm guessing that (among other things) I should add \ to all of the other instances of sche/, like you did to sche/sw1c (ie sche/sw2b ⇒ sche\/sw2b etc), yes? Or not to all of them? - -sche (discuss) 03:05, 14 September 2011 (UTC)[reply]

Doesn't make a difference, it's only necessary inside the regexps, not simple strings (/.../, not "..."). --Yair rand 03:22, 14 September 2011 (UTC)[reply]

It seems that makes the script work! The "±" sign displays atop the gloss, but that's a relatively minor problem. - -sche (discuss) 03:09, 14 September 2011 (UTC)[reply]

That's because the dewikt tables don't have the show/hide button as the first node in the NavHead, and the script places the "±" after the first node. Can be fixed by replacing insertDiv.insertBefore(edit_button, insertDiv.firstChild.nextSibling); with insertDiv.insertBefore(edit_button, insertDiv.firstChild);, so that it's placed before the first node.--Yair rand 03:22, 14 September 2011 (UTC)[reply]

Other issues: The dewikt language templates leave parserfunction residue when substed. This could be fixed by modifying the language templates to have {{{|safesubst:}}} before the #if: ({{ {{{|safesubst:}}}#if:{{{nolink|}}}|Französisch|[[Französisch]]}}). Also, the use of {{t}} needs to be replaced with whatever template dewikt uses. --Yair rand 03:32, 14 September 2011 (UTC)[reply]

Thanks; that change puts the "±" in the right place! :) I'm working on replacing {{t}} with the de.Wikt counterpart {{Ü}}. I am also considering that certain functions, like "Page name:", may not be applicable to de.Wikt. (In fact, I replaced the code to input qualifiers with code to input sense numbers; it may be that I should undo that and instead use the AFAICT-unneeded-on-de.Wikt pagename-with-diacritics code as the vessel for adding sense numbers.) (No, that wouldn't work at all.) - -sche (discuss) 04:01, 14 September 2011 (UTC)[reply]

Re safesubst: actually, the necessary change isn't to the templates (although having templates that subst safely is probably a good idea); the necessary change is to the code: we don't use use "Französisch" in translation tables on de.Wikt, we use {{fr}}. - -sche (discuss) 04:35, 14 September 2011 (UTC)[reply]

I've changed the code so that it does not subst language codes. However, changing {{t}} to {{Ü}} caused the function to display "Could not find translation entry for 'pt:worde'. Please reformat" when I tried to add worde (with ISO code pt given) to a section containing other translations. However, it added correctly to an otherwise empty section. I thought residual "{{t"s or a "{{Üxx" I added might be confusing the script's sorting mechanism, but it was also confused by this version of the page. (That version also shows that I/we/de.Wikt-programmers need to change how/where gender information is added.) - -sche (discuss) 05:03, 14 September 2011 (UTC)[reply]

The function getEditFunction might be the problem. It's built to look through the translation table wikitext for the translation to insert the new translation before (I think), but it's searching by first looking for * [[langname]]:, then for * langname:, and then for {{subst:langcode}}:, in case it's a newly added translation, but dewikt doesn't format translations like any of these. --Yair rand 21:21, 14 September 2011 (UTC)[reply]

You're right; removing subst: (so that it only looks for the language code) makes it work. Now to remove cruft... - -sche (discuss) 21:55, 14 September 2011 (UTC)[reply]

I have adapted the code to work with de.Wikt's Ü-templates. It even nests nb and nn correctly, when neither no nor nb nor nn is already in the table. However, de:Benutzer:Yoursmile gave me feedback that the adder appears but doesn't work (Could not find translation table for 'fr:reg'. Glosses should be unique) when other scripts are around, e.g. de:Benutzer:Yair rand/TabbedLanguages.js. I thought that might be because DOM-node code is redundantly in both codes, but when I separated the DOM-node and translations codes, and imported both, I found that the trans-adder no longer appeared. (If it had appeared, I would have imported only the trans-code and TabbedLanguages, to see if the absence of redundant DOM-code allowed the two to work together.) Any idea why splitting the two seemingly discrete scripts causes them to cease funktioning (ie causes the trans-adder to cease appearing)? Any idea why the trans-adder appears but does not work when TabbedLanguages are around?

Separate issue: any idea what I did wrong when I tried to remove the "script" bit? That edit caused the adder to cease appearing. That edit also removed a bit of "gender"-code, but that wasn't problematic; I successfully removed it later. I rendered the "script" bit harmless (we don't use script templates on de.Wikt) by causing it to input nothing and removing the interface, but that leaves a lot of cruft. - -sche (discuss) 23:56, 17 September 2011 (UTC)[reply]

The top four lines of that edit are actually removing part of something completely unrelated to the "script" bit, but that part isn't what actually broke it. The edit contained an extra comma (ota:{,wsc:"ota-Arab"}) which caused a syntax error. --Yair rand 07:32, 19 September 2011 (UTC)[reply]

Thank you! I have got the script to work with Tabbed Languages; with your syntax-fix, now the unneeded "script" part has been successfully removed. I tested Tabbed Languages and the Trans-Adder together in the main namespace on de:Katze. I moved the code to de:Benutzer:-sche/uebersetzung.js, if anyone wants to see for themselves (remember that at the moment it is still oriented to {{Benutzer:-sche/sw1c}} and therefore only works on test pages or modified pages). The only issue I note now is that it wouldn't add more than one translation without me refreshing or navigating away from and back to the page; I wondered if I just didn't wait long enough (de:Katze had a lot of translations for it to sort through), but it displayed the same behaviour on my simple test page. (A minor problem I remind myself to fix is the unneeded space between * {{langcode}}.) de.Wikt will have to adopt glosses for this to work. - -sche (discuss) 09:27, 19 September 2011 (UTC)[reply]

Of note: the code works differently in different browsers and in the main vs the user namespace. (Those interested can temporarily restore this version of Katze and try using the code on it.) - -sche (discuss) 22:54, 19 September 2011 (UTC)[reply]

Idiomatic translations

I've been wondering for a while how to add translations that are not strictly idiomatic in English or in the target language, but for which the translation itself is idiomatic and not obvious. An example I came across was 'I have a nosebleed', which is translated more or less word for word into Dutch as 'Ik heb een bloedneus', but in Catalan it is translated as 'Em sagna el nas' - literally 'The (my) nose bleeds to me'. Any translations given for nosebleed are only useful for Dutch, but they would not cover the Catalan case at all. The literal translation of 'nosebleed' in Catalan is 'hemorragia nasal', which is not helpful in this case, and is not even idiomatic itself so it can't be included. Cases like this are quite common between languages, and it seems like a rather big gap in Wiktionary to leave it out... —CodeCa t 13:04, 1 September 2011 (UTC)[reply]

People keep talking about a phrasebook, maybe this would be a good use for it. Fugyoo 14:02, 1 September 2011 (UTC)[reply]

But there should be something directly under [[nosebleed]] too. In a paper English-Catalan dictionary you would expect to see something like "nosebleed - hemorragia nasal. I have a ~ : Em sagna el nas". So why not do something like that here: when an English term is best translated with a phrase in the target language, we give the phrase in addition to the straightforward noun=noun translation. —An gr 15:40, 1 September 2011 (UTC)[reply]

I would agree with this way but there is some overlap with entries that are idiomatic, which have their own entries. We might end up with a situation where the entry give contains translations for 'give up', while give up has its own translations as well. We would need to be careful that translations are not duplicated like this. —CodeCa t 15:52, 1 September 2011 (UTC)[reply]

Not addressing your question, which is general, but, rather, only the specific example: Do other symptoms not translate into Catalan similarly? Is "I have a headache" in Catalan not literally "The head hurts to me"? If so (and, not knowing any Catalan, I have no idea whether it's so), then I don't think we should include such translations in any entry at all: they belong in a grammar, perhaps, but are not relevant to any one word of the language.—msh210℠ (talk) 15:47, 1 September 2011 (UTC)[reply]

We already have some grammar in Wiktionary's entries, and I don't think it's much of a problem if we include things like this. They are very useful to someone who wants to say 'I have a nosebleed' in Catalan and looks at the translation table, and then notices immediately that what he wants to say is said differently. It's very user friendly that way. —CodeCa t 15:52, 1 September 2011 (UTC)[reply]

On a tangential note, google:"I have a nosebleed" seems much less common than google:"my nose bleeds" and google:"my nose is bleeding". "my nose bleeds" seems like a fairly good candidate for a phrasebook entry, one that can be linked to from "nosebleed". --Dan Polansky 16:05, 1 September 2011 (UTC)[reply]

'My nose bleeds' seems very awkward to me. It sounds like you are saying it bleeds habitually rather than that it is bleeding right now. —CodeCa t 16:06, 1 September 2011 (UTC)[reply]

Where I live in Northern England, people would say "my nose is bleeding" or possibly "I have a nosebleed" but never "my nose bleeds". I imagine most of the ghits for "my nose bleeds" would be part of phrases such as "my nose bleeds when..." or such like. BigDom 16:10, 1 September 2011 (UTC)[reply]

google:"My head hurts" and google:"my stomach hurts" also seem very common inspite of not using the present continuous tense. Also check the two phrases in Google books to see how very common they are also there. --Dan Polansky 16:12, 1 September 2011 (UTC)[reply]

This American agrees with Codecat and BigDom: my nose bleeds sounds like it does so habitually, not now. My X hurts OTOH means now. Go figure.—msh210℠ (talk) 16:28, 1 September 2011 (UTC)[reply]

I agree. But to me it seems similar to the Americanism "Do you have" instead of "Have you got" (I was once asked "Do you have children?" I replied "Not very often." and she was very confused!) SemperBlotto 06:59, 2 September 2011 (UTC)[reply]

@ CodeCat, we often just split the link, [[hemorragia]] [[nasal]]. Mglovesfun (talk) 07:08, 2 September 2011 (UTC)[reply]

More generally, the translation table should include help when needed. Lmaltier 17:03, 2 September 2011 (UTC)[reply]

Question

I was directed here from a discussion section. So what is the "acceptability" of signatures? An editor since 8.28.2011. 06:36, 3 September 2011 (UTC)[reply]

Any signature is probably going to be acceptable unless someone takes exception to it. If somebody has a problem with it, they will explain and then you will know how to improve its acceptability. Or you can ignore the complaint and advice and choose instead to burn your bridges with that editor. If you burn too many bridges, you may find it difficult or impossible to function effectively here. —Stephen ^(Talk) 06:44, 3 September 2011 (UTC)[reply]

~~I have no clue how that is related to my comment? (Never mind.) Okay... what exactly are you trying to say?~~ Oh, I see! I'm stupid when I'm tired. Thanks! An editor since 8.28.2011. 06:48, 3 September 2011 (UTC)[reply]

FWIW I find colorful signatures annoying... but I'd rather be annoyed than limit others' freedom to editor their own signature, unless the signature is really really silly. Mglovesfun (talk) 09:55, 3 September 2011 (UTC)[reply]

Thank you! An editor since 8.28.2011. 17:08, 3 September 2011 (UTC)[reply]

However: my signature is uni-colored. An editor since 8.28.2011. 17:11, 3 September 2011 (UTC)[reply]

Colorful doesn't necessarily imply more than one color. --Mglovesfun (talk) 14:18, 7 September 2011 (UTC)[reply]

Differently-coloured signatures might cause problems for people using different skins (colour schemes), perhaps because of poor eyesight. Fugyoo 14:26, 7 September 2011 (UTC)[reply]

User:Yair rand/uncategorized language sections/English

Just want to ask for a few volunteers to fix these entries, using templates such as {{en-noun}}, {{en-verb}} or just {{infl}}. No obligation of course, but even fixing one entry at this late stage is a help. Thank you, Mglovesfun (talk) 09:54, 3 September 2011 (UTC)[reply]

community's opinion on bot format

I received this message on my talk page: Hi there. It is a bit late now, but I have been meaning to ask you for some time if the form P.officer was a mistake. Also, in your subpages, we like to use {{conjugation of}} these days rather than {{form of}} e.g. {{conjugation of|pellettizzare||2|s|past historic|lang=it}} (Italian example).

The thing that I'm asking about is when it says "we like to use {{conjugation of}}...rather than {{form of}}". Is it important which template to use, if both produce identical results? To me, it looks unnecessary to change to {{conjugation of}}, if not a waste of time, but I'm eager to hear the voices of other users. --Pofficer 17:53, 4 September 2011 (UTC)[reply]

If they produce the same thing, I don't see the point in switching.—msh210℠ (talk) 20:22, 4 September 2011 (UTC)[reply]

Conjugation of is more uniform, there are many minor variation on how to write "first-person singular present indicative" using form of, while conjugation of only allows one of these. Mglovesfun (talk) 21:34, 4 September 2011 (UTC)[reply]

I agree. Hence my "If..." clause.—msh210℠ (talk) 21:37, 4 September 2011 (UTC)[reply]

OK, I shall continue the bot. If there are any problems, don't hesitate to leave me a message and I'll put a clamp on the bot. --Pofficer 09:56, 5 September 2011 (UTC)[reply]

I got a message just now about a bot flag, the "small formality of requesting permission to run as a bot, and then getting a sysop to set the bot-flag on your user id". Can I request permission to run P.officer (talk • contribs) as a bot? Instead, perhaps, I could change the name of the bot to Officebot (talk • contribs) as it could avoid confusion. --Pofficer 10:20, 5 September 2011 (UTC)[reply]

We do have a couple of bots without -bot or -Bot in the name but, if you don't really mind, I'll change it to PofficerBot before setting the bot flag (It seems to be functioning OK). SemperBlotto 10:31, 5 September 2011 (UTC) p.s. You would need to edit your user-config.py file to reflect the name change.[reply]
- I certainly will change user-config.py. "Pofficerbot" is fine as a name. Thanks --Pofficer 10:40, 5 September 2011 (UTC)[reply]

OK. Changed to "Pofficerbot" and bot-flag now set. SemperBlotto 10:44, 5 September 2011 (UTC)[reply]
- Still needs a vote technically, no? Not that I object. Does anyone actually object to this bot? Seems a bit of a waste of time to have a vote if nobody would actually oppose it anyway. Mglovesfun (talk) 10:46, 5 September 2011 (UTC)[reply]
  - It takes a second to remove the flag (I'm going to keep an eye on it for a while). SemperBlotto 10:48, 5 September 2011 (UTC)[reply]
    - Thanks again SemperBlotto. As if by magic, the edits have been removed from RecentChanges. --Pofficer 10:59, 5 September 2011 (UTC)[reply]

WT:About_Japanese

Calling all 日本語能力のある方...

Following comments in various other threads, it appears that the WT:AJA page needs some work. The issues I'm immediately aware of:

Quasi-adjectives (な adjectives): WT:AJA insists on including the な in the headword, which does not appear to be the current consensus.
の adjectives: WT:AJA does not include any clear guidelines for these. (Relatedly, {{ja-adj}} doesn't include any way of handling these either.)
Suru compound verbs: WT:AJA calls for using the {{ja-suru}} template. However, する is a standalone verb, so including the する conjugation on each and every compound verb page seems excessive.
{{ja-kanjitab}}: WT:AJA describes including this under an === Etymology === section if there is one, but including under the main == Japanese == section produces largely identical results, unless there are multiple etymology sections, in which case repeating the kanjitab seems excessive.
The Transliteration subpage could also use some work, particularly with regard to spacing and what constitutes a single word in Japanese (i.e., particles should be separate, suru should be separate, etc. etc.).
連体詞: WT:AJA states that this should be given a POS of "prefix", but that is really not what these words are -- a prefix is part of a word, whereas 連体詞 are clearly standalone words. They are less prefixes and more like true adjectives, in that they must precede a noun.
Single-kanji entries: WT:AJA has no clear instructions on how to specify okurigana in kun'yomi listings, nor any clear instructions on how to format these to link to verb forms. For instance, 食 shows one way of clarifying okurigana and linking to kanji+okurigana entries, but is a bit visually messy; ja:食#日本語 looks a bit cleaner with the use of hyphens to show the break between the kanji and the okurigana, and this roughly matches the format I've most often seen in dead-tree dictionaries, but the entry doesn't link to any kanji+okurigana entries, just to the hiragana entries; and 飲 doesn't show okurigana or link to any kanji+okurigana entries.

This post is really just meant to get the ball rolling. Many of these changes listed above are a departure from what WT:AJA currently says, so I'm hoping to spark a bit of discussion before making any edits. -- TIA, Eiríkr Útlendi | Tala við mig 17:41, 6 September 2011 (UTC)[reply]

Please keep discussion in the fora here in English where possible. For the record, 日本語能力のある方 seems to mean "those skilled in Japanese" or similar (based only on Google Translate, not that I know any Japanese, myself).—msh210℠ (talk) 18:25, 6 September 2011 (UTC)[reply]

Also, you might want to continue this discussion at Wiktionary talk:About Japanese, since it may wind up taking up a lot of screen space and is specific to Japanese (and indeed the AJA page!).—msh210℠ (talk) 18:28, 6 September 2011 (UTC)[reply]

Fair enough. I've tried posting there a few times and got the overwhelming impression of crickets chirping, which led me to try posting on a more-trafficked page. I'll copy this thread over to there shortly. -- Eiríkr Útlendi | Tala við mig 19:16, 6 September 2011 (UTC)[reply]

I think it is a good idea to have this post here, directing everyone to the Wiktionary talk:About Japanese page (where the discussion can take place). If no-one adds to the discussion, contact other active editors of Japanese directly on their talk pages. If there are none, or you have done that and they have not replied, then you (as the only active editor of the language) should make whatever changes you deem necessary. - -sche (discuss) 20:06, 6 September 2011 (UTC)[reply]

I agree: keep this here, but continue discussion there. (That's what I meant in the first place: sorry I wasn't clear.)—msh210℠ (talk) 23:41, 6 September 2011 (UTC)[reply]

Sure, no worries. :) I copied my initial post over to Wiktionary_talk:About_Japanese#Work_Needed. I hope to get into the nitty gritty over there. -- Cheers, Eiríkr Útlendi | Tala við mig 23:45, 6 September 2011 (UTC)[reply]

I've created a list of the 1000 most common species epithets

Hi Latin lovers and barflies,

User:Pengo/Latin/Top_1000

Based on the Encyclopedia of Life database, I've compiled a list of the most common species epithets. I'm hoping this will help those who want to create new Latin/Translingual entries.

There's more details on the page. --Pengo 14:14, 7 September 2011 (UTC)[reply]

Here's the top 5 words that are missing Latin/Translingual entries:

fasciata (banded)
apicalis (apex)
africana
nana (dwarf)
variegata

--Pengo 03:22, 8 September 2011 (UTC)[reply]

In many cases it is just the inflected form that is missing, eg, nana, nanus (“a dwarf”), but in some cases lemmata are missing, even classical ones, eg, variegatus, variego. DCDuring TALK 14:21, 8 September 2011 (UTC)[reply]

Looks like it would help if I grouped words with the same stem. I'm going to attempt to make another list that does that (at least crudely).

I, myself, don't know an inflection from a declension, so until I learn some Latin grammar and work out all the templates and formatting here, this list is really for you and other editors. So let me know if there's anything else that would be useful. --Pengo 02:53, 9 September 2011 (UTC)[reply]

Grouping by stems is less helpful IMO for speeding entry creation than grouping by inflectional ending and suffix. Ie, the forms ending in "ata" have a very similar Latin section structure. That structure will have links to the participle lemma ending in "atus", which will have links to the lemma verb. Some of those links may be red. The entries for the red link lemmas should probably be added by an editor familiar with Latin with access to multiple Latin references, including some for Medieval Latin. Purely New Latin terms are much less interesting to most Latinists, however important they may be to taxonomists and to Wiktionary. DCDuring TALK 12:25, 9 September 2011 (UTC)[reply]

Thanks for the feedback. Working on it. Will add some extra features too. --Pengo 04:56, 10 September 2011 (UTC)[reply]

abuse filter

As of recently, we have an abuse filter. It allows us to create rules against which edits (and moves and other things) are filtered; if an edit matches such a rule, it can — at our option for each rule — tag the edit with a little note in special:recentchanges, not allow the edit to go through until the editor first sees a warning that the edit might not be wise (which warning can be customized for each filter rule), block the edit altogether, or remove the editor's "autoconfirmed" flag. (Or combinations of those.) It can also do these things only after the editor in question makes too many rule-matching edits in a short period of time (which rate, too, is customizable per filter rule). For more on the abuse filter, see the MediaWiki extension page and/or the Wikipedia abuse filter page (except that they call it the "edit filter").

I've set up some rules that I thought would be helpful.

One of them actually blocks an edit from going through: this filter checks that the user is not an autopatroller, admin, or bot; that the edit is in the main (entry) namespace; that the entry had a level-three header before the edit; that the edit had no level-three header after the edit; and that entry (after the edit) doesn't have a speedy-deletion template or {{only in}}. It blocks that edit from going through. That filter has (in its current incarnation) caught scores of edits, with no false positives (i.e., it not block any edit that we wouldn't have manually rolled back had it gone through).

No other rule currently does more than tag an entry on special:recentchanges. I propose, though, that three do.

One of them is a copy of a filter at enWP. These filters look for an edit that adds a single bad word and nothing else. (Approximately. The actual workings of the filter are hidden on enWP, so I've hidden our copy also. Admins and "edit filter managers" over there can see their copy, and our admins can see ours.) On enWP, it prevents the edit from going through, and has done so for months. (I don't, however, know how fastidious they are in looking for false positives.) Here, it does nothing; so far we've had only a handful of matches, with no false positives. I propose it prevent edits from going through here also. I also ask admins to edit it to enwikt purposes (testing it well of course, especially if it disallows the edit from going through).

Update: Now we've had a false positive.—msh210℠ (talk) 15:27, 8 September 2011 (UTC)[reply]

Another rule I think should do more than tag is one that checks whether a new (main namespace) entry is created by a non-autopatroller (non-admin, non-bot), lacks a level-three header, and either {has both a capital letter and a space in its title} or {has a right-parenthesis ) at the end of its title}. It's only had a handful of hits, with no false positives. Again, please improve it; and I think perhaps it should also block edits from going through.

The third rule I think should prevent edits from going through currently also just tags. It checks whether an entry is not new, is being edited by a non-autopatroller (non-bot, non-admin), and has its after-this-edit text the same (but for capitalization and other normalizations) as its pagetitle.

Thoughts?

(Of course, edits to improve the other filters are sought, too. And new ones.)—msh210℠ (talk) 20:05, 7 September 2011 (UTC)[reply]

This is a really neat tool and I applaud your initiative in creating a few filters to start out. - [The]DaveRoss 20:08, 7 September 2011 (UTC)[reply]

Yeah, it's an excellent thing to have. I think I saw a rule to block edits that create a page whose content is identical to its title, which (for some reason) is a very common useless edit. Equinox ◑ 22:58, 7 September 2011 (UTC)[reply]

We have it for existing pages: it checks whether the page content was reduced to its pagetitle. We could easily have it for new pages also (even by editing the existing filter rule).—msh210℠ (talk) 15:19, 8 September 2011 (UTC)[reply]

I've updated that rule. We can watch and see if it picks up false positives.—msh210℠ (talk) 15:30, 8 September 2011 (UTC)[reply]

I think we could disallow creating pages in the main namespace if the first character is a letter. All existing pages begin with either a header or with a template like {{also}} or {{wikipedia}}. —CodeCa t 23:26, 7 September 2011 (UTC)[reply]

(You mean if the first char is alphanumeric?) Most people don't come here knowing the formatting rules, so if we did do that, we would need extra-prominent links to those and to places they might want, like WT:REE. Equinox ◑ 23:30, 7 September 2011 (UTC)[reply]

I think we should tag those but not disallow 'em. There might be some usable content. Mglovesfun (talk) 11:45, 8 September 2011 (UTC)[reply]

Yeah most people don't know ELE, so we shouldn't disallow them, but I think it might be wise to give the editors a notice before allowing them to save, which notice can outline the format, or something. (And tag the edit.)—msh210℠ (talk) 15:19, 8 September 2011 (UTC)[reply]

I've created a filter along these lines. It checks whether the first character is anything but { or =. It does nothing for now (so we can check for false positives), but can warn the user.—msh210℠ (talk) 18:07, 18 September 2011 (UTC)[reply]

Awesome! —Ruakh_TALK 00:29, 8 September 2011 (UTC)[reply]

Could we write a filter that shows editors a warning before allowing them to put their edit through, if their edit introduces <ref> (and does not introduce <references/>) to an entry that does not contain <references/>? I sometimes forget, on both en. and de.Wikt

, to add <references/> when adding <ref>s. The warning would remind the editors to add the <references/> tag. - -sche (discuss) 18:50, 8 September 2011 (UTC)[reply]

I've created it but have not yet tested it (or checked how expensive it is).—msh210℠ (talk) 21:34, 8 September 2011 (UTC)[reply]

It works well, as far as I can tell, and has caught a couple of users. I think we need to update the location of the message, though (either move MediaWiki:Abusefilter-warning/ref-no-references back to MediaWiki:Abusefilter-warning/ref-no-reference or change the link, whichever is easier; at the moment it displays a default message rather than the nicer and more informative custom one). - -sche (discuss) 05:53, 10 September 2011 (UTC)[reply]

I've fixed it, I think (not just now).—msh210℠ (talk) 17:30, 11 September 2011 (UTC)[reply]

So (to repeat myself) we have a filter rule that catches edits that result in a page whose content matches its title (in the main namespace, and except for whitelisted folks, admins, and bots). Any objection to having that rule block the edit from going through? As of now we've had only about ten hits, but no false positives, and I can't think how there would be any.—msh210℠ (talk) 17:30, 11 September 2011 (UTC)[reply]

Done.—msh210℠ (talk) 15:18, 13 September 2011 (UTC)[reply]

A lot of anon users seem to create pages that just contain one or more instance of the text "[[File:Example.jpg]]" (perhaps they are accidentally clicking the delayed-loading JavaScript toolbar?). A filter for this might be worthwhile. Equinox ◑ 13:02, 17 September 2011 (UTC)[reply]

Alternatively, we could push to have the toolbar fixed. ;-) Personally I have it turned off, because it's just too annoying to try to click in the textarea and suddenly have inserted something random. —Ruakh_TALK 14:49, 17 September 2011 (UTC)[reply]

I would like there to be a fixed-sized empty space on the page until the toolbar loads and replaces it. Whom do we nag? Equinox ◑ 14:54, 17 September 2011 (UTC)[reply]

Lemma entries for Japanese na type adjectives (形容動詞)

I've noticed that a the policy for な-type adjectives or keiyodoshi is to include the な as a part of the entry. This is not, as far as I know, standard practice in any Japanese dictionary or even the Japanese Wiktionary.

For example, both 元気 and 元気な are treated as lemma entries. I believe users would be better served to have the 元気な entry read: "Attributive (連体形) form of 元気", and have both noun and adjective lemma entries listed on 元気.

The -な suffix is merely a conjugation of form and should be treated as such. The most egregious example, and the one that brought this issue to my attention, is たくさんな. There is a page for the kanji version of this word, 沢山, but there isn't even a link to it from たくさんな, instead there is a broken link to 沢山な. But all of this is besides the point, the real issue is that たくさんな is a much less often used form than either たくさんの or even just たくさん. All of these forms would be better served by the lemma entry たくさん, which I would be happy to write tonight after work, but that doesn't solve the system wide problem of な-type adjectives being written with the な as part of the lemma.

The only policy on this I can find, Wiktionary:About Japanese#Quasi-adjectives_.28.E5.BD.A2.E5.AE.B9.E5.8B.95.E8.A9.9E.29, is not very clear on the issue. I propose that it be changed to include the ideas I've put forth, but I'm not sure exactly how to do so. Entries would still need to acknowledge that these are な-type adjectives, but this could easily be done in a header or something, right?

Also, perhaps a bot of some sort to change all of the entries made in the way I clearly find so offensive. *^_^*

MichaelLau 19:04, 9 September 2011 (UTC)[reply]

Hello Michael, thanks for chiming in --

Those of us dealing with Japanese here on the English Wiktionary have been chewing on some of these issues recently, c.f. WT:BP#Preferred forms for Japanese lemmata, WT:BP#WT:About_Japanese, and a number of posts starting at Wiktionary_talk:About_Japanese#Lemma forms for keiyōdōshi and continuing further down that page. The emerging consensus is in largely line with what you describe. I'd really appreciate it if you could have a look at the other posts I've linked to here to get up to speed with what has already been discussed of late, and then it'd be great if you'd add to the discussion over at Wiktionary_talk:About_Japanese#Work_Needed. -- Cheers, Eiríkr Útlendi | Tala við mig 20:00, 9 September 2011 (UTC)[reply]

Template:ja-kanji

I'd like to update this template to handle shinjitai / kyūjitai, much as the Japanese POS templates already do (see {{ja-noun}}, {{ja-adj}}, {{ja-verb}}, etc.).

Some kanji don't get used as words on their own, and thus the individual kanji entry won't have anywhere graceful to put shinjitai / kyūjitai information. It would seem most appropriate for that information to go in the {{ja-kanji}} template itself, rather than (or possibly as well as - removing would take work) in the POS templates.

Are there any admins who could either implement this change, or change the protection level of {{ja-kanji}} to allow me to do so? -- Eiríkr Útlendi | Tala við mig 20:40, 9 September 2011 (UTC)[reply]

Looked at this again and realized I can indeed edit the template, so I did. I'll update the template documentation later to account for the new args. -- Eiríkr Útlendi | Tala við mig 21:51, 20 September 2011 (UTC)[reply]

Classical/Literary Chinese entries

Is there a correct way to add a definition of a Classical or Literary Chinese word? I've seen information about noting an etymology, but I'm not talking about an etymology for a modern word, I'm talking about defining a word as used in Classical Chinese texts. Such an entry might have the same meaning in modern Chinese, or might have a different meaning, or might not be used at all any more. I've looked for a list of "official" wiktionary languages, and found the "random entry" list. It has Old Chinese, Middle Chinese, and Late Middle Chinese, which are names of reconstructed languages (mostly phonology) from different periods. Those were spoken languages, and Classical Chinese was the most common written language used during all of those periods. How about Early Vernacular Chinese, for example words used in the novels 红楼梦 or 金瓶梅? There are entire dictionaries devoted to this language, but is the distinction appropriate on wiktionary? If so, can I just enter Early Vernacular Chinese as the language? Craig Baker 06:26, 11 September 2011 (UTC)[reply]

Such distinctions can be a bit arbitrary, I edit Old French, Middle French and French so I'm familiar with the issue. Important note one, please don't remove Mandarin headers. Mandarin is standard here, it's also a widely accepted language name. Like you say, depending on date it could be Old Chinese, Middle Chinese, and Late Middle Chinese. There's no reason not to create an ad hoc code for Classical Chinese if editors want it. But only if editors want it. For example we have 'ad hoc' codes {{roa-jer}} for Jèrriais and {{roa-leo}} for Leonese. Mglovesfun (talk) 13:38, 11 September 2011 (UTC)[reply]

There is already a code {{lzh}} for Literary Chinese. —CodeCa t 13:53, 11 September 2011 (UTC)[reply]

Right then, in which case definition don't replace Mandarin with Literary Chinese, as Mandarin is a language. We don't replace English with Middle English, we include both when the word/term is used in both languages. Mglovesfun (talk) 14:00, 11 September 2011 (UTC)[reply]

Please see here for your references. Engirst 15:16, 11 September 2011 (UTC)[reply]

Regarding "replacing" Mandarin, what about in entries I've added where the words are not found in Mandarin? Or, where I have no evidence that the word is found in Mandarin? Is there an expectation that when I add a Literary Chinese definition, I will also research whether the word is found in Mandarin? Craig Baker 15:52, 11 September 2011 (UTC)[reply]

Could we enter Classical Chinese like this? Engirst 16:25, 11 September 2011 (UTC)[reply]

The transliteration in this entry is based on Mandarin, the way Classical Chinese is taught in China. There really can't be another way, as they teach the words, grammar, sentence structure but not the pronunciation. So, in short it's not 100% accurate. --Anatoli 00:09, 12 September 2011 (UTC)[reply]

Speedy deletion is only for patently wrong entries. Unlike Wikipedia, deleting one language section of an entry with more than one language section would be considered a speedy deletion, about equivalent to blanking a whole Wikipedia entry. You should likely be going to WT:RFV with these, though unless there's a pretty robust answer to the question 'what's the difference between Classical Chinese and Mandarin?' then a lot of these debates will be a waste of time. Mglovesfun (talk) 19:41, 11 September 2011 (UTC)[reply]

From what I understand from Wikipedia, Literary Chinese is an obsolete writing standard based on the Middle Chinese spoken language that was used up till the early 20th century. It would be comparable to Ottoman Turkish, but being in use a lot longer. —CodeCa t 21:45, 11 September 2011 (UTC)[reply]

Mglovesfun, just to be clear, my change of two entries to "Classical Chinese" which you reverted were new entries which I added earlier that day, and initially categorized them as Mandarin because I didn't know that Literary Chinese was an option. I otherwise wouldn't have considered changing the language of an existing entry, which is what I assume you mean by "speedy deletion". What I'm more curious about is new entries for which I can provide Classical Chinese definitions, but don't have any information about Mandarin or other modern varieties. Craig Baker 03:26, 12 September 2011 (UTC)[reply]

I don't think we have contributors in Classical Chinese and in my opinion, we don't need to split Mandarin and Classical Chinese if a specific pronunciation for a specific period is not chosen. Also, The way Classical Chinese is used in Modern Mandarin, Cantonese, etc, the words can be classified as simply Mandarin, Cantonese, etc with some {{qualifier}}. The reason is that, they are borrowed into modern Chinese varieties and adjusted to the appropriate pronunciation, used in quotes quite often. The few words that are NEVER or SELDOM used in modern languages, like classical pronouns, prepositions, have a modern usage, anyway, e.g. 伊 (yī), 其 (qí), 之 (zhī), etc. and the modern pronunciation. Numerous Mandarin chengyu are an example how Classical Chinese is used in modern Mandarin. To understand their meaning, some knowledge of the Classical Chinese grammar and vocabulary is required but I don't think their components should have a separate entry as Classical Chinese. In any case, hanzi as such a complicated component, which is hard to classify as a part of speech, they often convey a meaning and only in combination become nouns, verbs, etc. --Anatoli 00:09, 12 September 2011 (UTC)[reply]

The {{ qualifier}} idea sounds ok to me. As long as there is a way to note that they are Classical Chinese words, the information will not be lost, and it will be possible to use the dictionary when reading Classical Chinese texts for example. I'm curious why choosing a pronunciation is related to splitting the languages; pronunciations are not necessary to write a dictionary, though maybe some technical limitation of Wiktionary requires it? To me, the written form seems most important in a language like Classical Chinese where the pronunciation was not really even recorded, although I do think reconstructions can be interesting and useful in some ways. I agree that a good number of Classical Chinese words are Mandarin words too, but in general I don't agree with your example of chengyu; in most cases I think chengyu should be considered to be a single word in Mandarin (etc.), but just an ordinary phrase or sentence in Classical Chinese. In such chengyu, what used to be Classical Chinese words are no longer free to act like words in Mandarin sentences, and the meaning of the chengyu has fossilized and often shifted. In the terms used on the "Criteria for inclusion" page, the chengyu is idiomatic in Mandarin, but not in Classical Chinese; and the words it is composed of are not attested in Mandarin outside of that chengyu. In the end, I suppose the "language status" is not very important to me, as long as the two can be separated in some way by the reader or perhaps by an automatic script for the reader's use, so that the dictionary is useful for reading both Classical Chinese and Mandarin texts. Craig Baker 03:26, 12 September 2011 (UTC)[reply]

I only said that chengyu in Modern Mandarin demonstrate the grammar and syntax of Classical Chinese, didn't say that one can use its components as they were then.

As a dictionary, Wiktionary deals less with stylistics and syntax, it would be really hard to define each hanzi for both modern Mandarin and Classical Chinese. 文言文 (Wényánwén) (Classical Chinese), unlike 白話 / 白话 (Báihuà) (Vernacular Chinese) was almost 100% monosyllabic, each word consisting of only one hanzi, and defining the classical sense and usage of hanzi would require major work on these entries. At the moment, most definitions for hanzi are under the Han character heading. The specific CJKV language sections mainly deal with the READINGS of those characters. --Anatoli 03:47, 12 September 2011 (UTC)[reply]

I see your point about definitions for single characters currently being under the "Translingual" section. Of course it would require major work, but it's hard for the work to even begin without a language category, or to attract anyone capable of doing the work. I notice that many (most?) single-character entries already have definitions in the Japanese section, as well as etymologies (while the Translingual section has just a character etymology, not a word etymology). I would assume that the eventual goal is for definitions to be provided in the other languages/dialects too, so that we have information about how the word is used in those languages (or how it is not used—one of the most difficult things about reading Classical Chinese with a dictionary that includes both modern and ancient definitions is filtering out the modern definitions). I will continue reading around the Community Portal to try to understand the plan for this. Perhaps it would also help to note that there are already many large, good dictionaries devoted to just Classical Chinese, so they are definitely useful. Craig Baker 03:08, 14 September 2011 (UTC)[reply]

Sorry to have not seen this sooner. I have created thousands of classical chinese words on wiktionary over the last several years. I have created a number of translations at wikisource that link back words to wiktionary definitions. My long term project is s:Romance of the Three Kingdoms. So far, the format has been largely decided by me, since I haven't come across anyone knowledgeable in the subject that wanted to contributed entries. My approach has been to view the problem through the lens of Mandarin. I'm not suggesting that this is the ideal approach, merely the most practical. Since Classical Chinese can be read in modern Mandarin, it made sense to create mandarin entries that used either the {{literary}} or {{archaic}} labels. The {{obsolete}} label might be another potential option, although I haven't used it all that much. These context labels recently underwent a minor change. They now put the words into categories called: Category:Mandarin archaic terms in traditional script, Category:Mandarin archaic terms in simplified script, Category:Mandarin literary terms in traditional script and Category:Mandarin literary terms in simplified script. These categories should gradually replace Category:zh-tw:Archaic, Category:zh-cn:Archaic, Category:zh-tw:Literary and Category:zh-cn:Literary as well as Category:Traditional Chinese archaic terms, Category:Traditional Chinese archaic terms, Category:Traditional Chinese literary terms and Category:Simplified Chinese literary terms. See 飲酒 and 征東將軍 for some typical examples of how I format entries. Also, I used 字 as a model of how we could do it if time and people were not limitations. Thanks. -- A-cai 01:37, 29 September 2011 (UTC)[reply]

P.S. Other pieces that I've done in this way on wikisource: s:Departing from Baidi in the Morning, s:Preface to the Poems Composed at the Orchid Pavilion, s:Song of Everlasting Regret, s:The Peach Blossom Spring and s:Touring Shanxi Village -- A-cai 01:43, 29 September 2011 (UTC)[reply]