User talk:Dan Polansky/2012
Blurbzone's Babel
[edit]Thank you for the Wiktionary welcome, and for your inquiry about the Babel feature. Response is that I'm not comfortable with adding the template yet. The codes/levels don't provide the right classification for my account. E.g., level 5+ for en. My 2nd item is a request that you please notify your Wiktionary co-administrators not to randomly delete entries (etiquette rule #4), since they are causing waste of volunteer time and energy, and causing [in some cases] loss of relevant information over extremely petty, unexplained issues on their part. Blurbzone 12:14, 3 January 2012 (UTC)
- Regarding your second point, Special:DeletedContributions/Blurbzone reveals no deleted edits, also there is no Wiktionary:Etiquette rules, are you confusing us with Wikipedia? We get that a lot. Mglovesfun (talk) 12:19, 3 January 2012 (UTC)
- Re Babel: If you are unsure of you Babel levels, you should just add
{{Babel|en}}
: that will do, provided you are a native English speaker. - Re admins such as SemperBlotto: It seems that you are new to wikis, not only to Wiktionary. SemperBlotto is one of the most prolific contributors to English Wiktionary, so if anyone is wasting times of volunteers it is you: you are wasting time of SemperBlotto. SemperBlotto is sometimes hasty in deleting suspect poorly formatted additions by newbies, but, OTOH, he does a lot of patrolling and is a huge contributor. If you have any more questions about SOS page, you'd better talk to someone else than SemperBlotto, such as me. --Dan Polansky 12:21, 3 January 2012 (UTC)
- Will ignore your outburst below. Information and language are universal (as far as encyclopaedia matters go), so I don't appreciate that particular editor's approach or actions. There are reasons for the entry I found deleted, then later modified for reduction of relevant input...I don't appreciate that, nor the blatantly false and untrue statement you made "you are wasting time of SemperBlotto." Where????? Nor the fact that my references (e.g., to the etiquette matters) are taken right out of Wiki pages that neither one of you has read. You can both assign my reviews elsewhere since neither one of you seem to understand points of formal and valid communication included in the messages left. Sorry for being short with you, but I hate time waste.
- Re: Babel template --- I am not interested in listing my speaking level, since I'm not interested in advertising nativity to that or any other language. And nativity is not the point when it comes to materials written for publication in any language.Blurbzone 01:34, 4 January 2012 (UTC)
Flood flag
[edit]Hi, would you accept a nomination at WT:RFFF. Mglovesfun (talk) 11:53, 21 January 2012 (UTC)
- Sure, good idea. --Dan Polansky 11:54, 21 January 2012 (UTC)
- Does there need to be a request? Can't you just add the flood flag to my user? --Dan Polansky 11:58, 21 January 2012 (UTC)
- There are supposed to be two approvals, and now there are. BTW please don't make other edits whilst using the flood flag (though it's not a disaster of course) and leave a message on my talk page, or another admin's, when you want the FF off. Mglovesfun (talk) 12:02, 21 January 2012 (UTC)
- Thank you. I see you have set the flood flag for me, but I still see my edits in recent changes: http://en.wiktionary.org/w/index.php?title=Special:RecentChanges&hidebots=1. What is the consequence of the flood flag? --Dan Polansky 12:07, 21 January 2012 (UTC)
- There are supposed to be two approvals, and now there are. BTW please don't make other edits whilst using the flood flag (though it's not a disaster of course) and leave a message on my talk page, or another admin's, when you want the FF off. Mglovesfun (talk) 12:02, 21 January 2012 (UTC)
Some stats about the replacement batch made using WT:AWB:
- The date of the dump: 2012-01-09
- The number of pages matching the regexp ", from {{(term|etyl)" before the start of the replacement batch: 17690
- The number of edits: 5717
--Dan Polansky 16:32, 21 January 2012 (UTC)
I have made a follow up, by considering the forgotten search for "< {{proto" in AWB.
- The number of edits: 1147
- Updated regexp for creating worklist: "< {{(term|etyl|proto)"
- Regexp table used in AWB:
( < (from False False False False False False True (< (from False False False False False False True < , from False False False False False False True ^< From False True True False False False True
--Dan Polansky 18:55, 22 January 2012 (UTC)
As a hopefully last follow-up, I have searched all the pages containing " < ", without any checking for its context.
- The number of edits: 895.
- False positives: I have managed to skip or correct a couple of false positives. I am afraid I have left some false positives uncorrected, though.
--Dan Polansky 22:53, 22 January 2012 (UTC)
I have fixed double "from" occurrences, as "from from"; #edits: 46.
Regexp table used for this:
from from from True False False False False False From from From True False False False False False From From From True False False False False False from From from True False False False False False
--Dan Polansky 11:54, 24 January 2012 (UTC)
- I dunno why you were still in the recent changes; it's clearly a bug of some sort, no doubt due to the updating on the MediaWiki software. Mglovesfun (talk) 11:58, 24 January 2012 (UTC)
- On Saturday when I had the flood flag, I have failed to mark my edits my minor. Could this be the cause? --Dan Polansky 12:09, 24 January 2012 (UTC)
Babel
[edit]No, no chance I would voluntarily add Babel to my userpage. The underlying theory of the Babel presentation is wrong. The templates have little actual usage beyond vanity. And I subscribe to the original purpose for user namespace on Wikimedia projects (circa 2004, so mostly irrelevant at the current time): User pages are for the over-all improvement of the wiki. - Amgine/ t·e 15:34, 22 January 2012 (UTC)
- I'm sure, then, with your strong opinions on the subject, that you can appreciate when others have strong opinions on issues as well. - Amgine/ t·e 17:18, 22 January 2012 (UTC)
About Wikisaurus
[edit]In 2010 and 2011 I did poorly thought, immature and harmful edits to Wikisaurus, including the history of WS:person, the use and documentation of certain templates and a flood of WS pages for proper nouns. You had to undo them and/or discuss them with me, while I defended my edits stubbornly and disregarded your experience in contributing to the project and your knowledge of semantic relations.
I am sorry. --Daniel 17:53, 24 January 2012 (UTC)
- I appreciate and accept your apology. --Dan Polansky 11:58, 25 January 2012 (UTC)
Czech entries for Czech diacritics
[edit]Hi Dan. Could you create Czech entries for kroužek (the English already exists) and the diminutive suffix -ek (that page has Breton, Hungarian, Kurdish, and Serbo-Croatian entries) please? Also, could you tell me what diminutive suffix was involved in the derivation of čárka from čára please? I ask you because you are the member of Category:User cs-N with whom I am best acquainted. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 23:19, 5 February 2012 (UTC)
- I've created "kroužek", but at least two senses are missing. I have not created an entry for the Czech suffix -ek, as I am unsure the formation of diminutives in Czech is best described as suffixing, and even if it is, this requires an amount of research that I not want to go into right now. Some examples, nonetheless: kruh -> kroužek, pruh -> proužek, samec ->sameček, dům -> domek & domeček, strom -> stromek & stromeček, chlap -> chlapec -> chlapeček, zvon -> zvonec -> zvoneček, čepec -> čepeček, hrob -> hrobeček, kopec -> kopeček, dar -> dárek -> dáreček, myslivec -> mysliveček, trávník -> trávníček, kůň -> koník -> koníček, Vašek -> Vašík -> Vašíček.
- For čára and čárka, I am not sure this is best described as suffixing, as well. Furthermore, I find it hard to find a set of examples that follows the pattern of čára -> čárka. Terms that sound similar to čárka include bárka and várka, but these are no obvious diminutives of other terms. --Dan Polansky 10:54, 6 February 2012 (UTC)
- Thank you. If it's all right with you, I'll copy that information to Talk:-ek, so that it may serve as a basis for someone else to do that research in future, if it occurs to anyone to do so.
- I created the Czech pronunciatory transcriptions for čárky, háčky, kroužek, and kroužky based upon the information on Czech phonology that I found on Wikipedia. Could you verify that they are correct, please?
- — Raifʻhār Doremítzwr ~ (U · T · C) ~ 17:40, 6 February 2012 (UTC)
- I am not very enthusiastic about your copying the above post to Talk:-ek, but I do not oppose it either.
- I avoid doing Czech pronunciation, so my verification is going to be very superficial, following W:Czech phonology. Furthermore, I am going to ignore SAMPA, as I do not intend to learn it any time soon.
- Here are the IPA pronunciations that you have given:
- čárky: /ˈʧaːrkɪ/
- háčky: /ˈɦaːʧkɪ/
- kroužek: /ˈkroʊʒɛk/
- kroužky: /ˈkroʊʒkɪ/
- Comments:
- "oʊ" should be "ou̯" following W:Czech phonology and http://www.phil.muni.cz/jazyk/files/fonetika/ipa-pro-cestinu.pdf, but some folks from Czech Wiktionary use oʊ̯ there, following W:cs:Fonologie_češtiny. See also User_talk:Dan_Polansky/2011#IPA_and_Czech_u.
- ˈ is possibly only used in [] and not in //; to be verified.
- --Dan Polansky 18:12, 6 February 2012 (UTC)
- See Talk:-ek#Czech diminutive suffix for the information I've copied; is what I've done OK? Thanks for fixing the pronunciatory transcriptions for (deprecated template usage) kroužek and (deprecated template usage) kroužky (I've updated the SAMPA transcriptions accordingly); my use of <oʊ> invalid IPA characters (<>) was prompted by the information on diphthongs given in w:Czech language (I omitted the combining inverted breve below, which marks non-syllabicity, because I supposed that, were the /oʊ/ not a diphthong, that fact would be noted with a syllabic break as /o.ʊ/). Finally, re the use of <ˈ> invalid IPA characters (<>), is stress phonemic in Czech? (I assume not, given that every word receives stress on its initial syllable as standard.) — Raifʻhār Doremítzwr ~ (U · T · C) ~ 22:06, 6 February 2012 (UTC)
- The pronunciation of "kroužky" in IPA is /ˈkrou̯ʃkɪ/ and not /ˈkrouʒkɪ/. If there´s a voiced consonant before a voiceless one, both these consonants become voiceless. --Istafe (talk) 17:51, 30 March 2012 (UTC)
- Re: "Is stress phonemic in Czech?": I don't know. --Dan Polansky 07:07, 7 February 2012 (UTC)
- OK, well, if not, the stress mark shouldn't be used in phonemic IPA transcriptions (i.e., those bound by slashes: /…/) for Czech. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 19:54, 7 February 2012 (UTC)
Wikisaurus links standardization
[edit]I have run a batch of edits that standardized linking to Wikisaurus from the mainspace, using AWB. This required a lot of manual editing and supervision.
Worklist identification: search for "[[WikiSaurus" without regexp in a case-sensitive manner.
The number of edits: 175.
Replacements used in AWB:
\* *\[\[WikiSaurus * See also [[WikiSaurus True True False False False False ^''See'' \[\[WikiSaurus * See also [[Wikisaurus True True True False False False ''[S|s]ee'' \[\[WikiSaurus See also [[Wikisaurus True True False False False False [[WikiSaurus [[Wikisaurus True False False False False False
--Dan Polansky 10:36, 6 February 2012 (UTC)
Then please make clear which of the sources cited claims that the etymolgy is unknown. "Etymology unknown" is a claim as much as any other, and it must be clear whose claim this is. Otherwise you are just replacing one unsourced claim by another. --Dbachmann 10:33, 7 February 2012 (UTC)
Etymology cleanup
[edit]A personal log: I have done a partial etymology cleanup using AWB, above all by putting commas before "from" in etymology chains. Beyond that, I added the use of the term template to some easily matched parts that were not yet formatted using the term template; I did it in an incomplete manner, mainly as a preparation for adding commas before "from". The task of adding "term" template is not perfectly suited to AWB; ideally, one would loop at a comprehensive list of languages and, for each language, perform a couple of regexp replacements.
- Worklist search term: }} from {{(term|etyl|proto|prefix)
- Regexp table:
''\[\[([^][#]*)#Middle English\|([^][#]*)\]\]''\s*["']([^"']*)["'] {{term|$1|$2|$3|lang=enm}} False True False False False False ''\[\[([^][#]*)#Old English\|([^][#]*)\]\]''\s*["']([^"']*)["'] {{term|$1|$2|$3|lang=ang}} False True False False False False ''\[\[([^][#]*)#Anglo-Norman\|([^][#]*)\]\]''\s*["']([^"']*)["'] {{term|$1|$2|$3|lang=xno}} False True False False False False ''\[\[([^][#]*)#Italian\|([^][#]*)\]\]''\s*["']([^"']*)["'] {{term|$1|$2|$3|lang=it}} False True False False False False ''\[\[([^][#]*)#Old French\|([^][#]*)\]\]''\s*["']([^"']*)["'] {{term|$1|$2|$3|lang=fro}} False True False False False False ''\[\[([^][#]*)#Old Norse\|([^][#]*)\]\]''\s*["']([^"']*)["'] {{term|$1|$2|$3|lang=non}} False True False False False False ''\[\[([^][#]*)#Middle French\|([^][#]*)\]\]''\s*["']([^"']*)["'] {{term|$1|$2|$3|lang=frm}} False True False False False False ''\[\[([^][#]*)#French\|([^][#]*)\]\]''\s*["']([^"']*)["'] {{term|$1|$2|$3|lang=fr}} False True False False False False ''\[\[([^][#]*)#Latin\|([^][#]*)\]\]''\s*["']([^"']*)["'] {{term|$1|$2|$3|lang=la}} False True False False False False ''\[\[([^][#]*)#Old High German\|([^][#]*)\]\]''\s*["']([^"']*)["'] {{term|$1|$2|$3|lang=goh}} False True False False False False ''\[\[([^][#]*)#Middle High German\|([^][#]*)\]\]''\s*["']([^"']*)["'] {{term|$1|$2|$3|lang=gmh}} False True False False False False ''\[\[([^][#]*)#Middle Low German\|([^][#]*)\]\]''\s*["']([^"']*)["'] {{term|$1|$2|$3|lang=gml}} False True False False False False ''\[\[([^][#]*)#German\|([^][#]*)\]\]''\s*["']([^"']*)["'] {{term|$1|$2|$3|lang=de}} False True False False False False ''\[\[([^][#]*)#Old Frisian\|([^][#]*)\]\]''\s*["']([^"']*)["'] {{term|$1|$2|$3|lang=ofs}} False True False False False False ''\[\[([^][#]*)#East Frisian\|([^][#]*)\]\]''\s*["']([^"']*)["'] {{term|$1|$2|$3|lang=frs}} False True False False False False ''\[\[([^][#]*)#Middle Dutch\|([^][#]*)\]\]''\s*["']([^"']*)["'] {{term|$1|$2|$3|lang=dum}} False True False False False False ''\[\[([^][#]*)#Dutch\|([^][#]*)\]\]''\s*["']([^"']*)["'] {{term|$1|$2|$3|lang=nl}} False True False False False False ''\[\[([^][#]*)#Old Saxon\|([^][#]*)\]\]''\s*["']([^"']*)["'] {{term|$1|$2|$3|lang=osx}} False True False False False False ''\[\[([^][#]*)#Spanish\|([^][#]*)\]\]''\s*["']([^"']*)["'] {{term|$1|$2|$3|lang=es}} False True False False False False ''\[\[([^][#]*)#Gothic\|([^][#]*)\]\]''\s*["']([^"']*)["'] {{term|$1|$2|$3|lang=got}} False True False False False False ''\[\[([^][#]*)#Middle English\|([^][#]*)\]\]'' {{term|$1|$2|lang=enm}} False True False False False False ''\[\[([^][#]*)#Old English\|([^][#]*)\]\]'' {{term|$1|$2|lang=ang}} False True False False False False ''\[\[([^][#]*)#Anglo-Norman\|([^][#]*)\]\]'' {{term|$1|$2|lang=xno}} False True False False False False ''\[\[([^][#]*)#Italian\|([^][#]*)\]\]'' {{term|$1|$2|lang=it}} False True False False False False ''\[\[([^][#]*)#Old French\|([^][#]*)\]\]'' {{term|$1|$2|lang=fro}} False True False False False False ''\[\[([^][#]*)#Old Norse\|([^][#]*)\]\]'' {{term|$1|$2|lang=non}} False True False False False False ''\[\[([^][#]*)#Middle French\|([^][#]*)\]\]'' {{term|$1|$2|lang=frm}} False True False False False False ''\[\[([^][#]*)#French\|([^][#]*)\]\]'' {{term|$1|$2|lang=fr}} False True False False False False ''\[\[([^][#]*)#Latin\|([^][#]*)\]\]'' {{term|$1|$2|lang=la}} False True False False False False ''\[\[([^][#]*)#Old High German\|([^][#]*)\]\]'' {{term|$1|$2|lang=goh}} False True False False False False ''\[\[([^][#]*)#Middle High German\|([^][#]*)\]\]'' {{term|$1|$2|lang=gmh}} False True False False False False ''\[\[([^][#]*)#German\|([^][#]*)\]\]'' {{term|$1|$2|lang=de}} False True False False False False ''\[\[([^][#]*)#Old Frisian\|([^][#]*)\]\]'' {{term|$1|$2|lang=ofs}} False True False False False False ''\[\[([^][#]*)#Old Saxon\|([^][#]*)\]\]'' {{term|$1|$2|lang=osx}} False True False False False False ''\[\[([^][#]*)#Spanish\|([^][#]*)\]\]'' {{term|$1|$2|lang=es}} False True False False False False ''\[\[([^][#]*)#Gothic\|([^][#]*)\]\]'' {{term|$1|$2|lang=got}} False True False False False False {{etyl\|la}}\s*''\[\[([^][']*)\]\]''\s*"([^"]*)" {{etyl|la}} {{term|$1||$2|lang=la}} False True False False False False {{etyl\|la}}\s*''\[\[([^][']*)\]\]'' {{etyl|la}} {{term|$1|lang=la}} False True False False False False {{etyl\|enm}}\s*''\[\[([^][']*)\]\]''\s*"([^"]*)" {{etyl|enm}} {{term|$1||$2|lang=enm}} False True False False False False {{etyl\|enm}}\s*''\[\[([^][']*)\]\]'' {{etyl|enm}} {{term|$1|lang=enm}} False True False False False False {{etyl\|fro}}\s*''\[\[([^][']*)\]\]''\s*"([^"]*)" {{etyl|fro}} {{term|$1||$2|lang=fro}} False True False False False False {{etyl\|fro}}\s*''\[\[([^][']*)\]\]'' {{etyl|fro}} {{term|$1|lang=fro}} False True False False False False {{term\|([^\|]*)\|\1\| {{term|$1|| True True False False False False }} from {{(term|etyl|proto|prefix) }}, from {{$1 True True False False False False ^Via {{(term|etyl|proto|prefix) From {{$1 True True True False False False
- #edits: 2371
--Dan Polansky 14:46, 12 February 2012 (UTC)
Reliability: The regexp replacements above are imperfect. Their use requires close monitoring in AWB. I have cought several bad edits, but I may have overlooked other ones. An example of bad edit, cought by Ruakh: http://en.wiktionary.org/wiki/בן־דוד?diff=16253121. --Dan Polansky 15:00, 12 February 2012 (UTC)
- Could you also look for '>' in etymologies and replace them with 'from'? I recently found entries which wrongly used '>' instead of '<'. --Vahag (talk) 12:29, 1 March 2012 (UTC)
- Do you have an example entry? --Dan Polansky (talk) 12:31, 1 March 2012 (UTC)
How do you interpret this vote? Is it implementable in any meaningful way? Mglovesfun (talk) 13:55, 23 February 2012 (UTC)
- For reference, the text of the vote: "For the pronunciation of English terms, agreement to use the specific IPA character /ɹ/ instead of /r/ for the r phoneme in words like red, green and orange."
- What the vote says I think is this: 'In the IPA markup of pronunciation of English terms, for words that contain the r phoneme that is exemplified by (but not limited to) its occurrences in "red", "green" and "orange", the IPA character /ɹ/ rather than /r/ should be used.'
- What I don't know is whether there are more r-phonemes in English, as I know close to nothing about English IPA; a striken-out text of the vote speaks of /ɹ/ and /ɾ/, implying there are at least two r-phonemes in English.
- In conclusion, my guess would be, yes, the vote can be implemented by placing /ɹ/ into all English IPA pronunciations that have the same r-phoneme as "red", "range" and "green". --Dan Polansky 14:11, 23 February 2012 (UTC)
- "the r phoneme in words like red, green and orange" says to me there are words which aren't like red, green and orange where the IPA symbol ɹ should not be used. But the vote doesn't try to explain what a word like red, green or orange is. I think DAVilla may have meant 'the r phoneme in all English words', but it's a weird way to express that, as the wording leads me to believe the exact opposite, that is to say not in all English words. Maybe we should have a 2012 version of the vote saying exactly that, the rhoticr in all English words, where any r is considered rhotic unless it is not pronounced at all (like in British English mark, there is no rhotic r at all). Mglovesfun (talk) 14:16, 23 February 2012 (UTC)
- I don't know. I am the wrong person to talk to about English IPA, I am afraid. You can find out what DAVilla meant by asking him; he's still around. Thereafter, a clarifying vote could be in order, but I don't really know. --Dan Polansky 14:27, 23 February 2012 (UTC)
- "the r phoneme in words like red, green and orange" says to me there are words which aren't like red, green and orange where the IPA symbol ɹ should not be used. But the vote doesn't try to explain what a word like red, green or orange is. I think DAVilla may have meant 'the r phoneme in all English words', but it's a weird way to express that, as the wording leads me to believe the exact opposite, that is to say not in all English words. Maybe we should have a 2012 version of the vote saying exactly that, the rhoticr in all English words, where any r is considered rhotic unless it is not pronounced at all (like in British English mark, there is no rhotic r at all). Mglovesfun (talk) 14:16, 23 February 2012 (UTC)
Czech translations by Zdeněk Brož
[edit]I have added Czech translations donated by Zdeněk Brož via ZBroz (talk • contribs). See the user page and the user talk page for details. --Dan Polansky (talk) 23:17, 26 February 2012 (UTC)
Numbers
[edit]Thank you for checking on the numbers Wikisaurus entry. I checked each and every number that I put up. All of them are from Wikipedia, and I checked all of them, except, I think the twins/triplets series. I cannot understand why you deleted that work. Please restore. BenjaminBarrett12 (talk) 16:05, 29 February 2012 (UTC)
- The words you have added do not refer to numbers, and thus are not hyponyms of "number". You have added "monad", "pair", "duo"; "primary", "secondary", "tertiary"; "single", "double", "triple"; numeral systems "quaternary", "quinary", "sexal"; you have added all of them as hyponyms of "number", but these are not numbers but rather words relating to numbers in various ways. There is a terminological confusion by which number words are often called "number", but that is not what WS:number is about. You have expanded WS:number based on W:Numeral (linguistics), an article that looks like original research and contains rare terms such as "multiplicative numerals". You have used terms in the semicolon headings that have very few Google hits, such as "multiplicative numbers"; you have used "composite numbers" to refer to "unary", "binary", and "ternary", although the term usually refers to positive integers that are not prime numbers. What I think could be meaninguflly done is place these terms into "Various" section rather than "Hyponyms" section, and separate the groups using {{ws ----}} rather than using headings with rare phrases. Here is your revision of WS:number. --Dan Polansky (talk) 18:46, 29 February 2012 (UTC)
- Thank you for the discussion. I agree that the terms did not seem to be very sensible. I think it would be useful to find terms, perhaps with a query to the tea room, for these types of numbers. I myself have wanted a page to find numbers like this in the past, and I think it is useful information. In the meantime, I did what I think you are suggesting, though I put ordinal numbers in a section immediately below the cardinal numbers, as I think that makes sense to people looking for that information. BenjaminBarrett12 (talk) 20:18, 29 February 2012 (UTC)
- Thank you for the clarification. I've kept the removed items in a list for future attestation. BenjaminBarrett12 (talk) 20:44, 29 February 2012 (UTC)
Restricting translations to lemma
[edit]I have restricted adjectival translations to their lemma forms, removing feminine, neuter, and sometimes plural forms listed alongside the masculine form in the translation table. Done using AWB. Minor supervision was necessary, as some Icelandic terms were found outside of translation tables. These could have probably been reduced to masculine form too, but I have skipped them to be on the safe side.
- #edits: 360
- Worklist search term: \[\[(...)(.*?)]] {{m}}, \[\[\1.*?]] {{f}}, \[\[\1.*?]] {{n}}
- Regexp table:
\[\[(...)(.*?)]] {{m}}, \[\[\1.*?]] {{f}}, \[\[\1.*?]] {{n}}, \[\[\1.*?]] {{p}} [[$1$2]] False True True False False False True \[\[(...)(.*?)]] {{m}}, \[\[\1.*?]] {{f}}, \[\[\1.*?]] {{n}} [[$1$2]] False True True False False False True
- An example edit: diff
--Dan Polansky (talk) 20:07, 29 February 2012 (UTC)
- Thank you for doing this! --EncycloPetey (talk) 04:21, 1 March 2012 (UTC)
- You're welcome!
I have performed another batch, this time for translations using the t template rather than the plain [[...]] markup. Supervision is needed; there were some false positives in noun entries, such as in "gangue" or "pupil". In fact, it is surprising that the lame heuristic used in the regexp table below works so well for a range of languages. The batch sometimes left forms for m|p, f|p and n|p behind, in those few entries where they were used.
- #edits: 272
- Worklist search term: pick one from the regexp table below
- Regexp table:
{{(t-?\+?)\|(..)\|(...)([^\|]*?)\|m(\|?.*?)}}, {{t-?\+?\|\2\|\3([^\|]*?)\|f\|?.*?}}, {{t-?\+?\|\2\|\3([^\|]*?)\|n\|?.*?}}, {{t-?\+?\|\2\|\3([^\|]*?)\|p\|?.*?}} {{$1|$2|$3$4$5}} {{(t-?\+?)\|(..)\|(...)([^\|]*?)\|m(\|?.*?)}}, {{t-?\+?\|\2\|\3([^\|]*?)\|f\|?.*?}}, {{t-?\+?\|\2\|\3([^\|]*?)\|n\|?.*?}} {{$1|$2|$3$4$5}}
- An example edit: diff.
Above, I have been working with languages that have the three genders of m, f, and n. What remains to be done is the same for languages with the two genders of m and f, such as--probably--Italian, French and Spanish. Using the same technique is likely to generate a significant number of false positives. The matching on all three genders almost always selects adjectival translations; a similar matching for only masculine and feminine would probably select many nouns, such as analogues of English "actor" and "actress" in these languages. To fix this, one would have to make sure that the translation being matched is within an adjective section, which is nowhere obviously possible using AWB regexp replacements. --Dan Polansky (talk) 10:44, 1 March 2012 (UTC)
- I have manually fixed the few items that were of the form m|p, f|p and n|p. --Dan Polansky (talk) 11:22, 1 March 2012 (UTC)
A further batch, heavily supervised with several skips and manual corrections, for the likes of {{t|...|m|f}}, {{t|...|n}}:
- #edits: 32
- Edit summary: restrict adjective translation to lemma
- Rexexp table:
{{(t-?\+?)\|(..)\|(...)([^\|]*?)\|m\|f(\|?.*?)}}, {{t-?\+?\|\2\|\3([^\|]*?)\|n\|?.*?}}, {{t-?\+?\|\2\|\3([^\|]*?)\|p\|?.*?}} {{$1|$2|$3$4$5}} False True True False False False True {{(t-?\+?)\|(..)\|(...)([^\|]*?)\|m\|f(\|?.*?)}}, {{t-?\+?\|\2\|\3([^\|]*?)\|n\|?.*?}} {{$1|$2|$3$4$5}} False True True False False False True
--Dan Polansky (talk) 11:51, 1 March 2012 (UTC)
Etymology and reverted less-than sign
[edit]Responding to a query by Vahag several sections above: The search in AWB for "> {{(term|etyl|proto|prefix)" finds 387 pages. Unfortunately, in these entries, ">" means either "from" or "whence", depending on whether it goes in the right directtion. For ">" meaning "from", an example page is "drachma"; for ">" meaning "whence", an example page is "saksalaistaa".
I have performed a batch of edits in AWB nonetheless.
- #edits: 46
- #skips: 341
- Edit summary: etymology: > to from, where > goes in the wrong direction,
- Worklist regexp: > {{(term|etyl|proto|prefix)
- Skip term: ==Finnish== (skip Finnish entries, as there ">" usually went in the right direction)
- Regexp table:
> {{(term|etyl|proto|prefix) , from {{$1 False True
- Supervision: massive; there were very many skips, as the sign ">" often went in the right direction, and I often could not tell whether it does, as I did not speak or know the language in question. Cases where I could confirm that ">" went in the wrong direction include Old French being derived from Latin and not vice versa. Cases where I could not tell included various Arabic items.
--Dan Polansky (talk) 13:59, 1 March 2012 (UTC)
Changing forms
[edit]Thank you for all your help so far. I see you are involved in etymology, and I have a related question I hope you can help me with. The word "pasta" used to be "paste" in English when referring to the Italian food. In the pasta entry, I put up the earliest quotation I could find of "pasta" and now I would like to put up some earlier instances of "paste" (where the word clearly means "pasta). Is it acceptable to also add an obsolete entry to the paste entry? And can I put both "paste" and "pasta" in the etymology section on the pasta entry? I hope my questions make sense. BenjaminBarrett12 (talk) 22:17, 1 March 2012 (UTC)
- Your questions:
- "Is it acceptable to also add an obsolete entry to the paste entry?"
- Do you mean to add an obsolete sense to the "paste" entry? We include all attested senses, even the obsolete ones, so the answer to my phrasing of the question is yes. If you add the sense together with an attestation, even better.
- 'Can I put both "paste" and "pasta" in the etymology section on the pasta entry?'
- If "pasta" stems from "paste", then "paste" is naturally part of the etymology of "pasta", but "pasta" seems to be directly borrowed from Italian, so it does not stem from English "paste". If "paste" is merely cognate with "pasta" (by having some shared ancestors), then "paste" can be listed as a cognate, although English-English cognates are usually listed in "Related terms" section, so there would be "paste" in "Related terms" of "pasta".
- 'The word "pasta" used to be "paste" in English when referring to the Italian food.'
- That is incorrect or misleading, I think. The English word "pasta" is not descended from the English word "paste".
- --Dan Polansky (talk) 22:32, 1 March 2012 (UTC)
- "Is it acceptable to also add an obsolete entry to the paste entry?"
- Thank you for the responses. Clearly you're right that "pasta" is not descended from "paste." Instead what probably happened is that "pasta" replaced "paste." So I'll add an obsolete meaning to "paste" with citations and then cross-reference pasta and paste to each other. BenjaminBarrett12 (talk) 22:52, 1 March 2012 (UTC)
Wikisaurus statistics
[edit]Some Wikisaurus statistics for January 2012 follow, based on stats.grok.se, such as "http://stats.grok.se/en.d/201201/Wikisaurus:fatty_acid".
The number of Wikisaurus pages: 1209 (inaccurate, actually a bit more more; a list extracted from Wiktionary:All_Wikisaurus_pages)
The total number of page hits in Wikisaurus in January 2012: 85,983
The total number of page hits in Wikisaurus in January 2012, without top 100 pages: 23,513
Median page hits per Wikisaurus page in January 2012: 17
Average page hits per Wikisaurus page in January 2012: 71
Top 100 Wikisaurus pages in January 2012, with page hits in the month:
WS:sexual intercourse | 5448 |
WS:masturbate | 5383 |
WS:vagina | 5004 |
WS:vulva | 4793 |
WS:breasts | 4191 |
WS:erection | 3234 |
WS:semen | 2959 |
WS:penis | 2523 |
WS:promiscuous woman | 2440 |
WS:nude | 2402 |
WS:prostitute | 2027 |
WS:promiscuous man | 1769 |
WS:mistress | 1564 |
WS:sexual partner | 1289 |
WS:homosexual | 1218 |
WS:bisexual | 1068 |
WS:pregnant | 891 |
WS:labia | 869 |
WS:testicles | 745 |
WS:heterosexual | 739 |
WS:money | 638 |
WS:beautiful woman | 582 |
WS:nonsense | 441 |
WS:penis/translations | 365 |
WS:buttocks | 337 |
WS:fool | 299 |
WS:clitoris | 296 |
WS:marijuana | 265 |
WS:arrogant | 264 |
WS:bathroom | 258 |
WS:drunk | 258 |
WS:oral sex | 226 |
WS:sexual activity | 209 |
WS:libertine | 203 |
WS:beer | 193 |
WS:ear | 192 |
WS:anus | 191 |
WS:obstinate | 187 |
WS:female genitalia | 180 |
WS:fastidious | 177 |
WS:pubic hair | 169 |
WS:cheeky | 167 |
WS:destroy | 158 |
WS:male homosexual | 148 |
WS:sexy | 147 |
WS:excellent | 143 |
WS:marijuana cigarette | 140 |
WS:copulate | 134 |
WS:ejaculate | 132 |
WS:naive | 131 |
WS:insane | 129 |
WS:woman | 127 |
WS:girl | 126 |
WS:masturbation | 126 |
WS:wow | 122 |
WS:idiot | 121 |
WS:thingy | 118 |
WS:calm | 112 |
WS:die | 109 |
WS:witty | 109 |
WS:characteristic | 108 |
WS:villain | 108 |
WS:anal sex | 102 |
WS:abandon | 101 |
WS:strange | 101 |
WS:sycophant | 101 |
WS:beautiful | 99 |
WS:joke | 99 |
WS:ghost | 98 |
WS:chav | 97 |
WS:covert | 97 |
WS:water | 94 |
WS:hinder | 93 |
WS:steal | 91 |
WS:supposition | 91 |
WS:gigantic | 90 |
WS:give head | 90 |
WS:ejaculation | 89 |
WS:intelligent | 89 |
WS:reprehend | 89 |
WS:index finger | 88 |
WS:saying | 88 |
WS:dork | 87 |
WS:humble | 86 |
WS:ephemeral | 84 |
WS:fake | 83 |
WS:beginner | 82 |
WS:combative | 82 |
WS:happy | 82 |
WS:dammit | 81 |
WS:advise | 80 |
WS:apex | 80 |
WS:evil | 80 |
WS:skillful | 80 |
WS:stingy | 80 |
WS:disorder | 79 |
WS:model | 79 |
WS:scrawny | 79 |
WS:mad person | 78 |
--Dan Polansky (talk) 20:17, 12 March 2012 (UTC)
c. and cf. in etymologies
[edit]You're good at regex and AWB, from what I can tell. (Thank you for replacing those less-than signs!) Could you also replace c., C, c, [[c.|C.]], C, C., c and c. (and possibly other variants of "c") (both with and without spaces between the c, dot or bracket and the following word) in etymology sections — replace them with "attested circa" or "circa"? And could you replace cf. / Cf. with "compare", since Wiktionary is not paper?
Entries with etymologies that begin with "c."-variants are: discord gangster donate authoritarian disability numinous adaptitude heliostat virescent sycomore sycamore disbelieve iridescent acetification heteronym inalienable anonymous indigent guzzle confection burger legion availability hieroglyph centenarian immanent humanitarian anonym equalitarian putrescent centenary beautification rubescent astronomical lickety-split reification prose offing notability dollar scissors realize anthroponym opalescent squeeze split dwarf arborescent (there may be more entries that have "c" somewhere else).
Entries with etymologies that begin with "cf." or "Cf." are: el nana celo civet baste mope squander brou actionable winkel Wallachia boira gibe glans gebied (other entries may have "cf" elsewhere in them).
- -sche (discuss) 20:44, 14 March 2012 (UTC)
- I'll see if I can find time and enthusiasm for this task. --Dan Polansky (talk) 10:01, 16 March 2012 (UTC)
RFD
[edit]Are you also harassing everyone else that simply votes "delete" "tosh" and "nah" on a regular basis with no "bearing" or just me?Lucifer (talk) 22:12, 6 April 2012 (UTC)
- I have no seen anyone else consistenty giving blatant non-reasons in RFD. "Tosh" is not my favorite reason, but at least makes sense as some sort of quick shortcut; "Since I am one" is pure nonsense as a reason for keeping. --Dan Polansky (talk) 06:44, 7 April 2012 (UTC)
- For the case that you remove inconvenient threads from your talk page again, your recent poor editing is documented in this revision. --Dan Polansky (talk) 10:34, 7 April 2012 (UTC)
- For my record, the users of this person include Gtroy (talk • contribs), Acdcrocks (talk • contribs) and Catch22 (talk • contribs), and Luciferwildcat (talk • contribs). I have already had a conversation with him, in User_talk:Acdcrocks#Quotations., in which his responses were unforthcoming and rude. --Dan Polansky (talk) 10:56, 7 April 2012 (UTC)
- Re: "I have edited on Wikipedia for years with little issues that could not be resolved" (a statement made by the person on his talk page): his user W:User_talk:Luciferwildcat made first edit on 10 November 2011. W:User:Luciferwildcat/archive1 shows that the user causes trouble on Wikipedia no less than on Wiktionary. On Wikipedia, W:User:Purplebackpack89 (who also edits Wiktionary) had the pleasue of repeatedly dealing with the user. By the way, I posting here on my talk page, as the user has a history of removing inconvenient comments from his talk page. --Dan Polansky (talk) 06:43, 8 April 2012 (UTC)
- Good to know. Thanks for the research. -Atelaes λάλει ἐμοί 10:53, 9 April 2012 (UTC)
- Here is some more on some Wikipedia users that can be of this person: W:User:Gtroy has made first edit on 18 August 2011 and last edit on 30 October 2011; it appears to be his user. W:User:Acdcrocks has made first edit on 10 October 2011 and last edit on 2 November 2011; it appears to be his user. W:User:Catch22 does not appear to be a user of the person; W:User:Catch22 has made almost no contributions, anyway.
- The person has been discussed at Wiktionary:Beer_parlour_archive/2011/October#Gtroy_sockpuppets. There, the person's having made one legal threat is mentioned. There, many admins seem to see the person as more useful than harmful.
- A sockpuppet that I have omitted: User:Totallynotfairbro.
- Blocking the person indefinitely is likely to lead to his creation of further sockpuppets. His strategy of being very independent, sloppy ("I'll dump whatever comes to mind to Wiktionary, and you guys find citations, format my contributions, and delete the entry if needed; I am too lazy to do a proper attestation myself, but I can always cry 'wiki' and 'collaboration'"), of being rude, and of creating new accounts if blocked seems to work well for him, and seems hard to act against. --Dan Polansky (talk) 19:13, 9 April 2012 (UTC)
- Good to know. Thanks for the research. -Atelaes λάλει ἐμοί 10:53, 9 April 2012 (UTC)
statistics on translations
[edit]Ahoj,
Is it hard to produce regular statistics on translations into a language (Russian, Chinese, etc.)? Where can I find this info or does it require some coding? --Anatoli (обсудить) 07:35, 13 April 2012 (UTC)
- Privét, if what you are after are numbers of language entries, WT:STAT shows you what you want; I especially like the "gloss definitions" column, as it does not include inflected forms. If what you are looking for is the number of translations in translation tables of English entries, per language, then I do not know of any such statistic being published. Nonetheless, the number of English pages that have at least one Russian translation can be obtained using WT:AWB by searching for "\* Russian:" with regexp; this gives me 26,734 entries. What you can also search for is "{{t.?\|ru" using regexp; ".?" means "possibly one another character"; "\|" makes sure "|" is read literally by AWB rather than as a magic character; the search gives me 23,021 entries. These are numbers of English entries rather than numbers of Russian translations, as one English entry can have more than one Russian translation.
- To get the number of Russian translations in English entries, you can run "grep" on a Wiktionary dump from a command line:
- grep "\* Russian:" enwiktionary-20120109-pages-articles.xml | wc -l
- This gives me 39,570 Russian translations.
- The tools that you need to get these numbers is AWB (Windows-only) or "grep", which is installed on most or all Linux machines and can be had for Windows from http://gnuwin32.sourceforge.net/packages/grep.htm (my favorite, one that I actually used) and from other places. No coding or writing scripts is needed.
- You can get a dump by following instructions at Help:FAQ#Downloading_Wiktionary; the latest dump is at http://dumps.wikimedia.org/enwiktionary/latest/; the latest dump without revision histories that I recommed is http://dumps.wikimedia.org/enwiktionary/latest/enwiktionary-latest-pages-articles.xml.bz2. The dumps per date of publishing can be had from http://dumps.wikimedia.org/enwiktionary/. Before using the dump, you need to unpack the bz2 archive, which can be done using 7-zip on Windows.
- Sorry if you already knew a lot of what I have described. --Dan Polansky (talk) 08:31, 13 April 2012 (UTC)
- I do not know of any such statistic being published - There were: User:Robert Ullmann/Trans languages, but the last update is from 2010 (28560 Russian translations). Maro 14:15, 13 April 2012 (UTC)
- Thank you very much for the long description. I thought the statistics is readily available somewhere. It seems a bit complicated, I don't write code as a hobby, only for work :) I may address it later though. Maro, I saw that dump, was hoping this could be refreshed. User:Matthias_Buchmeier has generated list of tranlations from English into some FL's. It's quite impressive and useful. --Anatoli (обсудить) 10:48, 14 April 2012 (UTC)
I'd disagree that this is all that includable. It's easily derived from the sum of its parts, if you can find the sense at mood, granted. Furthermore it's much much more commonly called the subjunctive, so this and indicative mood, nominative case (etc. etc.) should really jut point to the more common form, so subjunctive, indicative and nominative. Mglovesfun (talk) 10:21, 9 May 2012 (UTC)
- I'd keep it per my arguments from Talk:nominative case. --Dan Polansky (talk) 18:35, 9 May 2012 (UTC)
Why names of languages are not proper nouns
[edit]A blog post, as it were, relating to Wiktionary:
I do not know whether names of languages are proper nouns or not, but I will try to develop a position according to which names of languages are not proper nouns.
In English, there are two characteristics of names of languages that point to their being proper nouns: capitalization and their having a single referent.
Capitalization of names of languages can be declared an accident. Terms referring to people by nationality ("Englishman", "Spaniard") are also capitalized and yet are not proper nouns. In names of nationalities and languages, capitalization seems to confer an honor to the referent ("Spanish") or manner of reference ("Spaniard").
Having a single referent is also not guaranteeing being a proper noun. An analogy can be construed between terms referring to materials and masses on one hand and terms referring to languages on the other hand. Terms referring to materials such as "gold" and "wood" also have a single referent, even if spatially distributed one. A chair can be made of wood or metal. By a bit of a stretch, a sentence can be made of English (as if it were a sort of material from which sentences are made) or of Spanish. The hypothesis would be that terms that have a single referent that is an abstract object are not considered proper nouns; abstract objects include chemical elements, chemical compounds, colors, numbers, etc. If this hypothesis is accepted, it remains to be seen whether a language is a concrete object or an abstract object.
A language has no spatial extension, no location and no mass, so it does not belong to the ranks of such concrete objects as rocks, rivers, cities, plants, animals, people, stars, comets, etc. The analogy to materials suggested above points to the possibility that a language is an abstract object; similar abstract objects seem to be musical styles, such as "rock" and "jazz", and names of dances (Wikisaurus:dance).
A key property of abstract objects is that they can be instantiated or quasi-instantiated, even though they already are instances. Thus, gold can be instantiated in a particular brick, rock can be instantiated in a particular song (which again is an abstract object, instantiated in a particular performance of that song), the number five is instantiated in my right hand in the number of fingers, the color black is instantiated in the color of my laptop, and the English language is instantiated in this particular sentence. Only a fraction of the whole of gold is instantiated in a particular brick; only a fraction of the whole of English is instantiated in any particular sentence. By contrast, concrete objects do not seem to show anything like this ability of being instantiated.
This consideration is complicated by the fact that information artifacts (such as this sentence) seem to be abstract objects yet their names are considered proper names, and so capitalized. To deal with it, we could introduce a degree of being abstract. Thus, a particular sentence is still an abstract object, instantiated in utterances of the sentence. But it is less abstract than a language, instantiated in its sentences. Furthermore, if this consideration creates a problem, the problem so created nowhere concerns only names of languages. Why are names of styles of music and dances considered common nouns, while names of particular artistic works proper nouns? Are languages more like styles of music and dances or more like particular artistic works?
To close the discussion, if a term that refers to a single referent that is an abstract object is not considered a proper noun regardless of the singularity of reference, and if a language is an abstract object, then a name of language is not a proper noun, regardless of capitalization.
--Dan Polansky (talk) 22:03, 14 July 2012 (UTC)
When you create a new term in a language with limited documentation, especially one you don't know, be sure to add a citation (either a quotation or a ===References=== section) and add {{LDL}}
with it. Thanks --Μετάknowledgediscuss/deeds 08:35, 21 July 2012 (UTC)
- Why exactly? The term has one mention in Freeman, 1835. --Dan Polansky (talk) 08:39, 21 July 2012 (UTC)
- Then give the full citation in a References section. If you're asking why, then you might as well ask why we cite terms to begin with. --Μετάknowledgediscuss/deeds 08:42, 21 July 2012 (UTC)
- The common practice is not to cite every single term. A term should be requested for citation only if it is in doubt that it is attested. A term can be attested even if it has no quotation in Wiktionary. The term "teny" is pseudo-attested in Freeman's 1835 dictionary. I have added many Czech terms with no quotations and no references into Wiktionary, and plan to continue doing so. --Dan Polansky (talk) 08:49, 21 July 2012 (UTC)
- I only ask because it would be rather hard for someone else to cite it (easier now because you left a note on the talk page). --Μετάknowledgediscuss/deeds 08:52, 21 July 2012 (UTC)
- I have entered the source in the original edit summary. It should not be hard to people to verify the entry by the pseudo-attestation criteria; all they have to do is find the term in a single dictionary. --Dan Polansky (talk) 08:57, 21 July 2012 (UTC)
- If you really want to fight this hard not to cite it, I won't force you to. I was just hoping that you'd help out. --Μετάknowledgediscuss/deeds 09:00, 21 July 2012 (UTC)
- I now think you are actually right; providing a reference is not all that much work, and it makes it much easier for people to verify the entry. --Dan Polansky (talk) 09:06, 21 July 2012 (UTC)
- If you really want to fight this hard not to cite it, I won't force you to. I was just hoping that you'd help out. --Μετάknowledgediscuss/deeds 09:00, 21 July 2012 (UTC)
- I have entered the source in the original edit summary. It should not be hard to people to verify the entry by the pseudo-attestation criteria; all they have to do is find the term in a single dictionary. --Dan Polansky (talk) 08:57, 21 July 2012 (UTC)
- I only ask because it would be rather hard for someone else to cite it (easier now because you left a note on the talk page). --Μετάknowledgediscuss/deeds 08:52, 21 July 2012 (UTC)
- The common practice is not to cite every single term. A term should be requested for citation only if it is in doubt that it is attested. A term can be attested even if it has no quotation in Wiktionary. The term "teny" is pseudo-attested in Freeman's 1835 dictionary. I have added many Czech terms with no quotations and no references into Wiktionary, and plan to continue doing so. --Dan Polansky (talk) 08:49, 21 July 2012 (UTC)
- Then give the full citation in a References section. If you're asking why, then you might as well ask why we cite terms to begin with. --Μετάknowledgediscuss/deeds 08:42, 21 July 2012 (UTC)
Thanks :) --Μετάknowledgediscuss/deeds 09:07, 21 July 2012 (UTC)
- Are you now going to add
{{LDL}}
to all members of Category:Malagasy nouns, after you have added it to teny? Are you a bureaucrat who hates to do real work? --Dan Polansky (talk) 09:08, 21 July 2012 (UTC)- No, because I don't know any Malagasy and it's not an interest of mine. Conversely, I am at tpi-1 in Tok Pisin and am actively learning it, and you will notice that I have cited and LDL'd more than a hundred Tok Pisin entries so far, and intend to do many more, as I'm just beginning a project of using the Tok Pisin Bible to cite entries (by just beginning, I mean that I'm in Chapter 2 of Genesis). --Μετάknowledgediscuss/deeds 09:15, 21 July 2012 (UTC)
- So what is
{{LDL}}
good for? Why have you placed it to teny? By what criteria do you decide which entries of poorly documented languages should have the template? --Dan Polansky (talk) 09:17, 21 July 2012 (UTC)- To answer your questions in turn: to alert readers (just read it and you'll understand), because of policy (i.e., the vote you opposed) and because of the answer to your previous question, and the criteria are the same as those that determine whether terms get added to Wiktionary at all: whether or not someone cares enough to add them. --Μετάknowledgediscuss/deeds 09:22, 21 July 2012 (UTC)
- Then again, why have you added
{{LDL}}
to teny but not to Category:Malagasy nouns? Does the policy force you to add{{LDL}}
to teny? If so, what sentence of the policy? Why does not the same policy force you to add{{LDL}}
to all members of Category:Malagasy nouns? Why are readers not alerted on almost all pages of Wiktionary that three quotations are missing? --Dan Polansky (talk) 09:27, 21 July 2012 (UTC)- Answered individually: The template is designed for entries in the main namespace (not categories), yes, to quote the CFI: "a box explaining that a low number of citations were used should be included on the entry page (such as by using the
{{LDL}}
template)" (this is described as a "requirement" a few lines above), I have already answered that, because most entries on Wiktionary are easily citable with three quotations and can be presumed to be reliable. --Μετάknowledgediscuss/deeds 09:32, 21 July 2012 (UTC)
- Answered individually: The template is designed for entries in the main namespace (not categories), yes, to quote the CFI: "a box explaining that a low number of citations were used should be included on the entry page (such as by using the
- Then again, why have you added
- To answer your questions in turn: to alert readers (just read it and you'll understand), because of policy (i.e., the vote you opposed) and because of the answer to your previous question, and the criteria are the same as those that determine whether terms get added to Wiktionary at all: whether or not someone cares enough to add them. --Μετάknowledgediscuss/deeds 09:22, 21 July 2012 (UTC)
- So what is
- No, because I don't know any Malagasy and it's not an interest of mine. Conversely, I am at tpi-1 in Tok Pisin and am actively learning it, and you will notice that I have cited and LDL'd more than a hundred Tok Pisin entries so far, and intend to do many more, as I'm just beginning a project of using the Tok Pisin Bible to cite entries (by just beginning, I mean that I'm in Chapter 2 of Genesis). --Μετάknowledgediscuss/deeds 09:15, 21 July 2012 (UTC)
- To repeat myself: Why does not the same policy force you to add
{{LDL}}
to all members of Category:Malagasy nouns? --Dan Polansky (talk) 09:35, 21 July 2012 (UTC)- Sorry, I misread that one. Some of them may be fully citable and fully trustworthy. Thus, the template is added on a case-by-case basis. --Μετάknowledgediscuss/deeds 15:30, 21 July 2012 (UTC)
- Disingenuous. Thanks for self-disclosure. --Dan Polansky (talk) 11:05, 22 July 2012 (UTC)
- No problem. Er, um, *clears throat* mwahahahaha! --Μετάknowledgediscuss/deeds 19:34, 22 July 2012 (UTC)
- Disingenuous. Thanks for self-disclosure. --Dan Polansky (talk) 11:05, 22 July 2012 (UTC)
- Sorry, I misread that one. Some of them may be fully citable and fully trustworthy. Thus, the template is added on a case-by-case basis. --Μετάknowledgediscuss/deeds 15:30, 21 July 2012 (UTC)
Inflected forms
[edit](A Wiktionary blog-like post) Wiktionary has the practice of including inflected forms rather than restricting itself to lemmas. One advantage of doing so is that the reader of a written material can take any inflected form found in a sentence, and find it in Wiktionary, even when he does not know the regularities ("smile" --> "smiled") and irregularities ("buy" --> "bought") of inflection of the language. Thus, to the extend to which inflection is regular, form-of entries can be thought of as a tabulated or buffered result of an inflectional analyzer, something like an addition table replacing a compact algorithm for addition.
A consequence that may be disliked is that, in highly inflected languages, entries for inflected forms massively outnumber lemma entries. Based on WT:STATS made using the dump of 2012-07-24, here are some statistics for some of the languages with highest numbers of inflected forms. Let me highlight that the column E has the number of form-of definitions rather than form-of entries, and the column D has the number of gloss definitions rather than the number of gloss-having entries. Furthermore, note that C = D + E. Column B stands in no direct relation to the other columns other than that B < C; it involves both entries with form-of definitions and entries with gloss definitions. B-D comes close to being a lower bound on the number of pure form-of entries.
Language (A) | Number of entries (B) | Number of definitions (C) | Gloss definitions (D) | Form-of definitions (E) | E/D | B-D |
---|---|---|---|---|---|---|
Latin | 613023 | 992531 | 44653 | 947878 | 21 | 568370 |
Italian | 487007 | 613087 | 129759 | 483328 | 4 | 357248 |
Spanish | 242918 | 357840 | 38284 | 319556 | 8 | 204634 |
French | 254629 | 333948 | 53346 | 280602 | 5 | 201283 |
Esperanto | 100720 | 101803 | 12254 | 89549 | 7 | 88466 |
German | 69501 | 113797 | 31781 | 82016 | 3 | 37720 |
Swedish | 89768 | 100972 | 20954 | 80018 | 4 | 68814 |
Finnish | 107180 | 133946 | 63074 | 70872 | 1 | 44106 |
Catalan | 56049 | 72196 | 9761 | 62435 | 6 | 46288 |
The claim of Wiktionary:Main_Page that Wiktionary has "3,065,335 entries with English definitions" has to be read with the inflected forms in mind. By summing the column D from WT:STATS, we get 1,490,000 gloss definitions; the number of gloss entries is even lower than that.
There is a discussion in Wiktionary:Requests_for_verification#vuvuzela about whether attestation requirements should apply to inflected forms. Some people seem to think that whenever Wiktionary has an attested lemma entry, it should also have all regularly formed inflected forms of the lemma regardless of their attestation. By contrast, I think that Wiktionary should avoid hosting unattested inflected forms regardless of the attestation of the lemma. Especially, when an inflected form is challenged in RFV, it should be deleted unless attested. The use of bots to create a complete set of inflected forms where there is a suspition that some of them are unattested seems tolerable, provided the inflected forms are deleted once they are challenged and left unattested. --Dan Polansky (talk) 09:44, 4 August 2012 (UTC)
Luciferwildcat
[edit]For some notes on Luciferwildcat (talk • contribs), see #RFD above.
Users of the person:
- Gtroy (talk • contribs)
- Acdcrocks (talk • contribs)
- Catch22 (talk • contribs)
- Totallynotfairbro (talk • contribs)
- Luciferwildcat (talk • contribs)
- AVerSiMeDejan (talk • contribs) -- added in 2013
Unattested entries that he has hastily added:
- fatbitch; November 2011
- wifeboy; November 2011
- fatcow; November 2011
- nocturnavore -- "a person who eats the night"; see Talk:nocturnavore later; August 2012
Incidents:
- Making one legal threat: diff; September 2011
- Using unduly familiar terms of address: "sunshine", and "hon": diff, diff; October 2011
- Lying about the term of his editing in Wikipedia: "I have edited on Wikipedia for years with little issues that could not be resolved": diff; April 2012
- Edit warring at niggerling; August 2012
Edits showing lack of lexicographical skill:
Vote:
Editing pattern:
- Creating shoddy entries, more of which are unattested than is usual with other editors.
--Dan Polansky (talk) 10:03, 12 August 2012 (UTC); updated --Dan Polansky (talk) 16:48, 12 August 2012 (UTC); updated --Dan Polansky (talk) 11:32, 1 October 2012 (UTC)
- If you want more, just search on "Lucifer" or "LW" at WT:RFC, WT:RFV, and WT:RFD. You also missed his IP address, which he has used like an alternate account. There are additionally more incidents and older blocks, including a one from a while ago given by Atelaes (see his talkpage for more). --Μετάknowledgediscuss/deeds 15:42, 12 August 2012 (UTC)
- I don't find anything all too interesting using the searches you have proposed. --Dan Polansky (talk) 16:44, 12 August 2012 (UTC)
English definitions from Wiktionary in a relational form
[edit]Today, I have discovered that an export of English Wiktionary is being published that only contains English terms with their definitions in a relational format, as four tab-separated columns. This is so nice! The file is much smaller than a full dump (50 MB after unpacking), can be copied and pasted to Excel and then filtered using Excel filtering tools, can be grepped while you see both the term and the definition in the result line, can be copied to Excel after grepping, etc.
The definition files location:
- http://toolserver.org/~enwikt/definitions/
- enwikt-defs-20120821-all.tsv.gz - English Wiktionary, all languages
- enwikt-defs-20120821-en.tsv.gz - English Wiktionary, English terms only
The result of 'grep "irrationality" enwikt-defs-20120821-en.tsv' (on Windows, you may use "findstr" instead of "grep"):
English Dada Noun # A cultural movement that began in [[w:Zürich, Switzerland|Zürich, Switzerland]] during [[w:World War I|World War I]] and peaked from 1916 to 1920. The movement primarily involved visual arts, literature (mainly [[poetry]]), [[theatre]], and graphic design, and was characterized by [[nihilism]], [[deliberate]] [[irrationality]], [[disillusionment]], [[cynicism]], chance, [[randomness]], and the [[rejection]] of the [[prevail]]ing standards in [[art]]. English irrationalities Noun # {{plural of|irrationality}} English irrationality Noun # The quality or state of being [[irrational]]; want of the [[faculty]] or the quality of reason; [[fatuity]]. English irrationality Noun # Something which is irrational or brought forth by irrational action, judgement, idea or thought. English unreason Noun # Lack of [[reason]] or [[rationality]]; [[unreasonableness]]; [[irrationality]]. English woolly-headedness Noun # The quality of being [[woolly-headed]]: [[illogicalness]], [[irrationality]].
From what I can see, this has been around since March of 2010. It appears to have been created by Conrad.Irwin (talk • contribs).
The relational format is also great for all languages. It is hard to filter on language and definition at the same time using AWB or Wiktionary online search function.
--Dan Polansky (talk) 20:46, 26 August 2012 (UTC)
Hi! Is this expression idiomatic/proverbial (entry-worth) in Czech? If so, would you like to move the transwiki into the main namespace? Or if it's not idiomatic, let me know and I'll delete it. - -sche (discuss) 21:05, 26 August 2012 (UTC)
- It would be "dej Bůh štěstí". It seems to be a salutation. It seems to be a literary one; I have never used it myself and never heard anyone saying that. Google books google books:"dej Bůh štěstí" finds a couple of hits, including in Sebrané spisy Boženy Němcové, 1906, "V tom přišli už hosté do koliby. — „Dej bůh štěstí!" pozdravili. — „No, daj bůh, daj bůh," — odpověděl ...". It reminds me of grüß Gott. Similar salutations that come to mind include "pozdrav Pánbůh", "Bůh s tebou", and "s Bohem" (actually "sbohem" in modern Czech). I seems borderline idimatic or at least phrasebook-worthy, although it would be poor phrasebook users who would try to use this phrasebook gem in a Prague pub :). --Dan Polansky (talk) 21:26, 26 August 2012 (UTC)
Question on guideline
[edit]Please help me to understand, as I am obviously unaware of the proper protocol for making edits here on Wiktionary. Do I need to create an RFV or RF(whatever) for any changes that I deem necessary or just the ones that you don't agree with? You actually don't need to answer that as it is a rhetorical question, to demonstrate the absurdity of your percieved thought process. I just want to clarify the guidelines as I understand them are to do constructive edits as I see fit. If I am verifying a definition and I see an incorrect sense, I am supposed to change that and fix it. If we put out RFV for all changes nothing would get done. I honestly wish I do not upset you with my responses as I merely wish to edit peacefully and be corrected amicably when my editting is incorrect. Speednat (talk) 00:00, 28 August 2012 (UTC)
- Your post probably relates to User_talk:Speednat#Abderian and your edit in diff.
- If you want to remove definitions, you typically need to use RFV or RFD. There are rare cases when it is obvious that a definition can be removed, but, most often, when an editor cannot attest a definition, he sends it to RFV via
{{rfv-sense}}
to see whether other editors can attest it. Just recently, I could not attest English "angulus" so I have send it to RFV, and, soon enough, an editor linked to attesting quotations. --Dan Polansky (talk) 19:20, 28 August 2012 (UTC)
Restricting translations to lemma 2
[edit]Another batch of AWB editing, this time a tiny one. See also #Restricting translations to lemma.
- #edits: 8
- Worklist search term: see the first regex in the replacement pair
- Replacement pair:
{{(t-?\+?)\|(..)\|(...)([^\|]*?)\|m(\|?.*?)}}, {{t-?\+?\|\2\|\3([^\|]*?)\|f\|?.*?}}, {{t-?\+?\|\2\|\3([^\|]*?)\|p\|?.*?}} {{$1|$2|$3$4$5}}
- Edit summary: restrict adjective translation to lemma
- An example edit: diff
--Dan Polansky (talk) 19:14, 6 September 2012 (UTC)
Queries on English definitions
[edit]Look what you can do with #English definitions from Wiktionary in a relational form
grep "\(arrogant\|conceited\|proud\).*person" enwikt-defs-20120821-en.tsv | cut -f2-
arriviste Noun # [[upstart]], [[newcomer]], late arrival, [[nouveau riche]], [[parvenu]], generally characterised as an ambitious, brash or arrogant person who has yet to integrate with his or her new social group. bashaw Noun # {{archaic}} A [[grandee]]; a self-important or arrogant person. {{defdate|from 16th c.}} bastard Noun # {{vulgar|referring to a man}} A [[contemptible]], [[inconsiderate]], overly or [[arrogantly]] [[rude]] or [[spiteful]] person. See [[asshole]], [[sod]]. bighead Noun # {{colloquial}} A person having an inflated opinion of himself; a [[conceit]]ed or [[arrogant]] person. cock of the walk Noun # {{idiomatic}} A [[proud]] or [[conceited]] person. cock of the roost Noun # {{idiomatic}} A [[proud]] or [[conceited]] person. coxcomb Noun # A [[foolish]] or [[conceited]] person; a [[dandy]]. inflated Adjective # {{context|figuratively}} [[pompous|Pompous]]; [[arrogant]] (''of a person or ego'') pajock Noun # A [[proud]] or [[ostentatious]] person. proudling Noun # {{obsolete}} A [[proud]] or [[haughty]] person. swellhead Noun # {{informal}} An [[arrogant]] or [[conceited]] person. wisenheimer Noun # {{informal}} A [[self]]-[[assertive]] and [[arrogant]] person; a [[know-it-all]] or [[smart aleck]].
--Dan Polansky (talk) 19:32, 6 September 2012 (UTC)
- Cool, but I wish we had this kind of stuff built into Wiktionary, rather than having to mess about with downloads and "dumps" and command lines. There are few Web sites that would benefit more from some kind of semantic database format. Equinox ◑ 20:51, 6 September 2012 (UTC)
- Agree. Unfortunately, I don't see a straightworfard way of achieving this in Wiktionary. Restricting a plain text search (not regex) to one language would already help a lot. At least, a couple of Wiktionary editors are not too shy with command line, so I find it good to know the option is there. I have downloaded the dump once, and now I can query it as often as I wish. --Dan Polansky (talk) 21:02, 6 September 2012 (UTC)
Ahoj,
Mám dvě otázky.
- I suspect držitel has only neutral meaning and doesn't mean a military occupant, see Talk:occupant.
- What's the singular of vepřové škvarky? Is it vepřová škvarka? Feminine or masculine? See pork rind. --Anatoli (обсудить/вклад) 04:26, 10 October 2012 (UTC)
- Re: 1: I don't think "držitel" is used in the sense of military occupant. But frankly, I cannot think of a sentence in which I would translate "držitel" as "occupant", in any sense. "Držitel" was added in diff by 87.245.91.33 (talk). In case of doubt, "držitel" can just be removed.
- Re: 2: The singular of "škvarky" is "škvarek".
- --Dan Polansky (talk) 17:38, 10 October 2012 (UTC)
- Děkuji. There's some problem with vepřové škvarky - the gender is not displayed using
{{cs-noun}}
. (I didn't notice that you were the creator, otherwise, I wouldn't ask the second question). --Anatoli (обсудить/вклад) 22:21, 10 October 2012 (UTC)
- Děkuji. There's some problem with vepřové škvarky - the gender is not displayed using
Thanks
[edit]Hullo, it is ‘Pilcrow’ (Seth) again. I am not a Slavicist myself, but I must say, your labours on Wikisaurus are generally quite good. In a particular, I like Wikisaurus:burger because it is very thorough, and it is interesting to be aware of all of the many types of hamburgers. A little nitpick is that ‘gardenburger’ and ‘whaleburger’ are missing, but that isn’t a big deal. I appreciate how you want to enrich the project, even if I do not always show it.
So…would you like to be friends? --Æ&Œ (talk) 15:18, 15 October 2012 (UTC)
- I am puzzled. In any case, I wish you many good edits in Wiktionary. --Dan Polansky (talk) 20:41, 16 October 2012 (UTC)
Hi Dan. Can you revisit your vote at Wiktionary:Requests for deletion#Crouchy in light of my revisions to Crouchy? Cheers! bd2412 T 18:26, 15 October 2012 (UTC)
- Done. Thanks for citing the entry. --Dan Polansky (talk) 20:52, 16 October 2012 (UTC)
[1]: wtf: from where You got this? Czechs use infinitiv. Cheers. ;-) --Kusurija (talk) 06:08, 24 October 2012 (UTC)
- "neurčitek" is attested in Google books (google books:"neurčitek"), but "infinitiv" seems much more common in Czech indeed. Good catch. --Dan Polansky (talk) 18:09, 24 October 2012 (UTC)
- Feel free to consult with me about czech terminology ;-). --Kusurija (talk) 07:45, 25 October 2012 (UTC)
What is one word
[edit]A word (aka "lexeme") is a collection of inflected forms. This is not a definition, merely a statement to clarify that a word can appear in printed text as different strings of characters. As an example, "does" and "did" are word forms belonging to one word. However, "doer" (noun) is not an inflected form of "do" and is a different word.
When do two uses of a particular string of characters belong to one word, and when do they belong to two words? My tentative answer is that two uses with different etymology belong to two words and that two uses with different part of speech belong to two words. Any two uses that share etymology and part of speech belong to one word. Thus, same-spelled words are distinguished by either of etymology and part of speech. By means of example: "paper" in "made of paper" and "paper" in "papered" belong to two words per different part of speech; "sound" in "it made a loud sound" and "sound" in "The Sound of Denmark, where ships pay toll" belong to two words per different etymology. --Dan Polansky (talk) 11:42, 3 November 2012 (UTC)
Just in case you wonder ...
[edit]... about [2] and [3]: I accidentally clicked a rollback link instead of the link above it. -- Gauss (talk) 21:25, 12 November 2012 (UTC)
Do you support removing the phrasebook? —RuakhTALK 21:42, 5 December 2012 (UTC)
- I probably don't. But I want to find out how much the "nuke the whole phrasebook" thing expressed in WT:RFD is supported. --Dan Polansky (talk) 21:57, 5 December 2012 (UTC)
- I don't think that forcing a vote is the best way to do that. I think it would be better to just ask on a discussion-page. —RuakhTALK 22:01, 5 December 2012 (UTC)
- A vote is an excellent way of documenting what editors support, I think. The vote can be easily found later by searching for "phrasebook". The passage of CFI proposed by the vote for removal entered CFI without a vote, so a voted attempt at a removal seems worthwhile, whatever the result. A discussion page is too low-profile for getting input from as many people as possible. The phrasebook has been repeatedly a contentious issue, with too little clear indication of where the public opinion is directed. ---Dan Polansky (talk) 22:09, 5 December 2012 (UTC)
- I don't think that forcing a vote is the best way to do that. I think it would be better to just ask on a discussion-page. —RuakhTALK 22:01, 5 December 2012 (UTC)
- I agree with Dan's idea. Would also be good to have CFI, so that the phrasebook doesn't become a mockery. --Anatoli (обсудить/вклад) 23:42, 5 December 2012 (UTC)
- The problem with the vote is that I don't think it will determine what editors support: the specific question in the vote is so superficial as to be almost meaningless. I suspect that while many editors want to keep the phrasebook no matter what, many other editors' opinions will totally depend on what the options are. This vote doesn't give any options — there's no actual proposal to vote on — so editors can't meaningfully vote. —RuakhTALK 02:23, 6 December 2012 (UTC)
- I believe a lot of editors can meaningfully vote, either in support or in opposition. From what I can see, the editors who have left the following comments in RFD can vote in support:
- "just nuke the entire phrasebook"
- "If I'm asked, delete the whole g-mn phrasebook"
- 'Would any like to have my proxy to vote "delete" on RfDs about other entries that have been included solely because of the phrasebook "project"?'
- Re: "there's no actual proposal to vote on": This sentence seems wrong given that a vote states a proposal that the authors of the above-listed comments seem to support, if I understand their comments correctly. Please, if you think you can set up a better vote, do so, but in a separate vote. If other people ask that your vote is run before my vote, I am okay with that, and my vote can be postponed. --Dan Polansky (talk) 18:38, 6 December 2012 (UTC)
- I believe a lot of editors can meaningfully vote, either in support or in opposition. From what I can see, the editors who have left the following comments in RFD can vote in support:
Fête's block on French Wiktionnary
[edit]Check here and you'll have right answers to your questions. Ĉiuĵaŭde (talk) 17:59, 15 December 2012 (UTC)
Czech translation please
[edit]Hey Dan, I recently came across this Czech book with a long note in it that I'm very curious about. If you're willing to translate it for me, it's the third and fourth pages in this album. Ultimateria (talk) 16:53, 17 December 2012 (UTC)
- I'm disinclined to translate the Czech handwriting for you. By translating it, I would not improve Wiktionary in any way. --Dan Polansky (talk) 19:13, 19 December 2012 (UTC)
- Sometimes, people do things for other people just for the sake of being kind. —Μετάknowledgediscuss/deeds 00:50, 20 December 2012 (UTC)