User talk:Visviva/archive/2009
Add topicNew entry
[edit]Hi, can you check 균 and maybe add a few derived terms/compounds? 24.29.228.33 21:21, 2 December 2008 (UTC)
Hi, shouldn't 翟 be included at 척? How did it get left out, if all the backlinks to 척 were checked? I'm aware, of course, that some Internet and print sources leave out some very rare, little-used, or unusual hanja. 131.123.121.146 15:54, 5 December 2008 (UTC)
Sorry, I seem to have mistaken one initial consonant for another. 131.123.121.146 15:59, 5 December 2008 (UTC)
- Actually, I don't normally check all the backlinks; for one thing, some of these seem to be incorrect (though it's difficult to prove a negative). Lately I've been focusing only on those hanja which have entries in multiple abridged okpyeon, and which are therefore likely to be of interest to the casual searcher ... I figure that's enough to get our coverage started, and then other folks can finish the job if they are so inclined. I wouldn't have the expertise to write a truly definitive entry for these in any case, and the more obscure a given character is, the higher are my odds of making a foolish and misleading blunder. -- Visviva 05:06, 6 December 2008 (UTC)
Hi, can you add definitions for the hanja at 적? 131.123.121.146 16:00, 5 December 2008 (UTC)
- I'll try to get to it eventually. There are scads of these, and honestly they aren't a big priority for me; just one large Syllable entry is enough to give me a splitting headache. -- Visviva 04:58, 6 December 2008 (UTC)
I saw your "notice" in that entry. So where should such a thing be discussed? 50 Xylophone Players talk 16:13, 6 December 2008 (UTC)
- The Beer parlour discussion started, more or less, at WT:BP#Final feedback sought on Hangul syllable entries and is continuing at WT:BP#Standard way to include technical character code and Unicode data. Please join in if you have any thoughts on the matter(s). No objections to what you're doing; it just seems like the more we do now, the more we're likely to have to go back and change when a decision is finally reached. Given that there are somewhat over 10,000 of these altogether, I'd personally prefer not to have to deal with them more than once. :-) -- Visviva 16:24, 6 December 2008 (UTC)
Citations buttons on crack
[edit]Concerning Wiktionary:Beer parlour#Dates etc, I was wondering if you had seen my most recent comment. Your citations buttons are the sort of thing that I envision, and I was wondering if you had any interest in the task. To recap and clarify, I'm seeing a set of buttons (not really sure where, perhaps to the left of the sense numbers?), which when pressed do a show/hide bit and display a set of information immediately underneath that sense (i.e. before the following sense). I suppose the best thing would be to have the buttons be a single, upper-case letter (Q for quotes, E for etymologies, T for translations, etc.). Would you have any interest in writing such a thing? I realize that this involves a lot of work (which, sadly, I cannot pay you for :-)), and so if it's not your cup of tea, I will in no way be offended. -Atelaes λάλει ἐμοί 08:39, 10 December 2008 (UTC)
- I have had something like this in the back of my mind for a while, though I had been thinking of it somewhat differently (in terms of a button that would actually "grab" sense-related content from other parts of the entry).
- I think this is quite doable, but I doubt if I am the right person for the job; I have never really progressed beyond the "blindly groping about" stage of JS literacy. If I were going to pick the editor must likely to have both the necessary skills and the necessary enthusiasm, I would suggest Conrad.Irwin. That said, if I stumble into something useful, I'll let you know. -- Visviva 09:21, 10 December 2008 (UTC)
- Well, User:Conrad.Irwin/parser.js does do this to some extent, though it needs a bit of work - particularly useful would be to have some indication of how reliable it thinks the sense-matching guesswork is. I've been quite distracted with other things recently, but if people want to hit me with ideas, I'll see if either they can be incorporated into the parser that already exists, or whether we need something a bit more heavyweight. It would certainly be a neat idea to grab the citations of the citations page on the way through. Conrad.Irwin 01:28, 14 December 2008 (UTC)
Wanted Hangul entries
[edit]Per Wiktionary:Beer parlour#Seeking final comment on Hangul syllable entries, I was wondering if you thought it a good idea to remove all the Hangul syllables from Wiktionary:Wanted entries. It seems like no one is working on these, and if you're planning on writing them all en masse, I think that wasting space on that banner is a bit silly. Your thoughts? -Atelaes λάλει ἐμοί 23:40, 10 December 2008 (UTC)
- I agree it's a bit silly; I don't think they need to be there. On the other hand, those particular syllables are all hanja readings, so there is a lot to be said about them apart from the purely technical stuff. On the third hand, I seem to be the only person with the inclination to actually write proper entries for these, and my inclination is not particularly strong (see three sections above). I can manage about one a day when I feel like it, which I'm sorry to say is not very often ATM. In consequence, the Hangul syllables just end up taking up space on the banner, space which would probably be more profitably filled with stuff from the Hotlist et al. So, yeah, IMO they should go. -- Visviva 01:58, 11 December 2008 (UTC)
Far East
[edit]India has always been in the Far East. --EncycloPetey 01:28, 16 December 2008 (UTC)
- Not according to that article: "The term Far East was popularized in the English language during the period of the British Empire as a blanket term for lands to the east of British India." (emphasis added) I also note that we currently define "East Asia" as "the Far East", though that should probably be replaced with something more precise in any case. -- Visviva 01:33, 16 December 2008 (UTC)
- Probably. The difficulty with South Asia as a term, though, is that it usually is understood to include just the Indian subcontinent, and not Southeast Asia where "pajamas" are also worn, IIRC. I note that the WP article on Pajamas says they're worn in South and West Asia, so there may be more work to be done on that definition than either of us initially thought. --EncycloPetey 01:38, 16 December 2008 (UTC)
- (ec) I was just thinking that... I can't find a source that gives me a straight answer on whether paijama are (or have been) traditionally worn in Iran or Afghanistan, as the OED entry cited by the pedia article would suggest, or in any other countries outside the subcontinent such as Burma. How about "various Asian countries including India"?-- Visviva 01:44, 16 December 2008 (UTC)
- I think we could even use the lowercase "southern Asian", as "various southern Asian countries including India". The UN definition of "Southern Asia" (note: not South Asia) includes Iran and Afghanistan. --EncycloPetey 01:47, 16 December 2008 (UTC)
NYT
[edit]I spent a minute or two trying to figure out why AF had apparently edited a user space page ... these are cool, very nice idea. As you noted, we have pretty good coverage of the everyday vocabulary of the NYT.
I was then thinking what other paper we might look at, and thought the Guardian might be a good choice, covering some UK and Commonwealth usage. I just came back to my laptop and looked, and see you are way ahead of me. (;-) Robert Ullmann 15:02, 20 December 2008 (UTC)
- That's good to hear. Lacking any British background, I wasn't sure if the Guardian was really a good choice. (My first though had been the Times of London, but they don't seem to have a daily page.)
- Hopefully these will be a good way to fill in the chinks in our coverage of "normal English." I've already noticed a couple of people other than myself working on the lists -- presumably because they saw them on RC, since I haven't announced them anywhere -- which is encouraging. -- Visviva 15:22, 20 December 2008 (UTC)
- The Times daily is http://www.timesonline.co.uk/tol/news/ but I like the Guardian better. Robert Ullmann 15:31, 20 December 2008 (UTC)
- Thanks for this. That's not quite what I would need/want; the nice thing about the Guardian and NYT is that they actually provide a daily list of articles in the print edition, so I can scoop up all of the articles for one day in a single run. Most newspapers don't provide this kind of list, or provide it only to paying subscribers; the Times seems to be in that category. At any rate I'm happy with the Guardian; aside from its strange habit of changing its name on Sundays, it seems like a quite satisfactory newspaper. -- Visviva 03:23, 26 December 2008 (UTC)
- Yeah, like that, darn it. :-) -- Visviva 05:17, 26 December 2008 (UTC)
- FYI: The Guardian and the Observer are considered separate newspapers, the one having always been a daily (i.e. not Sundays) and the other has always been a Sunday, published since the 18th century It was acquired in 1993, to become the sister paper. Robert Ullmann 05:31, 26 December 2008 (UTC)
- BTW, thanks for providing the daily XML dumps, and mentioning how to extract a list of titles from them. These make generating the lists much easier. -- Visviva 03:23, 26 December 2008 (UTC)
- Karibu sana. Robert Ullmann 05:12, 26 December 2008 (UTC)
- Oh, I don't know if you are writing in Python, but if so, User:Robert Ullmann/spork may be a useful source for code to steal. (;-) Robert Ullmann 06:03, 26 December 2008 (UTC)
- Indeed, thanks. I am using Python, which is great fun, though for now I am trying not to outsmart myself -- I just have the program dump a quick-and-dirty list, and then I work through it by hand. -- Visviva 09:30, 29 December 2008 (UTC)
Character set: the NYT pages use IS 8859/1. You're writing it back to the wikt as if it is UTF-8. If you take the returned text from the get-url call you are using (urllib2? I'm still assuming Python, all though you haven't said ;-), and do:
text = unicode(text, 'iso-8859-1', errors = 'ignore')
then you'll have a unicode string, and the wikipedia.put op will convert it to UTF-8 correctly to post. Likewise for other sites if they aren't UTF-8. (list of codings supported at http://docs.python.org/library/codecs.html#id3) Robert Ullmann 09:08, 29 December 2008 (UTC)
- Thanks; I noticed that today, but didn't think it was worth fixing retroactively since it only affected the sentences, which are for heuristic purposes only ATM. I suspect this was due to my foolishly using the "encode" method (which apparently assumes a Unicode string) rather than
unicode()
. Tomorrow's batch should be a bit cleaner. -- Visviva 09:30, 29 December 2008 (UTC)
- That would have been fine, except that you wanted "decode" (;-). "encode" is from internal string (ASCII) or Unicode string to whatever codeset the file or other data is in; "decode" is from the external code to internal string or Unicode string. Get from NYT, decode 8859 to Unicode internal, do stuff, encode to UTF-8 (done by wp.put method), put to wikt.
- If you want to be really sophisticated:
# "f" is file-like object returned by urlopen: try: m = re.search('charset=([^\'\";]+)', f.info()['Content-Type']) if m: text = unicode(text, m.group(1), errors = 'ignore') else: text = unicode(text, 'utf-8', errors = 'ignore') except IndexError: # (no 'Content-Type', not sure what an unknown code will do) text = unicode(text, 'utf-8', errors = 'ignore')
(;-) something like that ... I just wrote the above w/o testing it. Then you can read anything. Robert Ullmann 10:36, 29 December 2008 (UTC)
See User:Robert Ullmann/Mwananchi/30 December 2008. I'm stealing your idea. Right now working out how to prioritize a bit, and see what I can do, as most words are missing. (Very basic words ... I should have been adding all along ... ;-) Robert Ullmann 14:17, 30 December 2008 (UTC)
- Spiffy! I don't even want to think of the list I would get if I did something like this for a Korean newspaper (even if I could figure out how to lemmatize the input). -- Visviva 14:20, 30 December 2008 (UTC)
What (from your edit summary ;-) is the trouble you are having with Python dicts? Robert Ullmann 16:11, 1 January 2009 (UTC)
- Oy, I wish I knew... Essentially, contrary to my intention above, I managed to outsmart myself (it's not hard ;-). Basically, the code needs to be able to do the same things with files already on-disc as with files that it is downloading. I though I could be clever and have it extract all of the information (author, title, text, URL) from the downloaded file before saving, and then pass all of that info directly to the word-count function in the form of a dictionary of dictionaries. I'm still not entirely sure what went wrong (the code is a godawful mess, not ready for public viewing), but somehow I ended up with a meta-dictionary that, for each article, provided either the filename only or all of the other information, causing the word-count routine to treat the corpus as empty. In the end, I brought things back down to my level by having the download routine put a URL stamp on each file, so that the word count function could just read all the info from the files in its usual way.
- And may I just say, the difference between the effects of "a=b" when a and b are dictionaries/lists and "a=b" when a and b are strings is just so effing clever it makes me want to shoot someone. >:-> I realize now that that is documented, but not in a way that a non-savvy person like myself would ever recognize in advance. (That was tying me in knots two days ago; I don't think it was related to yesterday's problems.)
- At any rate, I think all is well now (until the next time I try something clever :-). -- Visviva 03:45, 2 January 2009 (UTC)
- Ah, it is certainly possible to use dicts-of-dicts, and I have; but very easy to confuse yourself. And copies and comparisons don't work as you might initially expect; although with understanding, they work precisely as they should ... In this case, a dict of tuples (author, title, text, URL) would work well, but then you have to remember which part is which (was URL [3]? ;-) and adding other attributes is painful. An object instance would be perfect, but then there is code to do the class definition and methods; in Java or c++ that would be another page of code. But in Python? Is easy ...
As a wizard I have and use many magickall objects and devices. I'll share one with you; but understand it might be troublesome if not grokked. Okay? (Wizards always have these nefarious conditions, even when trying to be helpful.) This one of my own invention, but I would expect others have similar:
class ufo: def __init__(self, **k): for a in k: setattr(self, a, k[a])
Got it? Um, probably not. It works like this:
thing = ufo(author = "(author)", title = "(title)", text = "whatever", URL = "http://..." ) # now add it to a dict d = { } d[filename] = thing # now use it: print "url for filename", filename, "is", d[filename].URL
and of course you can change the attributes as you please, or part of the code can add another:
d[filename].date = "2008-01-02"
"ufo" evals to "Universal F---- Object", for whatever F-word pleases you. Robert Ullmann 13:33, 2 January 2009 (UTC)
- Very cool, thanks. I shall have to puzzle over that a bit. -- Visviva 14:04, 2 January 2009 (UTC)
corpera
[edit]Thanks for the research help, especially for reminding me that these things exist and are accessible for free. I've bookmarked them. It would an interesting thing to look at the usage frequencies for the words in the vocabulary of Wiktionary entries (not the defining vocabulary, but headings, context tags, visible categories, etc.). "Bitransitive", "determiner", and "ergative", "pro-sentence" are no- or very-low-frequency words. Do you think this would just annoy our regulars? Holiday greetings, BTW. DCDuring TALK 16:35, 25 December 2008 (UTC)
- I can definitely see the point for bitransitive, ergative, and pro-sentence; if we use these at all, they shouldn't be taking a leading role. "Determiner" is more complicated, because there are many languages for which the term (or some equivalent) is indispensable, and it also does a reasonable job of covering some of the more imponderable aspects of English. Plus, in English, it mostly turns up in entries like "that", which people aren't likely to look up unless they are interested in linguistic obscurities. At any rate, these are both discussions worth having, but I would be inclined to separate the question of technical terminology (i.e. accessibility) from the question of English determiners (a structural issue which impacts accessibility). Merry New Super Thanksaweenmas, as my family has taken to saying. -- Visviva 03:10, 26 December 2008 (UTC)
Thanks for you help. It did remove the blank line. --Panda10 16:36, 26 December 2008 (UTC)
harr
[edit]I know the entry was a long time ago, but may I query your definition of harr as an easterly wind? I think the association comes from the fact that it is the wind from the east that brings the harr in sense 1, but do you have other evidence of this transferred meaning? Dbfirs 08:03, 27 December 2008 (UTC)
- I believe my source was the entry for "haar" in the Etymological Dictionary of the Scottish Language. The quote I attached is not really satisfactory, but this one (cited in the said EDSL entry) at least shows some sort of "seasonal wind" sense, and is IMO clearly English rather than Scots. ... of course, if that's the only clear-cut use in all of b.g.c., it'll probably need to be shuffled off to Citations:harr. -- Visviva 08:45, 27 December 2008 (UTC)
- Sorry I missed your reply in December. Yes, that's a clear use of the transferred meaning. I'll probably come across others now! Dbfirs 07:57, 11 January 2009 (UTC)
Agreed. I’ve tried to organise them properly; however, senses one and three are almost indistinct, whilst the second sense may need splitting, so I’d appreciate your review of my efforts. † ﴾(u):Raifʻhār (t):Doremítzwr﴿ 03:20, 28 December 2008 (UTC)
- Looks like other folks got to this before I did; I have to admit I was putting it off as the distinction between 1 and 3 was giving me a headache. :-) The current arrangement is fine IMO. -- Visviva 01:21, 31 December 2008 (UTC)
Guardian analysis
[edit]Hey, thank you for uploading these lists, they are very interesting. Would it be too much to ask you to include the total word count at the top (as is done on one of the dates, I forget which) to see if we can work out an average % of missing words. Don't bother if it's a hassle, not that important. Yours. Conrad.Irwin 18:43, 30 December 2008 (UTC)
- No problem; added now. However, please note that any mathematical claims originating from me, and particularly from any code I claim to have written, should be taken with a large grain of salt. :-/ (the previous numbers were actually from Antconc, back when I was using an unholy cocktail of Windows programs to generate the lists; they should be roughly comparable, though.)
- It's also worth noting that a) until I actually write a downloader, these will omit a certain chunk of the original newspaper content (though more for the NYT than the Guardian, I think), and b) absolutely no lemmatization is done, which means that no distinction is made between missing forms and missing lemmata. I plan to fix a) soon, possibly today; I wouldn't hold my breath on b). -- Visviva 02:30, 31 December 2008 (UTC)
- Today's NYT list does reflect a custom Python dowloading (although it doesn't seem to have added much text, which makes me wonder if I'm doing something wrong). I'm actually not sure if a custom downloader would make any difference for the Guardian -- they don't seem to do multi-page articles very much. -- Visviva 11:33, 31 December 2008 (UTC)
- No worries, the lemmatization isn't a problem - as it's already alerted me to lemmas not linking to their nonextant forms. Thank you again for this. Conrad.Irwin 11:39, 31 December 2008 (UTC)
- Hi there. Would you consider putting a "subpages" template on your user page - so we can find these pages more easily. My own analysis of Italian texts indicate that we are missing about 0.5 to 5 percent of ordinary everyday words - but it varies widely according to the nature of the text. Good work. SemperBlotto 10:51, 31 December 2008 (UTC)
- Done (if I understand the request correctly). -- Visviva 11:33, 31 December 2008 (UTC)
- I did take the liberty of creating User:Visviva/ a day or two back. Conrad.Irwin 11:39, 31 December 2008 (UTC)
- Done (if I understand the request correctly). -- Visviva 11:33, 31 December 2008 (UTC)
- That's freaky. I had no idea Special: pages could be transcluded. -- Visviva 12:51, 31 December 2008 (UTC)
- I was expecting a simple
{{subpages}}
- but thanks. SemperBlotto 13:01, 31 December 2008 (UTC)
- One of us could do something similar for an Italian paper, if you would tell us which one(s). I have code I am now using to collect words from Mwananchi in Dar, in (Tanzanian) Swahili. Collecting the pages is fairly easy, dealing with the ridiculous HTML formats is what takes some time. Robert Ullmann 14:29, 31 December 2008 (UTC)
20071012
[edit]This subpage is now complete (except for two spelling mistakes and a sum-of-parts).
I assume that you would want to delete it yourself rather than let me do it. SemperBlotto 19:51, 2 January 2009 (UTC)
p.s. Time for a talk page archive?
- I'd rather like to keep these around in some form, if only for historical purposes (after all, even if deleted, they'll still be taking up space on the server). But I can see how that would be aggravating for people trying to work through the lists. Maybe I could move these to subpages of User:Visviva/Tracking/Done, and delete the redirects; or something... -- Visviva 03:52, 3 January 2009 (UTC)
tracking header
[edit]You might want to look at the #time function as used in User:Robert Ullmann/Italiano/header.
See [1] ... Robert Ullmann 14:35, 3 January 2009 (UTC)
- I've had rather discouraging results from #time, over the years. It's possible I'm doing something wrong, and in particular it seems to perform a bit worse with the sensible time formats I prefer (2008-12-30) than with those degenerate American time formats (December 30, 2008). But even using the US format, strange things happen: for example {{#time:Y-m-d|December 32 2008}} currently generates 2010-01-01 instead of 2009-01-01. That doesn't seem right, and the date arithmetic would be my primary reason for using this... All in all, it just seems simpler to put in the full dates for now; I'm not doing that many of these, and I'm already automating the dailies. -- Visviva 16:07, 3 January 2009 (UTC)
- Generating "December 32 2008" and then expecting #time to fix it seems a lot harder than generating "December 31 2008 +1 day" which it understands? {{#time:Y-m-d|2008-12-31 +1 day}} is 2009-01-01 But whatever. Cheers, Robert Ullmann 18:03, 3 January 2009 (UTC)
- Now that I am slightly more clueful, it is indeed quite useful. Thanks! -- Visviva 16:51, 5 January 2009 (UTC)
Don't you mean onside? SemperBlotto 10:00, 5 January 2009 (UTC)
- Seems so; that form is much more common. Still, this is common enough not to be a simple error, and is both spoken and written so it's not an "alternative form" in our usual sense. Not sure what kind of tag it needs... -- Visviva
-genic
[edit]Ummm... you do realize that once we add all the scientific usages, there will be a very, very long list? --EncycloPetey 01:13, 9 January 2009 (UTC)
- Well, that's why I had it at
{{mediagenic terms}}
. AFAIK these are all formed by analogy with photogenic, and IMO form a fairly well-defined (and interesting) set, distinct from other, more typical, uses of -genic. -- Visviva 01:18, 9 January 2009 (UTC)
Words from the press
[edit]I like your lists of words from newspapers and journals (especially the scientific ones, because science produces a ton of new words). They are a really good way to find words we don't have that are actually being used by people. Pity there isn't a way to "watch" pages in advance, really; I'd quite like to be able to see these as to-do lists. Equinox 19:37, 10 January 2009 (UTC)
- Thanks for the feedback, and especially for your efforts in blueing these lists. As it happens, you can add pages to your watchlist that don't exist yet, for example by adding the page titles to Special:Watchlist/raw. Not sure if that would really be worth the trouble in this case, but it can come in handy once in a while. (What we really need is a way to watchlist patterns (e.g. "User:Visviva/Guardian_*"), but AFAIK that's not possible.)-- Visviva 06:23, 11 January 2009 (UTC)
- OOuuuuah. I just refreshed the page with antsiness on it, and they suddenly went from red to blue! We must both be awake. Equinox 00:11, 23 January 2009 (UTC)
- Yeah, I figured I'd better get caught up on the NYT lists before they got away from me (I was running about 4 days behind). I'd feel silly if I wasn't keeping up with at least one of these lists. :-) -- Visviva 00:15, 23 January 2009 (UTC)
- Outstanding and fun. I noted that some of the entries created (or the lemmas) don't seem to have the quotes. Perhaps your template should explicitly encourage use of the quotes using words such as the following in the templates:
The quotes often provide good usage examples and attestation evidence and should be included in the entry or the citation page for the lemma. The quotation templates can be copied and need only template renaming.
- Even better might be examples. Great job. DCDuring TALK 13:07, 27 January 2009 (UTC)
- Thanks. :-) I've added your first sentence and a bit about how to activate the "add" links (which are a lot simpler than copying the templates, though that works too). I hadn't really expected such an influx of interest -- although I must say, it's very gratifying -- so I hadn't really fleshed out the documentation. Please feel free to edit anything that seems to need editing; most of the boilerplate lives at User:Visviva/tracking header. (that goes for all y'all.)
- I'm realizing now there are various other things I need to get around to, like moving the preload templates out of userspace so it doesn't look like I'm putting my stamp on others' work, but I'd better put all of that off until tomorrow if I want to keep my "job". (I'd hate to have to fire myself ;-). -- Visviva 13:47, 27 January 2009 (UTC)
en-noun
[edit]I can understand the functional edits, but why did you comment out the categorization of the template? --EncycloPetey 19:50, 12 January 2009 (UTC)
- Because I'm an idiot. ;-) Fixed now. -- Visviva 01:23, 13 January 2009 (UTC)
I have reverted your edits per your edit summary, where you say to do so in case of lack of backward compatibility. See discussion at the template talk page.—msh210℠ 19:48, 13 January 2009 (UTC)
- Thanks, think I have fixed this now but will leave it for a while. -- Visviva 02:22, 14 January 2009 (UTC)
Hi, for the ! and ? parameters, it would be nice to still display the forms (even though they may not exist) perhaps as [[Citations:plurals|*]]plurals
or something similar. That way it would be obvious how to change the current state of affairs. Conrad.Irwin 00:25, 14 January 2009 (UTC)
- Yeah, I was thinking about that. Seems there are two issues: 1) How best to inform the reader of the problem -- that the plurals given may not be real; and 2) how best to inform the user of how to fix the problem. I'm not sure that pointing the user to Citations:<plural> is the best thing; I would say that the <singular> page itself, and the Citations:<singular> page, should be the first and second choices.
- Still, I have a hard time justifying putting something on the inflection line unless we're already reasonably sure it is true... inflection lines aren't really the best place for nuanced qualification or fancy symbols. That was my feeling behind having ? and ! automatically override any further forms or qualifications. But my thoughts are somewhat mixed. Hmm... can you give an example of a case where it would be beneficial to show the possible plurals? -- Visviva 02:22, 14 January 2009 (UTC)
If you can provide CFI-sufficient citations, this ought to be nominated for WOTD. I haven't seen the adjective form before, and I expect most people haven't. --EncycloPetey 02:20, 13 January 2009 (UTC)
- Citations added; there are a fair number more out there. Unfortunately, most sentences that use this word are also examples of it. :-) The medical and general senses could be separated, although I'm not sure if this is necessary; both simply mean "exhibiting logorrhea". -- Visviva 02:53, 13 January 2009 (UTC)
- I don't see evidence in the quotes on the page to make me want a separate sense (even though the associated noun distinguishes). However, I would change my mind if there were additional medical citations demonstrating an association with a specific and diagnosable condition or syndrome. --EncycloPetey 02:57, 13 January 2009 (UTC)
Re the "nationality" appendix, FYI/FWIW
[edit]OALD 2000: 1 [U,C] the legal right of belonging to a particular nation 2 [C] a group of people with the same language, culture and history who form part of a political nation
(OALD 1995: 1 [U,C] membership of a particular nation 2 [C] a national group forming part of a political nation) --Duncan 19:46, 13 January 2009 (UTC)
- Thanks, added. I'm not sure, in general, if we want to have learner's dictionaries for these pages, but I hate to see good data go to waste. :-) Meanwhile, I think I had better go back and check the other dictionaries to make sure I didn't miss this "subnational" nuance elsewhere. -- Visviva 02:46, 14 January 2009 (UTC)
Just a reminder. It seems as if you were pulled away in media res. DCDuring TALK 01:54, 16 January 2009 (UTC)
- So I was, thanks... I need to fix up that template a bit. -- Visviva 03:10, 16 January 2009 (UTC)
Are you planning to do some of these right away? Many are biological terms I'm shocked we don't have. Well, not that shocked sine I discovered in the past 24 hours that we didn't have nursing home or punch clock. Anywho, I'd be up for defining some of these, if that wouldn't be stealing your fun. If so, is there a way that you convert "your" quote template into a more general one? --EncycloPetey 05:37, 16 January 2009 (UTC)
- I'm thinking especially about the taxonomic and systematics words: allopolyploidy, acanthoidian, gnathostome, ingroup, monophyly, ... --EncycloPetey 05:42, 16 January 2009 (UTC)
- Please go ahead. At the moment I'm generating far more of these lists than I can possibly keep up with. I've actually been wondering if I shouldn't perhaps move this whole project into Wiktionary-space, though it is still a bit rough around the edges. -- Visviva 05:44, 16 January 2009 (UTC)
- I see no problem with having a community-open project in one's personal user space. Although, it would help if there were a notice welcoming people to participate and perhaps an organizing link somewhere, if it is truly growing beyond you. --EncycloPetey 05:48, 16 January 2009 (UTC)
- Good point, I'll add something to the boilerplate. -- 05:54, 16 January 2009 (UTC)
- The idea with the template is that clicking on "add" will preload the target page with a framework entry including a wrapper template that subst's as
{{quote-journal}}
. That's if I haven't screwed anything up (the current setup is rather Goldbergian)... this also requires putting something like User:Visviva/monobook.js#Preload_text in your monobook. One can, of course, just cut and paste from the list, and add the formatting by hand; that may be more cost-effective in this case (only a few of the Nature/Science quotes are really citation material, I fear). -- Visviva 05:54, 16 January 2009 (UTC)- As a refereed academic journal, any cite from Nature meets CFI, even if it is lexically uninformative. I've added what I could for now. There are a few others I could do with a bit more effort, but it's late here and I've been working on a WikiSource contribution I really want to finish, namely Tom Jones. It's one of the earliest works that can be described as an "English novel", and a book I've really enjoyed for its elegant comedy and rich vocabulary (especially many rare and wonderful adverbs). --EncycloPetey 07:25, 16 January 2009 (UTC)
- True... but I prefer to stick to cites that add significant value. On the other hand, there's something to be said for always adding a cite when one is readily available; if nothing else, it helps to reinforce the idea that all entries should be cited. I've been leaning more toward this point of view lately.
- Sounds like an interesting project. I guess you have already added most of the adverbs? I ran my script over the first four chapters, but it only came up with a couple (along with a big ol' pile of archaic spellings). -- Visviva 03:59, 17 January 2009 (UTC)
- I have added a "notemplates" link to the base template for journal articles, should you care to try it out. (I really think the template approach is best -- it took me only seconds to update hundreds of entries to match the recent revision of WT:QUOTE -- but I don't expect to convince you.:-)
- Also, the preload-text functionality (without which the link doesn't actually work) now requires only this line in Special:Mypage/monobook.js:
importScript('User:Visviva/pretext.js');
- (Works in FF3, haven't actually tested it on anything else yet.) -- Visviva 04:30, 17 January 2009 (UTC)
I've added a few entries using this script (it's great), but on User:Visviva/20090109 there seem to be a lot of abbreviations for authors- is this intentional? Should I just take the author out when I cite it? Nadando 04:37, 17 January 2009 (UTC)
- No, just sloppy coding. :-) Have fixed that (and a couple of other glitches) now, will upload cleaner version shortly. -- Visviva 05:00, 17 January 2009 (UTC) Now so uploaded. -- 05:21, 17 January 2009 (UTC)
This looks like mis-use of the specific epithet from a scientific name, which would make it non-English. --EncycloPetey 09:26, 19 January 2009 (UTC)
- I'm not sure of the origins of the name, but it's pretty well-established now in reference to this specific variety; I've even heard people in my own family talking about delicatas. Will add more cites. -- Visviva 09:30, 19 January 2009 (UTC) Now has three citations (the 1992 one was the earliest that turned up in a cursory search). Tentatively tagged "North American". -- 09:40, 19 January 2009 (UTC)
- I'm not sure of the origins either. It could be a varietal or cultivar name, in which case my resources wouldn't have a chance of having it listed. If it's entered common use as a vegetable name then blehh!! I hear professional biologists, especially microbiologists, do this sort of horrid name-mangling all the time. I sat through a research presentation once where the speaker constantly referred to Saccharomyces cerevisiae (brewer's yeast) as cerevisiae, which is a genitive description meaning "of the brewer". I cringed over and over and over, when what I really wanted to do was yell out a frustrated correction in the middle of the talk. --EncycloPetey 09:37, 19 January 2009 (UTC)
- I understand your pain. :-) -- Visviva 09:40, 19 January 2009 (UTC)
- NB: this, from early 1986, is the first use I can find anywhere on the Googles. It seems to indicate that the variety was first put on the market in 1985 by a specialty retailer in California, but unfortunately is cut off in mid-sentence. A person could almost tempted to fork over the $3.50 ... but not me, I'm afraid. Anyway, it seems possible that somebody just picked the name out of the air, figuring they could sell more squash as "delicatas" than as "green-striped blockhead squash" or whatever the original name was. -- Visviva 14:01, 19 January 2009 (UTC)
your periodicals
[edit]Do you add specific publications to your, er, raping queue on request? I'm thinking specifically of Notices Amer. Math. Soc..—msh210℠ 06:34, 26 January 2009 (UTC)
- I would, but anything PDF-based is kind of on hold for now. I've been using PDFMiner, but -- possibly because it converts to HTML instead of plain text -- PDFMiner introduces all sorts of errors even in text that exports flawlessly using Acrobat's native "save as text" function (which isn't really a practical option, since it has to be done for each file individually). I have similar problems with a batch converter that I purchased for daywork (it works great on Korean patents, but for some reason underperforms on English journal articles). All very annoying, though it's possible there's a simple solution ... I've been keeping myself busy with HTML-based stuff, so I haven't had a lot of time to focus on this problem yet. I was thinking that mathematics would be a good field to have better coverage of.
- Wait, strike that. They have the whole issue as a single PDF file? Too cool! I should definitely be able to work with that, though it will (at least in the short term) lack any metadata for citations. -- Visviva 07:27, 26 January 2009 (UTC)
- Woo hoo! Come to think of it, even better than the Notices would be the Monthly, but AFAIK it's available by subscription only.
:-/
—msh210℠ 08:06, 26 January 2009 (UTC)
- Woo hoo! Come to think of it, even better than the Notices would be the Monthly, but AFAIK it's available by subscription only.
- Looks that way. (And it looks like even if I did have access, it would be in multiple PDFs, which I'm not quite ready to deal with yet.)
- Anyway, see User:Visviva/NAMS_200901 and User:Visviva/NAMS_200902. I'm afraid that's the best I'm going to be able to do for these for a while.
- The citation data -- at least the author, title and page number -- is extractable in principle, but it will take some cogitating, and I'm not quite sure when I'll get around to it. How would one operationalize "title-like line(s) followed by author-like line(s) followed by article lines"? It has to be doable, at least fuzzily, but I'm not quite seeing it. If you happen upon a solution, let me know. :-) Cheers, -- Visviva 11:10, 26 January 2009 (UTC)
- Thanks so much for the Notices. I wouldn't know how to extract those data, but manual works too (if, natch, not as quickly).—msh210℠ 18:54, 26 January 2009 (UTC)
Q re Philosophical Studies tracking list
[edit]Hello Visiva -- Ah, yes, I very much appreciate the new philosophical tracking lists, with all those great words we philosophers love, like pastness and substancehood. I just added a new countable sense in the relativism entry, using your tracking list quotation for relativisms. In the process I noticed one missing item in your citation line, but am not sure whether your software can extract it. For academic quotations, I always like to include the page number. Is it possible for you to pull out the page number of the quoted text and post it in the tracking list entry? -- WikiPedant 18:28, 26 January 2009 (UTC)
- And the same for the Notices of the AMS, actually.—msh210℠ 18:55, 26 January 2009 (UTC)
- This should be doable, since the page numbers are right there. I'll need to get a little fancier with how I'm handling the text, though. -- Visviva 01:38, 27 January 2009 (UTC)
- Thanks for the feedback. Regarding page numbers, the problem is that the page numbers aren't included in the HTML version. I could pull the page numbers for the article as a whole from the table of contents (with a bit more work) ... I suppose that would be better than nothing, so I'll try to add that soon. -- Visviva 01:38, 27 January 2009 (UTC)
- Is User:Visviva/Erkenntnis 200807 more satisfactory? -- Visviva 08:19, 27 January 2009 (UTC)
Nature pages
[edit]Hi there. Items in your Nature pages have "add" and "notemp" links. When I try them, my screen goes "funny" then I just get a normal edit screen with nothing filled in. What is supposed to happen? SemperBlotto 12:16, 27 January 2009 (UTC)
- That's interesting; I'm not sure what would cause the screen funniness. The "add" and "notemp" links won't actually work unless you add this (or something equivalent) to your monobook.js, and flush the cache:
importScript('User:Visviva/pretext.js');
- This works on IE7 and FF3; haven't tested other browsers. Once installed, the "add" link loads the edit screen with a blank entry (with POS defaulting to "Adjective") containing the citation; the "notemp" link loads a citation-template-free version.
- I should, um, probably mention all this in the boilerplate. :-o -- Visviva 12:56, 27 January 2009 (UTC)
- Well, I've managed all these years without a monobook.js - so I think I'll stick to using my sandbox (so much faster to load). SemperBlotto 14:50, 27 January 2009 (UTC)
- Yah, the links only add value if you want to add (some) citations; otherwise I suppose they just get in the way. Is it easy to scrape the data you want, or should I post a separate, words-only list? -- Visviva 15:09, 27 January 2009 (UTC)
- Don't worry. I regularly scrape data from various sources for my sandbox - it's no trouble to include your Nature pages. SemperBlotto 16:52, 27 January 2009 (UTC)
- Yah, the links only add value if you want to add (some) citations; otherwise I suppose they just get in the way. Is it easy to scrape the data you want, or should I post a separate, words-only list? -- Visviva 15:09, 27 January 2009 (UTC)
Thanks for the Tracking lists!
[edit]I just came across them, and find them very intersting and useful. The semi-automated citations are also pleasing, helping to satisfy my "citations! more citations!" thirst. Keep up the good work! JesseW 07:30, 31 January 2009 (UTC)
- That's great to hear, thanks. -- Visviva 14:20, 31 January 2009 (UTC)
=> Toronto Star? SemperBlotto 14:18, 31 January 2009 (UTC)
- Ack! Thanks. -- Visviva 14:19, 31 January 2009 (UTC)
Wow... I saw from User:Visviva/NYT 20070201 that we didn't have the word (deprecated template usage) cardio! I've created it, but I still can't get the "create" link to bring up anything but a blank edit box. --EncycloPetey 05:45, 2 February 2009 (UTC)
- If you add this line:
importScript('User:Visviva/pretext.js');
- to your monobook.js file, and clear the cache in your browser, it should work (but I can't vouch for IE6, as I don't have a copy handy.)
- Thanks for creating the entry. I saw it go by, and thought "geez, I can't believe we don't have that", but then I got distracted by something else before I actually did anything about it. :-) -- Visviva 10:42, 2 February 2009 (UTC)
- I'll try that, thanks. I don't have to worry about IE6, since I don't use IE when I can avoid it. --EncycloPetey 08:24, 8 February 2009 (UTC)
TNYT
[edit]Your tracking lists are a great idea, and implementation. One thing (well, one thing further to the things I've already bugged you about), though. Your citations (in the 'add' links) to The New York Times should be just that, not to the New York Times, which is not its title. (It differs from most placename + word newspaper titles in this regard, I believe.)—msh210℠ 21:50, 2 February 2009 (UTC)
- I was a bit confused about that, and I'm afraid I still am. For example, the title of the main page reads "Today's Paper - New York Times". On the other hand, their iconic letterhead reads "The New York Times". In the "Member Center", we read "Please let the New York Times representative know..." [2], and the only uses of capital-TNYT on that page are sentence-initial. On the other hand, the copyright page refers both to "The New York Times Company" and "The New York Times".[3] The Wikipedia article uses "The", but the only actual discussion appears to be this brief mention.
- Turning to the Chicago Manual of Style, 15th ed., section 8.180, we read:
When newspapers and periodicals are mentioned in text, an initial the, even if part of the official title, is lowercased (unless it begins a sentence) and not italicized.
- Given that (the) (T)NYT itself can't seem to keep things straight, I'm inclined to follow the CMOS and use tNYT. But I haven't actually looked into this closely, so there may be more considerations that I'm missing. -- Visviva 02:38, 3 February 2009 (UTC)
- It does seem kind of inconsistent to use New York Times but The Guardian... Bah. It's just, to me, they both somehow look very wrong the other way. I think I'm going to deliberately not worry about this for now, on the principle that I can flip a switch in User:Visviva/quote-news-special to fix it (for those that haven't already been substed in, that is) if it does in fact need to be fixed. Do let me know if you have anything to add to the above data points, though; I'm still very much in the dark, and I'm wondering how the Wikipedia MOS arrived at its current formulation. -- Visviva 16:57, 3 February 2009 (UTC)
monkey jacket
[edit]I always used to hear my undergraduate mentor refer to his doctoral robes as a "monkey suit". I wonder if the striped cuffs of a doctoral robe was the original reason the moniker was used. --EncycloPetey 08:23, 8 February 2009 (UTC)
- That would make sense. Curiously, the OED doesn't have a doctoral-robes sense for "monkey suit", although they do have the more general "formal clothing" sense. Would be interesting if that could be cited... Brewer, my favorite lexicographer of all time, indicates that "monkey jacket" comes from the jacket's having "no more tail than ... an ape". The OED, oddly, just cites Brewer, so I guess that's the best guess anyone has come up with. -- Visviva 08:43, 8 February 2009 (UTC)
Optional bullets
[edit]Hmm, how does this work? † ﴾(u):Raifʻhār (t):Doremítzwr﴿ 11:13, 9 February 2009 (UTC)
- If you enter for example
{{R:Dictionary.com|bullet=}}
, then you get the text only, without the bullet: “word”, in Dictionary.com Unabridged, Dictionary.com, LLC, 1995–present.. You could also substitute another indentation/bulleting symbol ("#:", ":*") as appropriate. There may be a better way to do this; I added it because I was annoyed with the broken lists at firstly and probably elsewhere. -- Visviva 11:28, 9 February 2009 (UTC)
- Aah yes, that makes perfect sense; thanks for the explanation. I agree that the in-built bullets are annoying; IMO they should all be removed from these R: templates, and IIRC, many already have. † ﴾(u):Raifʻhār (t):Doremítzwr﴿ 11:34, 9 February 2009 (UTC)
zombie
[edit]I personally would have put the zombie bank citation under a (new) sense at zombie.
Following are context-specific, probably attestable headwords using "zombie":
- (business): zombie S&L, zombie institution, zombie company, zombie business, zombie organization
- (philosophy): zombie hypothesis, zombie world, zombie thought experiment
- (social science): zombie effect
- (computing): zombie network, zombie process, zombie client, zombie system, zombie program, zombie computer, zombie state, zombie version, zombie host, zombie path, zombie user, zombie software
- (dance): zombie dance
- (cinema): zombie film, zombie genre
In each context there is arguably (I think rather definitely) a somewhat differentiated sense of the word (deprecated template usage) zombie.
Is this where we want to go now? If so, this is my partial repayment for the wonderful newspaper lists you have been providing. If not, I still owe you. DCDuring TALK 19:17, 18 February 2009 (UTC)
- Well, I only created zombie bank because I hadn't encountered this sense of "zombie" in any other context. However, I think you have a point -- I see "zombie S&L" was in use well before 1997. You know the field better than I do, so feel free to redirect or delete or whatever seems appropriate. I'm not really attached to the entry. :-)
- For the terms you list, I don't think we would want most of them -- unless, for example, a "zombie film" is something much more specific than I would imagine. "Zombie dance", "zombie effect" and "zombie hypothesis" might be worthwhile though.
- More specifically, I think that we should have entries for any of these terms that we can say something useful about. Unfortunately just figuring out whether there is anything useful to say about a phrase is often a major research project in itself; and then depending on the phase of the moon the community may vote to delete it anyway.
- In terms of CFI, it would be nice to have the same sort of sliding scale that we have for brand names -- there are a large number of brand names that could pass the test, but the burden of proof is heavy enough that we run no risk of being flooded by them, and we can be confident that those we do have are providing some value. I've been thinking about this lately in terms of apoptosis; the entry effectively programming its own fate. Not sure where to go with that, except that it probably means we should focus more on the actual content of the entry as RFD'd -- does it present useful information that cannot be derived from the sum of parts? -- than on the word itself. -- Visviva 04:34, 19 February 2009 (UTC)
I guessed that the entry was citation-driven. It annoyed me enough to provoke the research on b.g.c. into the more common noun phrases it was involved in. (I forgot zombie master, who provides the will for the willless (!) zombies.) Some of my greatest efforts have been so motivated.
I'm just not too happy with the general state of our rules for multiword entries. I suppose that the lack of enthusiasm for additional rules reflects bad experience with developing, voting on, and applying them. A more "organic" approach might be better. What troubles me is the power of narrowly focused, tendentious advocates of individual entries and classes of entries. I feel that we have a lot of cellulite, warts, cysts, fibroids, and tumors in our future without a good immune response and other corrective processes. External fitness criteria are unavailable (no P&L, no knowledge of non-editorial usage, no patron to indulge).
I suppose that there is little cost to having SoP entries. Perhaps they are a harmless outlet for would-be contributors and we need no immune response. What happens if we just let, say, all noun-noun phrases survive subject only to RfV? What if all multi-words not in some other approved dictionary were "sequestered" until cited? DCDuring TALK 11:12, 19 February 2009 (UTC)
- My feeling is that multi-word entries are, at worst, like a mild hyperplasia or low-grade infection -- depending on whether the threat is cancer or infection :-) -- and that they pose no serious threat to the long-term health of the Wiktionary organism. I could, of course, be tragically mistaken. But the thing is, unlike for example fictional entities and brand names, AFAICS there is no plausible constituency of editors that would be motivated to flood the project with these. We do have somewhat more multi-word entries than other dictionaries, but this isn't necessarily a bad thing and they are still less than 30% of our total English coverage. As long as we maintain a baseline immune response -- requiring some showing of non-sum-of-partsness -- I don't think we have much to worry about. The risk of an autoimmune syndrome, in which we hunt down and delete entries that do serve a useful role in the dictionary, seems much greater.
- That said, our actual criteria for inclusion are in great need of codification and refinement. If you feel up to the job, by all means go to it. Changing the CFI themselves may be nearly impossible, but that's no reason why we can't set up additional, less formal pages which codify and inform practice. -- Visviva 04:09, 20 February 2009 (UTC)
- For a job like that I'd need at least twice the salary. DCDuring TALK 12:34, 20 February 2009 (UTC)
- Yeah, that's about where I'm at with it too. -- Visviva 13:04, 20 February 2009 (UTC)
- I might take a run at some relatively easy part of the problem and, as you suggest, guidelines. I have seen a couple of such efforts peter out. I need to make sure I'm not pushing the buttons of major contributors, despite the obvious and egregious errors of their ways. ::::Noun-noun multiwords seem useful and specific enough, with no one dead-set in favor of total inclusion or total exclusion. Who might be able to ultimately produce a list of just the noun-noun multiwords (either with both English nouns in Wiktionary or using some other list of English nouns)? Perhaps the zombie multiwords would be an interesting test of what was felt to be worth including. (As I write this last sentence, I have the feeling of having written one much like it before in a very similar context, a deja ecrit experience. Do you recall any specific, limited-scope failed or low-impact effort to clarify the application of CFI to particular groups of words?) DCDuring TALK 15:44, 20 February 2009 (UTC)
Hi there
[edit]Hi there Visviva. I've been creating a ton of plurals lately. Could you please tell me if this is something that you want people to do here? If it is, I will keep on doing it, however, if it isn't; I will stop doing it. Thanks for the help :). Razorflame 03:36, 20 February 2009 (UTC)
- It's a fine thing to do, but it's also a good idea to do a quick "sanity check" to make sure that the plural is correct and there isn't anything seriously wrong with the originating entry. I noticed one, for example, that had the plural of "perch" as "perchs" (!). Also, there is a bot -- User:TheCheatBot -- that adds plurals and other inflected forms automatically after a while, although its operator has not been very active lately. So I wouldn't focus on adding plurals unless you find this especially rewarding for some reason. There are other forms of "gnome-work", such as adding synonyms and examples, wikifying definition lines, or fixing the various problems in Category:Requests for cleanup, which would add more value overall, and would probably be more interesting as well.-- Visviva 03:49, 20 February 2009 (UTC)
- Thanks for the tips. I find, for some reason, that adding in the plurals of the definitions is very intersting and I feel like I am helping the project in some way, shape, or form. I have also found that I am relatively good at it. Today alone, I have made over 1,200 (yes, 1,200) plural forms :), so I feel like I helped the project in a bit way :). I am also slightly good at wikifying, however, I find that less interesting, as that is all that I have done on the Simple English Wikipedia, so I am kinda bored of wikifying. I really like looking for plurals, because it exposes me to some interesting words that I never knew existed (cloudberry, for example). I hope to continue doing this for the project in the future :). Cheers, Razorflame 04:01, 20 February 2009 (UTC)
RC by language
[edit]Hi Visviva, I've noticed your RC by language invention and I can't thank you enough. It has already helped me notice things that I would have missed otherwise. One question. Would it be possible to show all changes by language for an entire day? I says it refreshes every 15 minutes, but how far does it go back in time? Thanks again for this great idea. --Panda10 13:09, 22 February 2009 (UTC)
- Currently it goes back through the last ~100 changes, however long that is. In the case of English and other high-traffic languages, that sometimes covers less than an hour; on the other hand, for many languages it may extend over weeks or months (not yet, of course; currently the script has been running for just ~ 7 days with some interruptions). 100 is a completely arbitrary cutoff; I was thinking maybe I should make it a little higher for English and Italian (though I'm not sure exactly *how* high; 24 hours can mean thousands of edits on a busy day for English, which could lead to page-load issues). Or maybe a separate "long" page and "short" page?
- You will probably notice a few bugs in the system, but I'm glad it's useful. :-) If you try the RSS, I'm finding Google Reader is pretty good (allows you to force a refresh whenever you want, just like viewing the HTML page). -- Visviva 13:25, 22 February 2009 (UTC)
- I've never tried the RSS, but I was happy with the HTML output you provided. Can you set the number of edits shown separately for each language? Hungarian edits are usually much less than the English ones. Can you reduce the information shown on the page in order to include more edits? The "talk" and "contribs" links bring up a page not found, so they could be excluded. It would be very helpful to be able to review all Hungarian edits on a day. Lately I go through all anon edits to find possible Hungarian changes, but registered users also edit Hungarian entries and it's almost impossible to find all. --Panda10 14:45, 22 February 2009 (UTC)
- Fixed the bad contribs links (thanks -- would have *sworn* I fixed that already), and set the threshold to 250 across the board for now. Will try to swing a more complete solution to this & some other issues soon, hopefully tomorrowish. Bit too pressed for time & sleep atm. :-) -- Visviva 16:00, 22 February 2009 (UTC)
- Hungarian (and English and various other languages) now has 4 overflow pages, so at any given time there should be a bit over 500 changes available for review (just click the "More changes" link at the top). The older parts of the list do contain some duplicates and other hiccups; let me know if you spot any bugs in changes more recent than ~ 0:00 UTC on the 26th. -- Visviva 04:17, 26 February 2009 (UTC)
Would you be OK with putting it on Wiktionary:News for editors? More people should know about this. Nadando 03:59, 26 February 2009 (UTC)
- I wouldn't object, but I'm not sure it's quite ready. There have been some "odd" things creeping in since I added translations to the mix. -- Visviva 04:02, 26 February 2009 (UTC)
- I've noticed a couple of days ago that English words are mixed in the Hungarian list, even though the changes are not related to Hungarian. --Panda10 19:56, 27 February 2009 (UTC)
- Yeah, I noticed that, and I *think* I've fixed it (was caused by a poorly-designed regex). Hopefully there won't be any false positives more recent than the "up in the air" edits. -- Visviva 03:36, 28 February 2009 (UTC)
- Hmm, this is interesting: [4]. Problem is the script is reading Conrad's edit as if it had created the translations section, which in a sense -- but not a very useful sense -- it did. Will need to cogitate on that a bit. -- Visviva 13:20, 28 February 2009 (UTC)
- Darnit, I actually made it worse. >:( Think it's fixed now; guess we'll see. -- Visviva 07:18, 1 March 2009 (UTC)
- Woops, I outsmarted myself yet again. I suppose that once this is running smoothly, I would do well to just zero out the slower-moving lists, so that this crud doesn't hang around indefinitely. Blah. -- Visviva 16:41, 2 March 2009 (UTC)
Minor bug in RC by language
[edit]Links to words with apostrophes don't seem to work, e.g. hockey d'antan [5] Equinox ◑ 13:15, 27 February 2009 (UTC)
- Good catch, thanks. Should be fixed on the next cycle, if I did it right. -- Visviva 13:34, 27 February 2009 (UTC)
obnosis
[edit]Removing a rfv rfc tag from obnosis was done by you why? The required request pages have been noted? Please use the discussion page before wide sweeping edits and especially because this page is under edit war flags.
Also, a user lisakachold can edit their own comments on the discussion page.
You did this: (cur) (prev) 03:50, 2 March 2009 Visviva (Talk | contribs) (13,243 bytes) (→Who can claim ownership for a word? rfv sense: not durably archived.) (undo) THERE WAS A rfv sense on the request for verification page - check it.
Really these are basic respectful use best practices. Threatening a user [lisakachold] for editing their own comments, removing of rfv sense tags that have not been resolved by an editor and repeatedly reverting past edits that were originally authorized back in october for rfv to move the word obnosis to a scientology only definition while whittling away media, web site and common use references is censorship not editing and certainly not acceptable use under wiktionary. Lisakachold 10:22, 2 March 2009 (UTC)
- You removed my comments, among others. Editing your own comments is usually perfectly fine. Apart from that, I think you're confused, as I didn't remove the RFC tag, that was done -- appropriately -- by Ruakh. I didn't remove the RFV tag either, but since no verification of any sense was requested, and because the one sense present in the entry is fully cited from durable sources, the tag was obviously baseless and suitable for removal. Since no RFC posting was made, and since the entry had already been cleaned up, there was no particular need for the RFC tag either.
- If you plan to stay on Wiktionary, please find something to do that does not involve wasting other editors' time. -- Visviva 10:52, 2 March 2009 (UTC)
- I'd like to re-open discussion regarding protecting this term with "Protected "obnosis": Counter-productive edit warring". It appears, simple examination and not in-depth, the user was adding current senses and citations. This seems to refute your statement of counter-productive. I have a citation from 1875 which clearly predates Mssr Hubbard et al., but have no opinion as to the so-called etymology which claims ownership of the word.
- - Amgine/talk 23:38, 8 March 2009 (UTC)
- These are interesting, though they are completely unrelated to what the user in question was adding. The Yearbook of General Medicine cite isn't visible in the snippet, which makes me suspicious; there is no way to be sure it is not a scanno. The Doctor Dispachemquic cite, which appears to be a (depiction of) a dialectal variant of "diagnosis", is definitely worth adding to Citations:obnosis. Anyway, protection expires today, so hopefully things can go forward in a peaceful and wiki-like fashion. -- Visviva 01:42, 9 March 2009 (UTC)
- Via sneaking, the 'Yearbook of General Medicine' doesn't qualify - it found "ob-" and "nosis", but not at the end and beginning respectively of consecutive lines. Agree, the 'Doctor Dispachemquic' cite is for a completely unrelated dialectic representation use. There is, of course, the question as to the justification for protection, but I wasn't johnny-on-the-spot so won't quibble. - Amgine/talk 22:59, 9 March 2009 (UTC)
- Uhm, that's a really odd diff you posted there... <didn't know Mediawiki could *do* that, actually> - Amgine/talk 23:02, 9 March 2009 (UTC)
- That is pretty bizarre. This was the diff I meant. -- Visviva 23:50, 9 March 2009 (UTC)
Coin945
[edit]All this user's entries need to be deleted (as you have pointed out). Are you going to do it? (I was hoping to get some work done this afternoon) SemperBlotto 14:33, 3 March 2009 (UTC)
- Sure, I'll take a run at them shortly (bit busy right now myself). -- Visviva 15:10, 3 March 2009 (UTC)
- I noticed that this user's contributions are to be removed. I presume that the main problem is definition copyvio. Would it address the problem to substitute
{{defn|en}}
for the WorldWideWords definitions? Should the References header be removed as well? DCDuring TALK 16:04, 3 March 2009 (UTC)
- I noticed that this user's contributions are to be removed. I presume that the main problem is definition copyvio. Would it address the problem to substitute
- When possible, I think it's preferable that copyvio be removed entirely from history (since the history is also part of Wiktionary, and past revisions are licensed under the GFDL just as much as current ones), and this can only be done by deletion. A list of the deleted entries could then be retrieved from Special:DeletedContributions/Coin945 and stub entries could then be created as you suggest, with a clean slate. -- 16:10, 3 March 2009 (UTC)
- OK. I had completely revised autohagiography before I thought to ask the question. I will delete and re-enter. Many of the words look like keepers. DCDuring TALK 16:40, 3 March 2009 (UTC)
- You can delete and then undelete only good versions instead of rewriting.—msh210℠ 16:57, 3 March 2009 (UTC)
- OK. I had completely revised autohagiography before I thought to ask the question. I will delete and re-enter. Many of the words look like keepers. DCDuring TALK 16:40, 3 March 2009 (UTC)
- OK, I have dealt with all those for which Coin945 was the most recent editor. It would be great if somebody else wanted to check through the few remaining entries at Special:Contributions/Coin945, a their leisure. -- Visviva 17:05, 3 March 2009 (UTC)
Broken newspaper word pages?
[edit]What's going on at User:Visviva/Observer_header and User:Visviva/Toronto_Star_header? Equinox ◑ 20:19, 5 March 2009 (UTC)
- Not sure... what seems to be the problem? The transclusions I looked at seemed OK. (The "invalid time" messages on the template pages themselves are innocuous.) -- Visviva 02:07, 6 March 2009 (UTC)
Bot account
[edit]If you're going to be running massive tasks, could you do them with a bot flag please - otherwise RecentChanges is unusable. (marking the edits Minor helps to some extent). Conrad.Irwin 13:53, 7 March 2009 (UTC)
- OK, sorry. I was just doing this by hand in tabs (so I could see if there were any issues along the way), so it didn't occur to me. Maybe I'd better switch that "mark all edits minor" thing on. ... Been thinking about a bot account for walled-garden uploads anyway, guess I'd better get that process started. -- Visviva 13:58, 7 March 2009 (UTC)
- Started vote at Wiktionary:Votes/bt-2009-03/User:Walled gardener for bot status. -- Visviva 15:09, 7 March 2009 (UTC)
- And bureaucracy marches on. There's another 30 minutes of my life wasted. >:-( Well, if Connel runs automated scripts under his own account I suppose I can do the same. Makes it easier to add the pages to my watchlist anyway. I guess the moral of the story is that's what I should have done in the first place; the only reason the edits you were complaining about weren't marked as minor was that I wasn't automating them. -- Visviva 09:49, 9 March 2009 (UTC)
Recently I decided to create Citations:bridewell, but remarked that the main entry has been deleted by you. Wherefore, was there no meaningful content? If there was somethink like prison, a house of correction for the confinement of disorderly persons, then it was well writ. The uſer hight Bogorm converſation 15:26, 7 March 2009 (UTC)
- The one I deleted was a copyright violation. -- Visviva 15:31, 7 March 2009 (UTC)
- I've added an entry. SemperBlotto 15:30, 7 March 2009 (UTC)
NYT archives
[edit]How many years back can you extract word lists? It would be interesting to see if there is a large increase or decrease in words we don't have going back 50 or 100 years. Nadando 18:45, 8 March 2009 (UTC)
- Sadly, the free archive only goes back to mid-2006; after that you have to pay for each article, which would get pretty expensive. Similar issues apply to the Guardian and most others that I've looked at.
- One resource that could be used for this is Time, which has proofread text archives back to 1923. I've had some difficulties with setting up bulk downloads for that site (go figure), so I've sort of put it on the back burner for now. But it might be possible to get some non-case-sensitive info from the Time Corpus. -- Visviva 01:56, 9 March 2009 (UTC)
, recent changes per language
[edit]the random pages function seems down since several days -- from what I've read it occurred to me that I should be able to use the above for basically the same purpose[. I use it for studying Chinese and Taiwanese],, could you tell me how to access your function/ u creatid?--史凡 11:04, 14 March 2009 (UTC)
- Sure. The main page is here. There are lists for Mandarin and Chinese, but the latter is almost exclusively translation-section edits. The Translingual list is dominated by Han characters, so that might be of interest as well. The lists for Cantonese and Hakka are very quiet, and there is no list at all for Taiwanese, which probably means that either no one has edited a Taiwanese entry in the past couple of weeks, or it's called something else in the language header. Note that anything with a "t" flag is an edit to a translation section of an English entry.
- Not sure if these lists are quite suitable for your purposes, but let me know if there's anything I can help with. (I've been thinking of sprucing up the functionality somewhat, but that's unlikely to happen in the near term.) -- Visviva 11:39, 14 March 2009 (UTC)
- It finally occurs to my sluggish brain that you are probably looking for Min Nan. But you probably knew that already. ;-) -- Visviva 12:31, 14 March 2009 (UTC)
Discussion pages in RC by language
[edit]Hi Visviva, I use your invention every day to check Hungarian changes. I've noticed that changes to entry discussion pages are not included. It was just a coincidence that I found a note at -aid. Any plans to include it? Thanks again for this great tool. --Panda10 22:38, 16 March 2009 (UTC)
- Well, it gets problematic when there are multiple language sections, since they all share a single Talk page. But since we don't use Talk very much anyway, I suppose the amount of noise would be minimal. Or I could just track only those pages with one language section -- but that would still leave important things out. Hmm...
- Sorry about the outages today, btw. Should be back on track shortly. Glad you're finding it useful. -- Visviva 05:03, 19 March 2009 (UTC)
I dunno about this one. "womens team", "womens basketball", "womens fashion", "womens department", "womens clothing" all seem fairly common. (As are the versions with the apostrophe either before or after the "s".) It is a nuisance to cite. In terms of acceptability women's > womens > womens', IMHO.
As I recall we do not permit possessives as inflected forms. Or is it forms with apostrophes that we don't permit? It seems that womens might offer a good home for a constructive usage note on possessives. Thoughts? DCDuring TALK 23:14, 18 March 2009 (UTC)
- I would have just swatted it as a typo, except that it's on the Hotlist -- but that apparently is due to the OED's entry for "womens" as a US dialectal/nonstandard plural form ("them womens is gettin' me down" and so forth), not a possessive. If we include that, as I suppose we should, then I guess we have to mention the possessive as well. (And this and mens may indeed be common enough to merit inclusion in their own rights.) Main usage note at women, with a mention in womens, perhaps? -- Visviva 04:59, 19 March 2009 (UTC)
- As you say would be better. Surprisingly common. I think many people who would not write "Joes" as possessive would write "womens team". I wonder why. DCDuring TALK 11:29, 19 March 2009 (UTC)
Thanks for commenting on the category move proposal. There's a lot of cleanup to do on labels and categories, and your contribution has helped clarify my ideas. But I'd rather not get this simple misnaming of a category tied up in all those other questions.
Anyway, Carolina's comment has prompted me to rethink the proposal, and I've restarted it under a new heading. Please leave a note there. Thanks. —Michael Z. 2009-03-19 16:00 z
source code of the script for RC by language
[edit]Hello !
I find your tool [6] great ! Is the source code available, so that I can adapt it to the French wiktionary ? Thanks !
Koxinga 16:29, 25 March 2009 (UTC)
- The code I'm using now is pretty sloppy, and I've been meaning to clean it up before posting. However, since I don't seem to have much time lately, I will just go ahead and dump it shortly, probably at User:Visviva/rclib.py. -- Visviva 05:51, 31 March 2009 (UTC)
- So, did you get around to do it ? It's not urgent or very important but it is a very nice tool and I would like to see it used on other wiktionaries. Koxinga 22:24, 11 May 2009 (UTC)
Welcome back again
[edit]Welcome back again! DCDuring TALK 15:51, 29 March 2009 (UTC)
- Thanks, it's good to be back again. :-) I'm still at a low level of activity, but will try to at least catch up on what I was doing when I disappeared. -- Visviva 05:51, 31 March 2009 (UTC)
- Nature and Science updates would be appreciated. Keep up the good work. SemperBlotto 10:57, 4 April 2009 (UTC)
Policy changes and voting
[edit]I seem to recall that you started a BP thread some time back proposing that we loosen the vote requirement on changes to policy pages, but I can't seem to find it. I think the time is ripe to create the vote to make such a thing a reality. Do you recall where that discussion is, so I can create a vote and link to it? Of course, if you'd like to create the vote yourself, you certainly have more of a right than I. It was your idea in the first place. -Atelaes λάλει ἐμοί 22:19, 30 March 2009 (UTC)
- Please go ahead; I meant to move on this some time ago but ended up getting swamped in RL. The old discussion is at WT:BP#Revising_Template:policy. Various points were raised regarding wording; personally I am in favor of anything that moves us toward greater flexibility and wikiness. -- Visviva 05:51, 31 March 2009 (UTC)
- Enacted at Wiktionary:Votes/pl-2009-03/Removing vote requirements for policy changes. Please go vote so that people will realize the futility of opposing such a clearly inevitable measure. :-) -Atelaes λάλει ἐμοί 06:23, 31 March 2009 (UTC)
American Illustrated Medical Dictionary
[edit]Do you have any use for http://dax.wustl.edu/~msh210/AIMD.pdf, or can it be deleted?—msh210℠ 02:33, 31 March 2009 (UTC)
- I've got a copy, so it can be deleted AFAIAC; thanks for asking. (I was kinda hoping BD or somebody would scrape the text, but until that happens there's not much to be done with the PDF anyway... come to think of it, I'll be back stateside in a few months & may take a stab at it then.) -- Visviva 05:51, 31 March 2009 (UTC)
- All right: gone: thanks.—msh210℠ 15:44, 31 March 2009 (UTC)
Hi Visviva,
Since you commented at Wiktionary:Beer parlour#Transwikis from other Wiktionaries., I wanted to make sure you were aware of the resultant vote, Wiktionary:Votes/2009-03/Transwikis from other Wiktionaries.
—RuakhTALK 13:42, 31 March 2009 (UTC)
I seem to remember creating this and don't remember violating copyright on it. Can you please reconsider your deletion, or userfy the page for me? Stifle 10:31, 14 April 2009 (UTC)
Tea Room discussion of the supernumerary plural forms of deus ex machina
[edit]Hi Visviva. You asked here for a Tea Room discussion of the plural forms of deus ex machina; though not in the Tea Room, there is such a discussion (of sorts) now taking place on the talk page of {{en-noun}}. Just FYI in case you want to contribute to the discussion. † ﴾(u):Raifʻhār (t):Doremítzwr﴿ 16:33, 14 April 2009 (UTC)
Notices of the AMS
[edit]I know I haven't been keeping up with adding the Notices redlinks you've posted thus far, but would you mind catching up on the issues anyway? I do hope to get to them sooner or later. (And, obviously, others, too, may wish to.) March, April, and May are now available, I believe. Thanks much.—msh210℠ 17:14, 14 April 2009 (UTC)
Hi, thanks for writing this - it's much nicer than abusing AutoEdit which my solution. If you get time could you possibly update it so that it doesn't throw a javascript error. The following fix is from the slightly modified version I'm using in Simple Wiktionary's acceleration, which in addition to catching the error also adds a preload summary field. (If you want me to fix it I'm happy to, and I'd also be happy if you imported the preloadsummary ;). Thanks Conrad.Irwin 22:46, 19 April 2009 (UTC)
function preloadText() {
var pretext=""
var presummary=""
try {
parts=window.location.search.split("?")[1].split("&")
} catch (e) {
return; //not an edit page
}
- Thanks for taking care of that, and sorry for having been out to lunch for so long. Please feel free to make any other improvements that seem worthwhile; my coding "skills" are unlikely to match yours in the foreseeable future. :-) -- Visviva 14:37, 19 August 2009 (UTC)
RC by language
[edit]First of all I would like to thank you for this most valuable tool. And seceond of all, I wish to express my hope that you will be back soon and continue updating it. Cheers! --Vahagn Petrosyan 18:50, 20 April 2009 (UTC)
- Hope to get this running again soon. (On a remote server this time, so it won't be subject to the vagaries of my home internet connection.) -- Visviva 14:40, 19 August 2009 (UTC)
- Welcome back! Are RSSs going to be working soon? I miss those. Also, how about adding rarely edited languages like
{{udi}}
and{{oge}}
to the list? --Vahagn Petrosyan 19:23, 28 August 2009 (UTC)
- Welcome back! Are RSSs going to be working soon? I miss those. Also, how about adding rarely edited languages like
- Oops, I'm an idiot. RSS feeds are up (or should be), at http://www.fraktionary.com/rc/(Language).html Links should be fixed on the next cycle.
- The languages list is mostly from RU's L2 list as of early this year, so I'll just need to update it. Should be able to get that done later today. Glad you've found this useful! -- Visviva 02:42, 29 August 2009 (UTC)
- Visviva, are you coming back soon? I start missing RC by language again. :) --Vahagn Petrosyan 11:44, 15 November 2009 (UTC)
- Should be running again now. Sorry about that; looks like the server got a case of the hiccups as soon as I stopped paying attention to it. -- Visviva 14:44, 15 November 2009 (UTC)
- Thanks for fixing so soon. --Vahagn Petrosyan 17:06, 15 November 2009 (UTC)
A while ago you were working on the RfV senses of the word do. Rather than try and figure out what the results were myself I was wondering if you could take care of the 4-5 RfV senses. It seems like a lot of work took place outside of the RfV discussion and I don't want to make the wrong calls based on lack of information. Thanks, [The]DaveRoss 22:54, 1 June 2009 (UTC)
- I would keep all, as I know that there are 2 more citations out there for each of these. It seemed at the time that adding more than one would be overkill, but I think that was a lapse of judgment on my part. I should probably add those citations before removing the tags. -- Visviva 15:01, 19 August 2009 (UTC)
Walled gardener
[edit]What ever happened with Wiktionary:Votes/bt-2009-03/User:Walled gardener for bot status? Looks like it passed and wasn't effected.—msh210℠ 22:49, 18 June 2009 (UTC)
- It does indeed. Who should I needle, I wonder? -- Visviva 14:35, 19 August 2009 (UTC)
Where are you?
[edit]Hello. I trust you have found something more pressing or enjoyable to do, and haven't died or anything. I miss your newspaper additions! Equinox ◑ 23:32, 20 July 2009 (UTC)
- Fortunately I had the scripts running on my computer the whole time. (Ask me why I didn't just take a few minutes to set up automatic daily uploads.) Unfortunately the Guardian has been doing something odd with their daily archives (so that if one clicks on a subsection, one gets a page full of stories dated the previous day), so it will take some fiddling before that particular set of lists is back online. But NYT is going up now, and the Star and Herald Sun will follow shortly (though the Herald Sun is going to have some lacunae, I fear). -- Visviva 14:34, 19 August 2009 (UTC) UPDATE: Think I had better wait until that bot flag comes through, don't want to clog RC too much.
- Many welcomes. Perhaps you can talk some sense into me about some things. DCDuring TALK 15:16, 19 August 2009 (UTC)
- Well, I can try. :-) Anything particular in mind?
- You know, as much of an ass as I was for just dropping my tools and waltzing away for months on end, it has lent me a new perspective on things. It is particularly striking just how eerily unchanged the community discussion is from 5 months ago -- even extremely minor reforms are still stalled (or abandoned). We have some serious problems here. -- Visviva 15:49, 20 August 2009 (UTC)
- There was some significant unpleasantness. Your thoughts are usually most especially valuable shortly after the heat of the moment. (Nothing does much good in the heat of the moment.)
- But as to the absence of progress, I can conjecture about causes. One is that folks just don't like policies. That might be due in part to the fact that we don't have good draftsmen. We also don't seem to have a good sense of the consequences of our rule-making. The combination of the latter two with the general bad experience in getting policies/rules passed means that a policy can have unintended consequences of unknown magnitude which can be corrected only with work.
- Another cause is an absence of consensus about goals, even at the most basic level. Our slogans and other writings, though fine sounding, are not wholly adequate. Recently a new contributor took a "strict constructionist" reading of the first line of WT:CFI: "A term should be included if it's likely that someone would run across it and want to know what it means." He felt he qualified as a "someone" and added a noun phrase from a song lyric which he had run across, wanted to know what it meant, did some research to find out, and, wanting to share in a wiki spirit, made an entry for it. (Recall that just below the sentence quoted is the suggestion that term is to be interpreted broadly.)
- Even "All words in all languages" turns out not to be what we mean, unless we footnote "all", "words", "languages", and possibly "in" (See WT:RfD#avant la lettre for a case where "in" might be better read as "between".)
- There are fundamental questions as to which needs of which users' are to be taken into account in designing entries and including headwords. There are questions as to who should vote in our largely invisible votes. We are already far removed from the idea of encouraging interested users to make contributions.
- But such conjecture about root causes doesn't help very much with how to fix things. Perhaps we need to get a bunch of the most "technical" of changes to WT:ELE (or possibly WT:CFI) through so that voting isn't felt to be so dreadful. DCDuring TALK 16:57, 20 August 2009 (UTC)
- Hi again. You know how your lists say "List status: open"? A while ago (I haven't kept it up, unfortunately), I started going through the lists from the beginning, doing every word I could, and not moving on to the next list until I'd completed (or given up on) the previous one. Is there a procedure for "closing" a list when only sequestered words remain? Equinox ◑ 22:17, 7 September 2009 (UTC)
- You can just add "|status=closed" to the header template. This doesn't actually do anything at present (apart from changing the header to say "This list is closed"), though it probably should. Thanks for your work! -- Visviva 10:18, 8 September 2009 (UTC)
- I did it at User:Visviva/NYT 20070207, but the list is still marked open. Did I make a mistake? (Note: the following are also closable/closeable: User:Visviva/NYT_20070120, User:Visviva/NYT_20070125, User:Visviva/NYT_20070202, User:Visviva/NYT_20070205.) Equinox ◑ 22:50, 9 September 2009 (UTC)
- Sorry, I meant specifically the "foo_header" template;
{{wordhunt stats}}
just makes the little numbers at the bottom. I can bottify this when I have time, so no need to bother if it's a hassle. (Speaking little bits of code that I put off for way too long, finally got my Guardian tracker fixed. Will probably only post the more recent lists here, and put the rest on Fraktionary where they can add constructively to the wordpile. I wot it's probably about time to clean up my userspace a bit. ;-) ) -- Visviva 02:39, 10 September 2009 (UTC)
- Sorry, I meant specifically the "foo_header" template;
- Heh. I can see your footprints. Good work! But I wonder if there is anything I can do with the templates that would make it easier to include the citations when creating entries? My goal with the "add" links was to make it so that it would be as easy to create an entry with at least one citation, as it would be to create one without (or ideally, even easier). If there's anything I can do to make this more of a reality, please let me know. -- Visviva 04:08, 10 September 2009 (UTC)
Minor formatting quibble with {{quote-news}}
[edit]Hello Visviva -- This is pretty picayune, but it has bugged me a bit for a while. In the quote-news template, the title of the quoted article is placed in quotation marks, as it should be, and followed by a comma. However, the template places the comma outside the closing quotation mark. In most style guides, and in professional printing, the comma is generally placed inside the closing quotation mark. (The same is usually done with a period when a sentence ends with a quoted passage or the title of a minor work, but I digress.) Any chance you could change it? -- WikiPedant 16:04, 5 September 2009 (UTC)
- If I recall correctly, this was explained to me a long while back -- probably by EP -- as being the preferred Wiktionary style. I believe the explanation was something like: a) preferred usage differs between North American and British English regarding the placement of the closing quotation mark; and b) because the British usage is less ambiguous -- i.e. there is no question of whether the comma/period/semicolon was part of the quoted text -- it is preferred here. As a Leftpondian myself, this always gives me problems; I have to reset my internal language filter whenever I switch between Wiktionary and working on some RL document. But it does make some sense as a broad rule -- for example, it means that templates like
{{term}}
can be placed at the end of a sentence without inconsistency -- and if we're going to follow this practice consistently elsewhere in entries, it surely makes sense to follow it in citation lines as well. Will try to find past discussion of this; I fear this is yet another lacuna in our documentation. -- Visviva 03:52, 6 September 2009 (UTC)
- I never could find that discussion, so I've started a new one at: Wiktionary:Beer_parlour#Quotation_marks_and_terminal_punctuation. -- Visviva 07:40, 26 September 2009 (UTC)
3 at once
[edit]- POS tests: great idea. I'd been thinking about such a thing for a while, especially in the weeks since my CGEL (awe-inspiring) arrived. I'm also interested in sufficiency conditions for idioms in hopes of making sure that we (I !!!) don't waste too much time on RfDs.
I was reading Jespersen and found two items of possible interest to you:
- Apparently in EME verbs written as ending in -eth were often pronounced as if they ended in -s, which was apparently the norm for much colloquial English.
- The current state of affairs with respect to -ing forms is a jumble that has emerged over years. The first -ing forms were from nouns. It is because verbs came to be spelled the same as some of those nouns that English speakers came to think of the -ing as a normal inflected form of all verbs.
I don't know what modern scholarship thinks of these conclusions, but Jespersen seems fairly sound. I do know that CGEL makes a point of being ahistorical. They specifically say they don't care about the diverse origins of the -ing suffix. DCDuring TALK 18:50, 5 September 2009 (UTC)
- 1. Thanks, it seemed like an idea whose time had come. Once I have it filled out a bit, I'm hoping to projectify the page to Wiktionary:POS testing or wherever. That way, if other editors have sources with useful takes on these issues, those can also be added. I'm thinking the basic areas that need to be covered are: participle vs. noun (doneish), participle vs. adjective, and noun-adjunct vs. noun.
- 2. Interesting. I don't have a particular interest in this entry group; I was only editing that batch because they were among the 100 or so English entries still using "Verb form" as a POS header, which I thought had been banished from the wiki long ago. (Many of those, though not overtly vandalistic, were created by a well-known vandal, and had not been edited since. I've now created a full list of entries from the dump that still have his UID on them.) Still, it's something that will need to be addressed as people add more Pronunciation sections to these.
- 3. Looks like we have the very beginnings of coverage of this issue at -ing. Definitely need more depth, either there or maybe at a separate appendix that could be linked widely. I do wonder about that etymological breakdown in -ing, given Huddleston & Pullum's (IMO well-considered) stance. ELE doesn't really provide for convergent etymologies... -- Visviva 05:45, 6 September 2009 (UTC)
- 2. I have put a little something at -eth#Pronunciation. It is fairly arcane. I wouldn't have mentioned it if I hadn't noted your -eth work.
- 3. We sometimes have etymology language "influenced by", but that seems insufficient for this case.
- It is very hard to make Wiktionary all the things we want it to be at the same time. Open and consistent; inclusive and high quality; monolingual and translating; modern and historical; multi-generational; complete and not redundant; complete and compact; complete and not confusing. DCDuring TALK 15:17, 6 September 2009 (UTC)
- 2. Looks great. If we can cite it up a bit, then all of the -eth forms that have "Pronunciation" sections can have a note linking back to that.
- 3. I would be inclined to discuss both etymologies under a single heading, but I'm afraid our resident etymologists wouldn't care for this. Shall have to marshal my sources before I attempt anything. :-)
- 4. I like to think of Wiktionary as a sort of meta-dictionary rather than a single work; an enormous bookshelf of dictionaries, if you will. At the moment they are rather difficult to separate, but there is no reason in the future that we could not make it easy for someone to pull out, say, a "Wiktionary of Legal and Business English" or an "Italian-English Scientific and Technical Wiktionary", just as if they were in some well-appointed university library, without having to deal with the other aspects of the project at all.
- So my primary concerns now are a) how to get more and better information in, and b) how to get more, better and cleaner information out. http://fraktionary.com/index.php/Special:Random is where I'm at on the first. (It's a bit messy now, and due for a big update, which will get the total number of wordpages to about 80,000 if I don't crash the server ... hopefully will get that done this week). I haven't made much progress on the second yet, but see User:Visviva/Page of the day -- not actually updating daily yet -- for a sort of notional markup of how such a slice might look. (this would be the "English-English, all words, definitions only" slice). -- Visviva 02:11, 8 September 2009 (UTC)
- It would be easier to think of it as a meta-dictionary if the data structure were better. I'm used to relational data structures (in the abstract anyway) and I can more readily imagine views built on such structures than on what we have. Our data seems a mess.
- I hope that our PoS headers are sufficient to support such efforts. In English, I suppose that we are not too bad if we look at single-word entries, except for Interjections and both participles. I don't think that the same is true for multiword entries, where we have had Proverbs and several approaches to other terms (Idiom as L3 header, Phrase as L3 header, assignment to basic PoS), none of which are completely satisfactory. Idiom is fast disappearing in English entries (though it is fairly common in CJK). I had thought the assignment-to-basic-PoS approach was best and would cover all cases. I don't think it covers all cases and my assignments to PoS would not pass muster with CGEL. (H&P do seem rather contentious.) CGEL's analyses are fairly compelling, but I don't think that we can dispense with any of the traditional PoS lexical categories that we use as L3 headers.
- I have been thinking of adding more modern grammatical categories to the English multi-word entries, especially to clauses, elliptical expressions, sentences that are not proverbs, sentence adverbs, imperative expressions. I have begun all except for the ellipticals. Category:English sentences is reasonably well populated. I may eventually get to some of the non-typical and marginal members of PoS categories like prepositions.
- The non-uniformity of entries, especially non-English, is pretty distressing. The number of English entries that are greviously incomplete upsets me often. How do those who reuse this information deal with the mess? DCDuring TALK 03:25, 8 September 2009 (UTC)
- User:Visviva/GSL coverage might be a useful basis for triage, if you're concerned about incomplete entries for core vocabulary. "Shortfall" is the difference between the number of full senses in Dictionary.com's version of the Random House Unabridged and the number in the corresponding English section on Wiktionary. As you can see if you sort by "shortfall," our median shortfall is 4 definitions (almost 5), which isn't that bad all things considered; but it does make one realize why we aren't anyone's go-to reference just yet. (It's somewhat more discouraging if you realize that RHU number excludes subsenses, while any Wiktionary subsenses are included in the count.) I'm thinking of marching through the list in reverse alphabetical order until I get bored, but would be interested in something more systematic/collaborative.
- I have started doing something similar. I was interested in using Longman's DCE, MW3, and McGraw-Hill's Dictionary of American Idioms and Phrasal verbs to estimate headword shortage and sense shortage. I was hoping to use DCE for a first cut at quality, because it is innovative and closer to us in spirit than many of the others, our de facto target audience is not too dissimilar from theirs, and it is the most basic senses that concern me most. On the headcount estimation score you won't be surprised to know that we have all of DCE's entries on one sample page and that our raw count of senses is above theirs. In comparison with MW3 over the same range of entries, by far the largest source of headword deficiency was in taxonomic names. Other technical vocabulary was also a source of deficiency. I have not completed the quantitative analysis.
- I'm not sure of the approach to doing this. Attempting to make this an insiders' project seems unlikely to succeed. We need to enlist a significant number of contributors and provide meaningful help to them: explicit, up-to-date help on the art of defining; model entries of various types (each PoS + some meaningful semantic categories; some sort of hints about finding citations. The "open" PoS classes seem like the ones where we do not necessarily need the most senior contributors to do all the work. The closed PoS classes are not particularly well done either AFAICT, without much enthusiasm for working on them among those who I would seek to rely on. It seems to me that we don't have nearly enough committed EN users to make any class of words systematically of high quality.
- Because folks contribute in such small units it seems that we would need to have an approach of creating lists of entries that need specific kinds of review and then improvement. We would need some kind of protocol for review of entries and a set of sense-focused tags. I have regretted the premature removal of the webster's tags from many of our entries because those tags provided a good indication of the presence of obsolete senses and dated wording.
- User:Visviva/GSL coverage might be a useful basis for triage, if you're concerned about incomplete entries for core vocabulary. "Shortfall" is the difference between the number of full senses in Dictionary.com's version of the Random House Unabridged and the number in the corresponding English section on Wiktionary. As you can see if you sort by "shortfall," our median shortfall is 4 definitions (almost 5), which isn't that bad all things considered; but it does make one realize why we aren't anyone's go-to reference just yet. (It's somewhat more discouraging if you realize that RHU number excludes subsenses, while any Wiktionary subsenses are included in the count.) I'm thinking of marching through the list in reverse alphabetical order until I get bored, but would be interested in something more systematic/collaborative.
- Well, I can do targeted cleanup lists. My general perception is that these lists don't attract a whole lot of attention, but here are a few I've cooked up from the definitions dump: User:Visviva/Cobwebs. Those cover the three issues that I'm most familiar with from ex-Webster's entries: "as" followed by comma, semicolons, and citing an authority by name only on the definition line. The /dashes list is of almost manageable size; the other two are in the thousands. If there are other problematic patterns you've noticed, I can try to run a list for them as well. -- Visviva 10:53, 11 September 2009 (UTC)
- There are precious few reusers of Wiktionary at this point, AFAIK, for a number of excellent reasons (stiff competition, no marketing, difficult to extract data; plus most of our content is not really ready for showtime yet in any case). But the thing about the slice model is that it doesn't really matter if, say, our Yiddish and Vietnamese entries suck -- or even if 90% of our content sucks. Thus, as long as minimal, AutoFormat-type standards are met, it makes more sense to focus on the specific areas where we can do well, so that there are some aspects of Wiktionary content that people will go out of their way for. Someone using the Italian-English Technical Wiktionary isn't going to care about our German entries, or even our English entries for that matter. And if we can develop some areas of flagship content, attracting and directing the energies of new contributors will become much easier. -- Visviva 05:17, 10 September 2009 (UTC)
- Do you have any ideas about the selection of target areas or target users? I would like to think that we could be clever and develop a framework for working on entries that would have us simultaneously recruit contributors and create model content. Would updating the Webster's tagged entries be a way to get started? That might allow us to develop the skills in crafting definitions, to make explicit some criteria for good definitions, and to develop some processes of application to other lists of target entries. DCDuring TALK 11:03, 10 September 2009 (UTC)
- There are precious few reusers of Wiktionary at this point, AFAIK, for a number of excellent reasons (stiff competition, no marketing, difficult to extract data; plus most of our content is not really ready for showtime yet in any case). But the thing about the slice model is that it doesn't really matter if, say, our Yiddish and Vietnamese entries suck -- or even if 90% of our content sucks. Thus, as long as minimal, AutoFormat-type standards are met, it makes more sense to focus on the specific areas where we can do well, so that there are some aspects of Wiktionary content that people will go out of their way for. Someone using the Italian-English Technical Wiktionary isn't going to care about our German entries, or even our English entries for that matter. And if we can develop some areas of flagship content, attracting and directing the energies of new contributors will become much easier. -- Visviva 05:17, 10 September 2009 (UTC)
In re your suggested technical fix to accommodate the variable display of different spellings
[edit]Hiya. I responded in that discussion. Just FYI, in case you’d’ve missed it! Regards, † ﴾(u):Raifʻhār (t):Doremítzwr﴿ 01:10, 10 September 2009 (UTC)
Any chance of getting this updated, or were you simply waiting for more lists to accumulate first? Nadando 02:38, 10 September 2009 (UTC)
- I've been working on something slightly more insane, which is still not quite ready for showtime, but from which that page could draw a random sample. See e.g. [7]. But in the meantime, I can still update it in the usual way (there is some junk in the mid-July NYT lists that was causing some problems, but I should be able to take care of that somehow). Will try to get to this in the next couple of days. -- Visviva 02:47, 10 September 2009 (UTC)
Cobwebs
[edit]There seem to be a great number of fairly common words in the "dash" list, judging from the "a"s. I have "marked" them by inserting {{R:OneLook}}
. I have only completed one, which I have removed from the cobweb list. There are 26 more, which suggests that my assessment of useful words on the list would yield low hundreds of entries from the entire dash list. If you have the chance and it isn't too hard, the intersection of (the union of the three cobwebs lists and entries using {{webster}}
) and the GSL list would be an excellent definition cleanup list. I hope that it isn't too close to being the entire GSL list.
These can be very time-consuming and draining. Perhaps we should have a "heavy-lifting" definition cleanup list recommended for EN-3s and up.
I was thinking of adopting the practice of moving quotations of obsolete senses to the citations page for longish entries. And probably also moving many of the older quotes for current senses, especially if there are more than one for the sense.
My plan is to work through each of the 26 as completely as I can (leaving the actual removal of the last cobweb item to the end). I hope I will learn something about how to speed up the process. DCDuring TALK 15:37, 11 September 2009 (UTC)
- Wasn't too hard -- Python's "set" objects did pretty much all of the actual work for me. See User:Visviva/Cobwebs/GSL. At 277 words, it's only a little over 10% of our total GSL coverage, which is really not that bad (or at least, it wouldn't be that bad if these were the only problems such entries suffer from).
- It's true that the closer one gets to the core of the lexicon, the more brutally difficult (and unenjoyable) the work becomes. But I think we could usefully point any eager newbies to the lower-frequency words on the semicolon list.
- I have very mixed feelings about removing quotations from entry pages under almost any circumstances. The problem is that once on the citation page they lose that direct connection to a specific sense which is their only real reason for being here in the first place. This is particularly true for highly-polysemous core vocabulary. On the other hand, getting these old quotes into proper shape is an enormous pain, and one does it knowing that the quotation is purely an import artifact; no Wiktionarian has come along and said "hey, that line from Sonnet 30 would be perfect here!" Plus a lot of the Webster's quotations lack any real illustrative value anyway. I have occasionally been tempted to just throw them out, but have only done so in extreme cases (e.g. a sentence fragment that proved to come from a long and turgid sentence with nothing to recommend it). -- Visviva 17:54, 11 September 2009 (UTC)
- Thanks. Glad it wasn't too hard and glad that there aren't too many.
- The combination of obsolescence of so many senses, the obsolescence of the quotes, and the need to find the quotes again to format then to our standard make me think that I will go ahead. Had your reservations been stronger I wouldn't. Wouldn't it be nice if we could get build a complete formatted citation from one of these fragments? Our existing templates do speed the job a bit, but....
- I look forward to the day when a click on something at a sense unlocks the riches associated with that sense. Until that day, I'm going to just try to implement the best compromise that gains consensus, cautiously exploiting lack of consensus, and just slightly pushing the envelope.
- BTW, Dan Polansky has started Appendix:English gerund, now ambitiously renamed Appendix:English grammar, which is a focal point at the moment for trying to get the issues involved in line. Because he is not En-N, but is very good in English, and is quite thoughtful, he is interesting for me to work with on this matter. I am open to any solution that those who understand the issues can come up with or even the status quo plus a good Appendix or section thereof. DCDuring TALK 18:41, 11 September 2009 (UTC)
Huh? A b.g.c. turns up a lot of evidence for the adjective:
- "Certainly the cornified envelope is a constant feature of mammalian epidermis..."
- "Frequently a plug of cornified material protrudes from the opening..."
- "Epidermal keratinocytes undergo characteristic changes as they are progressively moved upward from the basal layer of the epidermis to the cornified layer..."
--EncycloPetey 04:41, 13 September 2009 (UTC)
- I'm not seeing how that would be other than a standard use of the past participle of the verb cornify (which we were missing until just now). A cornified layer is a layer than has (or has been) cornified, just as a baked cookie is a cookie that has (or has been) baked. -- Visviva 04:47, 13 September 2009 (UTC)
- Because the element appears between a determiner and the noun. Also, a past participle does not have "more" or "most" forms. See the specific quotes I've added to the page. --EncycloPetey 04:51, 13 September 2009 (UTC)
- Looks good. I had just found some uses of "appeared cornified" and was about to reverse myself. :-) -- Visviva 04:56, 13 September 2009 (UTC)
If you want some challenging fun, you can look for synonyms / coordinate terms for (deprecated template usage) corkscrew, which is WOTD today. We don't have an entry for (deprecated template usage) wine key, which I think has the synonym of (deprecated template usage) waiter's friend (not sure about that or the form). I seem to recall a number of other names for the device as well, and am uncertain whether the object can be considered synonymous with corkscrew or is merely a similar object. Unfortunately, I have to go to bed now. --EncycloPetey 05:03, 13 September 2009 (UTC)
- Will do when I have a chance (must get some work done before my Monday deadlines catch up with me). You know, I've always wanted to make 'nym sections that look like this: [8] ... but I doubt if it would fly in mainspace. (I certainly won't pollute the WOTD with my attempts.) -- Visviva 05:43, 13 September 2009 (UTC)
FYI: My expansion project for the day was (deprecated template usage) curl, which has been greatly expanded. I'm short citations for a few definitions, though, and am running out of steam. Thought I'd mention it, in case you'd care to take a stab at it. --EncycloPetey 03:18, 14 September 2009 (UTC)
- Wasn't able to get to it today -- Monday has, as is its wont, kicked my ass clear into Tuesday morning -- but hope to have a go at it shortly. -- Visviva 15:05, 14 September 2009 (UTC)
- No rush. I usually don't have that kind of time these days, except on weekends. Ruakh and Equinox have handled the calculus senses. --64.175.231.30 21:35, 14 September 2009 (UTC)
Today's Guardian
[edit]Hi there. The entries for today's Guardian are almost all spelling mistakes and typos. I haven't bothered to create misspelling entries for most of them as they seem to be uncommon. SemperBlotto 09:23, 17 September 2009 (UTC)
- Yeah, I need to do some rehab on this. Throwing out anything with "blog" in the URL (which I should have been doing anyway) will take care of a bunch of it, and throwing out any "sentence" that lacks final punctuation should do for about half of the rest. While I'm at it, I'd better fix it so it's uploading to the correct day (if I can figure out which day that would be, now that they've gone all postmodern with their date assignments). -- Visviva 09:38, 17 September 2009 (UTC)
Splitting Appendix:Words found only in dictionaries by letter
[edit]This was definitely a good idea. However, I wonder whether we should go a step further; do you think we should split the appendix into subpages (again, by letter)? –It may very well comprise hundreds of words in a few months… † ﴾(u):Raifʻhār (t):Doremítzwr﴿ 09:22, 18 September 2009 (UTC)
- This might be necessary in the future, but I would prefer to put it off until then, since it makes maintenance much more headache-inducing.
- OED entries that cite as their earliest source Cockeram, Blount, Phillips, Cole, the various editions of Johnson and Webster, or Crabb, appear to amount to about 2500 altogether. So if current trends hold, that would make the final list a bit over 1000 words long. Of course, this is still leaving out quite a few dictionaries, to say nothing of slang dictionaries (which may warrant separate treatment). Plus the various dictionary words that aren't included in the OED -- I think they have most of the significant ones, but the C-K and S-Z ranges may need further investigation if this list is to be truly comprehensive. -- Visviva 10:37, 18 September 2009 (UTC)
- Yeah, that’s fine; let’s cross that bridge when we come to it. BTW, I share DCDuring’s concern that the list of “authoritative dictionaries” be restricted to genuinely authoritative lexicographical works; there are many philaveries whose gems, though interesting, are to be found nowhere else. IMO, the list should comprise only words who have some lexicographical staying power (even if, as DCDuring quaintly puts it, they are “zombie words”); consequently, let’s start with just the OED and Webster’s, and then consider expansion once those two are pretty well covered. † ﴾(u):Raifʻhār (t):Doremítzwr﴿ 12:55, 18 September 2009 (UTC)
- Well, you don't need to worry about me going beyond the OED. If I ever make it to the end of the current list (plus whatever additions there are once I have added entries that cite Century, Imperial et al.), no one will be more astonished than I -- or happier to go off and do something very different for a while. At any rate, per the description, the idea of the list was only to include words in multiple English dictionaries, though a couple of ghost words have been added that don't seem to strictly meet that requirement. -- Visviva 13:12, 18 September 2009 (UTC)
- I assume you mean (deprecated template usage) dord and (deprecated template usage) zzxjoanw; yeah, the latter especially doesn’t seem too worthy of inclusion, IMO… † ﴾(u):Raifʻhār (t):Doremítzwr﴿ 13:19, 18 September 2009 (UTC)
- Yeah, I'm not too fussed about them for now, since AFAIK there aren't very many; esquivalience is the only other one that comes to mind. Maybe I'll just put an asterisk next to them or something. -- Visviva 14:56, 18 September 2009 (UTC)
- Yeah, an asterisk sounds good. And what does esquivalience mean when it’s at home? † ﴾(u):Raifʻhār (t):Doremítzwr﴿ 04:02, 20 September 2009 (UTC)
- BTW, kudos for the key to dictionaries. † ﴾(u):Raifʻhār (t):Doremítzwr﴿ 13:20, 18 September 2009 (UTC)
- Thanks. Seemed easier than typing the full title over and over again. Sad that there don't seem to be any online versions of Cockeram or Blount, though. -- Visviva 14:56, 18 September 2009 (UTC)
- That would be an excellent task for Wikisource to undertake. It would be of incalculable value to have the ability to perform full-text searches of all those dictionaries you’re using… † ﴾(u):Raifʻhār (t):Doremítzwr﴿ 04:02, 20 September 2009 (UTC)
- Well, as I just discovered yesterday, that incalculable value is already provided (for dictionaries through 1702) by LEME. Unfortunately, if you want to make more than about 10 queries per day you have to buy a subscription ... which one can scarcely hold a grudge about, I suppose, given the work that must have been involved in building the database. -- Visviva 04:33, 20 September 2009 (UTC)
- Thanks for that page Visviva, I find it very amusing (and a warning to us all I suppose). Jcwf 15:09, 21 September 2009 (UTC)
Why do this, may I ask? Including that information in the {{qualifier}}
ensures it won’t be missed and it takes up no additional vertical space. OK if I revert that bit? † ﴾(u):Raifʻhār (t):Doremítzwr﴿ 08:22, 23 September 2009 (UTC)
- It seemed really needlessly distracting to me. Visually, it means that the first sentence-like thing that catches the eye is not the definition, but an extremely extraneous bit of information about spelling. On reflection, I'm not actually sure that this information needs to be in thalweg at all; it seems like it would be much more at home in either talweg#Etymology or talweg#Usage notes. Organizationally, it breaks up what should be a simple delimited list of alternative forms with information pertaining the usage of those forms, when the normal practice is to put usage information into the usage notes.
- All that said, I was just going by gut feeling, and I wouldn't trust my gut feeling on design further than I can kick it. So do what seems right. :-) -- Visviva 08:39, 23 September 2009 (UTC)
At your convenience, could you please take a look at what I have left at Wiktionary_talk:Links? DCDuring TALK 17:10, 25 September 2009 (UTC)
Technical implementation of spelling toggling
[edit]Hi Visviva. Could you comment in the Beer Parlour on the technical aspects of the implementation of spelling toggling please? Thanks. † ﴾(u):Raifʻhār (t):Doremítzwr﴿ 20:08, 28 September 2009 (UTC)
- Thanks for your comments. † ﴾(u):Raifʻhār (t):Doremítzwr﴿ 03:11, 30 September 2009 (UTC)
ta4ll!:0)
[edit]--史凡>voice-MSN/skypeme!RSI>typin=hard! 05:07, 30 September 2009 (UTC)
An area specific to Translingual has been started. It would benefit from your input. It might finally lead to some consistency within the various classes of Translingual entries. Letters and symbols are IMO the messiest ones that I see regularly. I have few (no!) constructive suggestions, but some questions. You probably have more constructive thoughts, and about CJKV Translingual, too.
Unrelatedly, do you know whether the 2000 or so Idiom headings in Chinese are the best/necessary way to handle such entries? Who is the best person to ask about this? I dislike the Idiom header and find it unnecessary (Phrase handles multiword entries that aren't handled by other PoS headers) in languages that I have a little knowledge in but I recognize the PoS categories of my youth are not quite as applicable as they seemed even in English and certainly can't be assumed to all be useful in other languages. I do realize that we retain some PoS headers for backward compatability with users. I have had thoughts that we might want to add and expand grammatical categories for multiword entries, along the lines of Category:English phrasal verbs and Category:English sentences (hidden). Thoughts? DCDuring TALK 12:03, 1 October 2009 (UTC)
- Yes, I've been eyeballing that page, without having any particularly brilliant ideas. It obviously needs to be expanded, but each of the areas in question is so fraught that one hesitates to plunge in. (On a tangentially related matter, I've been wondering if the adoption of hangul as the writing system for Cia-Cia finally gives me full justification to treat the zillion-and-one hangul "syllables" as translingual symbols. Alas, I have a feeling my fellow Wiktionarians won't see it that way... Some days I am strongly tempted to just start creating Translingual entries for every Unicode character and see how far I can get before the uproar/carping becomes deafening.)
- Such entries have value, albeit for a limited population of users or use-case scenarios. How can we help users find Unicode characters they come across to make such wikt entries more useful? DCDuring TALK 16:19, 1 October 2009 (UTC)
- I think it would be very difficult to get rid of the Idiom header for the various Chineses. It's a very well-established category, and not just at Wiktionary. But it could definitely use to be better documented. I associate it mostly with the "four-character proverbs", but I expect it has a more extensive and precise definition for those who know what they're doing. Perhaps one could stir the pot at Wiktionary talk:About Chinese. -- Visviva 12:28, 1 October 2009 (UTC)
- I suspected as much. I would like to see whether I can convince folks in some other languages use the idiom context tag instead of the Idiom header and infl parameter. An eventual across-the-board prohibition would have been nice, but working this out language by language is probably feasible. I may find out why my notion is unworkable. DCDuring TALK 16:19, 1 October 2009 (UTC)
- Our handling of multiword terms could definitely be much more nuanced. Things that come to mind include headedness (noun-headed, verb-headed), or composition (adjective+noun, noun+noun). I haven't focused much on this area, mostly because single-word terms seem more critical/interesting and we're not exactly running out of them yet -- but there is certainly plenty of work that could be done. -- Visviva 12:35, 1 October 2009 (UTC)
- We seem to be on a path toward including more and more multiword terms, especially if we let any one of the Pawley criteria make such terms includable. I would like to head off a major need to retroactively clean things up by testing out such categories now. Headedness is a good basic category-set definer and is quite compatible with most of what we already have and is clearly defensible. That is more or less what I have been doing by moving English items from Idiom and Phrase to Noun, Verb and other head-PoS categories. (Also out of Interjection.) But some notions seem wrong-headed (!!!), too bleeding-edge unstable to me. I don't think we can follow CGEL wherever it goes even when it seems to make a good case!
- However, there are also some items that seem lexically useful but don't really seem to be fit into a category of X-headed phrase. A term like in order to is an example, but not necessarily the best. Some idiomatic constructions that are not set phrases (X me no X's, uncle me no uncles; and some of the proposed X and X/Y type constructions generally, for example) are another set that comes to mind. Some of these are in Category:English phrases which has entries I could not confidently assign to other PoS headers on my first pass(es) through.
- I started editing the new CFI and realized that some of what I would want to write is too detailed or has more to do with usability and search considerations. I will probably start subsidiary pages as the spirit moves me. DCDuring TALK 16:19, 1 October 2009 (UTC)
- Sounds like a plan. For stuff that's related to layout/style, please also consider adding it to Wiktionary:Style guide, which I just started (or re-started, apparently). That's going to need a lot of work, but I think we are finally approaching a point in the life of the project (and the community) where it can be useful. -- Visviva 18:45, 1 October 2009 (UTC)
actional
[edit]Hi, the history page lists you as the editor for the entry defining it as "gestural". I came to the page via a link and was wondering where you found this def. Would you be so kind as to put a source below here? I often found the two mentioned side by side, but have so far failed to find them mentioned as synonyms. Thks. for any help you can give. (Lisa4edit) --71.236.26.74 03:45, 2 October 2009 (UTC)
- I must confess I have no specific memory of creating this definition. :-o From a quick look around, I think you are right, and the definition is in error. I will update it accordingly. Thanks! -- Visviva 04:31, 2 October 2009 (UTC)
Yes, feel free to move the lists to an appendix. Take heed of the note at the top, though: some characters are missing from some categories, even some characters that were included in the original list (that big official text file) from which I generated the pages. I noticed at the time that some categories were not being generated correctly — perhaps a text-encoding bug in my program, which I didn't keep, or something to do with pasting into the text field — so I removed the dodgy ones. Plus Unicode might have had a revision and new characters since that time! Equinox ◑ 15:01, 2 October 2009 (UTC)
- Hi there. Do you think it would be a good idea to add some sort of instructions to users on how to view these characters? i.e. what various fonts to install (if there are any) (I just get squares for all of them) SemperBlotto 14:01, 4 October 2009 (UTC)
- Yes, I think so. Template:unicode header takes an {{{about}}} parameter for this sort of info, although maybe a separate {{{fonts}}} parameter would be better. Unfortunately, now that I've installed about twenty kajillion fonts, I'm having a hard time figuring out which one is actually rendering in the browser... (I suppose there must be some sort of tool for this.)
- Code 2000 and Code 2001 cover an awful lot of these. However, there are some ranges that don't appear to have any font support at all -- the Enclosed Alphanumerics/Ideographics extensions, and even Bamum (for which a font has been developed, but doesn't seem to be available online). I haven't found anything for Egyptian yet either -- the "Unicode" Aegyptus font turned out be using the Private Use Area. Given that Unicode 5.2 is very new indeed, hopefully these gaps will be filled in soon. -- Visviva 14:12, 4 October 2009 (UTC)
General comment
[edit]I'm very impressed with the arguments you bring to debates and discussions. Keep it up, I find them enormously helpful. Mglovesfun (talk) 12:24, 3 October 2009 (UTC)
- Aw, shucks. :-) Thanks. And thanks for your work in closing and archiving! -- Visviva 04:36, 4 October 2009 (UTC)
- Widespread admiration and respect is not the same as worship. Don't let it go to your head! DCDuring TALK 14:14, 4 October 2009 (UTC)
- "Widespread admiration and respect", huh? Excellent! >:-) -- Visviva 14:29, 4 October 2009 (UTC) FTR, I have serious doubts about the widespreadness of any respect, let alone admiration, for someone (i.e. me) who mostly drifts aimlessly from one half-finished project to another. But all the same, my head is now swollen to at least twice its normal size. Perhaps you'd better stop now.
- You've gotten some wonderful things done and, perhaps more importantly, you have an ability to raise the plane of discussion consistent with both the values and procedures of the place and to create a high standard for entries. DCDuring TALK 14:36, 4 October 2009 (UTC)
- "Widespread admiration and respect", huh? Excellent! >:-) -- Visviva 14:29, 4 October 2009 (UTC) FTR, I have serious doubts about the widespreadness of any respect, let alone admiration, for someone (i.e. me) who mostly drifts aimlessly from one half-finished project to another. But all the same, my head is now swollen to at least twice its normal size. Perhaps you'd better stop now.
- Word. Heck, mega-word. —RuakhTALK 14:51, 4 October 2009 (UTC)
{{ko-usage-unicode}}
[edit]Do you think we can convert these transclusions to use {{character info}}
? --Bequw → ¢ • τ 16:01, 14 October 2009 (UTC)
- Yes, all the information should be compatible with the structure and purpose of
{{character info}}
. That icky navigation template can be merged in as well, I think, with only moderate pain and suffering. I am now glad that I never did more than a few test cases. :-) Will draft{{Hangul Syllables character info}}
shortly. (Is it worth creating an SVG for each syllabic block, I wonder? We have the technology...) -- Visviva 16:14, 14 October 2009 (UTC)
- That is some ick. I'd say if the syllabic blocks are too big than they should be separate since
{{character info}}
(focusing on encoding details) is above the language sections. I'll let you do it though since you know more about this:) --Bequw → ¢ • τ 00:23, 15 October 2009 (UTC)
- That is some ick. I'd say if the syllabic blocks are too big than they should be separate since
- Relatedly do you know why the Unicode appendices are missing some characters? Forinstance U+2150 is missing from Appendix:Unicode/Number Forms and U+214E is missing from Appendix:Unicode/Letterlike Symbols. --Bequw → ¢ • τ 19:10, 15 October 2009 (UTC)
- Just need to be updated, I think. The ones with a non-fullwidth table and titlecase are pre-5.2, and were prepared in a different way that allowed some characters to escape. I've updated these two; I'll see if I can't get to the remaining ones todayishly. -- Visviva 02:56, 16 October 2009 (UTC)
- All should be updated now. -- Visviva 12:35, 16 October 2009 (UTC)
MI calculation on COCA
[edit]What is the formula for this as applied to adj + noun? I am interested in applying it to snappy comeback, which I suspect would have a fairly high score on a measure like that. I didn't see from WP that we had the info for a formula that respected the grammatical restrictions.
10 occurrences of "snappy" + "comeback(s)"; 482 of "snappy"; 3182 of "comeback(s)"; 400MM total words
10/400MM X {log{10/400MM /(482/400MM X 3182/400MM)))? Base 2? What is a good score? DCDuring TALK 03:24, 20 October 2009 (UTC)
- To be honest, I have just been using the MI as output by COCA directly. So for this, I would just search for "snappy" with context "comeback" (settings 0 left, 1 right), sorting by mutual information. That gives me an MI here of 10.44, which meshes with my gut impression of setphraseishness. But you raise a good point; the MI with respect to all adjectives (or all nouns, going the other way) would likely be more relevant than the MI with respect to all words. I'm not sure I trust myself to figure it out, though. -- Visviva 05:29, 20 October 2009 (UTC)
- As much fun as it might have been to figure out, I'm relieved that it is something I can look up. That was a COCA feature I'd missed. I ought to look up their calculation and compare it with my crude one. I'm hoping the difference is just due to their use of base e vs my 2, or something, but they may have taken into account lexical classes (not that I necessarily think their work is perfect, based on the duplicates I see and the sloppy classification in particular instances. I found a conversation on MI on CM's talk page archives with a contributor who was a computational linguist. The technicalities of "doing it right" are daunting, but comparisons within PoS-type pairs are probably good. If it is that easy, maybe I should do the MI for clear and some marginal RfD cases and some of our existing idioms, so we can determine whether it is a helpful inclusion tool for things like snappy comeback, which I find hard to justify otherwise. DCDuring TALK 12:06, 20 October 2009 (UTC)
- BTW, MI for the plural N=5 is higher: 15.41. (N=5 for singular, too) DCDuring TALK 12:17, 20 October 2009 (UTC)
NYT lists
[edit]Are we supposed to do anything with pages where all the entries exist now? —RuakhTALK 15:02, 20 October 2009 (UTC)
- Originally I was planning to keep them around indefinitely, but now I'm thinking deletion is the way to go. I have a different approach in mind for long-term citation banking. I really need to unscrew the current situation (weird bugginess) and finish getting Fraktionary up to snuff. With all of the things I'm trying to avoid in RL right now, you'd think I'd be a bit more on-the-ball here. *sigh* I will try to get on top of these shortly; but if you are inclined to go through them with a buzz-saw in the meantime, feel free. -- Visviva 14:19, 21 October 2009 (UTC)
ylem, wikt, please help
[edit]The Etymology of Modern English ylem now looks as meant, however some usual links are missing as I could not manage (am beginner) some templates. Sorry, please tell me where to find exactly all sorts of templates to copy. Thanks. -- Jacob van Straten 19:06, 20 October 2009 (UTC)
In light of your participation in Wiktionary:Beer parlour archive/2009/September#SI units and abbreviations, please contribute your thoughts to Wiktionary:Votes/2009-12/Proposed CFI exception for SI Units. Cheers! bd2412 T 21:01, 18 December 2009 (UTC)