Wiktionary talk:Statistics
Add topic"good" and "bad" entries
[edit]Can we come up with different adjectives here? I think "good" and "bad" may be significantly misleading to the uninitiated. I had proposed "interesting" and "uninteresting", but Dangherous didn't like them. —scs 00:08, 30 June 2006 (UTC)
- How about describing them as "entries with wikilinks" and "entries without wikilinks".
- Also, how about removing the "mostly redirects" comment. It used to be true, but I'm not so sure that it is, now. --Connel MacKenzie T C 05:36, 30 June 2006 (UTC)
- It's not about wikilinks... is it? They are mostly redirects I guess, but up to now I haven't found any decent descreption of what is considered an entry and what not. Since we do have a huge amount of redirects, I expect them to be the majority. Now, "good" and "bad" are the terms that have always been used. Never mind, though, it's just a detail. Call them "empyreal" and "purgatorial" if you like. — Vildricianus 08:46, 30 June 2006 (UTC)
scrunch up a little
[edit]For all namespaces after NS:1=Talk, why not just combine two rows into one, and add a "Talk" column for them? (I'm tempted to suggest that all except the subtotals should have Show/hide type auto-hiding.) --Connel MacKenzie T C 05:38, 30 June 2006 (UTC)
- The show/hide crap will make it complicated, but feel free to play around with the table. — Vildricianus 08:46, 30 June 2006 (UTC)
Take after the French
[edit]This page would be more interesting/helpful if it contained more information in an easier to use format, such as fr:Wiktionnaire:Statistiques. Jade Knight 19:43, 23 October 2006 (UTC)
Spanish and English statistics
[edit]Curiouser and curiouser... we now have more Spanish words than English ones. Beobach972 03:15, 24 October 2006 (UTC)
- Right - on this iteration I did not exclude the "form of" templates. --Connel MacKenzie 03:16, 24 October 2006 (UTC)
Detail
[edit]I'm surprised to have gotten so little feedback on the "Detail" section. Perhaps the explanation is clear enough? Honestly, I expected somebody to ask why the numbers (say, for English) don't add up to get "Total definitions." (The answer is that "real definitions" is exclusive of the others, but something can count as an "inflected form" and as "slang" while actually being only one definition line.) I also kindof expected someone to ask why "total language sections" is so much higher than "real definitions" and so very much lower than "total definitions." I guess that is self-evident? --Connel MacKenzie 20:48, 23 May 2007 (UTC)
Translingual
[edit]What does it refer to exactly? Does translingual mean words which are used in more than one language? DaGizza 23:03, 10 January 2008 (UTC)
- It refers to two main groups of things. 1) Symbols that don't really belong to any language at all (see %). 2) Taxonomic names (some people call them New Latin) that are used across all languages (that use the Roman script) (see (Homininae). SemperBlotto 23:11, 10 January 2008 (UTC)
I also used it on CCC which is initialism for Chaos Computer Club, which works in English and German, not sure if that was right. Mutante 23:16, 10 January 2008 (UTC)
Language codes
[edit]Could we add language codes to this data? I'm going to do so manually right now but it ought to be added to the script that generates this page too. — hippietrail 05:07, 3 February 2008 (UTC)
PAGESINCATEGORY:
[edit]I've converted vi:Wiktionary:Thống kê to use {{PAGESINCATEGORY:}}
for the language breakdown. It'd be a bit more difficult to do that here; for instance, Category:English language doesn't directly contain all English words, so you'd have to add up all the parts of speech. In any event, it'd be a nice extension to the automatically-updated Special:Statistics page. – Minh Nguyễn (talk, contribs) 22:08, 21 May 2008 (UTC)
Statistics update
[edit]Is this supposed to be updated so rarely? The last dump is 50 days old. --Vahagn Petrosyan 20:49, 4 March 2009 (UTC)
- I could be wrong, but I think the question if one of responsibility. Connel took care of this page for a long time, but he has been mostly absent as of late, and not doing the updates. Conrad did it a few times, and certainly has access to fresh dumps. I suggest you nag him. -Atelaes λάλει ἐμοί 20:55, 4 March 2009 (UTC)
please tell me...
[edit]What are "Form-of" definitions, and why has Mandarin only got 80 of them? Can someone please leave a message for me on my talk page about it? Cheers Tooironic 13:45, 21 November 2009 (UTC)
- A "form of" definition consists of an entry that is defined solely as being a "form" of another word. For example, each English noun has a plural "form", and each English verb has a past, past participle, and present participle "form". A Latin verb may have over 100 "forms" (see the links in the inflection table at amō, for example). I suspect Mandarin doesn't have very many "form-of" entries because Mandarian verbs have oly a single form, which is the main entry form. "Form-of" entries exist primarily in languages that conjugate their verbs or inflect their nouns and adjectives. --EncycloPetey 16:58, 21 November 2009 (UTC)
Gloss definitions
[edit]What is meant by "gloss definitions"? - -sche (discuss) 10:21, 8 February 2013 (UTC)
- I think it's a definition that is not a "form-of" definition. Maro 18:46, 15 February 2013 (UTC)
- See here: gloss. It'd be good to add this link to the table header: [[gloss#Noun 2|gloss]]
Fix grammar
[edit]"requests for definitions, this may divide things incorrectly"
This is a comma splice. Please change the comma to a semicolon or add "and" before "this." 2001:18E8:2:1020:1463:E53C:61CD:5659 15:37, 13 June 2013 (UTC)
English lemmata
[edit]In June of 2012, Ruakh counted how many English lemmata Wiktionary covered in three different ways. See here. "Approach 1 gave 298,322; Approach 2 gave 299,516" and approach 3 (which lumped different parts of speech together, rather than considering them separate lemmata) gave 133,470. - -sche (discuss) 04:51, 30 August 2013 (UTC)
How does Latin have more entries and definitions than English?
[edit]How does a long-dead foreign language get more stuff here than the current, wider used, actual language of this wiktionary?-47.20.162.183 00:20, 17 June 2014 (UTC)
- Latin words have loads of inflected forms. — Ungoliant (falai) 00:21, 17 June 2014 (UTC)
- Thanks! :)-47.20.162.183 01:30, 17 June 2014 (UTC)
- I prefer using the gloss definitions column as a measure of how much content we have in a given language. The entries and definitions columns are heavily biased towards languages with complex inflection. Poor English, with its 4~5 inflected verb forms, stands no chance against Latin, which has over 100. — Ungoliant (falai) 01:36, 17 June 2014 (UTC)
- Maybe the gloss definitions column should be first one or should be given prominence in some other way. --Vahag (talk) 08:23, 17 June 2014 (UTC)
- I support that idea. If no one objects I’ll change the format for the next dump. — Ungoliant (falai) 13:10, 17 June 2014 (UTC)
- No objection, but if "gloss definitions" is moved to come after "definitions", the latter should probably be renamed "total definitions" in the interest of clarity. Actually, as long as things are being changed around, could you also put a 1 or something after gloss definitions, so it can be linked to an explanation like this? Given that even I who edit this dictionary had to ask what the term meant, the number of passersby who know what it means is probably small enough to make it worth a footnote. - -sche (discuss) 15:23, 17 June 2014 (UTC)
- I support that idea. If no one objects I’ll change the format for the next dump. — Ungoliant (falai) 13:10, 17 June 2014 (UTC)
- Maybe the gloss definitions column should be first one or should be given prominence in some other way. --Vahag (talk) 08:23, 17 June 2014 (UTC)
- While we’re at it, if there is any other layout change anyone wants to propose, speak up. I’m thinking of moving the data of appendix defs/entries to the same columns as the non-appendix data, since most languages have 0 anyway. — Ungoliant (falai) 15:43, 17 June 2014 (UTC)
- I prefer using the gloss definitions column as a measure of how much content we have in a given language. The entries and definitions columns are heavily biased towards languages with complex inflection. Poor English, with its 4~5 inflected verb forms, stands no chance against Latin, which has over 100. — Ungoliant (falai) 01:36, 17 June 2014 (UTC)
- Thanks! :)-47.20.162.183 01:30, 17 June 2014 (UTC)
- Now that we have categories for every language called "Foo lemmas" and "Foo non-lemma forms", maybe the number of pages in each of those categories for each language could be added to the table. —Aɴɢʀ (talk) 20:35, 21 December 2014 (UTC)
Translation statistics
[edit]I’ll be keeping translation statistics at this page. — Ungoliant (falai) 15:54, 28 July 2015 (UTC)
- I'm gonna bookmark that :) —Aryamanarora (मुझसे बात करो) 22:05, 8 December 2015 (UTC)
- Good stats, thanks! Russian at #2, after Finnish (60,823 translations). Not bad at all! --Anatoli T. (обсудить/вклад) 23:13, 8 December 2015 (UTC)
- Finnish is a surprise to me - and then there's Hindi, somewhere in the 40's. —Aryamanarora (मुझसे बात करो) 21:39, 3 January 2016 (UTC)
- Good stats, thanks! Russian at #2, after Finnish (60,823 translations). Not bad at all! --Anatoli T. (обсудить/вклад) 23:13, 8 December 2015 (UTC)
Statistics on Sindhi language
[edit]The information on Sindhi language is NOT correct even as of 2-12-2015. There were more than 1000 definitions in Sindhi wiktionary on that date. Please fix the error.
Aursani (talk) 09:57, 21 December 2015 (UTC)
- This information is about English Wiktionary only. — Ungoliant (falai) 13:50, 21 December 2015 (UTC)
Statistics on lemmas and non-lemmas
[edit]I think it would be useful if the statistics included measures on how many lemma and non-lemma entries have been created or removed. Right now there is only a generic "entries" column, but that includes all entries, and I don't know if it distinguishes cases where a new lemma POS section has been added to a page that already has a section for the current language. That is what I would consider an "entry", a single page can have multiple entries in one language. —CodeCat 21:35, 22 February 2016 (UTC)
Lemmas pie chart
[edit]Numbers from subcategories of Category:Lemmas by language, code copied from mw:Extension:Graph/Demo/CategoryPie:
The chart updates automatically. Would it make sense to add this to the page? --Yair rand (talk) 04:52, 24 February 2016 (UTC)
- Why does, eg, Spanish have 47,817 lemmas, German have 42,014, but Spanish doesn't show up on the chart? DTLHS (talk) 04:57, 24 February 2016 (UTC)
- Hm. Might be an API limitation. It seems to be ignoring all languages past the first 500 in the list. I'll go ask the author of the chart template if there's any way to fix it. --Yair rand (talk) 05:07, 24 February 2016 (UTC)
- Apparently it can't find more than 500 subcategories at a time, and it can't automatically just get the largest categories. I've changed it to a manual list of the largest 150. Unfortunately, this won't automatically add in new languages that enter the top 150. --Yair rand (talk) (not logged in) 14:34, 24 February 2016 (UTC)
how these column headers correspond to "etymology"s?
[edit]gloss definitions, entries, gloss entries, form definitions, total definitions - which of them is "etymologies"? --Qdinar (talk) 12:58, 6 February 2020 (UTC)
Pageview stats
[edit]@Ungoliant MMDCCLXIV I added some links to Wikimedia's pageview stats in Special:Diff/51268749/51268782, but it looks like they got removed (by a script?) in Special:Diff/57992037/58655295. – Jberkel 18:10, 17 February 2020 (UTC)
- That was my fault. I accidentally edited WT:Statistics instead of WT:Statistics/generated when I added this month’s stats. — Ungoliant (falai) 00:52, 18 February 2020 (UTC)
Amharic Wiktionary counter
[edit]Just in the first page of list of words starting with "a" there are 345 words (look at here) But still the counter of Amharic says 384 content! What is this madness! Abreham97 (talk) 00:34, 16 November 2021 (UTC)
Update?
[edit]How can we update the stats on Wiktionary:Statistics/generated (currently from the 2022-01-01 dump)? A455bcd9 (talk) 12:44, 13 March 2022 (UTC)
- Same issue for May :) A455bcd9 (talk) 07:22, 29 May 2022 (UTC)
- poke @Ungoliant MMDCCLXIV. Would be amazing to have the code on GitHub or GitLab so that anyone can generate and update this page. A455bcd9 (talk) 07:23, 29 May 2022 (UTC)
- @Ungoliant MMDCCLXIV Hi, I hope all is well. Could you please update the statistics or create a document explaining how to generate them so that anyone can run them in your absence? Thanks for any help you can provide. A455bcd9 (talk) 07:34, 14 July 2022 (UTC)
- poke @Ungoliant MMDCCLXIV. Would be amazing to have the code on GitHub or GitLab so that anyone can generate and update this page. A455bcd9 (talk) 07:23, 29 May 2022 (UTC)
I must confess that I, too, am becoming slightly impatient. On the other hand, Ungoliant may just have quit, and we can force no user to stay active and keep things up to date. Maybe raise the issue centrally? Steinbach (talk) 14:36, 17 October 2022 (UTC)
- I'm working on a replacement for Ungoliant's stats, the code will be hosted on gitlab/toolforge, to avoid this situation. However, it's not quite ready yet. – Jberkel 15:15, 17 October 2022 (UTC)
- Thanks for your help @Jberkel. FYI the French Wiktionary has detailed statistics and they would be happy to help. A455bcd9 (talk) 13:54, 1 November 2022 (UTC)
- How is it going, @Jberkel? Steinbach (talk) 18:03, 3 January 2023 (UTC)
- First iteration is now done. Jberkel 04:23, 11 March 2023 (UTC)
- Thanks! A455bcd9 (talk) 10:50, 11 March 2023 (UTC)
- Thanks indeed! Btw, what explains the apparent drop in the number of languages? Steinbach (talk) 14:56, 11 March 2023 (UTC) O, and can you provide the gitlab link? Steinbach (talk) 14:58, 11 March 2023 (UTC)
- The repo: gitlab. It contains a lot more code than just the stats. The drop in language is probably because reconstruction and appendix namespaces are not included. This is a limitation of the HTML dumps, see Wiktionary:Statistics#cite_note-1. Jberkel 21:57, 11 March 2023 (UTC)
- Thank you. I hope someone (either you or someone else) can fix that. Appendix-only languages were already hard to find, this makes them even less visible. Steinbach (talk) 11:25, 12 March 2023 (UTC)
- I have another question for you. Is it right that there are no new languages? After sorting the table for "change in number of gloss definitions" I noticed that several languages had gone up from very few (often enough one or two) to a decent number, but none where entirely new (that is, change in number of gloss definitions equals number of gloss definitions). How does your script handle any new language headers, @Jberkel? Steinbach (talk) 12:26, 16 March 2023 (UTC)
- I didn't include a diff of new language headers in the output, but there were probably new headers. The stats generation tool reads all L2 headers and transforms them into language codes based on the data of Module:languages. Languages not listed in this module are ignored (usually typos or errors). For the next run I can include a diff. Jberkel 07:47, 24 March 2023 (UTC)
- Regarding the missing Appendix/Reconstruction languages, it sometimes helps to signal your interest on phabricator (subscribing to the task etc) in order to get things moving a bit faster there. Jberkel 21:52, 25 March 2023 (UTC)
- I didn't include a diff of new language headers in the output, but there were probably new headers. The stats generation tool reads all L2 headers and transforms them into language codes based on the data of Module:languages. Languages not listed in this module are ignored (usually typos or errors). For the next run I can include a diff. Jberkel 07:47, 24 March 2023 (UTC)
- First iteration is now done. Jberkel 04:23, 11 March 2023 (UTC)
October update
[edit]Would it be possible to get this page updated for the October database dump? It's still showing the information for the July dump. Thank you! Vergencescattered (talk) 16:37, 16 October 2024 (UTC)
December update
[edit]@Jberkel Could you update the statistics for December? Thanks in advance. — Fenakhay (حيطي · مساهماتي) 07:04, 17 December 2024 (UTC)