Wiktionary talk:Frequency lists/PG/2006/04/1-10000

From Wiktionary, the free dictionary
Latest comment: 7 years ago by Bcent1234 in topic Summary
Jump to navigation Jump to search

Hmmm... I notice that "def" is at number 111. This looks just a little strange. Has it been vandalized, or should it say something else?Tobycek 10:58, 17 January 2008 (UTC)Reply

No, not vandalized. —Stephen 11:17, 17 January 2008 (UTC)Reply
It is a frequency list of what appear to be words in the Project Gutenberg corpus. There are artefacts such as words hyphenated at line breaks and other things. Robert Ullmann 11:46, 17 January 2008 (UTC)Reply
I noticed that some proper words are not capitalized, update some, I hope this is not an issue, as I got a message about edits. Rick7425 08:30 18 November 2008 (UTC)


Word 214 is "p" (short for "page," I imagine) and "Gutenberg" is around 250. One would think that Gutenberg would run its frequency analysis on the original texts rather than their own edited versions of them. --98.208.141.126 02:24, 1 June 2009 (UTC)Reply

These aren't run by anyone affiliated with Project Gutenburg though; they're under a free license and don't have copyright issues associated with them (afaik). I wonder though if there are copyright issues in using modern, copyrighted texts for these frequency lists. Since it would be storing their text in a database for a short period I tend to assume there are. -- 124.171.169.189 07:06, 6 March 2010 (UTC)Reply

Word #2781 is missing

-bwg 207.122.227.66 14:56, 23 October 2015 (UTC)Reply


Summary

[edit]

In summary: this sucks. "Blockquote" in position 551? Come on...

it's now at 641 surely this is a little better ... (GRIN) Bcent1234 (talk) 13:48, 7 November 2016 (UTC)Reply