Wiktionary:Frequency lists/English/Project Gutenberg
These lists are the most frequent words, when performing a simple, straight (obvious) frequency count of all the books found on Project Gutenberg. These are mostly English words, with some other languages finding representation to a lesser extent. Many Project Gutenberg books are scanned once their copyright expires, typically book editions published before 1923, so the language does not necessarily always represent current usage. For example, "thy" is listed as the 280th most common word. Also, with 24,000+ books, the text of the boilerplate warning for Project Gutenberg appears on each of them.
Here are the top 100 words from Project Gutenberg texts in alphabetical order:
- a
- about
- after
- all
- an
- and
- any
- are
- as
- at
- be
- been
- before
- but
- by
- can
- could
- did
- do
- down
- first
- for
- from
- good
- great
- had
- has
- have
- he
- her
- him
- his
- I
- if
- in
- into
- is
- it
- its
- know
- like
- little
- made
- man
- may
- me
- men
- more
- Mr
- much
- must
- my
- no
- not
- now
- of
- on
- one
- only
- or
- other
- our
- out
- over
- said
- see
- she
- should
- so
- some
- such
- than
- that
- the
- their
- them
- then
- there
- these
- they
- this
- time
- to
- two
- up
- upon
- us
- very
- was
- we
- were
- what
- when
- which
- who
- will
- with
- would
- you
- your
These wikified terms can be copied to other language wiktionaries; this is what they are intended for. If you do, please add an interwiki link onto the page here.
Lists by date
[edit]16 April 2006
[edit]10 October 2005
[edit]16 August 2005
[edit]Approximately 24,197 files, 1,712,082,956 words, 70,756.0 average words per file, from which were gleaned about 9,053,310 unique "words".