Wiktionary:Frequency lists/English/TV and Movie Scripts (2006)
Here are frequency lists comparable to the Gutenberg ones, but based on 29,213,800 words from TV and movie scripts and transcripts.
Here's a fuller explanation of how the list was generated and its limitations: Wiktionary:Frequency lists/TV/2006/explanation.
Here are the top hundred words (from TV scripts) in alphabetical order:
- a
- about
- all
- and
- are
- as
- at
- back
- be
- because
- been
- but
- can
- can't
- come
- could
- did
- didn't
- do
- don't
- for
- from
- get
- go
- going
- good
- got
- had
- have
- he
- her
- here
- he's
- hey
- him
- his
- how
- I
- if
- I'll
- I'm
- in
- is
- it
- it's
- just
- know
- like
- look
- me
- mean
- my
- no
- not
- now
- of
- oh
- OK
- okay
- on
- one
- or
- out
- really
- right
- say
- see
- she
- so
- some
- something
- tell
- that
- that's
- the
- then
- there
- they
- think
- this
- time
- to
- up
- want
- was
- we
- well
- were
- what
- when
- who
- why
- will
- with
- would
- yeah
- yes
- you
- your
- you're
Here they are in frequency order:
[edit]- Top 1,000 words cover 85.5% of all words (24,981,922 / 29,213,800).
- Top 10,000 words cover 97.2% of all words (28,398,152 / 29,213,800).
- This is a third of all the unique words. The rest were used 5 or fewer times each.