User:Pengo/2gram-species
Appearance
Binomials found in books
[edit]The one-hundred most common binomial names found in English-language books.
- Homo sapiens (Animalia, Chordata)
- Escherichia coli (Bacteria, Proteobacteria)
- Staphylococcus aureus (Bacteria, Firmicutes) Staphylococcus
- Candida albicans (Fungi, Ascomycota)
- Pseudomonas aeruginosa (Bacteria, Proteobacteria)Pseudomonas
- Mycobacterium tuberculosis (Bacteria, Actinobacteria) Mycobacterium
- Saccharomyces cerevisiae (Fungi, Ascomycota)
- Drosophila melanogaster (Animalia)
- Zea mays (Plantae, Tracheophyta)
- Bacillus subtilis (Bacteria, Firmicutes) Bacillus
- Haemophilus influenzae (Bacteria, Proteobacteria) Haemophilus
- Pneumocystis carinii (Fungi, Ascomycota) Pneumocystis
- Salmonella typhimurium (Bacteria) Salmonella
- Treponema pallidum (Bacteria, Spirochaetes)
- Streptococcus pneumoniae (Bacteria, Firmicutes) Streptococcus
- Phaseolus vulgaris (Plantae)
- Clostridium botulinum (Bacteria, Firmicutes)
- Listeria monocytogenes (Bacteria, Firmicutes) Listeria
- Klebsiella pneumoniae (Bacteria, Proteobacteria)
- Xenopus laevis - African clawed frog (Animalia, Chordata) Xenopus
- Helicobacter pylori (Bacteria, Proteobacteria)
- Neisseria gonorrhoeae (Bacteria, Proteobacteria) Neisseria
- Vibrio cholerae - epidemic cholera (Bacteria, Proteobacteria) Vibrio
- Pisum sativum - pea (Plantae, Tracheophyta)
- Clostridium perfringens (Bacteria, Firmicutes) Clostridium
- Entamoeba histolytica (Protozoa, Not assigned) Entamoeba
- Chlamydia trachomatis (Bacteria, Chlamydiae) Chlamydia
- Streptococcus pyogenes (Bacteria, Firmicutes) *
- Aspergillus niger (Fungi, Ascomycota) Aspergillus
- Mus musculus - house mouse (Animalia, Chordata)
- Nicotiana tabacum - Tabak (Plantae, Tracheophyta)
- Giardia lamblia (Protozoa, Sarcomastigophora)
- Cannabis sativa - Marihuana (Plantae, Tracheophyta)
- Salmonella Typhi ⇒ Salmonella enterica subsp. enterica, serovar Typhi, Salmonella enterica, Salmonella enterica subsp. enterica (Bacteria) *
- Bacillus thuringiensis (Bacteria, Firmicutes) *
- Oryza sativa (Plantae, Tracheophyta)
- Serratia marcescens (Bacteria, Proteobacteria) Serratia
- Vicia faba - broad bean (Plantae, Tracheophyta)
- Neisseria meningitidis (Bacteria, Proteobacteria) *
- Triticum aestivum (Plantae, Tracheophyta)
- Glycine max - soya bean (Plantae, Tracheophyta)
- Bacillus cereus (Bacteria, Firmicutes) *
- Bacillus anthracis (Bacteria, Firmicutes) *
- Hordeum vulgare (Plantae, Tracheophyta)
- Caenorhabditis elegans (Animalia, Nematoda)
- Pinus sylvestris - Scots pine (Plantae, Tracheophyta)
- Staphylococcus epidermidis (Bacteria, Firmicutes) *
- Ricinus communis (Plantae, Tracheophyta) Ricinus
- Aedes aegypti (Animalia, Arthropoda)
- Cryptococcus neoformans (Fungi, Basidiomycota) Cryptococcus
- Neurospora crassa (Fungi, Ascomycota) Neurospora
- Medicago sativa - Lucherne Albastre (Plantae, Tracheophyta)
- Solanum tuberosum (Plantae)
- Ginkgo biloba (Plantae, Tracheophyta)
- Streptococcus faecalis ⇒ Enterococcus faecalis (Bacteria) *
- Clostridium tetani (Bacteria, Firmicutes) *
- Allium cepa (Plantae, Tracheophyta)
- Mycobacterium avium (Bacteria, Actinobacteria) *
- Mycoplasma pneumoniae (Bacteria, Firmicutes) Mycoplasma
- Macaca mulatta - rhesus monkey (Animalia, Chordata)
- Clostridium difficile (Bacteria, Firmicutes)
- Aspergillus fumigatus (Fungi, Ascomycota) *
- Brassica oleracea (Plantae, Tracheophyta)
- Histoplasma capsulatum (Fungi, Ascomycota) Histoplasma
- Rattus norvegicus - Norway rat (Animalia, Chordata) Rattus
- Rana pipiens (Animalia) Rana
- Daucus carota (Plantae, Tracheophyta)
- Arabidopsis thaliana (Plantae, Tracheophyta)
- Mytilus edulis - blue mussel (Animalia, Mollusca)
- Beta vulgaris - Runkelrübe (Plantae, Tracheophyta) - Beta
- Proteus mirabilis (Bacteria, Proteobacteria) - Proteus
- Corynebacterium diphtheriae (Bacteria, Actinobacteria) Corynebacterium
- Schistosoma mansoni (Animalia, Platyhelminthes) Schistosoma
- Helianthus annuus (Plantae, Tracheophyta)
- Aspergillus flavus (Fungi, Ascomycota) *
- Picea abies - Norway spruce (Plantae, Tracheophyta)
- Trichinella spiralis (Animalia, Nematoda) Trichinella
- Bordetella pertussis (Bacteria, Proteobacteria) Bordatella
- Bombyx mori (Animalia, Arthropoda)
- Proteus vulgaris (Bacteria, Proteobacteria) *
- Mycobacterium leprae (Bacteria, Actinobacteria) *
- Borrelia burgdorferi (Bacteria, Spirochaetes) Borrelia
- Canis lupus - gray wolf (Animalia, Chordata)
- Vitis vinifera (Plantae)
- Cyprinus carpio (Animalia, Chordata)
- Lycopersicon esculentum (Plantae) Lycopersicon
- Apis mellifera - honey bee (Animalia, Arthropoda)
- Agrobacterium tumefaciens (Bacteria, Proteobacteria) Agrobacterium
- Papaver somniferum - opium poppy (Plantae, Tracheophyta)
- Sus scrofa - pig (Animalia, Chordata)
- Datura stramonium - Stechapfel (Plantae, Tracheophyta)
- Trifolium repens (Plantae) Trifolium
- Avena sativa (Plantae, Tracheophyta)
- Yersinia pestis - bubonic plague (Bacteria, Proteobacteria)
- Coccidioides immitis (Fungi, Ascomycota) Coccidioides
- Brucella abortus (Bacteria, Proteobacteria) Brucella
- Pinus strobus (Plantae)
- Brassica napus (Plantae, Tracheophyta)
- Lolium perenne (Plantae, Tracheophyta) Lolium
- Pseudomonas fluorescens (Bacteria, Proteobacteria) *
Notes
[edit]- Sorted by the total number of books (or volumes) in which the term is found, with the highest count first.
- Uses Google's 2012 "English (All)" Ngram corpus. This includes scanned English-language fiction and non-fiction books.
- Homo sapiens (#1) was found in 87,380 volumes. Pseudomonas fluorescens (#100) in 6,562.
- 30,019,350,634 lines of Google's 2gram data were parsed to create the list. 0.009% of lines were relevant (i.e. contained a species name). The script completed after running continuously for 6.5 days and downloaded around 120 GB of compressed data.
- If anyone's interested in further processing the generated data or in some variant of it, let me know what you'd like to do so I can send you the relevant file(s) or see what I can do.
- 51 of the 100 listed scientific names are red links on en.wikt (at time of this posting). Twelve of the top 1000 are red links on English Wikipedia, but this does not include any of the top 100 (this list).
- I've generated more specific lists which show only plants or vertebrates. When I have time, I may attempt to generate lists for other categories such as invertebrates, or a list just of genera, or perhaps an auto-updating list only those with missing entries or etymologies, or restricting the search only to fiction.
Inclusion criteria for species found in books:
- Had to be found already capitalized with the modern convention used for binomial names (i.e. the only capital being the first letter of the genera)
- Had to be also be found in the Catalogue of Life. Of CoL's approx 2,485,495 binomial species and synonyms, only 52,669 were found in books.
- Had to appear in a minimum of 40 books (or volumes). This appears to be the minimum to be included in Google's 2gram data.
- Trinomial names (e.g. subspecies) would have appeared as binomial to the search, and counted as such.
- Only books published after 1950 were included in the tally. This is partly to keep it current and partly to keep it fair: some binomials were capitalized differently before this date, so would have been left out in the final tally. (See the note on "botanical works published before the 1950s"). However, volume counts in these notes include the totals of all years available.