Template:R:GNV/documentation

	Documentation for Template:R:GNV. ^[edit]
	This page contains usage information, categories, interwiki links and other content describing the template.

Use this template to link to the Google Books Ngram Viewer, showing time-dependent graph of word form or spelling frequencies.

Parameters

The following parameters are used by this template:

|1=: The term or terms (comma-separated) to be graphed.
|2=: A display override for the term or terms.
|corpus=: The index of the corpus to be shown, see available corpora. Defaults to 26 i.e. English.
|startyear=, |start=: The year to begin the graph at. Defaults to 1800.
|endyear=, |end=: The year to end the graph at. Defaults to the newest available (see available corpora).
|caseinsensitive=: Whether to search with case insensitivity on or not. Any value taken to mean yes. Defaults to no.
|nodot=: By default, the template adds a full stop (period) at the end of the citation. To suppress this punctuation, use |nodot=1 or |nodot=yes.

Examples

Here are some:

* {{R:GNV|indecipherable, undecipherable}}

indecipherable, undecipherable at the Google Books Ngram Viewer.

* {{R:GNV|ad lib, extemporal, extemporary, extemporaneous, extempore, extemporized, impromptu, improvised, improviso, off-the-cuff, offhand|some of the synonyms}}

some of the synonyms at the Google Books Ngram Viewer.

* {{R:GNV|телепрогра́мма, телепереда́ча, телешо́у|corpus=36}}

телепрогра́мма, телепереда́ча, телешо́у at the Google Books Ngram Viewer.

* {{R:GNV|malen, streichen|corpus=31}}

malen, streichen at the Google Books Ngram Viewer.

* {{R:GNV|colour:eng_gb_2019,colour:eng_us_2019}}

colour:eng_gb_2019,colour:eng_us_2019 at the Google Books Ngram Viewer.

* {{R:GNV|croissanterie|corpus=30|start=1900}}

croissanterie at the Google Books Ngram Viewer.

* {{R:GNV|color/colour}}

color/colour at the Google Books Ngram Viewer.

* {{R:GNV|states of *}}

states of * at the Google Books Ngram Viewer.

* {{R:GNV|states of *_NOUN}}

states of *_NOUN at the Google Books Ngram Viewer.

* {{R:GNV|*_ADJ argument}}

*_ADJ argument at the Google Books Ngram Viewer.

* {{R:GNV|cook_NOUN,cook_VERB}}

cook_NOUN,cook_VERB at the Google Books Ngram Viewer.

* {{R:GNV|cook_INF a meal}}

cook_INF a meal at the Google Books Ngram Viewer.

* {{R:GNV|cook_INF *_NOUN}} -- does not work

cook_INF *_NOUN at the Google Books Ngram Viewer.

Available corpora

A list (with descriptions) is also available at https://books.google.com/ngrams/info.

Corpus	2019 index	2012 index	2009 index	Shorthand (followed by _ and year)
American English	28	17	5	eng_us
British English	29	18	6	eng_gb
Chinese (simplified)	34	23	11	chi_sim
English	26	15	0	eng
English Fiction	27	16	4	eng_fiction
English One Million	N/A	N/A	1	eng_1m
French	30	19	7	fre
German	31	20	8	ger
Hebrew	35	24	9	heb
Italian	33	22	N/A	ita
Russian	36	25	12	rus
Spanish	32	21	10	spa

Limitations

Google Ngram Viewer suffers from some limitations: (1) scanning errors (scannos); (2) the corpus increasingly becoming biased toward academic publications with the passage of time; (3) each book has the same weight regardless of popularity; and (4) wrong assignment of the year of publication. Some of the problems are covered below. The scanno problem does not seem to completely invalidate the results, especially for English and longer words. The severity of the problems depends on what we want to measure, whether cultural change over time or relative frequencies of word forms.

Bias toward academic publication

The search figure, Figure at the Google Books Ngram Viewer. reveals the problem: capitalized Figure rises to the top during 20th century, suggestive of use in captions of academic literature. When we restrict the corpus to English Fiction, the problem disappears: figure, Figure at the Google Books Ngram Viewer.

Long s vs. f

The search fuck at the Google Books Ngram Viewer. shows the problem: there is no way there were so many instances of fuck before 1800; rather, these are likely scannos of suck caused by the use of the long s (ſ). On the other hand, this problem does not occur after 1820.

Dropping hyphens

The searches anti-American, (antiAmerican*10) at the Google Books Ngram Viewer. and google books:"antiAmerican" show the problem: scanning sometimes drops the hyphen. There is no way there are so many occurrences of antiAmerican and the Google Books search confirms that. Other examples: (exteacher*10),ex-teacher at the Google Books Ngram Viewer, (nonEnglish*10),non-English at the Google Books Ngram Viewer.

Some hyphens are dropped when used within an unbroken line, other are dropped at a line break, which is ambiguous as for the presence of hyphen.

Dropping spaces

The searches thebook, nonchocolate at the Google Books Ngram Viewer and google books:"thebook" show the problem: the space was dropped and the result is as common as the legitimate nonchocolate. On the other hand, the book,(thebook*5000) at the Google Books Ngram Viewer shows this happens relatively rarely.

Joining different columns

The search google books:"misargument" shows the scanning problem: there are very few occurrences of misargument and some of the found items result from joining parts from different columns in multi-column publications.^[1] This one example does not make it into GNV statistics, though. It is unclear this could significantly impact frequencies of common words.

Changes in capitalization

There is no reason to think there are spurious changes in capitalization. anti-American,(antiamerican*1000) at the Google Books Ngram Viewer looks plausible, unlike anti-American, antiAmerican at the Google Books Ngram Viewer.

Links

Hyphens

As of Oct 2022:

To search for hyphenated phrases, do one of the following:
- Hope that GNV will continue working like before, e.g., non-standard, nonstandard at the Google Books Ngram Viewer.
- Make sure to enter spaces around the hyphen and use [ ] around the term, e.g. [non - standard,nonstandard] at the Google Books Ngram Viewer.
Google will often pick hyphenated phrase as non-hyphenated, whether in the middle of the text or at a line break.
- Thus, comparisons like exteacher, ex-teacher at the Google Books Ngram Viewer show results much more favorable to exteacher than reality. One needs to check in Google Books what is actually found on the scanned pages. It still shows convincingly exteacher is rare; it is in fact much rarer.
- anti-American, antiAmerican at the Google Books Ngram Viewer shows too many hits for antiAmerican. A similar result is for anti-German, antiGerman at the Google Books Ngram Viewer.
You can plot the frequency ratio: {{R:GNV|nonstandard/[non - standard]|nodot=1}}: nonstandard/[non - standard] at the Google Books Ngram Viewer.