Thursday, 26 January 2012

When pufh comef to fhove

I just saw an interesting example of the kind of analysis that needs caution when using Google Books Ngram Viewer.

On Yahoo! Answers there was a question asking what word people used instead of "push" before 1800.  This slightly odd query came on the evidence of the Ngram Viewer graph, which appears to show virtually no use of the word before the very late 1700s, then a sudden rise into significant use around 1800.

"push" 1750-2008

The explanation is simple, though the precise timing is hard to explain. Prior to 1800 or so, printed texts used the "long s" character "ſ" (a.k.a. "medial s" or "descending s"), which Google's OCR algorithm interprets as "f". So if you look instead for "pufh", you find all those missing pre-1800s examples of "push". The transition between the two is striking ...

"push" / "pufh" 1750-2008
"push" / "pufh" 1750-2008 (detail)
... and I'm not sure if anyone knows what was happening in the publishing/printing world to account for such a rapid shift.  As the Wikipedia article describes, it happened at different times in different countries, but just as rapidly as in English.

This phenomenon also explains the strange bipolar Ngram Viewer graph for "fuck".

"fuck" 1750-1830

The post-1960 hits are real. The pre-1800s ones don't represent some robust pre-prudish age, but occurrences of "suck" printed as"ſuck".

- Ray

1 comment:

  1. Oh no! You've given me a new toy to play with when I'm supposed to be doing something else! This is great - you can have hours of fun with it. The transition in American English is much less clear cut than English English. Also, "philosophy, philofophy" between 1600 and 2000 gives some very odd results (s high, both low, f high, s high). But unlike push, there is no overall growth trend -- obviously people today are pushier but no more philosophical than their forebears...