Google Books broken

The Oxford English Dictionary currently has the first citation for "spaceship" (aka "space-ship" or "space ship") as

1894 J. J. ASTOR Journey in Other Worlds I. vi. 93 ‘What sort of space-ship do you propose to have?’ asked the vice-president.

Suppose I want to investigate online texts to see if I can antedate this. It looks straightforward: go to Google Books, advanced search, and search for spaceship OR "space-ship" OR "space ship" with a date bracket 1800-1900. Here's the search. A few of the results:

  • Fodors Walt Disney World With Kids 2008. Kim Wright Wiley, 1899 [?]
  • Gravity from the ground up. Bernard Schutz. 1899 [No, this is a 2003 book]
  • Soviet Literature, Vols 1-4 - 1870 [except the found text refers to cosmonauts and astronauts, so the date is clearly wrong]
  • Sky Doll Spaceship 1, Canepa / Barbucci, 1899 [A modern comic]
  • The publishers weekly, Volume 162, 1873 [except the result refers to The Space Ship Under the Apple Tree (1952)

And so on. Some errors are so systematic that you can see what's happened. Hits for journal articles commonly give the journal founding date rather than that of the actual article. 1899 appears to be a standard placeholder for unknown dates. The problem, however, is sporadic. For example, researching the post on Victorian waterbeds last year, a similar search appeared fine - but knowing the possibility of error, I didn't trust the dates except when I could click through and confirm in the texts themselves (granted, this is what one should do for rigour when chasing citations, but the date errors shouldn't contaminate the initial search so much as to make it hard even to find candidate texts).

I've been noticing problems like this for some time, so I'm pleased to see Geoff Nunberg at Language Log tackling the issue: Google Books: A Metadata Train Wreck. Nunberg produces evidence - see his PDF presentation for the Google Books Settlement conference - that Google has been considerably slack in the processes for gathering metadata about its online historical texts (ie dates, categories, authorship, etc). For me, it's merely a nuisance; but it makes Google Books - despite being a marvellous resource if you're just generally seeking a text - pretty unfit for purpose for serious researchers such as historians, philologists and linguists. Nunberg is concerned that this should be the case for

what will probably wind up being the universal library for a long time to come, with no contractural obligation, and only limited commercial incentives, to get it right.

Addendum The comments to Geoff Nunberg's LL post are worth reading. Jon Orwant of Google has just added - with a generous admission that the metadata is indeed very faulty - a detailed analysis of Nunberg's examples showing that in the majority of cases the bad data was in the source information.

Addendum 2 I found an earlier citation for "spaceship", but via the 19th Century British Library Newspapers database.

"The Apergy once mastered, it was comparatively easy to anticipate and improve upon the ideas of a trifler like Jules Verne, and build a space-ship".
- A STRANGE JOURNEY, The Pall Mall Gazette, London, England, Tuesday, January 20, 1880; Issue 4652

This is from a review of Percy Greg's 1880 novel Across the Zodiac, concerning a trip to Mars. "The Apergy" is an energy source. See the update Early spaceships and Percy Greg.

  1. This is a wider problem than Google Books.

    I frequently find the use of original publication date (though not, admittedly, a dummy marker date) as edition date in, of all places, British Library and Library of Congress records.

    During my recent referencing of Professor Branestawm, for instance, a poll of the BL produced the following short form record:

    Hunter, N. and W.H. Robinson,
    The incredible adventures of Professor Branestawm.
    1933, Harmondsworth: Puffin.
    ISBN 0140367764 (pbk)

    The actual edition date (1999) was given in the notes field.

    The original edition was also listed, under the same date:

    Hunter, N. and W.H. Robinson,
    The Incredible Adventures of Professor Branestawm. With 76 illustrations by W. Heath Robinson.
    1933, London: John Lane.

  2. Indeed. The discussion of this is ongoing at Geoff Nunberg's post.