Saturday 21 February 2015

Ask A Foolish Question

From Project Gutenberg: Robert Sheckley's 1953 short SF story Ask A Foolish Question, which is still worth reading, as well as having quite a bit of modern pertinence. The story concerns various species, including humans, who make pilgrimage to a device called Answerer, "Because Answerer knows everything".


Original illustration, Science Fiction Stories, December 1953

It's well established now that the way you put a question often determines not only the answer you'll get, but the type of answer possible. So ... a mechanical answerer, geared to produce the ultimate revelations in reference to anything you want to know, might have unsuspected limitations.
Despite the date, the story is out of copyright due to a quirk in US copyright that prior to 1992 required explicit renewal of copyright in a work. This was abolished in 1992,  but under current US law, pre-1992 works whose copyright had lapsed didn't go back into copyright after the change. Gutenberg notes: "This etext was produced Science Fiction Stories 1953. Extensive research did not uncover any evidence that the U.S. copyright on this publication was renewed".

It's a good story (except for the built-in sexism of its era - all the characters are "he", even the mechanical Answerer, with one exception of collective unspecified gender) and I'm not the only one to wonder if it was Douglas Adams' inspiration for the computer Deep Thought in The Hitchhiker's Guide to the Galaxy. Anyhow, here it is:

Ask a Foolish Question (Robert Sheckley, Project Gutenberg EBook #33854)

The thrust of the story is that Answerer has a limitation that makes it virtually useless to askers; it's unable to answer outside the asker's assumptions in framing the question. If those assumptions are wrong or incomplete through the asker's primitive mindset, as they generally are, Answerer is not permitted to explain the wider picture. This brings me to a problem that has recently become significant with a more familiar Answerer: Google, as presently implemented. Programmed to be 'smart' in aid of guiding to you what it thinks you want to find, it over-rides the asker's framing of the question.

This works fine if the intended question is obvious, and the framing obviously faulty. If you put in something like hsow me pitchers of kitons, it recognises and/or corrects the mis-spellings, so you'll get links to pictures of kittens. What's perhaps less known is how disruptive this smartness can be in over-riding your input, and even hiding results, when you're perfectly clear about what you want to search for.

I ran into this problem a few days ago while researching the post ATURFUQIL: philanthropy funded by snake oil. A 1903 media journal mentioned that the cough medicine Liqufruta originally came with the story that its recipe was invented by a folksy character called "Mother Job", and I tried Googling to see if there was anything online about this yarn.
  • "Mother Job" gives 143,000 hits - made useless by Google's assumption, despite my use of inverted commas, that I'm looking for phrases like "a mother's job".
  • +"Mother Job" gives the same 143,000 hits. I knew this would be the case. A while back Google abolished the feature that a "+" prefix forced a specific expression.
  • "Mother Job" in Verbatim mode (at the result, choose Search Tools / All results / Verbatim). No count give, but the same results again.
  • "Mother Job" -"mother's job" gives 40,800 hits. Finally, a search string that rejects the unwanted "mother's job"! As it happens, the phrase doesn't turn out to be sufficiently specific. But that's a side issue. The point is that if specify I want to search for "Mother Job", Google should not try to out-think me and assume I might want "mother's job".
That problem was soluble, but it gets more troubling. As it happened, I had a strong suspicion that "Mother Job" might have been inspired by a historical character nicknamed "Good Mother Job", whose name I found in the Nineteenth Century British Library Newspapers archive. In this case, it being a historical topic, Google Books looked the best option.
  • "good mother job" at Google Books gives 7 hits. Most, again, either introduce irrelevant hits for "good mother's job", or give topic-related hits unrelated to the input string. One - here - gave what I was looking for, a piece in The Gentleman's Magazine in 1823 about Eleanor Job, a lady who lived to a great age, and  had acquited the nickname "Good Mother Job" as an army wife who attended the wounded at the Battle of Quebec.
But this is the nasty part:
  • "good mother job" "eleanor" at Google Books gives 28 hits, all about this Eleanor Job, in 1800s publications. With the previous search, Google Books actively hid from me a large number of pertinent hits that should have shown up for the input "good mother job".
When I used Google and Google Books as a research tool - alongside many others - when researching A Wren-like Note, two or three years ago, Google generally could be relied on to search for what I told it to. Now, this has become increasingly weakened, with search results either padded out with irrelevant near-hits, or else thinned out by omitting hits that are only findable unless you tighten up the criteria with some specific keyword (as Sheckley's story concludes, "In order to ask a question you must already know most of the answer"). Google has become a kind of 'fly-by-wire' setup imposing its own decisions on the search.

Of course, only a fool would use Google alone for research, and all database searching has its faults. All archives made text-searchable by OCR of page scans suffer from OCR errors to some extent. For instance, despite their massive usefulness, the Nineteenth Century British Library Newspapers archive and the Isle of Wight County Press archive have rather poor OCR. But errors resulting from deliberate policy to over-ride what the searcher specifies are a different matter.

A good article at medium.com, Never trust a corporation to do a library's job, summarises the problem, Google's increasing focus in favour of the new and sexy. This is happening both in search results, which are skewed toward newer results, and in a general policy of abandoning archiving interests. When your search interests lie strongly in historical materials, this is an annoying and depressing trend.

- Ray

No comments:

Post a Comment