Big Data Part 4: Google’s Ngram and the Illusion of Knowledge

If you haven’t heard about Google’s Ngram viewer yet you are in for a treat! Google has created the ability to search the frequency of any given word from 1500-2008! The corpus includes 50 billion words from the 5.2 million books Google has included in this database. Ngram graphically demonstrates how common your search query was in a given period! Its a word nerd’s delight (or maybe a student’s best friend).

I can’t think of many important uses for this tool in the ordinary world, but one use that came to my mind was for parents trying to find a name for their baby. Now you have a sweet tool to quickly check how popular the name you are thinking of calling your kid was both in recent times, and into the distant past. That tells you something useful, sort of…

The graph below shows a comparison of my name with my wife’s and my daughter’s.

It’s fun to know that my wife’s name was super popular near 1800, but pretty irrelevant. Its an interesting curiosity, but this information is probably not going to change anything. Besides the possible academic uses, the average person has no use for such an incredible tool.

Amusing Ourselves to DeathLong before the Internet became what it is, an author named Neil Postman wrote a book called Amusing Ourselves to Death. This is one of my favorite books, both for its prophetic insight and its strong message. In his book, Postman traced the way technology has made it possible for information to be removed from its context. Accordingly, when you create context-free information, information loses its potency, and so we try to create new environments to make information useful again. Some examples include, crossword puzzles, TV game shows, and the ultimate example, Trivial Pursuit. 

In the age of Big Data, does Google’s Ngram amplify the problem of context free information, or does it help alleviate it. I suppose when used properly it might help solve difficult research questions, since you can mine the context of the word for whichever period you are interested in. At the end of the day, it will probably only provide the illusion of knowledge, but you can bet we are going to be seeing quite a few more fancy graphs on blogs and student papers!

Enhanced by Zemanta

Please comment, but note that I reserve the right to delete comments I don't feel are helpful. If in doubt, read my comments policy.

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: