Books have been one of the primary resources available to historians when studying the past. Books can describe an event, a person, or provide an insight into a historical figure’s mind. Since the invention of the printing press, the number of published books has been rising exponentially every century. Which makes it harder for historians to sort through all the available information out there. Quantitative Analysis states that it would take a person 80 years to read a corpus of 5,195,769 books; which is only about 4% of total books published. With emergence of modern technology driving the cost of publishing to a record low, how can current, and more importantly future historians sort through the mounds of information?

The Google N-Gram tool is one of the methods of sorting through this data. It allows researcher to look for patterns in the way our culture is using words at certain periods in time. This does not necessarily allow historians to actually read the books, but rather could serve as cultural map for them to use when examining a certain issue, like for example slavery.

Slavery N-Gram

By searching the word “slavery” in the Google N-Gram viewer, we can see that from 1500s to about 1740, the word slavery was used periodically, but was not popular. Then around the time of the American Civil War in 1776, the word becomes a lot more popular, and the trend continues to climb. Past 1920, the word starts to decline, but still remains prominent. This simple analysis can indicate to historians which years were important for anti-slavery movement by looking at the amount of discussion about the topic in a given year. With increasing amounts of information, pattern recognition will help historians narrow down their focus to a handful of potential dates relevant to their research. Furthermore, pattern recognition paints a larger picture about our global society as a whole. It shows us the way our views, values and ambitions progress through time.

The limitation of this tool is that it is a largely English based application, with 361 billion English words and the next largest represented language is French with 45 billion. Considering that majority of the planet is speaking Mandarin, Cantonese and Hindi, we are still missing a huge chunk of information. The hope is that with time, the access to information in all languages will increase, not just English. As mentioned before, N-Gram viewer will not show any detail, but simply identify a pattern of one n-gram or a combination of n-grams.

Another useful tool is Mining the Dispatch which is created by Robert K. Nelson from the University of Richmond. It uses the archive called Daily Dispatch to create, or model, “topics”. Which is essentially taking two words or phrases that would be likely to appear together in the same document. Similarly to the N-Gram viewer, topic modeling is useful on a macro scale to look at larger patterns. Continuing with the topic of slavery, here is one of the topics available from the Mining the Dispatch which shows the graphical representation of fugitive slave posters by year. What is particularly great about this tool, is that unlike the N-Gram viewer, you can examine the actual documents while looking at the chart. A historian could use the N-Gram Viewer to identify a particular time of interest, for example mid – 1800s and then use Mining the Dispatch to take a closer look at what was happening during those years.

I think both of these tools are a great start of historians in beginning to solve the problem of information overabundance, but they are by no means the complete solution. The amount of information will never decrease, it will continue to increase, and historians need to begin adopting tools such as N-Gram Viewer and Mining the Dispatch as part of their regular tool kit.

Advertisements