Textual analysis is the studying of what, when, and how often certain words appear in certain contexts and, through that, drawing some conclusions about the politics, culture, language, social norms of the period or many other topics. Before, the scope of textual analysis was limited due to how long it would take for a person to conduct it, but now, using computers, textual analysis can a scope and scale beyond anything ever fathomed before. Tools such as the Google Ngram Viewer and Mining the Dispatch take advantage of such abilities and allow us to explore history in a unique way, but are they really all that useful? Let’s explore them to find out.

Mining the Dispatch takes digitized versions of the Richmond Daily Dispatch, a daily paper published in Richmond during the American Civil War and uses Topic Modeling to try and separate the various articles into topics by detecting what words are used in each. By dividing the articles into topics, you can then graph how common each topic was over the time for the entire run of the paper. The problem with how Mining the Dispatch is set up is that, as stated in the intro, the software dictated what topics were used, instead of the historian, causing for some fairly broad topics. It was also fairly difficult to find the purpose of the project and what they were setting out to do. Graphs weren’t labeled very well either.

The other issue I had with it is that it often took a while to get more than a couple of duplicating entries for each topic. For example, the “Fugitive Slave Ads” page was entirely dominated by ads offering a $10 reward for returning a slave named Parthena and a $100 reward for returning a slave named Sam. The topic assignments were also fairly odd and inconsistent. For some reason,  each ad looking for Sam was given different topic assignments, despite their content being identical.

mtd

Why are the topics so different than the first one?

Why are the topics so different than the first one?

The Google Ngram Viewer shows how often a certain phrase appears in Google’s corpus of digitized books between a specified time period. It is pretty well laid out and easy to use, which was nice. To play around with it a little, I decided to see input “Germany, France, Britain, United States”. This is the result I got.

ngram

Now while this does give some interesting data, it is limited in what it can provide. The graph shows the rise in the influence of the United States over time and the falling influence of France and Britain. Both the US and Germany peak twice during the two World Wars, which would make sense. However, the problem with this data is similar to the problem with nearly all textual analysis tools. They don;t give any context. The data provided by the Ngram viewer does not give any context as to how, for example, “United States” or “Germany” are used in books. Are they pro-American, anti-American, pro-German? We have no context, and without any context, we are lacking a lot of out research.

The third thing I explored was the Science Magazine article “Quantitative Analysis of Culture Using Millions of Digitized Books”. This article brought up some great points on how textual analysis can be used to examine the evolution of language and grammar use, which would, indeed, be a great use for the technology. But, I did find a fair bit of their research be be either long-winded or simply difficult to follow.

Textual analysis can be a useful tool for historians and will probably become more so as the amount of digitized material grows. However, since it doesn’t give much context to work with, it is limited and would need to be supplemented by other forms of research.

Advertisements