For most of human history, gathering information was restricted to what a single person or group could both gather and read. Despite new technologies which made the gathering of information easier, history remained limited by its largely narrative nature. Now, with the advent of computer programs with powerful searching capabilities, this is no longer the case. However, as with all quantitative studies, this new tool comes with a cost, namely that of context. The real question, therefore, is how much raw words can truly tell a historian without personal knowledge of their environment.

In order to evaluate the advantages and disadvantages of a new tool, one must look to those that have already begun to use it, in this case 3 particular websites which delve the usefulness of quantitative analysis in the study of literature. The first of these, Google Ngram Viewer, is a tool which looks for specific words within the Google Books corpus and maps out the frequency of their use over time. Mining the Dispatch, a website chronicling the studies of Robert Nelson, scrutinizes the word cluster trends within the editions of the Richmond Daily Dispatch issued during the Civil War in order to analyze changes in social and political life. Finally, Quantitative Analysis is an article published in Science which discusses the values of quantitative analysis of literature and the new methodology of culturomics.

Through the examination of these three websites, several advantages become quite clear. First and perhaps foremost, tools of quantitative analysis allow historians to capitalize on the growing number of available sources. When making judgments based on qualitative study, there is always the risk of missing a crucial contradictory source, but with quantitative analysis this risk is greatly reduced. In addition, quantitative studies, unlike their narrative-based counterparts, are able to map large scale changes, such as those of language and the public consciousness. Finally, the massive scale of the new tools enables them to overcome one of the classic weaknesses of smaller quantitative studies. Population bias, a constant threat when using information gathered by even moderately large surveys, is almost inconceivably unlikely when the corpus being examined is as large as Google Books.

However, this is not to say that quantitative analysis is without flaws. The largest problem with raw data is that it lacks context. Without knowing what sentences, nuances, and ideas surround a particular word in a particular place, it is impossible to make complete judgments on what they mean. Useful conclusions almost always require other sources of knowledge in order to discover why a particular trend occurred, even if quantitative data makes finding the trend easier. In addition, it is impossible to control for exceptions to the regular use of a given rule. Tools like Google Ngram Viewer also tell us nothing about what is being said or thought, only what finds its way into the written record. Finally, quantitative information is equally or even more dependent on the researcher’s interpretation than any other source, since the words have been extracted from the writer’s original intentions and thoughts.

In order to illustrate the advantages and disadvantages of quantitative analysis, I would like to examine two examples. First, let us look at a graph produced by Google Ngram Viewer of references to ‘Sherlock Holmes’.


This graph, while illustrative of the role of other mediums on literature when combined with additional research (the spike in the late 1920s was preceded by several movies about Sherlock Holmes), is completely unintelligible without that or similar knowledge. In addition, the graph, while seemingly significant, is incapable of proving the movies caused the spike in writing about Sherlock Holmes, as it is possible the two had a mutual cause. Another useful example is the following Ngram, which maps the usage of various terms that commonly refer to the war that occurred between 1914 and 1918.


As Quantitative Analysis discussed, N-grams can be useful in mapping out changes in language, but, as Mining the Dispatch talks about in the section on soldiers, they cannot tell us what the words refer to, as in this case it can be seen that the term ‘Great War’ came into use well before 1914.

The frequency words occur can tell us a great deal about what is being discussed, shedding insight on long term developments and taking advantage of the wealth of information now available to historians. However, words out of context are in the end just words. Especially in the study of the past, quantitative data in isolation tells only a partial story, being unable to properly evaluate the way the word is being used and the context it is being used in. Quantitative analysis is a powerful new tool in the historian’s arsenal, but it has a limited capacity without being supported by more traditional methods of research.