Modern technology is now being used to change the way we do research and looking at data. Vast amount of information is provided just by searching it all up. The real problem nowadays is not having enough research material but actually effectively using all the information we can get.

In order to help combat these problems, new methods of looking at data are being formulated. Program such as N-gram Viewer and Mining the Dispatch are examples newer tools to looking at information. N-gram Viewer and Mining the Dispatch are tries to sort and present the information through techniques called textual analysis and topic modeling.

N-Gram Viewer

N-Gram Viewer is a textual analysis tool that provides visual representation of how often certain word is used in given time period. This is search is done through finding out how frequently the chosen word appear in collection of books provided by Google Books. N-gram viewer has simple and easy to use design. In order to graph the data, you would just type in a word and decide on the time period. Using N-gram view, user can see the “history” of the word.

donut n-gram

Looking at the example of “donut”, it shows general timeline of the use of word donut. According to the N-Gram Viewer, word donut wasn’t in use until 1860s, fell out of use until 1930s and became increasingly popular.

In addition to general search method, N-Gram Viewer offers more interesting way to use the graph using their built in functions. N-Gram Viewer can used to find a number of times sentences starts or ends with certain words using _START_ or _END_ function. It can also be used to find out how often two words come up together using => function. Aside from the techniques mentioned, N-gram viewers could also use Google search operators (“+”, “-“, “*”, “/”, and “:”) and find out how many times the word was used as either noun or verb. This isn’t the full list, but some of more interesting ones offered.

Above shows how => is used to find food is dependent on the word cheap. Examples of words which are considered are cheap food, cheap delicious food, or cheap Chinese food.

Above shows how => is used to find food is dependent on the word cheap. Examples of words which are considered are cheap food, cheap delicious food, or cheap Chinese food.

While N-Gram Viewer provides interesting way to look at information, I feel that the lack of way to get more specific information makes the tool feel very lacking. It does not tell you anything about the sources used to make the graph. You cannot find the name of the book or the context of the word. Furthermore, we have no real information about the database, Google Books, which is used to generate the graph. Google Books does not hold all the books in the world. Since the database is not complete, the graph presented might provide misleading information. In the first example with donut, it states that the word donut has not been use since the 1861. However, the word donut has been in use as early as 1803 in English cookbooks. Also, if we were to use the graph to compare two different words, we would have no way of knowing if the information is accurately represented; Google Books might have more collection on certain topic of books skew the graph.

Mining the Dispatch

Mining the Dispatch does same function as N-Gram Viewer except the information is limited to 1860 to 1865 American Civil War. It provides information on the American Civil War subjects such as on soldiers, slavery and economy. One feature which is given in Mining the Dispatch but not in N-Gram Viewer I really liked is Exemplary Articles. This section shows Daily Dispatch articles which were published during the selected month from the graph and ranks them according to the relevance to the topic chosen.

Mining the Dispatch does same function as N-Gram Viewer except the information is limited to 1860 to 1865 American Civil War. It provides information on the American Civil War subjects such as on soldiers, slavery and economy. One feature which is given in Mining the Dispatch but not in N-Gram Viewer I really liked is Exemplary Articles. This section shows Daily Dispatch articles which were published during the selected month from the graph and ranks them according to the relevance to the topic chosen. Another textual analysis I really liked which had similar function was Voyant Tool. Voyant Tool is combination N-Gram Viewer and Wordle, and does analysis on a single document.  It has a neat feature of provide the context of the word used in a document. I can understand how this might be difficult task in N-Gram Viewer since there could be millions of sentences from all the books. But similar feature can be used to find what books are about instead of sentences.

voyant

Example of Voyant Tool using the review of The Donut: Canadian History. The analysis displays Cirrus, Reader, Word Trend and Keyword in Context and Word in Document categories.

Conclusion

Although N-gram viewer is intuitive way to look at information, there are room for improvements. Even by just incorporating many features from similar tools, N-Gram Viewer can improve what it can offer to the users. While other tools provide more comprehensive data, the database is not as big as N-Gram Viewer’s. While these methods of researching provides interesting and new way to look at information, I believe improvements are needed. It could be used to get some general information or perspective on topics but would be inadequate for serious research. However, I still see potential textual analysis tools such as N-Gram Viewer and refining these tools might perhaps change the way we do research in the future.

Advertisements