Dipping a toe into Digital Humanities: word clouds

The term “digital humanities” has always confused me. When I first heard it, I assumed it was what I was already doing – applying digital approaches to humanities research and teaching.

But no. Digital Humanities seems to be about applying certain elements of computer science to the humanities, with emphasis on quantification. At least, that’s how I’d put it. Wikipedia says, “the systematic use of digital resources in the humanities, as well as the reflection on their application”. Stanford University says, “Digital humanities foster collaboration and traverse disciplines and methodological orientations, with projects to digitize archival materials for posterity, to map the exchange and transmission of ideas in history, and to study the evolution of common words over the centuries.”

[I am treading carefully here, since the term is now used by people who have professionalized the subject. Like most new disciplines, it’s already questioning itself.]

When I come across the term, it usually involves word counts, tallying the number of times a word or words is used in a text. I think that makes Wordle one of the first digital humanities tools. Wordle was an applet created by Jonathan Feinberg ten years ago. It counted the number of times a word appeared in a text, and created a tag cloud, with more frequent terms in larger text.

So using another progam, Jason Davies’ Word Cloud Generator, let’s see what happens.

For example, here’s the Declaration of Independence using 400 most-used words:

 

There are many uses for such an approach. I can compare it, for example, to Magna Carta.

where there is far less about the people.

Even without a word cloud, one can use a basic word search of one can get a whole document in a browser window. So if I have the declaration here, and I do a “find” for the word people, it tells me it’s there 10 times.

So today (stand back!) I’m going to apply this method to HG Wells’ autobiography.

The 19,332 words that result after removing the table of contents and the index took 7 minutes to process (with all words counted):

Hmmm. “Peace” is big, and “Nazi” is small. “Work”, “world”, “now”, “man” “life” are all big. “New” and “still” are the same size. There is no representation of the personality of the piece, which is part of the purpose, except in the words themselves. But really, not very helpful. What if I limit results to the top 25 words?

A little better, but hardly revealing.

Fiction, however, often fares better. That’s why it’s digital humanities, not digital biography. Taking The Sea Raiders by HG Wells at 25 words, we get:

Tentacles! Creatures! Well, that’s more fun, anyway.

Given the current environment in social discourse, digital humanities techniques are being used to ferret out trends in speeches, maps, and censuses, to demonstrate sexism or racism. So the use goes far beyond word clouds.

But I’m still sad. No digital humanities grants for me.

1 comment to Dipping a toe into Digital Humanities: word clouds

  • HI Lisa,

    In the past I have used Wordle – http://www.wordle.net/ I have used it more for vizualisation/illustration than for analysis. My experience is that it is so easy to edit, to make it include the words you want and delete the words you don’t want, that it would be difficult to trust it for analysis purposes, i.e. when looking at other people’s Wordles.

    But I haven’t really explored the variety of ways in which it could be used for research purposes.

Leave a Reply

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>