fb-pixel Skip to main content

When computers read the canon

The National Library of the Czech Republic in Prague.
The National Library of the Czech Republic in Prague.Pavel Horejsi/The New York Times

Good stories are so absorbing that you don’t care how they pull you in. Computers are not so easily distracted, though, and a growing number of researchers are using programs to search for hidden patterns that underlie narratives from the Odyssey to modern sitcoms. This includes new work released earlier in June from the University of Vermont that concludes fictional stories follow one of six basic emotional trajectories. It’s a finding so reductive that it has reignited a debate about just how perceptive computers can be when it comes to analyzing plot.

The idea that all stories follow one of a few basic forms has a long history. It was formalized in the 20th century by writer Joseph Campbell, who identified the hero’s journey as an archetypical plot, and also preoccupied novelist Kurt Vonnegut. In his rejected masters thesis at the University of Chicago, and a subsequent video, now popular on YouTube, Vonnegut claimed that stories are defined by the swings of their main characters between good and ill fortune.


He argued that there are three main trajectories most stories follow: “man in a hole” (character starts out doing fine, gets in trouble, gets out of trouble), “boy meets girl” (starts out normal, gets better, gets worse, gets better again), and the “Cinderella” story (starts low, peaks, drops, peaks again). Vonnegut also suggested that technology might provide a way to test his theory.

“We were inspired to look into story arcs from Vonnegut’s original masters thesis,” says Andrew Reagan, an applied math doctoral student at UVM and lead author of the paper, which was posted online on June 24. “He says there’s no reason why the simple shapes of stories can’t be fed into computers.”

Following Vonnegut’s cue, the Vermont team used a technique called sentiment analysis, measures the emotional content of text using a collection of 10,000 English language words, each of which has been tagged with a happiness score. In this scale, words like “laughter,” and “excellent” score high, while words like “terrorist,” “suicide,” and “arrested” score low.


Sentiment analysis has been used effectively on shorter texts, like to automatically tag Yelp reviews as positive or negative. This new study extends the technique to 1,700 stories downloaded from the online library Project Gutenberg. After running different statistical processes, they found that the stories sorted into six distinct story arcs – the three Vonnegut had identified, plus three others: tragedies (a straight progression from happy to sad); “rags to riches stories” (a straight progression from sad to happy); and what they term “Oedipus” stories (which feature a fall, then a rise, then a final fall).

The Vermont results are consistent with work last year by Matthew Jockers, a professor of English at the University of Nebraska.

Both Reagan’s and Jockers’s work have also generated criticism from within their field for using statistical methods that make plot patterns seem more pronounced than they might actually be. On July 18, Benjamin Schmidt, a history professor at Northeastern and a prominent figure in the digital humanities, published a blog post in which he characterized the UVM results as “extremely weak,” and questioned whether the technique of sentiment analysis – even when done well — is able to say much about plot structure at all.


“Sentiment analysis is widely used because it’s one of the things we have at hand that does its job reasonably well, but there are all sorts of plots that aren’t organized around happy or sad or even fortune or misfortune,” Schmidt says.

The debate within the digital humanities community centers on two main questions. First: Is sentiment analysis is even effective at tracking the arc of positive and negative emotion in a story. Last year Ted Underwood, a professor of English at the University of Illinois, tried to use sentiment analysis to distinguish Shakespeare’s comedies from his tragedies. Underwood found that he couldn’t. “If we can’t get some evidence like that, I can’t feel confident sentiment is necessarily telling us something meaningful about plot,” says Underwood.

Another question is whether the rise and fall of a character’s fortune is really the best way to try and understand plot. In his recent blog post, Schmidt speculated that if master plots do in fact exist, they might be better understood in terms of how tension is built and released than in terms of emotional rises and falls.

“There are crime procedurals, hospital stories, that aren’t about individual characters going through redemption or encountering obstacles,” Schmidt says. “There are also a lot of narrative devices that aren’t about positive or negative words. Some of them are, but others are about who is the mole inside the British Secret Service and how are they going to catch them.”


Underwood thinks it’s unlikely that plots can be described by any one single variable. And in his blog post, Schmidt questioned whether such a thing as master plots really exist, especially when compared with the fairly rigid structures found more commonly in music.

“I don’t think people are ever going to discover any organizing principles in novels as important as sonata form is in music,” he says.

Yet that won’t stop people from trying. Human beings have deep, instinctive attraction to stories and we clearly respond to certain kinds of plots – which means the more we can unlock about how plots work, the more we might be able to understand ourselves.

Kevin Hartnett is a writer in South Carolina. He can be reached at kshartnett18@gmail.com.