Cambridge’s Bluefin Labs decodes social media chatter
Facebook users “like” things 2.7 billion times a day. People share their opinions more than 500 million times daily on Twitter. Now, this start-up is betting it can change everything from product placement to how we elect our president.
IN THE 1930S AND ’40S, Hollywood had a way of tracking the popularity of its movie stars. Studios would sift through the quarter of a million or so fan letters that arrived each month and sort them into separate bags by actor name. Then studio employees would heave these bulging bags onto a scale, according to industry researcher Leo Handel. A big spike in weight meant the star was trending up. A sharp decline suggested the star was on the way to becoming yesterday’s news.
As measurements go, this was pretty crude. Even back then, the people who would take the time to write a letter represented a tiny subset of the population, usually teenagers motivated by an excess of adoration (or antipathy). So, in time, movie executives would follow the lead of their counterparts in radio, television, and advertising and adopt the techniques of opinion research to understand what their audiences wanted.
The push toward data collection in television brought us Nielsen families, those chosen few whose living room diaries and, eventually, People Meters were powerful enough to keep their favorite shows on the air. And it brought us the ubiquitous focus group, where a dozen unhurried souls would be steered into a conference room and, in exchange for 50 bucks and all the M&M’s they could eat, be asked to render a verdict on a new program.
Yet it hasn’t always been clear how much we’ve gained from this relentless pursuit of audience preferences. High Nielsen ratings — scores that were extrapolated from just several thousand households — kept shows like Three’s Company and The Love Boat on the air long past their sell-by dates. And I’ve been suspicious of the focus group ever since the seventh grade, when on a trip to New York I somehow got shanghaied into testing a sitcom starring Harold Gould as a skirt-chasing widower. Against the heated objection of this 13-year-old out-of-towner, CBS went ahead and aired Foot in the Door in 1983, though the network thankfully mercy-killed it after just six episodes. What’s the point of market research if it regularly leads to doozies like that? We might as well bring back the fan-mail scales.
In a way, that’s what’s happening.
Thanks to social media, thousands of fan letters and complaint missives, huzzahs and boos, are now being written every single minute. Twitter alone processes half a billion tweets each day. But there are problems. As with those letter-writing fans of the past, today’s social media commenters skew young. And right now, the most common methods for tracking their views resemble the “trending by weight” measures the old studios favored: Multiple firms tally all mentions of a TV show or movie made on social media, then report grand totals across general categories. This is useful only to a point.
Twitter, Facebook, and other services have already transformed media from a one-way conversation into a democratized, constantly churning feedback loop. In time, social media hold the promise of exercising enormous influence over everything from the shows we watch on TV to the toothpaste we buy in the supermarket to the politicians we send to Washington. “Social TV” and the “second screen” experience — watching the TV set while cradling a smartphone or tablet — may even rescue live television viewing from the dustbin into which the DVR has swept it.
Yet the only way any of this is going to happen is if somebody can reliably convert all that online chatter into meaningful information. After all, if someone tweets “the office is making me cry,” is that person referring to a particularly poignant episode of the NBC comedy or a hostile workplace? Even more difficult is discerning sentiment. Are most of those millions of mentions about your show praising it or panning it? And what if the name of the show isn’t even mentioned? Making those kinds of interpretations are easy for humans yet exceedingly difficult for computers.
But they’re learning. A Cambridge start-up called Bluefin Labs is marrying the computational power of machines with the interpretive guidance of humans to make sense of — and profit from — the fire hose of nonstop social media. The company’s work builds on the research of its two cofounders, MIT guys who have dedicated their professional lives to teaching machines to understand human language. Now they are using that knowledge to teach machines to understand what we really mean when we tweet or post about everyone from President Obama to Honey Boo Boo. The outcome just may be as important to the president as it is to that cringe-worthy pint-size product of reality TV.
IT’S THE THIRD TUESDAY IN OCTOBER, and the whole world is watching. President Obama and Mitt Romney are about to square off for their second debate. Will Barack Obama redeem himself after his listless first outing? Will Romney continue his extreme-to-moderate makeover? To chew over these questions endlessly, CNN has mobilized its vast team of talking heads, arraying them in clusters throughout its hangar-like studio. Periodically, the network cuts to a separate studio, seeking insight from 35 undecided voters who appear as trapped and beleaguered as the Donner Party in the snows of the Sierra Nevada.
That night Deb Roy is sitting at home with his wife, watching the debate on television like the rest us. Or at least the “second screen” viewers among us who no longer restrict our conversations about what we’re watching to the people sitting next to us on the couch. More than 40 percent of smartphone and tablet owners use their devices while watching television at least once a day, according to a Nielsen survey published in April.
As the debate unfolds, the 43-year-old Roy keeps his iPad on his lap, pausing often to read his wife the insights from the 100 people he follows. “Being able to have this social soundtrack come into our living room,” he says, “has completely changed the experience of television.”
Roy has always been fascinated by technology. When he was in grade school in Winnipeg, Manitoba, in the late 1970s, he walked into a RadioShack and was dazzled by a new personal computer on display. The boy strode up to the keyboard and typed “HELLO.” On the screen popped the message “SYNTAX ERROR.” He tried “HI,” but got the same response. In a way, Roy hasn’t stopped trying to get computers to understand him since.
The son of a town planner who had emigrated from India to Canada, Roy spoke Bengali before English. While his older sister was an accelerated student who kept her head buried in the books, he was content to get B’s. Skipping studying left him plenty of time to conduct an endless supply of experiments at home. By his early teens, he had built robots, fireworks, a reading device for the blind, and a rudimentary speech-recognition tool that could understand words in English, Bengali, and German.
In 2004, when Roy’s wife, Rupal Patel, learned she was pregnant, he approached her about turning their split-level Arlington home into a round-the-clock research lab, with their newborn serving as the sole subject. Since they’d broadly discussed this kind of project years earlier, Patel was an easier sell than most spouses. While Roy focused on teaching robots human language as a professor at the MIT Media Lab, his wife taught speech pathology and computer science at Northeastern University. She had contributed to his PhD research, and they shared an interest in observing a child in a natural setting, all in the hopes of unlocking the mysteries of how kids learn to talk.
First, they agreed to lots of privacy controls. Even though every room in the house — including the master bedroom and the bathroom — had a color camera with a fisheye lens planted in the stucco ceiling, individual room cameras could be turned off easily. What’s more, each was equipped with what Roy calls an “oops” button, which would erase the previous stretch of video if he and his wife felt it had caught something embarrassing.
The multiple cameras of this Human Speechome Project rolled for three years, amassing about 250,000 hours of footage — an archive Roy calls “the world’s largest home-video collection.” And Roy and his research team soon found some fascinating results in their mountains of data. They learned it didn’t make sense to focus solely on the child, but instead the interplay between him and his parents (and nanny). Tellingly, the caregivers’ language became less complicated the closer the child got to speaking each new word, then gradually grew more complex afterward. Without realizing it, the caregivers were essentially dumbing down their language to meet the boy halfway.
The researchers also learned that a word’s association with a pattern of activity or certain place in the house was a far more robust predictor of how quickly Roy’s son would learn that word than was the frequency with which he heard it. The boy was much quicker to learn words with a very specific meaning — “mango,” for instance, which he usually ate in the kitchen — than those like “water,” which could mean a drink in the kitchen, bath water in the tub, or the rain outside. Context, it turned out, was hugely important.
As the MIT team prepared to turn off the cameras, Michael Fleischman, then one of Roy’s PhD students, met with his adviser to discuss using Speechome data in his dissertation. Roy broke the news to Fleischman that it would take many years to devise the algorithms needed to identify the most relevant video clips and then develop efficient ways to transcribe the footage, using a combination of computers and humans. If Fleischman wanted to graduate any time soon, Roy said, he would need to find another data set.
That led Fleischman to start a project teaching computers to understand baseball. He set up his computers to watch two seasons of Red Sox games, training them to tell the difference between a ball and a strike, between a foul and a home run. And because he used the games’ closed-captioning, he didn’t even have to worry about the computers getting thrown a curve by Jerry Remy’s accent.
A small write-up about the project in MIT Technology Review in 2007 led to a National Science Foundation grant, which in turn led Roy and Fleischman in 2008 to found a company leveraging the lessons from both their projects. Forced to act quickly, they named the company Bluefin Labs, after the Porter Square sushi restaurant where they frequently ate.
The original plan was to find the most talked-about plays from televised football games and analyze the commentary about them from sports bloggers. No matter that neither Bluefin partner was much of a sports fan nor that Roy hadn’t owned a TV in about 20 years. (“I overdosed on TV as a kid,” he says.) Over time, they raised more than $20 million in venture capital, including money from the Patriots’ Jonathan Kraft and Celtics co-owner Jim Pallotta.
Still, there was a major constraint on their new business’s path to success. The professional sports leagues own the rights to the games, so they would always call the shots. The partners pushed forward, shifting from bloggers to social media when Twitter began to take hold around 2009. Their eureka moment came when they noticed that tons of people were commenting not just on the games but also on the commercials. (Turns out fans weren’t just getting up to use the bathroom during those.) They decided in 2010 to flip the entire focus of the company. By analyzing which TV commercials and shows were gaining traction, they suddenly had a business with a huge upside. After all, TV is a $70 billion industry, and they could dangle in front of all those big-spending advertisers the promise of helping them use their dollars a lot more wisely.
WHILE ROY AND HIS WIFE watch the second presidential debate at home, nine people sit in a Kendall Square office in the shadow of MIT. With laptops on their knees, they keep their eyes glued to the big flat-screen TV mounted on an exposed-brick wall. Although these Bluefin staffers will be watching CNN’s coverage of the debate, they’ll also be competing with the network — and all other news organizations — in the race to make sense of the event.
Bluefin, which employs only about 50 people, has several crucial advantages. It gets the unfiltered feed of just about every tweet everywhere, as it is being posted — it’s possible to get such a thing from a Twitter reseller — and scrapes Facebook for all public comments. Bluefin also captures and catalogs everything broadcast on every channel in the country — programs and commercials alike — to create what it has dubbed the TV Genome.
Fleischman, Roy’s fellow cofounder, sits in the middle of the office, alternating his gaze from the flat screen on the wall to the one on his lap. As absorbed as he is, the 35-year-old Southern California native maintains his aura of calm. Seconds after Romney’s attempt to buff up his bona fides with women leads to a memorably awkward phrase, Fleischman stands up and calls, “Is ‘binders full of women’ trending?”
A couple of the staffers are tasked with “live tuning,” making note of unexpected word combinations that arise, then plugging them into the algorithms that are trying to determine context in all the chatter. The goal is to sharpen, in real time, the system’s ability to grasp nuance. In the first debate, the unexpected phrase was “Big Bird.” In the third, it would be “bayonets.” Tonight’s B surprise, of course, is “binders.” For all they have to recommend them, the algorithms still need a lot of help when it comes to detecting sarcasm.
Hunched over, sitting closest to the TV, is Bill Powers, a veteran Washington journalist who joined Bluefin in January specifically to work on election coverage through the company’s analytics initiative called The Crowdwire. When the cofounders first asked him to come aboard, Powers needed convincing. “Why would I need this stuff when there are lots of political polls with decades of science behind them?” the 51-year-old asked. “This feels like eavesdropping in a bar.”
“Stop right there,” Fleischman replied. “This is eavesdropping in a bar. But which would you prefer: to listen in on a conversation between friends in a bar or get the conversation between a pollster and the person whose dinner he just interrupted?”
Powers now has little doubt that social media analytics will loom large in the politics of tomorrow. Working against traditional pollsters is the mounting struggle to get a representative sample when caller ID leads fewer people to pick up the phone, land lines are disappearing, and robo-calling cellphone numbers is against the law.
Bluefin pairs up people’s political posts with other social comments they’ve made about TV. These resulting affinities can be entertaining. Bluefin, for example, determined Romney fans tend to like Arby’s, while Obama’s prefer Red Lobster. As the analysis gets more sophisticated, it’s easy to see how political parties would covet this tool in their effort to paint detailed portraits of their supporters.
On the commercial side of Bluefin, executives expect affinities to help advertisers find the right homes for their commercials and their product placements. The company recently identified a handful of unlikely TV shows, including reruns of That ’70s Show, whose fans happened to comment often about makeup. Ad time during those programs turned out to be surprisingly wise buys for cosmetics companies.
Despite interest from campaign operatives in Bluefin’s analysis, Fleischman says, the company decided to keep its politics operation both noncommercial and nonpartisan for at least this year, while maintaining the company’s focus squarely on analyzing TV shows and commercials. But there’s a good chance that this stance will have changed by the time the next big election rolls around.
In the back of the office, a 30-year-old guy with shoulder-length hair and a plaid short-sleeved shirt sits behind three large monitors. Matt Miller is a machine learning engineer, and his job for the night essentially involves doing R & D on the fly. He is monitoring an algorithm that hunts through all the raw data in search of words or phrases that are appearing together more often than you’d expect. The algorithm then clumps these words into “topics.” Miller samples actual tweets that are flooding in, so he can test whether the algorithm’s hunches are on target. “Binders full of women” produces one such huge spike under the topic of “women’s rights.” But so does another topic, grouping phrases like “middle class” and “capital gains.”
This one doesn’t make as much sense to Miller, so he plunges into the ocean of commentary, nervously fingering his hair behind his ear. Then it dawns on him. Lots of people are sarcastically going after Romney for talking about capital gains cuts as if those would really help people in the middle class.
When it’s over, this second debate generates 12.24 million social media comments, making it what Bluefin calls television’s “third most social event of all time,” a category that admittedly is barely older than Beyonce’s baby. Still, those numbers place it ahead of the 2012 Super Bowl and just behind this year’s Grammys and MTV Video Music Awards. (The 28.5 million public tweets and Facebook posts on election night subsequently bumped the second debate down to fourth place.)
Miller leaves his outpost to get closer to the TV, so he can take in the post-debate analysis by veteran talking head David Gergen on CNN. With little to go on besides his own instincts and the fever lines of reaction that had run along the bottom of the screen thanks to the levers being operated by that Donner Party of undecideds, Gergen gets wishy-washy. “Everybody will have different views of this,” he says.
Miller shakes his head. “They’ve got no data,” he snorts. “So they have to equivocate until they get some.”
While Bluefin will keep trying to tease out meaning from its data, Miller feels the algorithms have already shown him plenty. “I saw almost no positive spikes for Romney,” he says. For sure, Twitter’s young-skewing audience helps explain some of that imbalance. Yet the 2012 election’s ultimate triumph of New York Times poll wizard Nate Silver suggests the future may well belong to the stat geeks.
The instant, ardent reaction that Miller was able to see on his screens, coming from millions of people in every corner of the country, made CNN’s reliance on the gut reactions of old hands and conscripted proxies seem as dated as a dial-up modem.
TWO YEARS AFTER BLUEFIN’S SWITCH from sports analysis to television shows and ads, its clients include more than 40 American TV networks, among them CBS, NBC, and Fox. The networks take into account the firm’s data when deciding which shows to keep on the air, and even where to place them in their lineups. Similarly, giant consumer brands such as PepsiCo and Mars use Bluefin data to direct their advertising efforts more effectively.
David Wertheimer, president of digital for Fox Broadcasting Co., says the engagement levels on social media help identify shows that have growing bases of passionate fans, even if their ratings are still fairly low. Take Fringe, which tends to be one of the lesser watched shows on Friday night television. “The rise of social media helped us understand the core fans of the show better,” Wertheimer says, “and that was one reason we kept it on the air.”
There’s very little evidence yet to prove that high social-engagement levels automatically lead to higher ratings, which remain the primary measure for pricing ad time. But Jesse Redniss, who heads up digital operations for USA Network, says an aggressive social media campaign for the show Psych helped lift its overall ratings by 10 percent last season. Likewise, the big Twitter and Facebook buzz about Covert Affairs actor Christopher Gorham informed the decision to expand his character’s role considerably.
Bluefin appears to be having an immediate impact on the decisions of advertisers and media buyers. Advertising veteran JP Maheu joined Bluefin as CEO this past summer, ahead of Roy’s return to MIT in January. (Roy will remain company chairman.) Maheu says that despite all the billions at stake in TV, “feedback is not in real time at all. It takes four to six weeks to get a sense of how your campaigns are working.”
But Bluefin’s data, he says, can collapse that delay into instant feedback. In preparation for a trade conference, Bluefin picked up on a Verizon ad that ran during the NBA and NHL playoffs. Featuring a mother and her nest-departing daughter who are weeping so much their words need to be subtitled, the ad generated overwhelming hate from the audience. “Shouldn’t it be on Lifetime between a Tampax ad and a diaper ad?” someone wrote on CommercialsIHate.com.Verizon followed up the ad with a father and son who were hilariously emotionless, and the social world loved it.
As media columnist for Advertising Age, Simon Dumenco follows this new social TV phenomenon as closely as anyone and has formed nonfinancial data-sharing relationships with Bluefin and several of its competitors. “I think with Bluefin, we’re going to get more efficient commercial decisions,” he says. During the Summer Olympics, Bluefin was able to show one big advertiser — another cosmetics brand, though the firm won’t name names — that one of its ads was tanking. After the spot had been airing less than a day, 65 percent of social media chatter about it was negative — people thought it was trying too hard to celebrate multiculturalism. With $2 million worth of airtime still to run, the cosmetics company could have simply been buying itself a lot more negativity. Instead, it swapped in a different ad that proved more popular.
As for the impact of all these data on the content that airs between the commercials — the actual TV shows — Dumenco is following social media’s impact on live TV watching. With the rise of DVRs and mobile TV consumption, the number of people watching shows when the networks actually air them has gone down steadily for three years. (Tent-pole events, such as the Super Bowl, are the exceptions). But social media users know that if they get anywhere near Facebook or Twitter, it will be impossible to avoid spoilers that will ruin the suspense behind their favorite shows. That creates a bigger incentive for viewers to watch shows live and feel part of the conversation as it’s happening.
Social media can also play the role of that canary in a coal mine for the networks, Dumenco says. He cites the example of Fox’s megahit Glee, whose social-engagement numbers began to soften even before the ratings declined. If used properly, that kind of early signal could help show runners change course before a program has lost too much ground.
Yet for all of Bluefin’s promise and impressive techie pedigree, the company is not the only player in this new space. And it remains an open question whether it or one of its competitors — Trendrr, General Sentiment, SocialGuide — will dominate. The tiny SocialGuide made a big leap in mid-November when Nielsen acquired the company.
An analysis co-written by Fordham business school professor Philip Napoli earlier this year found remarkable disparities in the results from these competitors to the seemingly straightforward question of which shows garnered the most mentions on social media. The rankings differed by at least 60 percent, he says, and the richer analysis of positive-versus-negative sentiment within those comments differed by an ever bigger margin.
While Bluefin goes deeper, some of its competitors cast a wider net, scraping not just social media but many websites and keeping their claws out there for longer stretches before and after a show airs. “When it comes to people negotiating the buying and selling of audiences,” says Napoli, author of the book Audience Evolution, “it becomes more difficult and uncertain if there are multiple currencies circulating.” Nielsen’s longtime leadership in TV audience measurement, he says, was due less to its accuracy than its leverage as the dominant currency.
All this measurement will surely affect the kind of TV we consume. It might allow more quality shows to remain on the air by creating what Napoli calls “multiple definitions of success.” In years past, shows such as Family Guy and Arrested Development were canceled because of low ratings, only to be resurrected when the full extent of their cult followings became clear.
Just as easily, though, too much attention to audience sentiment could produce a race to the bottom, triggering cynical programming decisions on the part of the networks for the express purpose of padding social media chatter. We know we’re going to be getting a lot more instant reaction from a lot more people. Whether that gives us more Arrested Development or more Honey Boo Boo is still anybody’s guess.