Everyone who drives a car is a kind of mind reader, constantly guessing what nearby pedestrians are going to do next. Is that woman at the curb about to cross, or is she waiting for a ride? Even Boston’s notorious drivers almost always get it right, because humans have a knack for predicting what others are about to do.
But computers don’t have a clue about humans, and that’s a massive hitch in the effort to put self-driving cars on the road.
Now a startup in Somerville called Perceptive Automata is trying to tackle one of the hardest problems in artificial intelligence: teaching self-driving cars to predict human behavior by reading body language and facial expressions.
“The reason it’s such a hard problem is that humans are so unbelievably good at doing this,” said Sam Anthony, cofounder of Perceptive Automata. “We look at a pedestrian, and within 200 milliseconds we can say, ‘Oh, that person’s waiting for a bus, or that person is totally going to jaywalk.’ ”
Autonomous cars must master the same skill, or they could pose a deadly threat to pedestrians.
Conversely, autonomous cars could become ultra-tentative to avoid harming people, if they’re not sure what humans will do. When such a car detects a woman at the curb, it might slam on the brakes, just to play it safe. Multiply this by thousands of automated autos, and you’ve got massive traffic jams.
In the technology world, the challenge is sometimes referred to as “scene understanding,” and it’s getting a lot of attention from the biggest names in automobiles. Honda Motor Co., for example, has teamed up with a Chinese company that specializes in image recognition to develop technologies that “will enable complex automated driving in urban areas.” And Daimler AG, the parent company of Mercedes-Benz, is working with a British company, Humanizing Autonomy, that is developing a “pedestrian intent prediction platform.”
Beyond the automotive world, companies ranging from Microsoft Corp. to Boston-based Affectiva already have products that can analyze facial movements to assess what people are thinking or feeling. China, in particular, is big on facial recognition technology for multiple uses, many linked to surveillance; some schools there are testing a system that monitors students’ faces to see if they’re paying attention in class.
Perceptive Automata, meanwhile, focuses more on body language, as the position of someone’s head, hands, or feet may be more revealing than facial expression. Anthony holds a doctorate in psychology from Harvard University, and his specialty is vision science — the study of how humans process visual information. Another cofounder, Walter Scheirer , is an assistant professor of engineering at the University of Notre Dame and previously was a postdoctoral researcher at Harvard, specializing in teaching machines to recognize visual data.
The pair start with thousands of images of humans — walking along a sidewalk, crossing the street, waiting at an intersection — that they show to human testers, who judge whether the person in the picture is preparing to step into the roadway.
Often the images are partially obscured. For instance, one might show only a person’s head, while the next shows only the feet. In other cases, images are blurred. The testers also assess the likelihood that the person in the image knows there is a car coming.
The result is a large library of visual cues, each rated on the probability of the person crossing the path of a car. This data is used to train Perceptive’s software to make the same kinds of judgments about pedestrians detected by the cameras on a self-driving car.
“We’ve figured out how to take this general artificial intelligence problem and slice it finely enough so that we can solve a very fine-grained piece of it,” Anthony said.
Perceptive is conducting limited real-world testing. Anthony said his software “has run in production vehicles on both coasts of the US, and in Northern Europe,” but declined to name the companies using it. The software has been tested in self-driving cars, but only with a human “safety driver” along for the ride.
He said cars testing Perceptive’s software can drive more smoothly, with fewer unnecessary stops to make way for pedestrians who aren’t really about to step into traffic. That could reduce the risk of traffic jams or even rear-end collisions caused by over-cautious self-driving cars.
But Christoph Adami, a computational biologist at Michigan State University who works on artificial intelligence programs, said Perceptive Automata’s approach will fail. “It’s a good idea,” Adami said, but, “it’s still a dead end, and at some point they’ll hit the wall and see they can’t go further.”
Adami said Perceptive’s software isn’t sophisticated enough to consistently make the right decisions because it will not learn over time as the information it processes changes.
“They can be easily fooled by changing something on your shirt or your pants,” Adami said. “That means you can make tiny changes and suddenly the computer will completely misread the intent.”
For example, imagine a pedestrian wearing unusual clothing: If the Perceptive program hasn’t been trained to recognize that specific outfit, it will probably just freeze up, slam on the brakes, and wait for human aid.
Adami predicted that once this weakness becomes widely known, some mischievous types would exploit it to deliberately cause traffic jams or auto accidents. “They’re going to try to make the car crash. People are like that,” he said.
That’s not so far-fetched; Waymo, the self-driving car unit of Google parent Alphabet, reportedly has had its cars targeted by pedestrians, who abruptly jump in front of them to force them to stop.
Anthony insisted that with its fine-grained database of pedestrian imagery, Perceptive’s system will be much harder to fool.
And he’s not just interested in pedestrians. Because self-driving cars must share the roads with human-piloted vehicles, Anthony and his team want to teach his software to read the intentions of nearby human drivers.
There are opportunities beyond the auto industry. As companies develop robots to deliver packages, assist emergency workers, or provide home health care services, those machines will come into frequent contact with people and will need to know how to read their minds.Hiawatha Bray can be reached at email@example.com. Follow him on Twitter @GlobeTechLab.