MIT researchers have developed a computer system that independently adds realistic sounds to silent videos. Although the technology is nascent, it’s a step toward automating sound effects for movies.
In a series of videos of drumsticks striking things — including sidewalks, grass, and metal surfaces — the computer learned to pair a fitting sound effect, such as the sound of a drumstick hitting a piece of wood or rustling leaves.
The findings are an example of the power of deep learning, a type of artificial intelligence whose application is trendy in tech circles. With deep learning, a computer system learns to recognize patterns in huge piles of data and applies what it learns in useful ways.
In this case, the researchers at MIT’s Computer Science and Artificial Intelligence Lab recorded about 1,000 videos of a drumstick scraping and hitting real-world objects. These videos were fed to the computer system, which learns what sounds are associated with various actions and surfaces. The sound of the drumstick hitting a piece of wood is different than when it disrupts a pile of leaves.
Once the computer system had all these examples, the researchers gave it silent videos of the same drumstick hitting other surfaces, and they instructed the computer system to pair an appropriate sound with the video.
To do this, the computer selects a pitch and loudness that fits what it sees in the video, and it finds an appropriate sound clip in its database to play with the video.
To demonstrate their accomplishment, the researcher then played short video clips for test subjects, who struggled to tell apart whether the clips included an authentic sound or one that a computer system had added artificially.
But the technology is not perfect, as MIT PhD candidate Andrew Owens, the lead author on the research, acknowledged. When the team tried longer video clips, the computer system would sometimes misfire and play a sound when the drumstick was not striking anything. Test subjects immediately knew the audio was not real.
And the researchers were able to get the computer to produce fitting sounds only when they used videos with a drumstick. Creating a computer that automatically provides the best sound effect for any video — the kind of development that could disrupt the sound-effects industry — remains out of reach for now.
Although the technology world has seen significant strides of late in artificial intelligence, there are still big differences in how humans and machines learn. Owens wants to push computer systems to learn more similarly to the way an infant learns about the world: by physically poking and prodding its environment. He sees potential for other researchers to use sound recordings and interactions with materials such as sidewalk cement as a step toward machines’ better understanding our physical world.