An app for the blind that describes the world

A screenshot of the app.
A screenshot of the app.

Around the world, there are a number of different efforts underway to cure blindness or soften its impact. These range from regenerating photoreceptors in the brain and implanting electrodes in the retina, to translating the visual world into a stream of sound. Last month a computer engineer at the University of Massachusetts Boston came up with a far more straightforward approach: a phone app that simply speaks the name of the object in front of it.

Joseph Cohen calls his invention BlindTool. It makes the phone’s camera serve as the user’s eyes — as the camera scans the surrounding environment, it uses the vibrate setting to convey the certainty with which it perceives an object. The more insistently it vibrates, the more certain it is that it knows what it’s looking at, and once the app finds an object it knows with greater than 30 percent certainty, it speaks the name of that object out loud.

“The concept in my mind is kind of old,” Cohen says. “Years ago I worked with a blind [software] developer and it struck me so strongly that we need some better infrastructure for the blind.”


Cohen developed his app during a coding binge at the end of December. It’s based on a dataset of images call ImageNet, which includes 1,000 different types of objects. The app uses a data processing technique called a neural network to compare an object in front of it to the objects in the dataset. It’s an approach that’s only recently become viable, thanks to advances in the field of computer vision.

Get Truth and Consequences in your inbox:
Michael A. Cohen tekes on the absurdities and hypocrisies of the current political moment.
Thank you for signing up! Sign up for more newsletters here

“Just a couple years ago it was way beyond computer vision to analyze images of a real life environment and get meaningful information about objects,” says Peter Meijer, a research scientist with the Dutch company Hemics BV who works on software products for the blind. “There has been tremendous progress in the last couple years.”

BlindTool is one of a number of apps developed recently that use mobile technology to aid the blind. One, Be My Eyes, connects blind users with sighted users, who narrate the surrounding environment for their partner. Another, TapTapSee, works much like BlindTool.

These computer vision apps are only as good as the dataset of objects they have to draw on, which presents limitations for now. The ImageNet dataset that Cohen used was originally designed for use in computer vision competitions, which explains some of its idiosyncratic content.

“It has many different dog breeds, but only one type of shoe,” says Cohen, who recently launched a Kickstarter campaign to aid development of the app.


In online forums, the initial reception of BlindTool has been largely positive, despite some hiccups in the way the app performs. One user posted on Reddit, “I played around with it a bit, still needs some work. It thought my plate was a toilet seat, and hand was a gas mask.” Meijer also got some unusual results. “I found that any elongated bright blob tends to be identified as a nematode,” he says.

As computer vision improves and image datasets expand, these kinds of misidentifications will likely be resolved. The question then is how a product like BlindTool would fit with the other services being developed for blind people.

Meijer works on software called The vOICe that translates the visual world into soundscapes. In this translation, elevation is conveyed through frequency (pitch) and brightness is conveyed through loudness. A bright diagonal line that runs from the lower left to the upper right, for example, would be represented sonically by a sound that starts at a low pitch and gradually increases in pitch over the course of one second.

The vOICe is an example of what’s called “sensory substitution” (in this case using an audio representation of visual input) and Meijer says that learning to visually interpret the sounds is similar in difficulty to learning a foreign language. Once a user becomes fluent, the program provides something equivalent to 20/200 vision, which in principle suffices for walking around while recognizing objects. The soundscapes provide a broad but uninterpreted or “raw” description of the visual environment, while apps like BlindTool stand to provide a more limited but meaningful description of the most salient objects. Meijer imagines that as the technology improves, the two approaches might work well together.

“[Soundscapes] are the equivalent of peripheral vision. [They] can recognize the bright blob as being there and [the app] can tell you what this blob represents,” he says.

Watch a demo

Kevin Hartnett is a writer in South Carolina. He can be reached at