It’s the next big thing in artificial intelligence — a new system called GPT-4, which responds to images as well as words. And if this new software from the creators of popular AI app ChatGPT works as advertised, it could be a godsend for millions of people with vision disabilities.
With GPT-4, a smartphone could provide travel directions simply by using its camera to read a map. It can read the labels on packages at the supermarket and confirm that the product hasn’t passed its expiration date. Or it can confirm that a user has chosen the red dress from her closet, say, and not the green one.
Surging interest in AI systems is also raising fears about their capacity to do harm. Cyber criminals could use AI to build malicious software or commit fraud by creating realistic bots masquerading as real people. But, for people with little or no eyesight, GPT-4 might enable greater safety, independence, and confidence. That is why makers of technologies for the vision-impaired are eager to give it a try.
“This has the chance to make the world more accessible to 250 million people,” said Mike Buckley, chief executive of Be My Eyes, a Danish company that is now testing a GPT-4-enabled version of its assistance app for blind people.
San Francisco-based OpenAI, the company that created ChatGPT and Microsoft’s Bing Chat service, presented an online demonstration of GPT-4 on March 14. It’s a more sophisticated version of the software that’s become famous for creating pictures, essays, and poems based on a user’s written commands. But perhaps GPT-4′s most remarkable upgrade is an ability to make sense of images and the information embedded in them.
OpenAI still hasn’t made GPT-4′s image recognition features available to the public. But about 100 users of Be My Eyes are getting a preview, because the new system promises to make the company’s software vastly more useful.
Be My Eyes was launched in 2015. It’s a smartphone app that lets a blind user transmit images to a sighted volunteer who verbally describes the image. Over 300,000 people with limited vision use the app to help them shop for groceries or to make sure they’re wearing socks that match, and 6.3 million people have volunteered to help them.
But GPT-4 could replace many of these volunteers, because it can recognize and describe everyday objects without human assistance. A blind person could point the camera to get a spoken readout describing what the camera sees.
Buckley said OpenAI reached out to his company six weeks ago to set up a beta test of GPT-4′s image processing features. One of the testers is British journalist Lucy Edwards, who lost her vision about a decade ago. In an interview, Edwards said she has been stunned and delighted by the capabilities of the new software.
After landing at Heathrow Airport recently, Edwards needed to catch a subway train to central London. She used her smartphone to photograph a map of the famously complex London Underground. Then she asked Be My Eyes for the best route to her destination. In seconds, the app analyzed the map, decided on the correct route, and read out the answer.
“It actually made me cry,” she said.
Edwards was also delighted during a trip to her fitness center. She fed photographs of the exercise machines to the app, which told her which treadmills were unoccupied and how to walk to one of them without tripping over obstacles.
Edwards has also used the app to check the contents of her refrigerator. Be My Eyes serves up a list of the fridge’s contents, and she can also ask it what kinds of meals she can prepare with the food that’s there. GPT-4 responds with a list of suitable recipes.
OpenAI has teamed up with another maker of technology for blind people. Netherlands-based Envision is testing a head-mounted device powered by GPT-4. The device includes a camera, microphone, and earbuds. Users can take pictures of their surroundings and ask the software for more information — for instance, whether there are any empty seats in a conference room.
“It can do face recognition, it can do text recognition, it can do object recognition,” said Envision chief executive Karthik Mahadevan.
Other companies are using less advanced AI tools to assist blind people. For instance, a product called ARx Vision uses a head-mounted camera to identify nearby objects and transmit the information to a person’s smartphone.
But GPT-4′s greater ability to interpret what it sees “does kind of kick it up to a new level,” said Sandy Lacey, executive director of the Howe Innovation Center at the Perkins School for The Blind in Watertown.
Lacey also thinks there’s a decent chance that a lot more people could benefit from GPT-4′s visual abilities. She noted that closed captioning of TV shows and software that converts text to speech was “developed for people with disabilities, and then they permeate through to the general population.”
So, if GPT-4 lives up to its promises, even people with perfect vision may someday use it to help them understand what they’re seeing.
Hiawatha Bray can be reached at firstname.lastname@example.org. Follow him on Twitter @GlobeTechLab.