This Saturday, as about 700 of the nation’s top crossword solvers gather in the Grand Ballroom of the Brooklyn Marriott for the American Crossword Puzzle Tournament, there will be an interloper lurking in the back of the room.
The interloper is known as Dr. Fill. Unlike the other assembled crossword experts, Dr. Fill is not human. The Doctor is a crossword-solving program, and will be running on the notebook computer of Matt Ginsberg, a software engineer from Eugene, Ore. When the bell rings and humans start solving the first of seven championship puzzles, Ginsberg will hit “enter” and Dr. Fill will get to work, in an attempt to achieve the highest score in the tournament. (Dr. Fill isn’t officially entered, but anyone whose final score is better will get an “I Beat Dr. Fill” button from tournament organizer Will Shortz.)
Our brainy pastimes are falling, one by one, to silicon-based competitors. First there was chess, with Deep Blue beating world champion Garry Kasparov in 1997. Since then, programs have bested humans at poker, and Ginsberg himself has designed software that can beat the world’s experts in bridge. Last year, in a highly publicized match, the IBM supercomputer Watson emerged victorious in “Jeopardy!” against all-time champs Ken Jennings and Brad Rutter. But crossword puzzles have always seemed like an impossible hurdle for artificial intelligence, or A.I.; their emphasis on tricky wordplay would seem to make them immune to those without human powers of wit and association.
Still, Ginsberg, who runs a software company called On Time Systems that figures out optimal aircraft routes, was inspired by Watson’s success to try to improve automated crossword solving. Ginsberg is a longtime crossword aficionado: As an undergraduate at Wesleyan in 1976, he created an early program to fill grids with words. Over the past few years, he has made more than two dozen puzzles for The New York Times, including one last year coconstructed with the actress Dana Delany.
Ginsberg’s not the first computer scientist to tackle the A.I. crossword challenge. In 1999, Michael Littman of Duke University worked with grad students to create Proverb, a program that would have finished 147th out of 255 contestants had it been entered in that year’s tournament. Dr. Fill takes advantage of advances in computing power and data-mining to do better. Ginsberg conservatively guesses that Dr. Fill can place in the top 30 or so this year, but when it’s good, it’s very, very good: In simulations of 15 past tournaments, it came out on top three times.
Even when Dr. Fill beats the best humans, it’s not error-free. “It doesn’t really know what it’s doing,” Ginsberg admits. Though it can come up with answers based on looking at databases of past crosswords, dictionaries, and Wikipedia, there are inevitably devilish clues that it can’t solve. Take this clue from a 2010 Times puzzle: “Apollo 11 and 12 [180 degrees].” The answer is the incomprehensible series of letters SNOISSIWNOOW. A very clever human would eventually realize that when the answer is rotated 180 degrees, the upside-down letters spell out MOON MISSIONS.
Though Dr. Fill might not easily decipher such nondictionary fare, it gets a boost (like human crossword-solvers) from crossing words, looking ahead to see which “down” entries fit best with the “across” ones. Ginsberg has also built in routines to check for the kind of puzzle themes favored by constructors, such as adding or subtracting certain letters from longer answers. For instance, the puzzle that Ginsberg created with Delany was called “Pretty Disgusting,” which involved adding -IC (as in “ick!”) to common phrases. Thus, “fancy garb for Caesar” is a clue for FINE TUNIC, or fine-tune with an “icky” ending. Though Dr. Fill is getting better at handling knotty themes, Shortz told me that he thinks one or possibly two of this year’s tournament puzzles will give the program trouble.
If Dr. Fill gets stumped, it can still overcome the scoring penalty for wrong words through sheer speed. Ginsberg says it can complete most puzzles in approximately 30 seconds, but he then gives it another minute and a half to see if the filled-in grid can be improved upon. Not even Dan Feyer, the lightning-fast solver who has won the last two Brooklyn tournaments, can polish off tournament-caliber puzzles in two minutes.
Feyer told me he isn’t too concerned about competition from Dr. Fill--yet. But he expects that as Ginsberg tinkers with his algorithms, the program will eventually solve crosswords better than any human. In the meantime, he said, “the project should give us some great insights into crosswords, as well as the important mathematical problems that Matt solves in his day job.”
Ginsberg, for his part, has mixed feelings about his software’s success. He recalls the moment when he made a major improvement to Dr. Fill’s dictionary database (which currently contains about 10 million items), rendering most Times puzzles child’s play and putting even the hardest tournament puzzles in its grasp. “That was sobering,” Ginsberg said. “Being in the top 50 is fine, but being in the top one is different. It was a little bit bittersweet. I was very surprised to feel that way.”
Still, he stresses that the way that Dr. Fill goes about solving is “really very inhuman.” Machine intelligence, even when it makes impressive showings against carbon-based counterparts, is still fundamentally different from human intelligence. Cold comfort, perhaps, for Team Human, as we prepare to lose another battle to computers on our home turf.
Ben Zimmer is the executive producer of VisualThesaurus.com and Vocabulary.com. He can be reached at benzimmer.com/contact.