Several years ago, self-described art nerd Jason Bailey learned what, for him, was a crushing, if hard to verify, statistic: As much as 20 percent of all collected artworks may be either forged or misattributed, their false identities often propped-up by shady ownership records.
“It kind of blew my mind,” said Bailey, a marketing and business consultant from Ashland and founder of the art and technology blog Artnome. “We agree, culturally, that we care so much about art, but we have so little information about the works that people can just forge them.”
As Bailey saw it, the art market had a data problem: Unlike, say, baseball, where a player’s statistics are just a few keystrokes away, the art market, whose global sales reached an estimated $64.1 billion in 2019, often struggles to provide basic statistical information about artists and their work: How many artworks, for instance, did the early abstract Dutch painter Piet Mondrian make? How many display his signature abstract grid with bold colors? Where are they all? All of which lead to the deeper question: How can you guard against fraud without a full inventory of an artist’s works?
Bailey, a lapsed artist who’s worked at several Boston-area start-ups, comes from a family of engineers. He’s a born problem solver, so after being “kind of bummed out about it for a while,” he came up with a solution: a moonshot project to create a digital database of known artworks.
Sites like Artnet and MutualArt already provide auction data, gallery listings, and market analysis. But when it comes to a full accounting of an artist’s oeuvre, the art world has traditionally relied on a decidedly analog form of technology: the catalogue raisonné, limited-run scholarly works by art historians that comprise all known works by an artist. These encyclopedic tomes, usually reserved for blue chip artists, can span multiple volumes. Some take decades to assemble and can carry price tags in the tens of thousands.
“What would you do if you wanted to make it as hard as possible to get and analyze information about our best known artists?” asked Bailey. “First, you would put it in print so you can’t run an analysis. Then you would make it so expensive that only super wealthy institutions and collectors can afford them.”
Over the past five years, Bailey has meticulously scanned some 45 catalogues raisonnés, photographing compendiums devoted to Jackson Pollock, Georgia O’Keeffe, and Mark Rothko. Once he’s digitized the files, Bailey ships them to a small cadre of data processors who extract the salient data points — the works’ dimensions, media, and year of creation — and return them as clean data.
All told, Bailey says he’s sunk more than $15,000 into the project. He’s compiled a personal library of catalogues raisonnés, but he also relies on interested collectors and friendly libraries to access many others. Eventually, he hopes the database will include as many as 200 catalogues, including what he calls his “white whale”: the 33-volume tome devoted to Picasso. (Although original versions have reportedly fetched some $200,000 at auction, the work was reissued in 2014 with a price tag of $20,000.)
But painstakingly digitizing hundreds of pages is only so interesting. There’s also the question of copyright, which Bailey says he tries to avoid by keeping the database private and confining it to factual information: no essays, no photographs, just the facts.
And that’s to say nothing of his end game. Why was he compiling this vast trove of information if no one could access it, let alone know about it?
“It became a question of how to leverage this database, almost ‘Moneyball’-style, to provide collectors with unique insights that they otherwise couldn’t get,” he said. “Where it becomes interesting is if you combine the catalogue raisonné data with auction data. Then you can start to make some estimates about when something’s selling for too much or too little.”
In other words, he would turn his database into a Zillow for the art world — an inventory of known works, yes, but also one that harnesses the power of technology to predict auction values.
Bailey’s solution came in the form of the Artnome blog, where he and a handful of collaborators have been using statistical analysis and machine learning to create insightful analyses with the database.
In one early post, Bailey ran a statistical analysis comparing the lifetime output of Pollock against Van Gogh. Among his findings: Although the Dutch post-impressionist created more than twice as many paintings as the abstract expressionist, Pollock painted significantly more surface area (506 square yards vs. Van Gogh’s 322 square yards).
Using available auction data, Bailey went a step further. He estimated that although a Van Gogh is on average roughly twice as expensive as a Pollock, one square centimeter of a Van Gogh is nearly five times more expensive than a Pollock ($2,500 vs. $500).
“I don’t consider myself, like, the world’s most brilliant data scientist,” said Bailey. “But you don’t have to be when you’re skiing in virgin snow. I’m the only one with this data, so everything I come up with is pretty unique and exciting.”
In subsequent analyses, Bailey has sought to predict how many paintings Pollock, who died at 44, might have created if he’d lived longer. (Answer: 740.) He and his Artnome colleagues have used data visualization techniques and weather data to explain why Van Gogh’s palette became more saturated in yellow through the years. (He was painting in southern France.) They’ve sought to quantify the works of Rothko. (“Untitled (Rust, Blacks on Plum),” from 1962, which measures 60-by-57 inches is statistically his most average size work.) And they’ve tried to quantify Mondrian’s march toward abstraction. (”We were/are not sure it is really possible.”)
But it’s the introduction of auction data that Bailey believes may ultimately transform Artnome, whose title is a play on genome, into a business. Working with Kyle Waters, an economic consultant with Charles River Associates, Bailey has sought to harness the power of machine learning to predict the hammer price for works at auction.
In their first effort, they looked at four works of art that were being auctioned at Christie’s in 2018 — an O’Keeffe, an Edward Hopper, an Arthur Dove, and a John Marin — comparing Artnome’s estimates against the auction house’s expert human appraisers.
Waters, who developed the machine learning model, was quick to point out that this is a vanishingly small sample of works. Nevertheless, Bailey said Artnome’s mean error rate outperformed Christie’s appraisers, in part because the Marin canvas sold well above the auction house’s appraisal.
“Can we make this into a startup, and bring technology into the art market to legitimately change the art market?” asked Waters. “That was a first application of my model on a very small training set, but it offered some really surprising and encouraging results.”
In the years since, Bailey has presented his work at Christie’s. He’s traveled as far afield as Bahrain and China to present on Artnome, where lately he’s been writing on issues such as artificial intelligence and the role of blockchain in the art market. He said that although he’s been approached by a number of angel investors, so far nothing’s panned out.
Perhaps that’s one of the reasons Bailey remains modest about the effort, saying the predictive database will be more of a tool for human appraisers than a replacement, enabling them to work at greater scale.
“I don’t see a clear path to getting more accurate than human appraisers in the near future,” he said, adding that auction house appraisers have access to client lists — a critical data point in any sale. “An analysis of the buyers is arguably more important than any data you could derive about the artists.”
Malcolm Gay can be reached at firstname.lastname@example.org. Follow him on Twitter at @malcolmgay.