Next Score View the next score


    Watson is smart, but cancer is still smarter

    “Jeopardy!” champions Ken Jennings (left) and Brad Rutter look on as Watson beats them to the buzzer in 2011.
    Eros Dervishi for STAT
    “Jeopardy!” champions Ken Jennings (left) and Brad Rutter look on as Watson beats them to the buzzer in 2011.

    It was an audacious undertaking, even for one of the most storied American companies: With a single machine, IBM would tackle humanity’s most vexing diseases and revolutionize medicine.

    Breathlessly promoting its signature brand — Watson — IBM sought to capture the world’s imagination, and it quickly zeroed in on a high-profile target: cancer.

    But three years after IBM began selling Watson to recommend the best cancer treatments to doctors around the world, a STAT investigation has found that the supercomputer isn’t living up to the lofty expectations IBM created for it. It is still struggling with the basic step of learning about different forms of cancer. Only a few dozen hospitals have adopted the system, which is a long way from IBM’s goal of establishing dominance in a multibillion-dollar market. And at foreign hospitals, physicians complained its advice is biased toward American patients and methods of care.


    STAT examined Watson for Oncology’s use, marketing, and performance in hospitals across the world, from South Korea to Slovakia to South Florida. Reporters interviewed dozens of doctors, IBM executives, artificial intelligence experts, and others familiar with the system’s underlying technology and rollout.

    Get Talking Points in your inbox:
    An afternoon recap of the day’s most important business news, delivered weekdays.
    Thank you for signing up! Sign up for more newsletters here

    The interviews suggest that IBM, in its rush to bolster flagging revenue, unleashed a product without fully assessing the challenges of deploying it in hospitals globally. While it has emphatically marketed Watson for cancer care, IBM hasn’t published any scientific papers demonstrating how the technology affects physicians and patients. As a result, its flaws are getting exposed on the front lines of care by doctors and researchers who say that the system, while promising in some respects, remains undeveloped.

    “Watson for Oncology is in their toddler stage, and we have to wait and actively engage, hopefully to help them grow healthy,” said Dr. Taewoo Kang, a South Korean cancer specialist who has used the product.

    At its heart, Watson for Oncology uses the cloud-based supercomputer to digest massive amounts of data — from doctor’s notes to medical studies to clinical guidelines. But its treatment recommendations are not based on its own insights from these data. Instead, they are based exclusively on training by human overseers, who laboriously feed Watson information about how patients with specific characteristics should be treated.

    IBM executives acknowledged that Watson for Oncology, which has been in development for nearly six years, is in its infancy. But they said it is improving rapidly, noting that by year’s end, the system will offer guidance about treatment for 12 cancers that account for 80 percent of the world’s cases. They said it’s saving doctors time and ensuring that patients get top-quality care.


    “We’re seeing stories come in where patients are saying, ‘It gave me peace of mind,’ ” said Deborah DiSanzo, general manager of Cambridge-based Watson Health. “That makes us feel extraordinarily good that what we’re doing is going to make a difference for patients and their physicians.”

    But contrary to IBM’s depiction of Watson as a digital prodigy, the supercomputer’s abilities are limited.

    Perhaps the most stunning overreach is in the company’s claim that Watson for Oncology, through artificial intelligence, can sift through reams of data to generate new insights and identify, as an IBM sales rep put it, “even new approaches” to cancer care. STAT found that the system doesn’t create new knowledge and is artificially intelligent only in the most rudimentary sense of the term.

    While Watson became a household name by winning the TV game show “Jeopardy!”, its programming is akin to a different game-playing machine: the Mechanical Turk, a chess-playing robot of the 1700s, which dazzled audiences but hid a secret: a human operator shielded inside.

    In the case of Watson for Oncology, those human operators are a couple dozen physicians at a single, though highly respected, US hospital: Memorial Sloan Kettering Cancer Center in New York. Doctors there are empowered to input their own recommendations into Watson, even when the evidence supporting those recommendations is thin.


    The actual capabilities of Watson for Oncology are not well understood by the public, and even by some of the hospitals that use it. It’s taken nearly six years of painstaking work by data engineers and doctors to train Watson in just seven types of cancer, and keep the system updated with the latest knowledge.

    “It’s been a struggle to update, I’ll be honest,” said Dr. Mark Kris, Memorial Sloan Kettering’s lead Watson trainer. He noted that treatment guidelines for every metastatic lung cancer patient worldwide recently changed in the course of one week after a research presentation at a cancer conference. “Changing the system of cognitive computing doesn’t turn around on a dime like that,” he said. “You have to put in the literature, you have to put in cases.”

    Watson grew out of an effort to transform IBM from an old-guard hardware company to one that operates in the cloud and along the cutting edge of artificial intelligence. Despite its use in an array of industries — from banking to manufacturing — it has failed to end a streak of 21 consecutive quarters of declining revenue at IBM. In the most recent quarter, revenue even slid from the same period last year in IBM’s cognitive solutions division — which is built around Watson and is supposed to be the future of its business.

    In response to STAT’s questions, IBM said Watson, in health care and otherwise, remains on an upward trajectory and “is already an important part” of its $20 billion analytics business. Health care is a crucial part of the Watson enterprise. IBM employs 7,000 people in its Watson health division and sees the industry as a $200 billion market over the next several years. Only financial services, at $300 billion, is considered a bigger opportunity by the company.

    At stake in the supercomputer’s performance is not just the fortunes of a famed global company. In the world of medicine, Watson is also something of a digital canary — the most visible attempt to use artificial intelligence to identify the best ways to prevent and treat disease. The system’s larger goal, IBM executives say, is to democratize medical knowledge so that every patient, no matter the person’s geography or income level, will be able to access the best care.

    But in cancer treatment, the pursuit of that utopian ideal has faltered.

    STAT’s investigation focused on Watson for Oncology because that product is the furthest along in clinical care, though Watson sells separate packages to analyze genomic information and match patients to clinical trials. It’s also applying Watson to other tasks, including honing preventive medicine practices and reading medical images.

    Doctors’ reliance on Watson for Oncology varies among hospitals. While institutions with fewer specialists lean more heavily on its recommendations, others relegate the system to a background role, like a paralegal whose main skill is researching existing knowledge.

    Hospitals pay a per-patient fee for Watson for Oncology and other products enabled by the supercomputer. The amount depends on the number of products a hospital buys, and ranges between $200 and $1,000 per patient, according to DiSanzo. The system sometimes comes with consulting costs and is expensive to link with electronic medical records. At hospitals that don’t link it with their medical records, more time must be spent typing in patient information.

    At Jupiter Medical Center in Florida, that task falls to nurse Jean Thompson, who spends about 90 minutes a week feeding data into the machine. Once she has completed that work, she clicks the “Ask Watson” button to get the supercomputer’s advice for treating patients.

    On a recent morning, the results for a 73-year-old lung cancer patient were underwhelming: Watson recommended a chemotherapy regimen the oncologists had already flagged.

    “It’s fine,” Dr. Sujal Shah, a medical oncologist, said of Watson’s treatment suggestion while discussing the case with colleagues.

    He said later that the background information Watson provided, including medical journal articles, was helpful, giving him more confidence that using a specific chemotherapy was a sound idea. But the system did not directly help him make that decision, nor did it tell him anything he didn’t already know.

    Jupiter is one of two US hospitals that have adopted Watson for Oncology. The system has generated more business in India and Southeast Asia. Many doctors in those countries said Watson is saving time and helping more patients get quality care. But they also said its accuracy and overall value is limited by differing medical practices and economic circumstances.

    Despite IBM’s marketing blitz, with years of high-profile Watson commercials featuring celebrities from Serena Williams to Bob Dylan to Jon Hamm, the company’s executives are not always gushing. In interviews with STAT, they acknowledged the system faces challenges and needs better integration with electronic medical records and more data on real patients to find patterns and suggest cutting-edge treatments.

    “The goal as Watson gets smarter is for it to make some of those recommendations in a more automated way, to sort of suggest now may be the time and let us flip the switch” when a promising treatment option emerges, said Dr. Andrew Norden, a former IBM deputy health chief who left the company in early August. “As I describe it, you’re probably getting a sense it’s really hard and nuanced.”

    Such nuance is absent from the careful narrative IBM has constructed to sell Watson.

    .   .   .

    It is by design that there is not one independent, third-party study that examines whether Watson for Oncology can deliver. IBM has not exposed the product to critical review by outside scientists or conducted clinical trials to assess its effectiveness.

    While it’s not unheard of for companies to avoid external vetting early on, IBM’s circumstances are unusual because Watson for Oncology is not in development — it has already been deployed around the world.

    Yoon Sup Choi, a South Korean venture capitalist and researcher who wrote a book about artificial intelligence in health care, said IBM isn’t required by regulatory agencies to do a clinical trial in South Korea or America before selling the system to hospitals. And given that hospitals are already using the system, a clinical trial would be unlikely to improve business prospects.

    “It’s too risky, right?” Choi said. “If the result of the clinical trial is not very good — [if] there’s a marginal clinical benefit from Watson — it’s really bad news to the whole IBM.”

    Pilar Ossorio, a professor of law and bioethics at University of Wisconsin Law School, said Watson should be subject to tighter regulation because of its role in treating patients. “As an ethical matter, and as a scientific matter, you should have to prove that there’s safety and efficacy before you can just go do this,” she said.

    Norden dismissed the suggestion IBM should have been required to conduct a clinical trial before commercializing Watson, noting that many practices in medicine are widely accepted even though they aren’t supported by a randomized controlled trial.

    “Has there ever been a randomized trial of parachutes for paratroopers?” Norden asked. “And the answer is, of course not, because there is a very strong intuitive value proposition. . . . So I believe that bringing the best information to bear on medical decision making is a no-brainer.”

    So far, the only studies about Watson for Oncology are conference abstracts. The full results haven’t been published in peer-reviewed journals — and every study, save one, was either conducted by a paying customer or included IBM staff on the author list, or both. Most trumpet positive results, showing that Watson saves doctors time and has a high concordance rate with their treatment recommendations.

    The “concordance” studies comprise the vast majority of the public research on Watson for Oncology. Doctors will ask Watson for its advice for treating a slew of patients, and then compare its recommendations to those of oncologists. In an unpublished study from Denmark, the rate of agreement was about 33 percent — so the hospital decided not to buy the system. In other countries, the rate can be as high as 96 percent for some cancers. But showing that Watson agrees with the doctors proves only that it is competent in applying existing methods of care, not that it can improve them.

    IBM executives said they are pursuing studies to examine the impact on doctors and patients, although none has been completed to date.

    To date, more than 50 hospitals on five continents have agreements with IBM, or intermediary technology companies, to use Watson for Oncology to treat patients.

    But the partnership with Memorial Sloan Kettering, and the product that grew out of it, resulted in complications that IBM has papered over with carefully parsed statements and misleading marketing.

    .   .   .

    In its press releases, IBM celebrates Memorial Sloan Kettering’s role as the only trainer of Watson. After all, who better to educate the system than doctors at one of the world’s most renowned cancer hospitals?

    But several doctors said Memorial Sloan Kettering’s training injects bias into the system, because the treatment recommendations it puts into Watson don’t always comport with the practices of doctors elsewhere in the world.

    Given the same clinical scenario, doctors can — and often do — disagree about the best course of action, whether to recommend surgery or chemotherapy, or another treatment. Those discrepancies are especially wide for second- and third-line treatments given after an initial therapy fails, where evidence of benefits is slimmer and consensus more elusive.

    Rather than acknowledge this dilemma, IBM executives, in marketing materials and interviews, have sought to downplay it. In an interview with STAT, DiSanzo, the head of Watson Health, rejected the idea that Memorial Sloan Kettering’s involvement creates any bias at all.

    “The bias is taken out by the sheer amount of data we have,” she said, referring to patient cases and millions of articles and studies fed into Watson.

    But that mischaracterizes how Watson for Oncology works. (IBM later asserted that DiSanzo was referring to Watson in general.)

    The system is essentially Memorial Sloan Kettering in a portable box. Its treatment recommendations are based entirely on the training provided by doctors, who determine what information Watson needs to devise its guidance as well as what those recommendations should be.

    When users ask Watson for advice, the system also searches published literature — some of which is curated by Memorial Sloan Kettering — to provide relevant studies and background information to support its recommendation. But the recommendation itself is derived from the training provided by the hospital’s doctors, not the outside literature.

    Doctors at Memorial Sloan Kettering acknowledged their influence on Watson. “We are not at all hesitant about inserting our bias, because I think our bias is based on the next best thing to prospective randomized trials, which is having a vast amount of experience,” said Dr. Andrew Seidman, one of the hospital’s lead trainers of Watson. “So it’s a very unapologetic bias.”

    Seidman said the hospital is careful to keep its training grounded in clinical evidence when the evidence exists, but it is not shy about giving its recommendations when it doesn’t. “We want cancer care to be democratized,” he said.  “We don’t want doctors who don’t have the thousands and thousands of patients’ experience on a more rare cancer to be handicapped. We want to share that knowledge base.”

    At a recent training session of Watson on Manhattan’s Upper East Side, the tensions involved in programming the system were on full display. STAT sat in as Memorial Sloan Kettering doctors, led by Seidman, gathered with IBM engineers to train Watson to treat bladder cancer. Five IBM engineers sat on one side of the table. Across from them were three oncologists — one specializing in surgery, another in radiation, and a third in chemotherapy and targeted medicines.

    Several minutes into the discussion, the question arose of which treatment to recommend for patients whose cancers persisted through six rounds of chemotherapy. The options in such cases tend to be as slim as the evidence supporting them. Should Watson recommend a radical surgery to remove the bladder? Dr. Tim Donahue, the surgical oncologist, noted that such surgery seldom cures patients and is not associated with improved survival in his experience.

    Then what about another course of chemotherapy combined with radiation?

    When Watson gives its recommendations, it puts the top recommendation in green, alternative options in orange, and not recommended options in red.

    But in some clinical scenarios, it’s difficult to tell the colors apart.

    “This is the hard part of this whole game,” Dr. Marisa Kollmeier, the radiation oncologist, said during the training. “There’s a lack of evidence. And you don’t know if something should be in green without evidence. We don’t have a randomized trial to support every decision.”

    But the task in front of them required the doctors to press ahead. And they did, rifling through an array of clinical scenarios. In some cases, a large body of evidence backed up their answers. But many others fell into a gray area or were clouded by the inevitable uncertainty of patient preferences.

    The meeting was one of many in a months-long process to bring Watson up to speed in bladder cancer. Subsequent sessions would involve feeding it data on real patient cases at Memorial Sloan Kettering, so doctors could reinforce Watson’s training with repetition.

    That training does not teach Watson to base its recommendations on the outcomes of these patients, whether they lived, or died or survived longer than similar patients. Rather, Watson makes its recommendations based on the treatment preferences of Memorial Sloan Kettering physicians.

    Kris, the lead trainer at Memorial Sloan Kettering, said Watson for Oncology has the potential to improve care and ensure more patients get expert treatment. But like a medical student, Watson is just learning to perform in the real world.

    “Nobody wants to hear this,” Kris said. “All they want to hear is that Watson is the answer. And it always has the right answer, and you get it right away, and it will be cheaper. But like anything else, it’s kind of human.”

    Casey Ross can be reached at Ike Swetlitz can be reached at