Today on THCB Spotlights, Matthew speaks with Jeremy Orr, CEO of Medial EarlySign. Medial EarlySign does complex algorithmic detection of elevated risk trajectories for high-burden serious diseases, and the progression towards chronic diseases such as diabetes. Tune in to hear more about this AI/ML company that has been working on their algorithms since before many had even heard about machine learning, what they’ve been doing with Kaiser Permanente and Geisinger, and where they are going next.
Filmed at the HLTH Conference in Las Vegas, October 2019.
Super-resolution* promises to be one of the most impactful medical imaging AI technologies, but only if it is safe.
Last week we saw the FDA approve the first MRI super-resolution product, from the same company that received approval for a similar PET product last year. This news seems as good a reason as any to talk about the safety concerns myself and many other people have with these systems.
Disclaimer: the majority of this piece is about medical super-resolution in general, and not about the SubtleMR system itself. That specific system is addressed directly near the end.
Zoom, enhance
Super-resolution is, quite literally, the “zoom and enhance” CSI meme in the gif at the top of this piece. You give the computer a low quality image and it turns it into a high resolution one. Pretty cool stuff, especially because it actually kind of works.
In medical imaging though, it’s better than cool. You ever wonder why an MRI costs so much and can have long wait times? Well, it is because you can only do one scan every 20-30 minutes (with some scans taking an hour or more). The capital and running costs are only spread across one to two dozen patients per day.
So what if you could get an MRI of the same quality in 5 minutes? Maybe two to five times more scans (the “getting patient ready for the scan” time becomes the bottleneck), meaning less cost and more throughput.
Everyone seems to be amazed by artificial intelligence (AI) and machine learning in healthcare, but Enrico Coiera, Professor of Medical Informatics at Macquarie University, is not impressed — yet. Instead of designing algorithms, he advocates for designing “human-machine systems” that work with the best parts of the health system, the people. An interesting anecdote about how AI can go wrong? Diagnoses of thyroid cancer in South Korea have increased 15 times, but not because of a higher prevalence of the disease…it’s because of more sensitive AI diagnostics that are over-diagnosing people and rendering many with chemo and other treatments they don’t need. So, what should technologists do to ensure that tech doesn’t fail patient outcomes? Enrico gives his best advice for a healthcare industry that’s “in love with technology and can’t often see the simple solution for the sexy tech one.”
Filmed in the HISA Studio at HIC 2019 in Melbourne, Australia, August 2019.
Jessica DaMassa is the host of the WTF Health show & stars in Health in 2 Point 00 with Matthew Holt. Get a glimpse of the future of healthcare by meeting the people who are going to change it. Find more WTF Health interviews here or check out www.wtf.health.
Medical AI testing is unsafe, and that isn’t likely to change anytime soon.
No regulator is seriously considering implementing “pharmaceutical style” clinical trials for AI prior to marketing approval, and evidence strongly suggests that pre-clinical testing of medical AI systems is not enough to ensure that they are safe to use. As discussed in a previous post, factors ranging from the laboratory effect to automation bias can contribute to substantial disconnects between pre-clinical performance of AI systems and downstream medical outcomes. As a result, we urgently need mechanisms to detect and mitigate the dangers that under-tested medical AI systems may pose in the clinic.
In a recent preprint co-authored with Jared Dunnmon from Chris Ré’s group at Stanford, we offer a new explanation for the discrepancy between pre-clinical testing and downstream outcomes: hidden stratification. Before explaining what this means, we want to set the scene by saying that this effect appears to be pervasive, underappreciated, and could lead to serious patient harm even in AI systems that have been approved by regulators.
But there is an upside here as well. Looking at the failures of pre-clinical testing through the lens of hidden stratification may offer us a way to make regulation more effective, without overturning the entire system and without dramatically increasing the compliance burden on developers.
Despite an area under the ROC curve of 1, Cassandra’s
prophesies were never believed. She neither hedged nor relied on retrospective
data – her predictions, such as the Trojan war, were prospectively validated. In
medicine, a new type of Cassandra has emerged –
one who speaks in probabilistic tongue, forked unevenly between the
probability of being right and the possibility of being wrong. One who, by conceding
that she may be categorically wrong, is technically never wrong. We call these
new Minervas “predictions.” The Owl of Minerva flies above its denominator.
Deep learning (DL) promises to transform the prediction
industry from a stepping stone for academic promotion and tenure to something
vaguely useful for clinicians at the patient’s bedside. Economists studying AI believe that AI is revolutionary,
revolutionary like the steam engine and the internet, because it better predicts.
Recently published in Nature, a sophisticated DL algorithm was able to predict acute kidney injury (AKI), continuously, in hospitalized patients by extracting data from their electronic health records (EHRs). The algorithm interrogated nearly million EHRS of patients in Veteran Affairs hospitals. As intriguing as their methodology is, it’s less interesting than their results. For every correct prediction of AKI, there were two false positives. The false alarms would have made Cassandra blush, but they’re not bad for prognostic medicine. The DL- generated ROC curve stands head and shoulders above the diagonal representing randomness.
The researchers used a technique called “ablation analysis.”
I have no idea how that works but it sounds clever. Let me make a humble
prophesy of my own – if unleashed at the bedside the AKI-specific, DL-augmented
Cassandra could unleash havoc of a scale one struggles to comprehend.
Leaving aside that the accuracy of algorithms trained
retrospectively falls in the real world – as doctors know, there’s a difference
between book knowledge and practical knowledge – the major problem is the
effect availability of information has on decision making. Prediction is
fundamentally information. Information changes us.
A huge new CT brain dataset was released the other day, with the goal of training models to detect intracranial haemorrhage. So far, it looks pretty good, although I haven’t dug into it in detail yet (and the devil is often in the detail).
Of course, this lead to cynicism from the usual suspects as well.
And the conversation continued from there, with thoughts ranging from “but since there is a hold out test set, how can you overfit?” to “the proposed solutions are never intended to be applied directly” (the latter from a previous competition winner).
As the discussion progressed, I realised that while we “all know” that competition results are more than a bit dubious in a clinical sense, I’ve never really seen a compelling explanation for why this is so.
Hopefully that is what this post is, an explanation for why competitions are not really about building useful AI systems.
By ROBERT C. MILLER, JR. and MARIELLE S. GROSS, MD, MBE
This piece is part of the series “The Health Data Goldilocks Dilemma: Sharing? Privacy? Both?” which explores whether it’s possible to advance interoperability while maintaining privacy. Check out other pieces in the series here.
The problem with porridge
Today, we regularly hear stories of research teams using artificial intelligence to detect and diagnose diseases earlier with more accuracy and speed than a human would have ever dreamed of. Increasingly, we are called to contribute to these efforts by sharing our data with the teams crafting these algorithms, sometimes by healthcare organizations relying on altruistic motivations. A crop of startups have even appeared to let you monetize your data to that end. But given the sensitivity of your health data, you might be skeptical of this—doubly so when you take into account tech’s privacy track record. We have begun to recognize the flaws in our current privacy-protecting paradigm which relies on thin notions of “notice and consent” that inappropriately places the responsibility data stewardship on individuals who remain extremely limited in their ability to exercise meaningful control over their own data.
Emblematic of a broader trend, the “Health Data Goldilocks Dilemma” series calls attention to the tension and necessary tradeoffs between privacy and the goals of our modern healthcare technology systems. Not sharing our data at all would be “too cold,” but sharing freely would be “too hot.” We have been looking for policies “just right” to strike the balance between protecting individuals’ rights and interests while making it easier to learn from data to advance the rights and interests of society at large.
What if there was a way for you to allow others
to learn from your data without compromising your privacy?
To date, a major strategy for striking this balance has involved the practice of sharing and learning from deidentified data—by virtue of the belief that individuals’ only risks from sharing their data are a direct consequence of that data’s ability to identify them. However, artificial intelligence is rendering genuine deidentification obsolete, and we are increasingly recognizing a problematic lack of accountability to individuals whose deidentified data is being used for learning across various academic and commercial settings. In its present form, deidentification is little more than a sleight of hand to make us feel more comfortable about the unrestricted use of our data without truly protecting our interests. More of a wolf in sheep’s clothing, deidentification is not solving the Goldilocks dilemma.
Tech to the rescue!
Fortunately, there are a handful of exciting new technologies that may let us escape the Goldilocks Dilemma entirely by enabling us to gain the benefits of our collective data without giving up our privacy. This sounds too good to be true, so let me explain the three most revolutionary ones: zero knowledge proofs, federated learning, and blockchain technology.
Leave your bias aside and take a look into the healthcare future with me. No, artificial intelligence, augmented intelligence and machine learning will not replace the radiologist. It will allow clinicians to.
The year is 2035 (plus or minus 5 years), the world is waking up after a few years of economic hardship and maybe even some dreaded stagflation. This is an important accelerant to where we are going, economic hardship, because it will destroy most radiology AI startups that have thrived on quantitative easing polices and excessive liquidity of the last decade creating a bubble in this space. When the bubble pops, few small to midsize AI companies will survive but the ones who remain will consolidate and reap the rewards. This will almost certainly be big tech who can purchase assets/algorithms across a wide breadth of radiology and integrate/standardize them better than anyone. When the burst happens some of the best algorithms for pulmonary embolism, stroke, knee MRI, intracranial hemorrhage etc. etc. will become available to consolidate, on the “cheap”.
Hospitals can now purchase AI equipment that is highly effective both in cost and function, and its only getting better for them. It doesn’t make sense to do so now but soon it will. Consolidation in healthcare has led to greater purchasing power from groups and hospitals. The “roads and bridges” that would be needed to connect such systems are being built and deals will soon be struck with GE, Google, IBM etc., powerhouse hundred-billion-dollar companies, that will provide AI cloud-based services. RadPartners is already starting to provide natural language processing and imaging data to partners; that’s right, you speak into the Dictaphone and it is recorded, synced with the image you dictated, processed with everyone else to find all the commonalities in descriptors to eventually replace you. It is like the transcriptionists ghost of the past has come back to haunt us and no one cried for them. Prices will be competitive, and adoption will be fast, much faster than most believe.
Now we have some patients who arrive for imaging, as outpatients, ER visits, inpatients; it does not matter the premise is the same. Ms. Jones has chest pain, elevated d-dimer, history of Lupus anti-coagulant and left femoral DVT. Likely her chart has already been analyzed by a cloud-based AI (merlonintelligence.com/intelligent-screening/) and the probability of her having a PE is high, this is relayed to the clinician (PA, NP, MD, DO) and the study is ordered. She’s sent for a CT angiogram PE protocol imaging study. This is important to understand because there will be no role for the radiologist at this level. The recommendation for imaging will be a machine learning algorithm based off more data and papers than any one radiologist could ever read; and it will be instantaneous and fluid. Correct studies will be recommended and “incorrectly” ordered studies will need justifications without radiologist validation.
The year is 2019 and Imaging By Machines have fulfilled their prophesy and control all Radiology Departments, making their organic predecessors obsolete.
One such lost soul tries to decide how he might reprovision the diagnostic equipment he has set up on his narrow boat on the Manchester Ship Canal, musing at the extent of the digital take over during his supper (cod of course).
What I seek to do in this short paper is not to revisit the well-trodden road of what Artificial Intelligence, deep learning, machine learning or natural language processing might be, the data-science that underpins them nor limit myself to what specific products or algorithms are currently available or pending. Instead I look to share my views on what and where in the patient journey I perceive there may be uses for “AI” in the pathway.
I’ve been talking in recent posts about how our typical methods of testing AI systems are inadequate and potentially unsafe. In particular, I’ve complainedthat all of the headline-grabbing papers so far only do controlled experiments, so we don’t how the AI systems will perform on real patients.
Today I am going to highlight a piece of work that has not received much attention, but actually went “all the way” and tested an AI system in clinical practice, assessing clinical outcomes. They did an actual clinical trial!
Big news … so why haven’t you heard about it?
The Great Wall of the West
Tragically, this paper has been mostly ignored. 89 tweets*, which when you compare it to many other papers with hundreds or thousands of tweets and news articles is pretty sad. There is an obvious reason why though; the article I will be talking about today comes from China (there are a few US co-authors too, not sure what the relative contributions were, but the study was performed in China).
China is interesting. They appear to be rapidly becoming the world leader in applied AI, including in medicine, but we rarely hear anything about what is happening there in the media. When I go to conferences and talk to people working in China, they always tell me about numerous companies applying mature AI products to patients, but in the media we mostly see headline grabbing news stories about Western research projects that are still years away from clinical practice.
This shouldn’t be unexpected. Western journalists have very little access to China**, and Chinese medical AI companies have no need to solicit Western media coverage. They already have access to a large market, expertise, data, funding, and strong support both from medical governance and from the government more broadly. They don’t need us. But for us in the West, this means that our view of medical AI is narrow, like a frog looking at the sky from the bottom of a well^.