The Ongoing Data Crisis and Why We Mostly Have Ourselves to Blame

Sep 26, 2021

Data is the backbone of the practice of medicine.

Without high-quality data from rigorous clinical trials, the practice of medicine can start to resemble throwing darts (or another 4-letter word) at the wall. We rely on ‘evidence-based medicine’ to guide treatment decisions for patients.

COVID-19 is no different. We look to the ‘evidence’ and the ‘data’ to drive decisions. But what happens when no one can agree on what ‘good’ data look like, or worse, when good data just doesn’t exist?

This is the situation we find ourselves in with COVID-19.

In the U.S., we have no centralized, coordinated, national healthcare database with seamless interoperability. This has been the case for years, but up until now, it’s only been something talked about (more like complained about) at healthcare conferences among those of us who have been feeling this pain point for years.

But now, it’s affecting all of us.

It turns out that during a nationwide pandemic where everyone is at risk, people want to know what’s safe and what’s unsafe. They want to be reassured that they’re making the ‘right’ decision for themselves and their family. They want data-driven answers that give clarity to questions such as “Should I get a booster and when” to “What’s the efficacy of this vaccine versus another?” or “How long do I wait to be around my kids if I had a direct exposure?”

The only way to answer these questions is with data. The larger the dataset, and more rigorous the study design, the more confidence can be drawn from the data’s conclusion. When there’s no high-quality data to answer these questions, we’re left to take an educated guess based on intuition and hope we’re giving good advice.

How did we get here?

Up until ~25 years ago, almost all healthcare data was collected and stored via paper and pen. Recognizing the limitations of this modality, the era of Electronic Health Records (EHRs) was born. This was a seminal moment in medicine.

In theory, this should have enabled mass data collection at scale to answer important questions about COVID-19 and a host of any other conditions. Unfortunately, we missed the moment and are paying for that original sin over and over today.

What do I mean?

The best analogy is to think about how we switched from paper mail to email. Now let’s compare and contrast how we set up the infrastructure for email versus how we set up the infrastructure for electronic health records for these seismic transformations.

Email and EHRs are more similar than you think. The common goals are to:

- Record information in a digital format

- Store it for future retrieval

- Share information with recipients as needed

This last point is critically important. The whole point of email is to get a message to another (or multiple) recipients.

EHRs have the same goal. We need to record information, store it so it’s retrievable in the future, and share it with others on the care team, with patients themselves, or any other host of stakeholders including researchers to track outcomes.

Developers understood this back in the early days when creating data standards for email and the internet. They created common data language where everyone who types in www.espn.com is taken to the same landing page regardless of whether you’re using safari, chrome, or internet explorer and whether you access the internet through Charter, Comcast, or AT&T.

Similarly, an email gets to the desired recipient based on a unique email address regardless of whether the email originates @yahoo.com and is sent to @gmail.com. The common data standards enable the sharing of information seamlessly no matter where you choose to buy your internet or email.

Somehow, we missed the memo on the importance of these key principles when implementing EHRs.

Instead of creating common data standards, we let every EHR create their own data standards that did not have to ‘talk’ to other EHRs. Therefore, a potassium value in one EHR does not easily convert or transfer to another EHR even though they also store potassium values too. This would be like if the only way to send an email to a gmail account would be to send it from another gmail account. If you want to send an email to yahoo or another service, we don’t allow that. Instead, create a new account with our email service and start over.

This is the world we created with EHRs. Siloed, fractured, and missing any foundational standards to share data easily.

Dissidents will tell you I’m oversimplifying this and that ‘healthcare is different’ and ‘healthcare is harder’ but if we have the technology to control a house’s thermostat from 3,000 miles away using an app, surely it’s technologically feasible to share a glucose value (essentially a number) across two platforms.

We have been paying for these original sins over and over. It has always been an expensive, inconvenient nuisance. We repeat tests because we can’t get the past results in a timely fashion. We spend hours of time requesting records, faxing records, uploading records, matching records, etc. In fact, I doubt many of you could produce comprehensive health records on yourself and your family if you needed to right this moment. I sure you couldn’t.

And the final cost to all of us is the limitation in rapidly analyzing accurate, nationwide data.

Just think about the way we are delivering vaccines across this country. They are administered at local clinics, CVS, Walgreens, independent pharmacies, hospitals, drive thru public health stations, and Walmart. This is a great thing in the sense that vaccines are widely available and meant to be as convenient to access as possible.

But from a data perspective, we are unable to track when folks are getting their first shot, second shot, booster shot and subsequently do or don’t have a breakthrough infection or hospitalization. Yes, we ask folks to self-report this information when getting tested or showing up to the hospital, but this is not rock solid data. Just think about the fact that our ‘official vaccine record’ is a CDC-labeled notecard filled out on pen and paper. Very 21st century!

When someone shows up to urgent care to get a rapid test that turns positive, are we linking that to the COVID shots (and the brand of vaccine) they got at Walmart 5 months ago to learn from that data point and pool it with others? Does Walmart’s system link to the Urgent Care EHR and subsequently the hospital EHR if that individual has to be hospitalized? Unfortunately, the answer is a resounding no.

This is a national tragedy. There’s a reason almost all the COVID data we’re using to make booster recommendations among others comes from the U.K., Israel, or any other country that isn’t the U.S. It’s because we have no mechanism to collect and analyze reliable, nationwide data in real-time.

When the pandemic first started, I wrote about how we were flying blind in the hospital taking care of patients because we didn’t know much of anything about how to treat COVID-19. In month 2 of the pandemic, this was excusable.

But now, going into month 20 of the pandemic, I’d argue we’re still flying as blind today from a data perspective in this country as we were at the beginning. We’re seeing this play out in changed recommendations and guidelines on what feels like a daily basis. Look no further than this recent tweet from Eric Topol outlining the moving goalposts around booster recommendations over a span of weeks:

When you have so little reliable data to work off of, it’s no wonder few can agree what is the ‘right decision.’

The underpinnings of this data debacle were set in motion decades ago and untangling the complex web of patchwork data systems will take time and be VERY difficult to unravel. And in the meantime, the failure to create a functional healthcare data infrastructure will go down as one of those mistakes we never stop paying for.

Stay safe,

Harry

Harry Saag's COVID-19 updates

The Ongoing Data Crisis and Why We Mostly Have Ourselves to Blame