Curating Care: Machine Learning's Deep Dive into Big Data

By Scott Behm, Duke Surgery

June 28, 2019

Consider the sheer volume of data your brain processes in making one decision. Something as simple as your choice of breakfast is the culmination of data analysis—decades of zeroing in on personal taste, allergic reactions, nutrition, health considerations, and other factors. The use of data to make decisions, then, is not profound—it is an innate aspect of the human condition. In the surgical field, historically this data processing took place through the lens of personal experience. Consider why a senior surgeon is typically more trusted than an intern. More time in practice constructs a robust dataset to make sound surgical decisions.

For today’s surgeons, however, the dataset has expanded exponentially. To start, electronic health records (EHRs) create massive datasets—a resource often untapped due to the lack of processing power needed to leverage terabytes of data in a way that human brains can comprehend and make practical. As the capabilities of machine learning expand, big data from EHR and other sources has the potential to inform decisions in monumental ways, creating a detailed roadmap to improve quality and efficiency of care and to minimize potential complications.

Curating Existing Data

For a system as large as Duke Health, curating EHR datasets to creating meaningful, useful data can take years. Kristin Corey, a medical student at Duke and scholar at the Duke Institute for Health Innovation (DIHI), has spent 2 years working as part of the DIHI team to build PYTHIA, a data pipeline that has wide application potential across Duke Health and beyond. As part of the Perioperative Risk Optimization with Machine Learning for an Improved Surgical Experience (PROMISE) program, Ms. Corey’s original focus was predicting outcomes for geriatric patients

“Because DIHI is an innovations group, it is really creative,” Ms. Corey says. “They gave us the free range and flexibility to think big. When we got into the data, we started asking questions. What if we did this for all surgical patients, and not just geriatric patients?”

Ms. Corey and her fellow students Sehj Kashyap and Elizabeth Lorenzi spent their research year with DIHI mining and curating EHR data to create an initial repository of 99,755 procedures from 66,370 patients. When the project received attention at the Machine Learning in Healthcare Conference, the group decided to commit to another year of research with Allan D. Kirk, MD, PhD, David C. Sabiston Jr. Professor of Surgery and Chair of Duke Surgery, as the clinical primary investigator.

“The DIHI team has built out our EHR data pipeline, now covering 550,000 procedures and spanning 4 years,” Ms. Corey says. “Patient features include all inpatient and outpatient encounters, med administrations, vitals, diagnosis codes, demographics, orders, labs, and more.”

As the repository expands, so does its variety of application, far beyond the initial models used to predict postoperative complications. Currently, the team is building a model to predict 30-day readmission rates. Over time, training computer models with data increases their usefulness in making decisions and predicting outcomes. The broadest application of the repository could come from generalizability. The team is currently testing the success of models trained with Duke University Hospital data at other centers, such as Duke Regional Hospital, Duke Raleigh Hospital, and in the future, those outside the Duke University Health System.

Creating New Data

EHR data capture several important snapshots of the patient experience, but they are far from complete and mostly administrative. Intentional and systematic data collection is needed to create a fuller picture. Collaborations between the Department of Surgery and experts in data science at Duke are foundational to this approach.

A valuable partner in this initiative is Erich Huang, MD, PhD, Assistant Professor of Surgery and in Biostatistics and Bioinformatics, as well as Co-Director of Forge, Duke’s Center for Actionable Health Data Science. Dr. Huang is the founding advisor of KelaHealth, a company that originated from Duke Surgery around a machine-learning platform that uses algorithms from data points to reduce surgical complications and objectively inform decision-making.

“When a human makes a judgment, do they have formal confidence intervals, or is it subjective?” Dr. Huang asks. “The main reason we want to use machine learning is that it will help us measure our performance. If we can objectively measure, then we can also improve.”

Dr. Huang is a collaborator on the Department of Surgery's 1000 Patient Project, which collects biosamples from consenting patients before, during, and after surgery. This data collection is unique, and therefore requires something the EHR cannot provide.

“We are using surgery as a perturbation event to collect information from the patients before and after surgery,” Dr. Huang says. “We need a lot of structured information about the patients, and it doesn’t live natively in the EHR. We are collecting microbiome data and next-generation sequencing data. We need to have a separate data system to store
that information.”

The genomic information collected in the 1000 Patient Project can be used to help answer several questions, including who should have surgery and how they will respond to it. Lawrence Carin, PhD, Professor of Electrical and Computer Engineering at Duke, has worked with Duke Surgery in its data collection and analysis.

“Artificial intelligence (AI), machine learning, and data science affects the entire process, from beginning to end of the surgical process,” Dr. Carin says. “All sources of data that we have—from radiology, pathology, genomics, the clinical record itself—how can we pull in all of this information to better understand who should go into surgery and then understand how the body will go through the healing process after?”

The AI created by machine learning and data training has the potential to work collaboratively with "natural intelligence"—that of the clinicians and surgeons making decisions. This collaboration of intelligences combine the subjective and well-rounded experience of the clinician, with the objective and well-defined trends of data sets.

Collaborating on Real-Time Data

Real-time use of machine learning is a powerful tool in the Division of Emergency Medicine, where data is used to analyze trends from the often unpredictable ebb and flow of the ER. Assistant Professors of Surgery Neel Kapadia, MD, and Brent Jason Theiling, MD, work with a team to analyze incoming data, streamline processes, and improve care.

“Our goal is that we want to see and take care of every patient that presents to the hospital for care, and we want to do so efficiently and to make sure that the care is high-quality,” Dr. Theiling says. “Once we started with that core, it is just a matter of breaking it down. What are the patient demographics and other data that can inform operational decisions and impact for the better the quality and efficiency of the care that is provided?”

“When we originally looked at the data, it was more an executive summary,” Dr. Kapadia says. “Performance services is continually updating it. It has since evolved to our utilization per hour of the day, and now subsequently we can get into the data patient by patient, how long they stayed, why did they stay that long. We can break it down by bed utilization, and all these individual pieces can be
put together.”

Drs. Theiling and Kapadia add that because the emergency division collaborates with many outside entities, including labs, radiology, consultants, and transport, real-time data allows the division to objectively pinpoint weaknesses and direct improvement strategies accordingly. Machine learning can be used to expedite care and stratify a variety of risks. The Sepsis Watch tool uses historical and real-time data, including medical history, medications, vitals, and lab and radiology results, to predict sepsis probability.

“What is interesting is not only does the model generate a single probability, but it generates multiple iterations, adjusting as time and new data goes along,” Dr. Theiling says. “In fact, if you look at the time when the probability jumps, say from 15% to 90%, in many patients it is not tied to any discrete data point. This just shows the power of a machine learning model and how it differs from a purely algorithmic program.”

This tool is being piloted in the Emergency Medicine division, and, if successful, will roll out across the hospital.

Data as a Tool, Not a Takeover

Machine learning possibilities are vast, though not limitless. Deeper analysis of data, analyzing algorithms, and training predictive models will offer insight, but are not a substitute for the knowledge and intuition of a master clinician. Pairing the two intelligences—natural and artificial—offers both subjective and objective means of improving care for all patients.

Curating Care: Machine Learning's Deep Dive into Big Data

Curating Existing Data

Creating New Data

Collaborating on Real-Time Data

Data as a Tool, Not a Takeover

Dr. Andrew Godfrey and Dr. Anjni Joiner: Opioid Addiction Treatment Delivered on the Front Lines

Dr. Lauren Siewny: How To Guard Against Heat-Related Illnesses During a Heat Wave

Young and at Risk for Colorectal Cancer

What to Do if You Get Bitten by a Snake in North Carolina

Duke EM in the News: How Do You Treat a Copperhead Snake Bite?