ICM & CS Seminars on Computational Health: Christopher Chute, “Big Data meets Healthcare”

02/27/2014 @ 12:00 PM – 1:00 PM

Jump to:

Meet The Speaker

“Big Data meets Healthcare: The case for comparability and consistency”

Dr. Chute received his undergraduate and medical training at Brown University, internal medicine residency at Dartmouth, and doctoral training in Epidemiology at Harvard. He is Board Certified in Internal Medicine and Clinical Informatics, and is a Fellow of the American College of Physicians, the American College of Epidemiology, and the American College of Medical Informatics. He became founding Chair of Biomedical Informatics at Mayo in 1988, stepping down after 20 years in that role. He is now Professor of Medical Informatics and Section Head. He is PI on a large portfolio of research including the HHS/Office of the National Coordinator (ONC) SHARP (Strategic Health IT Advanced Research Projects) on Secondary EHR Data Use, the ONC Beacon Community (Co-PI), the LexGrid projects, Mayo’s CTSA Informatics, and several NIH grants including one of the eMERGE centers from NGHRI, which focus upon genome wide association studies against shared phenotypes derived from electronic medical records. Dr. Chute serves as Vice Chair of the Mayo Clinic Data Governance for Health Information Technology Standards, and on Mayo’s enterprise IT Oversight Committee. He is presently Chair, ISO Health Informatics Technical Committee (ISO TC215) and Chairs the World Health Organization (WHO) ICD-11 Revision. He also serves on the HL7 Advisory Board. Recently held positions include service as an index member on the Health Information Technology Standards Committee for the Office of the National Coordinator in the US DHHS and the founding Chair of the Biomedical Computing and Health Informatics study section at NIH.

Seminar Abstract

“Big Data meets Healthcare: The case for comparability and consistency”

The well-known phenomenon of “information explosion” has impacted virtually all areas of human enterprise, and healthcare has become no exception. While one might quibble whether more information is actually being created, there is no disagreement that vastly more information is being electronically captured and stored. Latent within the proliferation of such machine readable archives of information lays previously impractical metrics, capabilities for linkages and association, and ultimately new knowledge. The over-used moniker of “big data” is applied to the rise of vast, potentially-federated data sources, analytic methods for their interpretation, and emergent findings. Despite this non-precision, most observers agree that there is something new and different emergent in the opportunistic mining of disparate data on an unprecedented scale.

Examples of impressive inferences from big data abound in finance, marketing, education, social sciences, and economics. More focused, “big science” opportunities are self-evident in astronomy, physics, and arguably the discovery of the Higgs Boson (which really was inferred from perturbations observed across Exabytes of experimental particle-accelerator data). In biology and medicine the sweet spot has historically been in the human genome, where genotype-phenotype associations emerge from “genome-wide association studies” done at massive scale – more so in the present ere of whole-genome sequencing.

The promise of best-evidence discovery, comparative effectiveness research, new outcomes analyses, adverse event discovery, and improved clinical care in general that might emerge from big-data methods applied to large, federated, clinical data repositories is intriguing. There is “gold in them hills,” and the potential benefits of well-conducted data mining must not be lightly dismissed.

However, caution must dominate an otherwise unfettered analyses of clinical information, as the consequences of skewed, biased, spurious, or otherwise “wrong” answers can have serious adverse impact. While most of us are quite content to have a target answer appear “on the page” of a Google search result, somehow having the right answer “on the list” but not chosen for healthcare interventions may be interpreted as malpractice in some litigious countries – not to mention likely sub-optimal outcomes for a patient. Clinical decision support resources may recommend a spectrum of options to a clinician – who presumably has the responsibility of synthesizing such advice and selecting the optimal path, though few would argue that the amount of information and the complexity of their interactions have long ago exceeded the unaided human capacity for cognition, reliable processing, or well-balanced interpretation.

The more insidious risk of blindly applying big-data methods to large clinical repositories is the underlying heterogeneity of clinical data representation, both syntax and semantics. Syntactically, recognizing for example “heart disease” in a patient’s record may classify that patient into algorithmic risk groups, though if that rubric is nested under an information structure containing family history information, that risk assignment may well be inaccurate. Similarly, if a group of patients are found with “renal cancer” and a separate group are found with “kidney cancer,” no amount of big-data inferencing will reconcile their similarity absent an ontological assertion that these categories are synonymous. The risk of misclassification of clinical data is vast, more so in vast databases managed through conventional big-data methods.

The importance of comparable and consistently represented clinical information, either at entry or through normalization to a canonical form, must remain as a necessary step before big-data methods can be meaningfully or safely applied to clinical data repositories.



JHU - Institute for Computational Medicine