Using "big data" to dissect clinical heterogeneity.

Experienced practitioners recognize that patients with a shared diagnosis often fall into subsets that “look” the same and often respond to similar treatment strategies. Indeed, the source of clinical wisdom within many specialties is a nuanced knowledge of these subsets; they typically are not described in standard texts and may be recognized only after years of delivering care. The best clinical teachers convey these distinctions to their trainees, thus giving the trainees knowledge that might otherwise be acquired only after observing hundreds or thousands of clinical cases. Article see p 269 These disease subsets often turn out to be physiologically distinct processes that simply share a common clinical end point. Failure to recognize these subsets can confound treatment decisions and clinical research studies in which treatments are applied to heterogeneous cohorts and signals of efficacy are reduced or lost. Indeed, awareness of the heterogeneity of many common and complex diseases led the Institute of Medicine to call for a re-examination of the way we define diseases, taking advantage of our ability to measure disease at the molecular, cellular, tissue, and whole-organism levels.1 One of the promises of “big data in medicine” is to accelerate our ability to recognize disease heterogeneity and to create new distinctions using large numbers of measurements on large populations of patients. The opportunities for big data have exploded with improvements in our ability to measure biomarkers (clinical laboratory, genetic, proteomic, metabolomic measurements), to collect images, to instrument patients with personal devices, and to store all these data in electronic formats. The availability of these data suggests that we may be able to dissect clinical heterogeneity on the basis of not just clinical observations on our own patients but also aggregated measurements in and observations about large populations of patients. It is in this context that …