THE AMOUNT OF DATA BEING DIGITALLY COLLECTED AND stored is vast and expanding rapidly. As a result, the science of data management and analysis is also advancing to enable organizations to convert this vast resource into information and knowledge that helps them achieve their objectives. Computer scientists have invented the term big data to describe this evolving technology. Big data has been successfully used in astronomy (eg, the Sloan Digital Sky Survey of telescopic information), retail sales (eg, Walmart’s expansive number of transactions), search engines (eg, Google’s customization of individual searches based on previous web data), and politics (eg, a campaign’s focus of political advertisements on people most likely to support their candidate based on web searches). In this Viewpoint, we discuss the application of big data to health care, using an economic framework to highlight the opportunities it will offer and the roadblocks to implementation. We suggest that leveraging the collection of patient and practitioner data could be an important way to improve quality and efficiency of health care delivery. Widespread uptake of electronic health records (EHRs) has generated massive data sets. A survey by the American Hospital Association showed that adoption of EHRs has doubled from 2009 to 2011, partly a result of funding provided by the Health Information Technology for Economic and Clinical Health Act of 2009. Most EHRs now contain quantitative data (eg, laboratory values), qualitative data (eg, text-based documents and demographics), and transactional data (eg, a record of medication delivery). However, much of this rich data set is currently perceived as a byproduct of health care delivery, rather than a central asset to improve its efficiency. The transition of data from refuse to riches has been key in the big data revolution of other industries. Advances in analytic techniques in the computer sciences, especially in machine learning, have been a major catalyst for dealing with these large information sets. These analytic techniques are in contrast to traditional statistical methods (derived from the social and physical sciences), which are largely not useful for analysis of unstructured data such as text-based documents that do not fit into relational tables. One estimate suggests that 80% of business-related data exist in an unstructured format. The same could probably be said for health care data, a large proportion of which is text-based. In contrast to most consumer service industries, medicine adopted a practice of generating evidence from experimental (randomized trials) and quasi-experimental studies to inform patients and clinicians. The evidence-based movement is founded on the belief that scientific inquiry is superior to expert opinion and testimonials. In this way, medicine was ahead of many other industries in terms of recognizing the value of data and information guiding rational decision making. However, health care has lagged in uptake of newer techniques to leverage the rich information contained in EHRs. There are 4 ways big data may advance the economic mission of health care delivery by improving quality and efficiency. First, big data may greatly expand the capacity to generate new knowledge. The cost of answering many clinical questions prospectively, and even retrospectively, by collecting structured data is prohibitive. Analyzing the unstructured data contained within EHRs using computational techniques (eg, natural language processing to extract medical concepts from free-text documents) permits finer data acquisition in an automated fashion. For instance, automated identification within EHRs using natural language processing was superior in detecting postoperative complications compared with patient safety indicators based on discharge coding. Big data offers the potential to create an observational evidence base for clinical questions that would otherwise not be possible and may be especially helpful with issues of generalizability. The latter issue limits the application of conclusions derived from randomized trials performed on a narrow spectrum of participants to patients who exhibit very different characteristics. Second, big data may help with knowledge dissemination. Most physicians struggle to stay current with the latest evidence guiding clinical practice. The digitization of medical literature has greatly improved access; however, the sheer
[1]
Charles Safran,et al.
Toward a national framework for the secondary use of health data: an American Medical Informatics Association White Paper.
,
2007,
Journal of the American Medical Informatics Association : JAMIA.
[2]
Isaac S Kohane,et al.
Longitudinal histories as predictors of future diagnoses of domestic abuse: modelling study
,
2009,
BMJ : British Medical Journal.
[3]
Steven H. Brown,et al.
Automated identification of postoperative complications within an electronic medical record using natural language processing.
,
2011,
JAMA.
[4]
Jeffrey L. Brown.
A piece of my mind: The unasked question.
,
2012,
JAMA.
[5]
S. Brunak,et al.
Mining electronic health records: towards better research applications and clinical care
,
2012,
Nature Reviews Genetics.