Role of Big Data in Cardiovascular Research

B ig Data resemble people: interrogate them just so, and they will tell you whatever you want to hear. Perhaps you, gentle reader, have noticed that there seems to be an awful lot more data in recent years, but perhaps not a lot more knowledge. Welcome to the world of Big Data. Just what is Big Data, and how is it changing the world of cardiovascular medicine? Big Data may be defined as large sets of data that are given to analytic approaches that may reveal underlying patterns, associations, or trends. Big Data has also been characterized by the 4 Vs of volume (a lot of data), variety (data from different sources and in different forms), velocity (data are accumulated rapidly), and veracity (uncertainty as to whether the data are correct). However, these characteristics do not adequately define Big Data; one might say that if you have seen one set of Big Data you have seen one set of Big Data. It is perhaps more useful to think about Big Data as it relates to sources, repositories, and use (Figure 1, Table). Where do such data come from? How are they stored? How can they be analyzed and visualized? What can we learn from Big Data?

[1]  Eric E. Smith,et al.  2014 ACC/AHA Key Data Elements and Definitions for Cardiovascular Endpoint Events in Clinical Trials: A Report of the American College of Cardiology/American Heart Association Task Force on Clinical Data Standards (Writing Committee to Develop Cardiovascular Endpoints Data Standards). , 2015, Circulation.

[2]  C. Krittanawong,et al.  Artificial Intelligence in Precision Cardiovascular Medicine. , 2017, Journal of the American College of Cardiology.

[3]  Igor Jurisica,et al.  Optimized application of penalized regression methods to diverse genomic data , 2011, Bioinform..

[4]  Kenneth F Schulz,et al.  Multiplicity in randomised trials I: endpoints and treatments , 2005, The Lancet.

[5]  Dorothy Bishop Rein in the four horsemen of irreproducibility , 2019, Nature.

[6]  W. Hammond The making and adoption of health data standards. , 2005, Health affairs.

[7]  M. Brauer,et al.  High-Resolution Air Pollution Mapping with Google Street View Cars: Exploiting Big Data. , 2017, Environmental science & technology.

[8]  Leonard W. D'Avolio,et al.  A point-of-care clinical trial comparing insulin administered using a sliding scale versus a weight-based regimen , 2011, Clinical Trials (London, England).

[9]  Sean M. O'Brien,et al.  Predictors of Long-Term Survival After Coronary Artery Bypass Grafting Surgery: Results From the Society of Thoracic Surgeons Adult Cardiac Surgery Database (The ASCERT Study) , 2012, Circulation.

[10]  J. Wittes,et al.  Analysis and interpretation of treatment effects in subgroups of patients in randomized clinical trials. , 1991, JAMA.

[11]  K. Nyberg,et al.  Swedish guidelines for registry-based randomized clinical trials , 2019, Upsala journal of medical sciences.

[12]  Justin M. Weis,et al.  Copy, Paste, and Cloned Notes in Electronic Health Records. , 2014, Chest.

[13]  Trevor Hastie,et al.  An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.

[14]  Jeffrey G Klann,et al.  Data model harmonization for the All Of Us Research Program: Transforming i2b2 data into the OMOP common data model , 2019, PloS one.

[15]  Georg Langs,et al.  Predicting Semantic Descriptions from Medical Images with Convolutional Neural Networks , 2015, IPMI.

[16]  David A Chambers,et al.  Big Data and Large Sample Size: A Cautionary Note on the Potential for Bias , 2014, Clinical and Translational Science.

[17]  Cardona Alzate,et al.  Predicción y selección de variables con bosques aleatorios en presencia de variables correlacionadas , 2020 .

[18]  K. Schulz,et al.  Multiplicity in randomised trials II: subgroup and interim analyses , 2005, The Lancet.

[19]  A. Go,et al.  Comparative Effectiveness of Multivessel Coronary Bypass Surgery and Multivessel Percutaneous Coronary Intervention , 2013, Annals of Internal Medicine.

[20]  N. Geller,et al.  Hypertrophic Cardiomyopathy Registry: The rationale and design of an international, observational study of hypertrophic cardiomyopathy. , 2015, American heart journal.

[21]  Spiros C. Denaxas,et al.  Big data from electronic health records for early and late translational cardiovascular research: challenges and potential , 2017, European heart journal.

[22]  Randall K. Ten Haken,et al.  Big Data in Designing Clinical Trials: Opportunities and Challenges , 2017, Front. Oncol..

[23]  E. Omerovic,et al.  Outcomes 1 year after thrombus aspiration for myocardial infarction. , 2014, The New England journal of medicine.

[24]  C. McDonald,et al.  LOINC, a universal standard for identifying laboratory observations: a 5-year update. , 2003, Clinical chemistry.

[25]  Li Liang,et al.  Prediction of 30-Day All-Cause Readmissions in Patients Hospitalized for Heart Failure: Comparison of Machine Learning and Other Statistical Approaches , 2017, JAMA cardiology.

[26]  S. Normand,et al.  Comparison of Clinical and Administrative Data Sources for Hospital Coronary Artery Bypass Graft Surgery Report Cards , 2007, Circulation.

[27]  Brian J. McCourt,et al.  A registry-based randomized trial comparing radial and femoral approaches in women undergoing percutaneous coronary intervention: the SAFE-PCI for Women (Study of Access Site for Enhancement of PCI for Women) trial. , 2014, JACC. Cardiovascular interventions.

[28]  R. Berg,et al.  Derivation and Internal Validation of a Mortality Prediction Tool for Initial Survivors of Pediatric In-Hospital Cardiac Arrest* , 2017, Pediatric critical care medicine : a journal of the Society of Critical Care Medicine and the World Federation of Pediatric Intensive and Critical Care Societies.

[29]  Christopher B. Granger,et al.  Registry-based randomized clinical trials—a new clinical trial paradigm , 2015, Nature Reviews Cardiology.

[30]  Subhashini Venugopalan,et al.  Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs. , 2016, JAMA.

[31]  Raj M. Ratwani,et al.  Electronic health record usability: analysis of the user-centered design processes of eleven electronic health record vendors , 2015, J. Am. Medical Informatics Assoc..

[32]  W. Weintraub,et al.  Challenges of Assessing Therapeutic or Diagnostic Outcomes with Observational Data. , 2017, The American journal of medicine.

[33]  Ben Glocker,et al.  A Standardised Approach for Preparing Imaging Data for Machine Learning Tasks in Radiology , 2019, Artificial Intelligence in Medical Imaging.

[34]  Stuart J. Nelson,et al.  Normalized names for clinical drugs: RxNorm at 6 years , 2011, J. Am. Medical Informatics Assoc..

[35]  Philip E. Bourne,et al.  Natural language processing of symptoms documented in free-text narratives of electronic health records: a systematic review , 2019, J. Am. Medical Informatics Assoc..

[36]  H V Anderson,et al.  The American College of Cardiology-National Cardiovascular Data Registry™ (ACC-NCDR™): building a national clinical data repository , 2001 .

[37]  Cécile Viboud,et al.  Infectious Disease Surveillance in the Big Data Era: Towards Faster and Locally Relevant Systems. , 2016, The Journal of infectious diseases.

[38]  M. Desai,et al.  Rationale and design of a large-scale, app-based study to identify cardiac arrhythmias using a smartwatch: The Apple Heart Study , 2018, American heart journal.

[39]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[40]  V. Prasad,et al.  Improving observational studies in the era of big data , 2018, The Lancet.

[41]  Robert Gibbons,et al.  Using Electronic Health Record Data to Develop and Validate a Prediction Model for Adverse Outcomes in the Wards* , 2012, Critical care medicine.

[42]  Raj Chetty,et al.  The Opportunity Atlas: Mapping the Childhood Roots of Social Mobility , 2018 .

[43]  Sean M. O'Brien,et al.  Prediction of Long-Term Mortality After Percutaneous Coronary Intervention in Older Adults: Results From the National Cardiovascular Data Registry , 2012, Circulation.

[44]  Rishi Saripalle,et al.  Using HL7 FHIR to achieve interoperability in patient health record , 2019, J. Biomed. Informatics.

[45]  Martha Millan,et al.  Semantic Annotation of Medical Images , 2010 .

[46]  Philip W Lavori,et al.  Integrating Randomized Comparative Effectiveness Research with Patient Care. , 2016, The New England journal of medicine.

[47]  Kenneth D. Mandl,et al.  SMART on FHIR: a standards-based, interoperable apps platform for electronic health records , 2016, J. Am. Medical Informatics Assoc..

[48]  Mary Brophy,et al.  Million Veteran Program: A mega-biobank to study genetic influences on health and disease. , 2016, Journal of clinical epidemiology.

[49]  Sean M. O'Brien,et al.  Cost-effectiveness of revascularization strategies: the ASCERT study. , 2015, Journal of the American College of Cardiology.

[50]  Deepak L. Bhatt,et al.  ACC/AHA/SCAI 2014 Health Policy Statement on Structured Reporting for the Cardiac Catheterization Laboratory: A Report of the American College of Cardiology Clinical Quality Committee , 2014, Circulation.

[51]  Kaija Saranto,et al.  Definition, structure, content, use and impacts of electronic health records: A review of the research literature , 2008, Int. J. Medical Informatics.

[52]  Taghi M. Khoshgoftaar,et al.  A survey of open source tools for machine learning with big data in the Hadoop ecosystem , 2015, Journal of Big Data.

[53]  Michael A. Burke,et al.  Phenomapping for Novel Classification of Heart Failure With Preserved Ejection Fraction , 2015, Circulation.

[54]  Markus Perola,et al.  Genomic prediction of coronary heart disease , 2016, bioRxiv.

[55]  Sean M. O'Brien,et al.  Comparative effectiveness of revascularization strategies. , 2012, The New England journal of medicine.

[56]  Nicholas Ayache,et al.  Fine-tuned convolutional neural nets for cardiac MRI acquisition plane recognition , 2017, Comput. methods Biomech. Biomed. Eng. Imaging Vis..

[57]  Derek C Angus,et al.  Fusing Randomized Trials With Big Data: The Key to Self-learning Health Care Systems? , 2015, JAMA.

[58]  Sean M. O'Brien,et al.  Introduction to the STS National Database Series: Outcomes Analysis, Quality Improvement, and Patient Safety. , 2015, The Annals of thoracic surgery.