Making sense of big data in health research: Towards an EU action plan

Medicine and healthcare are undergoing profound changes. Whole-genome sequencing and high-resolution imaging technologies are key drivers of this rapid and crucial transformation. Technological innovation combined with automation and miniaturization has triggered an explosion in data production that will soon reach exabyte proportions. How are we going to deal with this exponential increase in data production? The potential of “big data” for improving health is enormous but, at the same time, we face a wide range of challenges to overcome urgently. Europe is very proud of its cultural diversity; however, exploitation of the data made available through advances in genomic medicine, imaging, and a wide range of mobile health applications or connected devices is hampered by numerous historical, technical, legal, and political barriers. European health systems and databases are diverse and fragmented. There is a lack of harmonization of data formats, processing, analysis, and data transfer, which leads to incompatibilities and lost opportunities. Legal frameworks for data sharing are evolving. Clinicians, researchers, and citizens need improved methods, tools, and training to generate, analyze, and query data effectively. Addressing these barriers will contribute to creating the European Single Market for health, which will improve health and healthcare for all Europeans.

Magnus Rattray | Thomas Lengauer | Haralampos Karanikas | Pablo Villoslada | Allan Hanbury | Konstantinos Pliakos | Juan Antonio Vizcaíno | Charles Auffray | Reinhard Schneider | Inês Barroso | Gianluigi Zanetti | Paul Flicek | Norbert Graf | Ana Conesa | Milan Petkovic | Christoph Bock | Niklas Blomberg | Rudi Balling | Peter Devilee | Mikael Benson | Thierry Sengstag | Enrique Bernal-Delgado | Henk-Jan Guchelaar | Jay Bergeron | Peter Varnai | Yi-Ke Guo | Jeanine Houwing-Duistermaat | Jerry Lanfear | Sophie H. Janacek | Vera Grimm | László Bencze | Susanna Del Signore | Christophe Delogne | Alberto Di Meglio | Marinus Eijkemans | Ivo Glynne Gut | Shahid Hanif | Ralf-Dieter Hilgers | Ángel Honrado | D. Rod Hose | Tim Hubbard | Sophie Helen Janacek | Tim Kievits | Manfred Kohler | Andreas Kremer | Edith Maes | Theo Meert | Werner Müller | Dörthe Nickel | Peter Oledzki | Bertrand Pedersen | Josep Redón i Màs | Xavier Serra-Picamal | Wouter Spek | Lea A. I. Vaas | Okker van Batenburg | Marc Vandelaer | John Peter Mary Wubbe | M. Rattray | Thomas Lengauer | C. Auffray | A. Hanbury | J. Houwing-Duistermaat | P. Flicek | G. Zanetti | R. Balling | T. Meert | M. Benson | I. Gut | A. Conesa | N. Blomberg | Reinhard Schneider | M. Petkovic | P. Devilee | Vera Grimm | C. Bock | M. Eijkemans | I. Barroso | P. Villoslada | H. Guchelaar | J. Vizcaíno | P. Várnai | D. Hose | X. Serra-Picamal | Werner Müller | László Bencze | Jay Bergeron | E. Bernal-Delgado | S. Del Signore | Christophe Delogne | A. Di Meglio | N. Graf | Yi-Ke Guo | S. Hanif | R. Hilgers | A. Honrado | Tim J. P. Hubbard | H. Karanikas | T. Kievits | Manfred Kohler | A. Kremer | J. Lanfear | Edith Maes | Dörthe Nickel | P. Oledzki | Bertrand Pedersen | Konstantinos Pliakos | T. Sengstag | L. Vaas | M. Vandelaer | Wouter Spek | Okker van Batenburg | Paul Flicek | Laszlo Bencze | Xavier Serra-Picamal

[1]  V. Strezov,et al.  An Analysis of Citizen Science Based Research: Usage and Publication Patterns , 2015, PloS one.

[2]  Wendy A. Warr,et al.  ChEMBL. An interview with John Overington, team leader, chemogenomics at the European Bioinformatics Institute Outstation of the European Molecular Biology Laboratory (EMBL-EBI) , 2009, J. Comput. Aided Mol. Des..

[3]  Nathan D. Price,et al.  Demystifying Disease, Democratizing Health Care , 2014, Science Translational Medicine.

[4]  Silvio C. E. Tosatto,et al.  Tools and data services registry: a community effort to document bioinformatics resources , 2015, Nucleic Acids Res..

[5]  John Geddes,et al.  Big data for bipolar disorder , 2016, International Journal of Bipolar Disorders.

[6]  Charles Auffray,et al.  A multi-omics data integration approach to identify a predictive molecular signature of CLAD , 2015 .

[7]  Marco Viceconti,et al.  In silico clinical trials: how computer simulation will transform the biomedical industry , 2016 .

[8]  A. Metspalu,et al.  Linking a Population Biobank with National Health Registries—The Estonian Experience , 2015, Journal of personalized medicine.

[9]  Irene Schlünder,et al.  Code of practice on secondary use of medical data in European scientific research projects , 2015 .

[10]  Viju Raghupathi,et al.  Big data analytics in healthcare: promise and potential , 2014, Health Information Science and Systems.

[11]  Max A. Little,et al.  Technology in Parkinson's disease: Challenges and opportunities , 2016, Movement disorders : official journal of the Movement Disorder Society.

[12]  Arnon Rosenthal,et al.  Methodological Review: Cloud computing: A new business paradigm for biomedical information sharing , 2010 .

[13]  Euan A Ashley,et al.  The Undiagnosed Diseases Network of the National Institutes of Health: A National Extension. , 2015, JAMA.

[14]  Edward J. Kim,et al.  Molecular testing to optimize therapeutic decision making in advanced colorectal cancer. , 2016, Journal of gastrointestinal oncology.

[15]  Bairong Shen,et al.  Translational Biomedical Informatics in the Cloud: Present and Future , 2013, BioMed research international.

[16]  Scott W. Piraino,et al.  Beyond the exome: the role of non-coding somatic mutations in cancer. , 2016, Annals of oncology : official journal of the European Society for Medical Oncology.

[17]  Yuzhe Tang,et al.  Searching {HIE} with Differentiated Privacy Preservation , 2014 .

[18]  P. Y. Lum,et al.  Extracting insights from the shape of complex data using topology , 2013, Scientific Reports.

[19]  Richard Kemp,et al.  Legal aspects of managing Big Data , 2014, Comput. Law Secur. Rev..

[20]  Michael Eisenstein,et al.  Big data: The power of petabytes , 2015, Nature.

[21]  H. D. Vries Data Protection: Laws of the World (losbladig) , 2009 .

[22]  N Graf,et al.  p-Medicine: From data sharing and integration via VPH models to personalized medicine , 2011, Ecancermedicalscience.

[23]  Alberto Di Meglio Big Data Management- From CERN/LHC to Personalised Medicine , 2016 .

[24]  Frank McCormick,et al.  Targeting RAF kinases for cancer therapy: BRAF-mutated melanoma and beyond , 2014, Nature Reviews Cancer.

[25]  Melissa A. Basford,et al.  Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data , 2013, Nature Biotechnology.

[26]  Martin Hofmann-Apitius,et al.  Bioinformatics Mining and Modeling Methods for the Identification of Disease Mechanisms in Neurodegenerative Disorders , 2015, International journal of molecular sciences.

[27]  Piotr Gawron,et al.  LSC Abstract – The AsthmaMap: Towards a community-driven reconstruction of asthma-relevant pathways and networks , 2016 .

[28]  Bladimir Díaz Borges Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities , 2008 .

[29]  Tanja Stadler,et al.  On the Need for Mechanistic Models in Computational Genomics and Metagenomics , 2013, Genome biology and evolution.

[30]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.

[31]  Marie-Christine Jaulent,et al.  Computational Approaches for Pharmacovigilance Signal Detection: Toward Integrated and Semantically-Enriched Frameworks , 2015, Drug Safety.

[32]  Ioannis Pandis,et al.  Clinical and inflammatory characteristics of the European U-BIOPRED adult severe asthma cohort , 2015, European Respiratory Journal.

[33]  Steven R Feldman,et al.  Review of patient registries in dermatology. , 2016, Journal of the American Academy of Dermatology.

[34]  Ted D Wade,et al.  Traits and types of health data repositories , 2014, Health Information Science and Systems.

[35]  Gwoboa Horng,et al.  Privacy Preserving Index for Encrypted Electronic Medical Records , 2013, Journal of Medical Systems.

[36]  Shane J. Neph,et al.  Personal and population genomics of human regulatory variation , 2012, Genome research.

[37]  Ulf Schmitz Olaf,et al.  Systems Medicine , 2016, Methods in Molecular Biology.

[38]  Jesper Tegnér,et al.  From Systems Understanding to Personalized Medicine: Lessons and Recommendations Based on a Multidisciplinary and Translational Analysis of COPD. , 2016, Methods in molecular biology.

[39]  P. Robinson,et al.  RD-Connect: An Integrated Platform Connecting Databases, Registries, Biobanks and Clinical Bioinformatics for Rare Disease Research , 2014, Journal of General Internal Medicine.

[40]  Samik Ghosh,et al.  Integrating Pathways of Parkinson's Disease in a Molecular Interaction Map , 2013, Molecular Neurobiology.

[41]  William B. Rouse,et al.  Bringing a Systems Approach to Health , 2013 .

[42]  Jesper Tegnér,et al.  Accelerating Translational Research by Clinically Driven Development of an Informatics Platform–A Case Study , 2014, PloS one.

[43]  A. Yardimci,et al.  The use of mobile smart devices and medical apps in the family practice setting. , 2016, Journal of evaluation in clinical practice.

[44]  Griffin M. Weber,et al.  Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2) , 2010, J. Am. Medical Informatics Assoc..

[45]  Giusti Francesco,et al.  ENCR European Network of Cancer Registries Newsflash- July 2016 , 2016 .

[46]  David Gomez-Cabrero,et al.  Workforce preparation: the Biohealth computing model for Master and PhD students , 2014, Journal of Translational Medicine.

[47]  J. Rumsfeld,et al.  Big data analytics to improve cardiovascular care: promise and challenges , 2016, Nature Reviews Cardiology.

[48]  S. Doyle-Lindrud,et al.  Watson will see you now: a supercomputer to help clinicians make informed treatment decisions. , 2015, Clinical journal of oncology nursing.

[49]  Mika Gustafsson,et al.  A validated gene regulatory network and GWAS identifies early regulators of T cell–associated diseases , 2015, Science Translational Medicine.

[50]  Alberto Anguita,et al.  p-medicine: A Medical Informatics Platform for Integrated Large Scale Heterogeneous Patient Data , 2014, AMIA.

[51]  P. Levy,et al.  Exploring the Potential of Predictive Analytics and Big Data in Emergency Care. , 2016, Annals of emergency medicine.

[52]  Patricia C. Dykes,et al.  The Significance of Data Harmonization for Credentialing Research , 2014 .

[53]  Jessica D. Tenenbaum,et al.  Translational Bioinformatics: Past, Present, and Future , 2016, Genom. Proteom. Bioinform..

[54]  Moustafa Ghanem,et al.  Computational Infrastructures for Data and Knowledge Management in Systems Biology , 2013 .

[55]  Frederick Marcus,et al.  Cancer Systems Biology, Bioinformatics and Medicine: Research and Clinical Applications , 2011 .

[56]  Declan Butler,et al.  Dutch lead European push to flip journals to open access , 2016, Nature.

[57]  Jesper Tegnér,et al.  Biomedical research in a Digital Health Framework , 2014, Journal of Translational Medicine.

[58]  Jure Acimovic,et al.  Training in Systems Approaches for the Next Generation of Life Scientists and Medical Doctors. , 2016, Methods in molecular biology.

[59]  [The UNESCO international declaration about human genetic data ]. , 2003, Revista de derecho y genoma humano = Law and the human genome review.

[60]  Jean Mosser,et al.  Routine molecular profiling of patients with advanced non-small-cell lung cancer: results of a 1-year nationwide programme of the French Cooperative Thoracic Intergroup (IFCT) , 2016, The Lancet.

[61]  Tina Blegind Jensen,et al.  Design principles for achieving integrated healthcare information systems , 2013, Health Informatics J..

[62]  Krister Wennerberg,et al.  Individualized systems medicine strategy to tailor treatments for patients with chemorefractory acute myeloid leukemia. , 2013, Cancer discovery.

[63]  Magnus Rattray,et al.  Erratum to: Making sense of big data in health research: towards an EU action plan , 2016, Genome Medicine.

[64]  Bartha Maria Knoppers,et al.  Framework for responsible sharing of genomic and health-related data , 2014, The HUGO Journal.

[65]  Erik Bongcam-Rudloff,et al.  The Pan-European research infrastructure for Biobanking and Biomolecular Resources: managing resources for the future of biomedical research , 2009 .

[66]  Ross M. Fraser,et al.  Defining the role of common variation in the genomic and biological architecture of adult human height , 2014, Nature Genetics.

[67]  J. Ludvigsson,et al.  External review and validation of the Swedish national inpatient register , 2011, BMC public health.

[68]  Andre Dekker,et al.  Standardized data collection to build prediction models in oncology: a prototype for rectal cancer. , 2016, Future oncology.

[69]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[70]  Michael Krawczak,et al.  Where genotype is not predictive of phenotype: towards an understanding of the molecular basis of reduced penetrance in human inherited disease , 2013, Human Genetics.

[71]  Hans-Ulrich Prokosch,et al.  A scoping review of cloud computing in healthcare , 2015, BMC Medical Informatics and Decision Making.

[72]  Elsi,et al.  Biobanking and Biomolecular Resources Research Infrastructure , 2015 .

[73]  Carmen C. Y. Poon,et al.  Unobtrusive Sensing and Wearable Devices for Health Informatics , 2014, IEEE Transactions on Biomedical Engineering.

[74]  Frederik Coppens ELIXIR: a distributed infrastructure for life-science information , 2016 .

[75]  Michael Nentwich,et al.  Recommendations for the Transition to Open Access in Austria , 2015 .

[76]  C. Sawyers The cancer biomarker problem , 2008, Nature.

[77]  Fred Kusumoto,et al.  The application of Big Data in medicine: current implications and future directions , 2016, Journal of Interventional Cardiac Electrophysiology.

[78]  M. Kalia,et al.  Biomarkers for personalized oncology: recent advances and future challenges. , 2015, Metabolism: clinical and experimental.

[79]  K. Yamamoto,et al.  GLOBAL ALLIANCE FOR GENOMICS AND HEALTH , 2015 .

[80]  A. McKenna,et al.  Evolution and Impact of Subclonal Mutations in Chronic Lymphocytic Leukemia , 2012, Cell.

[81]  Louis Lebel,et al.  Crafting usable knowledge for sustainable development , 2016, Proceedings of the National Academy of Sciences.

[82]  G. Poste Bring on the biomarkers , 2011, Nature.

[83]  Olaf Wolkenhauer,et al.  Enabling multiscale modeling in systems medicine , 2014, Genome Medicine.

[84]  Trey Ideker,et al.  Boosting Signal-to-Noise in Complex Biology: Prior Knowledge Is Power , 2011, Cell.

[85]  Peter J. Hunter,et al.  Big Data, Big Knowledge: Big Data for Personalized Healthcare , 2015, IEEE Journal of Biomedical and Health Informatics.

[86]  Joseph K. Pickrell,et al.  Understanding mechanisms underlying human gene expression variation with RNA sequencing , 2010, Nature.

[87]  Charles Auffray,et al.  Prediction of chronic lung allograft dysfunction: a systems medicine challenge , 2014, European Respiratory Journal.

[88]  Winston A Hide,et al.  Big data: The future of biocuration , 2008, Nature.

[89]  Régis Beuscart,et al.  Toward a Literature-Driven Definition of Big Data in Healthcare , 2015, BioMed research international.

[90]  Domenica Taruscio,et al.  Undiagnosed Diseases Network International (UDNI): White paper for global actions to meet patient needs. , 2015, Molecular genetics and metabolism.

[91]  Isaac S Kohane,et al.  Time for a Patient-Driven Health Information Economy? , 2016, The New England journal of medicine.

[92]  Frederick Marcus,et al.  Cancer Systems Biology, Bioinformatics and Medicine , 2011 .

[93]  Dylan B. George,et al.  Big Data Opportunities for Global Infectious Disease Surveillance , 2013, PLoS medicine.

[94]  Luis Fernández-Luque,et al.  Health and Social Media: Perfect Storm of Information , 2015, Healthcare informatics research.

[95]  Joan Stephenson Cancer Genome Consortium , 2008 .