The Era of Big Data: From Data-Driven Research to Data-Driven Clinical Care

When the era of big data arrived in the early nineteen nineties, biomedical research boosted new innovations, procedures and methods aiding in clinical care and patient management. This chapter provides an introduction to the basic concepts and strategies of data-driven biomedical research and application, an area that is explained using terms such as computational biomedicine or clinical/medical bioinformatics. After a brief motivation it starts with a survey on data sources and bioanalytic technologies for high-throughput data generation, a selection of experimental study designs and their applications, procedures and recommendations on how to handle data quality and privacy, followed by a discussion on basic data warehouse concepts utilized for life science data integration, data mining and knowledge discovery. Finally, five application examples are briefly delineated, emphazising the benefit and power of computational methods and tools in this field. The author trusts that this chapter will encourage the reader to handle and interpret the huge amount of data usually generated in research projects or clinical routine to exploit mined bioinformation and medical knowledge for individualized health care.

[1]  Victor I. Mikla,et al.  7 – Ultrasound Imaging , 2014 .

[2]  Burkhard Morgenstern,et al.  Meta-Analysis of Pathway Enrichment: Combining Independent and Dependent Omics Data Sets , 2014, PloS one.

[3]  Bernhard Pfeifer,et al.  Bridging Data Management and Knowledge Discovery in the Life Sciences , 2008 .

[4]  Eiichiro Fukusaki,et al.  Current metabolomics: technological advances. , 2013, Journal of bioscience and bioengineering.

[5]  Olivier Bodenreider,et al.  Ontologies and Data Integration in Biomedicine: Success Stories and Challenging Issues , 2008, DILS.

[6]  M. Cheung,et al.  Meta‐analysis in medicine: an introduction , 2010, International journal of rheumatic diseases.

[7]  Hui Sun,et al.  Mass spectrometry-based metabolomics: applications to biomarker and metabolic pathway research. , 2016, Biomedical chromatography : BMC.

[8]  Coral Barbas,et al.  Method validation strategies involved in non-targeted metabolomics. , 2014, Journal of chromatography. A.

[9]  G. Tseng,et al.  Comprehensive literature review and statistical considerations for GWAS meta-analysis , 2012, Nucleic acids research.

[10]  Panos M. Pardalos,et al.  Data Mining in Biomedicine , 2010 .

[11]  Jian Xu,et al.  A machine learning framework of functional biomarker discovery for different microbial communities based on metagenomic data , 2012, 2012 IEEE 6th International Conference on Systems Biology (ISB).

[12]  R. Aebersold,et al.  A Combined Shotgun and Targeted Mass Spectrometry Strategy for Breast Cancer Biomarker Discovery. , 2015, Journal of proteome research.

[13]  Bernhard Pfeifer,et al.  A new data mining approach for profiling and categorizing kinetic patterns of metabolic biomarkers after myocardial injury , 2010, Bioinform..

[14]  W. B. Lee,et al.  Data Mining in Biomedicine: Current Applications and Further Directions for Research , 2009, J. Softw. Eng. Appl..

[15]  Bernhard Pfeifer,et al.  A new rule-based algorithm for identifying metabolic markers in prostate cancer using tandem mass spectrometry , 2008, Bioinform..

[16]  Wendy Hall,et al.  The Semantic Web Revisited , 2006, IEEE Intelligent Systems.

[17]  Alan H. Fielding,et al.  Cluster and Classification Techniques for the Biosciences , 2006 .

[18]  C. Pasquier Biological data integration using Semantic Web technologies. , 2008, Biochimie.

[19]  C Baumgartner,et al.  Marfan Syndrome , 2005, Methods of Information in Medicine.

[20]  C. Baumgartner,et al.  Diagnostic power of aortic elastic properties in young patients with Marfan syndrome. , 2005, The Journal of thoracic and cardiovascular surgery.

[21]  P. Brennan,et al.  Proteomics technologies for the global identification and quantification of proteins. , 2010, Advances in protein chemistry and structural biology.

[22]  Andrea Calì,et al.  Accessing Data Integration Systems through Conceptual Schemas , 2001, ER.

[23]  A. Mobasheri,et al.  Application of machine learning to proteomics data: classification and biomarker identification in postgenomics biology. , 2013, Omics : a journal of integrative biology.

[24]  Nikolas Mitrou,et al.  Bringing relational databases into the Semantic Web: A survey , 2012, Semantic Web.

[25]  Frank Baas,et al.  Molecular classification of amyotrophic lateral sclerosis by unsupervised clustering of gene expression in motor cortex , 2015, Neurobiology of Disease.

[26]  Costel C. Darie,et al.  Mass spectrometry for proteomics-based investigation. , 2014, Advances in experimental medicine and biology.

[27]  Christian Baumgartner,et al.  A bioinformatics framework for genotype-phenotype correlation in humans with Marfan syndrome caused by FBN1 gene mutations , 2006, J. Biomed. Informatics.

[28]  C. Baumgartner,et al.  Non-invasive diagnosis of liver diseases by breath analysis using an optimized ion–molecule reaction-mass spectrometry approach: a pilot study , 2010, Biomarkers : biochemical indicators of exposure, response, and susceptibility to chemicals.

[29]  Christian Baumgartner,et al.  Bioinformatic-driven search for metabolic biomarkers in disease , 2011, Journal of Clinical Bioinformatics.

[30]  Matthias Baldauf,et al.  Personalized Oncology Suite: integrating next-generation sequencing data and whole-slide bioimages , 2014, BMC Bioinformatics.

[31]  L. M. Akella,et al.  SeMoP: a new computational strategy for the unrestricted search for modified peptides using LC-MS/MS data. , 2008, Journal of proteome research.

[32]  Amarnath Gupta,et al.  Mediator infrastructure for information integration and semantic data integration environment for biomedical research. , 2009, Methods in molecular biology.

[33]  Subbarao Kambhampati,et al.  Integration of biological sources: current systems and challenges ahead , 2004, SGMD.

[34]  Ralf Hofestädt,et al.  BioDWH: A Data Warehouse Kit for Life Science Data Integration , 2008, J. Integr. Bioinform..

[35]  Bambang Parmanto,et al.  A framework for designing a healthcare outcome data warehouse. , 2005, Perspectives in health information management.

[36]  Xiangdong Wang,et al.  Clinical bioinformatics: a new emerging science , 2011, Journal of Clinical Bioinformatics.

[37]  E. Worthey Analysis and Annotation of Whole‐Genome or Whole‐Exome Sequencing Derived Variants for Clinical Diagnosis , 2017, Current protocols in human genetics.

[38]  Christian Baumgartner,et al.  Metabolite profiling of blood from individuals undergoing planned myocardial infarction reveals early markers of myocardial injury. , 2008, The Journal of clinical investigation.

[39]  Benno Schwikowski,et al.  MUDE: a new approach for optimizing sensitivity in the target-decoy search strategy for large-scale peptide/protein identification. , 2010, Journal of proteome research.

[40]  Gos Micklem,et al.  metabolicMine: an integrated genomics, genetics and proteomics data warehouse for common metabolic disease research , 2013, Database J. Biol. Databases Curation.

[41]  Taneth Ruangrajitpakorn,et al.  Biomarker Selection and Classification of “-Omics” Data Using a Two-Step Bayes Classification Framework , 2013, BioMed research international.

[42]  Igor Jurisica,et al.  Knowledge Discovery and Data Mining in Biomedical Informatics: State-of-the-Art and Future Challenges , 2014 .

[43]  Rui Xu,et al.  Clustering Algorithms in Biomedical Research: A Review , 2010, IEEE Reviews in Biomedical Engineering.

[44]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[45]  Padhraic Smyth,et al.  Knowledge Discovery and Data Mining: Towards a Unifying Framework , 1996, KDD.

[46]  Michael P Snyder,et al.  High-throughput sequencing for biology and medicine , 2013, Molecular systems biology.

[47]  B. S. Manjunath,et al.  Biological imaging software tools , 2012, Nature Methods.

[48]  Adrian Paschke,et al.  A journey to Semantic Web query federation in the life sciences , 2009, BMC Bioinformatics.

[49]  Marco Viceconti,et al.  Computational Biomedicine: Modelling the Human Body , 2014 .

[50]  Jian Yang,et al.  MitProNet: A Knowledgebase and Analysis Platform of Proteome, Interactome and Diseases for Mammalian Mitochondria , 2014, PloS one.

[51]  Henry Pinkard,et al.  Advanced methods of microscope control using μManager software. , 2014, Journal of biological methods.

[52]  Mark Gerstein,et al.  Semantic Web Approach to Database Integration in the Life Sciences , 2007 .

[53]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[54]  Christian Baumgartner,et al.  Modeling and Classification of Kinetic Patterns of Dynamic Metabolic Biomarkers in Physical Activity , 2015, PLoS Comput. Biol..

[55]  Sumeet Dua,et al.  Data Mining for Bioinformatics , 2012 .

[56]  Guodong Chen,et al.  Application of LC/MS to proteomics studies: current status and future prospects. , 2009, Drug discovery today.

[57]  Bernhard Pfeifer,et al.  Knowledge Discovery in Proteomic Mass Spectrometry Data , 2015 .

[58]  Jun Gao,et al.  DW4TR: A Data Warehouse for Translational Research , 2011, J. Biomed. Informatics.

[59]  Henning Müller,et al.  Strategies for health data exchange for secondary, cross-institutional clinical research , 2010, Comput. Methods Programs Biomed..

[60]  Bernhard Pfeifer,et al.  A new ensemble-based algorithm for identifying breath gas marker candidates in liver disease using ion molecule reaction mass spectrometry , 2009, Bioinform..

[61]  D. Stekel,et al.  A machine learning heuristic to identify biologically relevant and minimal biomarker panels from omics data , 2015, BMC Genomics.

[62]  Christian Baumgartner,et al.  Genetic network and gene set enrichment analysis to identify biomarkers related to cigarette smoking and lung cancer. , 2013, Cancer treatment reviews.

[63]  Dimitris Kanellopoulos,et al.  Data Preprocessing for Supervised Leaning , 2007 .

[64]  Andrea Calì,et al.  On the Expressive Power of Data Integration Systems , 2002, ER.