Intelligent Integrative Knowledge Bases: Bridging Genomics, Integrative Biology and Translational Medicine

Successful application of translational medicine will require understanding the complex nature of disease, fueled by effective analysis of multidimensional ’omics’ measurements and systems-level studies. In this paper, we present a perspective — the intelligent integrative knowledge base (I2KB)— for data management, statistical analysis and knowledge discovery related to human disease. By building a bridge between patient associations, clinicians, experimentalists and modelers, I2KB will facilitate the emergence and propagation of systems medicine studies, which are a prerequisite for large-scaled clinical trial studies, efficient diagnosis, disease screening, drug target evaluation and development of new therapeutic strategies.

[1]  Lincoln D. Stein,et al.  Towards a cyberinfrastructure for the biological sciences: progress, visions and challenges , 2008, Nature Reviews Genetics.

[2]  Chaim Zins,et al.  Conceptual approaches for defining data, information, and knowledge , 2007, J. Assoc. Inf. Sci. Technol..

[3]  Mauno Vihinen,et al.  Guidelines for establishing locus specific databases , 2012, Human mutation.

[4]  Gilles Clermont,et al.  Bridging the gap between systems biology and medicine , 2009, Genome Medicine.

[5]  Christian Gilissen,et al.  Disease gene identification strategies for exome sequencing , 2012, European Journal of Human Genetics.

[6]  Gabriella Pasi,et al.  Human-Computer Interaction and Knowledge Discovery in Complex, Unstructured, Big Data , 2013, Lecture Notes in Computer Science.

[7]  Olivier Poch,et al.  Whole-exome sequencing identifies LRIT3 mutations as a cause of autosomal-recessive complete congenital stationary night blindness. , 2013, American journal of human genetics.

[8]  Alfredo Cuzzocrea,et al.  Availability, Reliability, and Security in Information Systems and HCI , 2013, Lecture Notes in Computer Science.

[9]  Peter N. Robinson,et al.  Deep phenotyping for precision medicine , 2012, Human mutation.

[10]  Jim Kaput,et al.  Initiating a Human Variome Project Country Node , 2011, Human mutation.

[11]  Paul J Shrimpton,et al.  Discovering rules for protein-ligand specificity using support vector inductive logic programming. , 2009, Protein engineering, design & selection : PEDS.

[12]  I. Fokkema,et al.  LOVD: Easy creation of a locus‐specific sequence variation database using an “LSDB‐in‐a‐box” approach , 2005, Human mutation.

[13]  M. Snir,et al.  Big data, but are we ready? , 2011, Nature Reviews Genetics.

[14]  Dietrich Rebholz-Schuhmann,et al.  Quantitative comparison of mapping methods between Human and Mammalian Phenotype Ontology , 2012, J. Biomed. Semant..

[15]  Feng Liu,et al.  The pharmacogenetics and pharmacogenomics knowledge base: accentuating the knowledge , 2007, Nucleic Acids Res..

[16]  Rob W.W. Hooft,et al.  The value of data , 2011, Nature Genetics.

[17]  Helen M. Berman,et al.  The Protein Structure Initiative Structural Biology Knowledgebase Technology Portal: a structural biology web resource , 2012, Journal of Structural and Functional Genomics.

[18]  Pamela A. Silver,et al.  Informing Biological Design by Integration of Systems and Synthetic Biology , 2011, Cell.

[19]  Gudmundur A. Thorisson,et al.  An informatics project and online “Knowledge Centre” supporting modern genotype‐to‐phenotype research , 2011, Human mutation.

[20]  Bjørn K. Alsberg,et al.  Microarray data classification using inductive logic programming and gene ontology background information , 2010 .

[21]  Matthew R. Pocock,et al.  Taverna: a tool for the composition and enactment of bioinformatics workflows , 2004, Bioinform..

[22]  John Boyle,et al.  Biology must develop its own big-data systems , 2013, Nature.

[23]  Randolph A. Miller,et al.  Service-oriented Architecture in Medical Software: Promises and Perils , 2007, J. Am. Medical Informatics Assoc..

[24]  C Béroud,et al.  UMD (Universal Mutation Database): A generic software to build and analyze locus‐specific databases , 2000, Human mutation.

[25]  S. Brunak,et al.  Mining electronic health records: towards better research applications and clinical care , 2012, Nature Reviews Genetics.

[26]  Rudi Balling,et al.  Revolutionizing medicine in the 21st century through systems approaches. , 2012, Biotechnology journal.

[27]  G. Church,et al.  From genetic privacy to open consent , 2008, Nature Reviews Genetics.

[28]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[29]  Thanh Phuong Nguyen,et al.  An Integrative Domain-Based Approach to Predicting protein-protein Interactions , 2008, J. Bioinform. Comput. Biol..

[30]  Andreas Holzinger,et al.  On Visual Analytics and Evaluation in Cell Physiology: A Case Study , 2013, CD-ARES.

[31]  Olivier Poch,et al.  KD4v: comprehensible knowledge discovery system for missense variant , 2012, Nucleic Acids Res..

[32]  Rui Chen,et al.  Systems biology: personalized medicine for the future? , 2012, Current opinion in pharmacology.

[33]  Olivier Poch,et al.  SM2PH‐db: an interactive system for the integrated analysis of phenotypic consequences of missense mutations in proteins involved in human genetic diseases , 2010, Human mutation.

[34]  Olivier Poch,et al.  Heterogeneous biological data integration with declarative query language , 2014, IBM J. Res. Dev..

[35]  Dietrich Rebholz-Schuhmann,et al.  Ontology design patterns to disambiguate relations between genes and gene products in GENIA , 2011, J. Biomed. Semant..

[36]  Olivier Poch,et al.  EvoluCode: Evolutionary Barcodes as a Unifying Framework for Multilevel Evolutionary Data , 2011, Evolutionary bioinformatics online.

[37]  Darren J. Wilkinson,et al.  Bayesian methods in bioinformatics and computational systems biology , 2006, Briefings Bioinform..

[38]  Csongor Nyulas,et al.  BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications , 2011, Nucleic Acids Res..

[39]  Olivier Poch,et al.  Knowledge Discovery in Variant Databases Using Inductive Logic Programming , 2013, Bioinformatics and biology insights.

[40]  Rolf Apweiler,et al.  Human Proteome Organisation Proteomics Standards Initiative Pre‐Congress Initiative , 2005, Proteomics.

[41]  Kai Wang,et al.  Identifying disease mutations in genomic medicine settings: current challenges and how to accelerate progress , 2012, Genome Medicine.

[42]  H. Kitano Systems Biology: A Brief Overview , 2002, Science.

[43]  Danny P. Wallace Knowledge Management: Historical and Cross-Disciplinary Themes , 2007 .

[44]  Doron Lancet,et al.  Omics data management and annotation. , 2011, Methods in molecular biology.

[45]  Monte Westerfield,et al.  Linking Human Diseases to Animal Models Using Ontology-Based Phenotype Annotation , 2009, PLoS biology.

[46]  Olivier Poch,et al.  Whole-exome sequencing identifies mutations in GPR179 leading to autosomal-recessive complete congenital stationary night blindness. , 2012, American journal of human genetics.

[47]  Nigel W. Hardy,et al.  Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project , 2008, Nature Biotechnology.

[48]  Charles Auffray,et al.  Genome Medicine: past, present and future , 2011, Genome Medicine.

[49]  Ross D. King,et al.  Applying Inductive Logic Programming to Predicting Gene Function , 2004, AI Mag..

[50]  Xavier Zanlonghi,et al.  Whole exome sequencing identifies mutations in LRIT3 as a cause for autosomal recessive complete congenital stationary night blindness , 2013 .

[51]  Stephen Muggleton,et al.  Inductive Logic Programming: Issues, Results and the Challenge of Learning Language in Logic , 1999, Artif. Intell..

[52]  M. Wake,et al.  What is “Integrative Biology”?1 , 2003, Integrative and comparative biology.

[53]  David J. Hand,et al.  Intelligent Data Analysis: An Introduction , 2005 .

[54]  Andreas Holzinger,et al.  Human-Computer Interaction and Knowledge Discovery (HCI-KDD): What Is the Benefit of Bringing Those Two Fields to Work Together? , 2013, CD-ARES.

[55]  Daniel J. Blankenberg,et al.  Galaxy: a platform for interactive large-scale genome analysis. , 2005, Genome research.

[56]  Olivier Poch,et al.  MSV3d: database of human MisSense variants mapped to 3D protein structure , 2012, Database J. Biol. Databases Curation.

[57]  Y. Moreau,et al.  Computational tools for prioritizing candidate genes: boosting disease gene discovery , 2012, Nature Reviews Genetics.

[58]  S. Henikoff,et al.  Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm , 2009, Nature Protocols.

[59]  John Wylie Lloyd,et al.  Foundations of Logic Programming , 1987, Symbolic Computation.

[60]  P. Stenson,et al.  The Human Gene Mutation Database (HGMD) and Its Exploitation in the Fields of Personalized Genomics and Molecular Evolution , 2012, Current protocols in bioinformatics.

[61]  David Page,et al.  Validation of Results from Knowledge Discovery: Mass Density as a Predictor of Breast Cancer , 2009, Journal of Digital Imaging.

[62]  Andreas Holzinger,et al.  KNODWAT: A scientific framework application for testing knowledge discovery methods for the biomedical domain , 2013, BMC Bioinformatics.

[63]  J. W. Lloyd,et al.  Foundations of logic programming; (2nd extended ed.) , 1987 .

[64]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[65]  Eric E Schadt,et al.  NEW: Network-Enabled Wisdom in Biology, Medicine, and Health Care , 2012, Science Translational Medicine.

[66]  Yves Moreau,et al.  Annotate-it: a Swiss-knife approach to annotation, analysis and interpretation of single nucleotide variation in human disease , 2012, Genome Medicine.