BioBroker: Knowledge Discovery Framework for Heterogeneous Biomedical Ontologies and Data

A large number of ontologies have been introduced by the biomedical community in recent years. Knowledge discovery for entity identification from ontology has become an important research area, and it is always interesting to discovery how associations are established to connect concepts in a single ontology or across multiple ontologies. However, due to the exponential growth of biomedical big data and their complicated associations, it becomes very challenging to detect key associations among entities in an inefficient dynamic manner. Therefore, there exists a gap between the increasing needs for association detection and large volume of biomedical ontologies. In this paper, to bridge this gap, we presented a knowledge discovery framework, the BioBroker, for grouping entities to facilitate the process of biomedical knowledge discovery in an intelligent way. Specifically, we developed an innovative knowledge discovery algorithm that combines a graph clustering method and an indexing technique to discovery knowledge patterns over a set of interlinked data sources in an efficient way. We have demonstrated capabilities of the BioBroker for query execution with a use case study on a subset of the Bio2RDF life science linked data.

[1]  Jérôme Euzenat,et al.  Ontology Matching: State of the Art and Future Challenges , 2013, IEEE Transactions on Knowledge and Data Engineering.

[2]  Xindong Wu,et al.  Data mining with big data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[3]  Hongfang Liu,et al.  Accelerating Rare Disease Diagnosis with Collaborative Filtering , 2017, AMIA.

[4]  Yugyung Lee,et al.  BmQGen: Biomedical query generator for knowledge discovery , 2015, 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[5]  Russ B. Altman,et al.  PharmGKB: the Pharmacogenetics Knowledge Base , 2002, Nucleic Acids Res..

[6]  Jérôme Euzenat,et al.  Ten Challenges for Ontology Matching , 2008, OTM Conferences.

[7]  E. Prud hommeaux,et al.  SPARQL query language for RDF , 2011 .

[8]  Cui Tao,et al.  Exploring the Pharmacogenomics Knowledge Base (PharmGKB) for Repositioning Breast Cancer Drugs by Leveraging Web Ontology Language (OWL) and Cheminformatics Approaches , 2013, Pacific Symposium on Biocomputing.

[9]  S. Mundlos,et al.  The Human Phenotype Ontology , 2010, Clinical genetics.

[10]  J. Bezdek,et al.  FCM: The fuzzy c-means clustering algorithm , 1984 .

[11]  Dan Suciu,et al.  Regular Paths in SparQL: Querying the NCI Thesaurus , 2008, AMIA.

[12]  Hongfang Liu,et al.  Populating Physician Biographical Pages Based on EMR Data , 2017, CRI.

[13]  Yugyung Lee,et al.  Knowledge Discovery from Biomedical Ontologies in Cross Domains , 2016, PloS one.

[14]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[15]  Orri Erling,et al.  RDF Support in the Virtuoso DBMS , 2007, CSSW.

[16]  Vipul Kashyap,et al.  The Translational Medicine Ontology and Knowledge Base: driving personalized medicine by bridging the gap between bench and bedside , 2011, J. Biomed. Semant..

[17]  James Taylor,et al.  Next-generation sequencing data interpretation: enhancing reproducibility and accessibility , 2012, Nature Reviews Genetics.

[18]  Maurice K. Wong,et al.  Algorithm AS136: A k-means clustering algorithm. , 1979 .

[19]  Dan Brickley,et al.  Resource Description Framework (RDF) Model and Syntax Specification , 2002 .

[20]  Cui Tao,et al.  Systematic identification of latent disease-gene associations from PubMed articles , 2018, PloS one.

[21]  Yugyung Lee,et al.  Using semantic web technologies for quality measure phenotyping algorithm representation and automatic execution on EHR data , 2014, IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI).

[22]  Michel Dumontier,et al.  Bio2RDF Release 2: Improved Coverage, Interoperability and Provenance of Life Science Linked Data , 2013, ESWC.

[23]  Hongfang Liu,et al.  Using Human Phenotype Ontology for Phenotypic Analysis of Clinical Notes , 2017, MedInfo.

[24]  P. Rousseeuw,et al.  Partitioning Around Medoids (Program PAM) , 2008 .

[25]  Nigel Shadbolt,et al.  Resource Description Framework (RDF) , 2009 .

[26]  Feichen Shen A pervasive framework for real-time activity patterns of mobile users , 2015, 2015 IEEE International Conference on Pervasive Computing and Communication Workshops (PerCom Workshops).

[27]  Vipin Kumar,et al.  Trends in big data analytics , 2014, J. Parallel Distributed Comput..

[28]  Krys J. Kochut,et al.  SPARQLeR: Extended Sparql for Semantic Association Discovery , 2007, ESWC.

[29]  Mathew W. Wright,et al.  The HUGO Gene Nomenclature Committee (HGNC) , 2001, Human Genetics.

[30]  Cui Tao,et al.  An integrative computational approach to identify disease-specific networks from PubMed literature information , 2013, 2013 IEEE International Conference on Bioinformatics and Biomedicine.

[31]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[32]  David S. Wishart,et al.  DrugBank: a knowledgebase for drugs, drug actions and drug targets , 2007, Nucleic Acids Res..

[33]  Yugyung Lee,et al.  SMARTSPACE: Multiagent Based Distributed Platform for Semantic Service Discovery , 2014, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[34]  Yugyung Lee,et al.  Collaborative Mobile-Cloud Computing for Civil Infrastructure Condition Inspection , 2015, J. Comput. Civ. Eng..

[35]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.

[36]  Amit P. Sheth,et al.  Changing Focus on Interoperability in Information Systems:From System, Syntax, Structure to Semantics , 1999 .

[37]  Yugyung Lee,et al.  PEMAR: A pervasive middleware for activity recognition with smart phones , 2015, 2015 IEEE International Conference on Pervasive Computing and Communication Workshops (PerCom Workshops).

[38]  Feichen Shen A graph analytics framework for knowledge discovery , 2016 .

[39]  Sean Bechhofer OWL: Web Ontology Language , 2018, Encyclopedia of Database Systems.

[40]  Yugyung Lee,et al.  MedTQ: Dynamic Topic Discovery and Query Generation for Medical Ontologies , 2018, ArXiv.

[41]  Jean-François Baget,et al.  Extending SPARQL with regular expression patterns (for querying RDF) , 2009, J. Web Semant..

[42]  The Uniprot Consortium,et al.  UniProt: a hub for protein information , 2014, Nucleic Acids Res..

[43]  Alan F. Scott,et al.  Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders , 2002, Nucleic Acids Res..

[44]  Jack J. Dongarra,et al.  Exascale computing and big data , 2015, Commun. ACM.

[45]  Judith A. Blake,et al.  The Mouse Genome Database (MGD): mouse biology and model systems , 2007, Nucleic Acids Res..

[46]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[47]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[48]  Feichen Shen,et al.  Situation Aware Mobile Apps Framework , 2012 .

[49]  J. A. Morgan,et al.  Calculation of the Residual Sum of Squares for all Possible Regressions , 1972 .

[50]  Hongfang Liu,et al.  Leveraging Collaborative Filtering to Accelerate Rare Disease Diagnosis , 2017, AMIA.

[51]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[52]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[53]  Cui Tao,et al.  Phenotyping on EHR Data Using OWL and Semantic Web Technologies , 2013, ICSH.

[54]  Dan Suciu,et al.  Generating Application Ontologies from Reference Ontologies , 2008, AMIA.

[55]  Yugyung Lee,et al.  SAMAF: Situation aware mobile apps framework , 2015, 2015 IEEE International Conference on Pervasive Computing and Communication Workshops (PerCom Workshops).

[56]  Hongfang Liu,et al.  Phenotypic Analysis of Clinical Narratives Using Human Phenotype Ontology , 2020, MedInfo.

[57]  H. Lan,et al.  SWRL : A semantic Web rule language combining OWL and ruleML , 2004 .

[58]  Yugyung Lee,et al.  Predicate Oriented Pattern Analysis for Biomedical Knowledge Discovery , 2016, Intelligent information management.

[59]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[60]  Joshua A. Grochow,et al.  Network Motif Discovery Using Subgraph Enumeration and Symmetry-Breaking , 2007, RECOMB.