Creating NoSQL Biological Databases with Ontologies for Query Relaxation

Abstract The complexity of building biological databases is well-known and ontologies play an extremely important role in biological databases. However, much of the emphasis on the role of ontologies in biological databases has been on the construction of databases. In this paper, we explore a somewhat overlooked aspect regarding ontologies in biological databases, namely, how ontologies can be used to assist better database retrieval. In particular, we show how ontologies can be used to revise user submitted queries for query relaxation. In addition, since our research is conducted at today's “big data” era, our investigation is centered on NoSQL databases which serve as a kind of “representatives” of big data. This paper contains two major parts: First we describe our methodology of building two NoSQL application databases (MongoDB and AllegroGraph) using GO ontology, and then discuss how to achieve query relaxation through GO ontology. We report our experiments and show sample queries and results. Our research on query relaxation on NoSQL databases is complementary to existing work in big data and in biological databases and deserves further exploration.

[1]  Bo Guo,et al.  Toward Ontology-Guided Knowledge-Driven XML Query Relaxation , 2010, 2010 Second International Conference on Computational Intelligence, Modelling and Simulation.

[2]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993, Knowl. Acquis..

[3]  Verena Kantere,et al.  Query Relaxation across Heterogeneous Data Sources , 2015, CIKM.

[4]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[5]  Steffen Staab,et al.  What Is an Ontology? , 2009, Handbook on Ontologies.

[6]  Yike Guo,et al.  High dimensional biological data retrieval optimization with NoSQL technology , 2014, BMC Genomics.

[7]  Maristela Holanda,et al.  A study of genomic data provenance in NoSQL document-oriented database systems , 2015, 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[8]  Anand Kumar,et al.  Text mining and ontologies in biomedicine: Making sense of raw text , 2005, Briefings Bioinform..

[9]  G. Singh,et al.  The Genome Sequence DataBase (GSDB): improving data quality and data access , 1998, Nucleic Acids Res..

[10]  Frédéric Cuppens,et al.  Cooperative Answering: A Methodology to Provide Intelligent Access to databases , 1988, Expert Database Conf..

[11]  Juancarlos Chan,et al.  Gene Ontology Consortium: going forward , 2014, Nucleic Acids Res..

[12]  Steffen Schulze-Kremer,et al.  Ontologies for Molecular Biology , 2001, Electron. Trans. Artif. Intell..

[13]  Lars Juhl Jensen,et al.  Are graph databases ready for bioinformatics? , 2013, Bioinform..

[14]  Stanley Letovsky,et al.  GDB: the Human Genome Database , 1998, Nucleic Acids Res..

[15]  Kevin R Coombes,et al.  Relax with CouchDB--into the non-relational DBMS era of bioinformatics. , 2012, Genomics.

[16]  Carole A. Goble,et al.  Ontologies in Bioinformatics , 2004, Handbook on Ontologies.

[17]  Min Wang,et al.  Supporting Ontology-based Keyword Search over Medical Databases , 2008, AMIA.

[18]  Zhengxin Chen,et al.  Expanding Database Keyword Search for Database Exploration , 2013, ITQM.

[19]  Troels Andreasen,et al.  Ontology-Based Querying , 2000, FQAS.