Improving Data Discovery for Metadata Repositories through Semantic Search

The amount of ecological data available electronically is increasing at a rapid rate, e.g., over 15,000 data sets are available today in the Knowledge Network for Biocom-plexity (KNB) alone. Using the existing search capabilities of these online data repositories, however, scientists struggle to quickly locate data that are relevant to their needs or that will integrate with their current data sets. Semantic technologies aim at addressing many of these problems and hold the promise of enabling more powerful "smart" searches of online data archives. We describe new semantic search features within the Metacat meta-data system, which is used by many ecological research sites around the world for archiving their data using a standardized metadata format. Our semantic search sys-tem adds to Metacat the ability to store OWL-DL ontologies in addition to semantic annotations that link data set attributes to ontology terms. Our approach also extends Metacat to improve metadata search in multiple ways: (i) by expanding standard keyword searches with ontology term hierarchies; (ii) by allowing keyword searches to be applied to annotations in addition to traditional meta-data; and (iii) by allowing more structured searches over annotations via ontology terms. We describe our implementation of these extensions, and compare and contrast these different types of search for a corpus of annotated documents. As data repositories continue to grow, these tools will be instrumental in helping scientists precisely locate and then interpret data for their research needs.

[1]  B R Schatz,et al.  Information Retrieval in Digital Libraries: Bringing Search to the Net , 1997, Science.

[2]  Catriel Beeri,et al.  Rewriting queries using views in description logics , 1997, PODS '97.

[3]  Diego Calvanese,et al.  Realizing Ontology Based Data Access: A plug-in for protégé , 2008, 2008 IEEE 24th International Conference on Data Engineering Workshop.

[4]  Pablo Castells,et al.  An Ontology-Based Information Retrieval Model , 2005, ESWC.

[5]  Robert Meersman,et al.  Towards Community-Based Evolution of Knowledge-Intensive Systems , 2007, OTM Conferences.

[6]  Boris Worm,et al.  Services Impacts of Biodiversity Loss on Ocean Ecosystem , 2009 .

[7]  Alan L. Rector,et al.  Editing Description Logic Ontologies with the Protégé OWL Plugin , 2004, Description Logics.

[8]  George A. Vouros,et al.  Human-centered ontology engineering: The HCOME methodology , 2006, Knowledge and Information Systems.

[9]  Robert Costanza,et al.  The value of ecosystem services , 1998 .

[10]  Venkata Subramaniam,et al.  Information Retrieval: Data Structures & Algorithms , 1992 .

[11]  Matthew B. Jones,et al.  Managing heterogeneous ecological data using Morpho , 2002, Proceedings 14th International Conference on Scientific and Statistical Database Management.

[12]  Deborah L. McGuinness,et al.  The Virtual Solar-Terrestrial Observatory: A Deployed Semantic Web Application Case Study for Scientific Research , 2007, AAAI.

[13]  Von-Wun Soo,et al.  Ontology-based information retrieval and extraction , 2005, ITRE 2005. 3rd International Conference on Information Technology: Research and Education, 2005..

[14]  Yarden Katz,et al.  Pellet: A practical OWL-DL reasoner , 2007, J. Web Semant..

[15]  Shawn Bowers,et al.  An ontology for describing and synthesizing ecological observation data , 2007, Ecol. Informatics.

[16]  C. Bizer,et al.  D2R MAP - A Database to RDF Mapping Language , 2003, WWW.

[17]  Enrico Motta,et al.  The usability of semantic search tools: a review , 2007, The Knowledge Engineering Review.

[18]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[19]  Lois M. L. Delcambre,et al.  Component-based end-user database design for ecologists , 2006, Journal of Intelligent Information Systems.

[20]  Matthew B. Jones,et al.  Managing Scientific Metadata , 2001, IEEE Internet Comput..

[21]  F. Ayala,et al.  Complexity in Ecology and Conservation: Mathematical, Statistical, and Computational Challenges , 2005 .

[22]  Timothy W. Finin,et al.  RDF123: From Spreadsheets to RDF , 2008, SEMWEB.

[23]  Bilişim Observations and Measurements , 2010 .

[24]  Hans-Michael Müller,et al.  Textpresso: An Ontology-Based Information Retrieval and Extraction System for Biological Literature , 2004, PLoS biology.

[25]  Jennifer Golbeck,et al.  Ontologies for ecoinformatics , 2006, J. Web Semant..

[26]  Nicola Guarino,et al.  OntoSeek: content-based access to the Web , 1999, IEEE Intell. Syst..

[27]  Matthew Jones,et al.  Maximizing the Value of Ecological Data with Structured Metadata: An Introduction to Ecological Metadata Language (EML) and Principles for Metadata Creation , 2005 .

[28]  James A. Hendler,et al.  The Semantic Web" in Scientific American , 2001 .

[29]  Jeffery S. Horsburgh,et al.  CUAHSI Community Observations Data Model Working Design Specifications Document - Version 4 , 2006 .

[30]  Chris Mattmann,et al.  Semantic Interoperability for Earth Science Data , 2004 .

[31]  Ricardo Baeza-Yates,et al.  Information Retrieval: Data Structures and Algorithms , 1992 .

[32]  Shawn Bowers,et al.  A Conceptual Modeling Framework for Expressing Observational Data Semantics , 2008, ER.

[33]  John Mylopoulos,et al.  Discovering the Semantics of Relational Tables Through Mappings , 2006, J. Data Semant..

[34]  Shawn Bowers,et al.  The New Bioinformatics: Integrating Ecological Data from the Gene to the Biosphere , 2006 .

[35]  M. P. Cummings,et al.  Data sharing in ecology and evolution. , 2005, Trends in ecology & evolution.

[36]  Jennifer E. Rowley,et al.  The controlled versus natural indexing languages debate revisited: a perspective on information retrieval practice and research , 1994, J. Inf. Sci..

[37]  Shawn Bowers,et al.  Advancing ecological research with ontologies. , 2008, Trends in ecology & evolution.