Linking and querying genomic datasets using natural language

The association of experimental data with domain knowledge expressed in ontologies facilitates information aggregation, meaningful querying and knowledge discovery to aid in the process of analyzing the extensive amount of interconnected data available for genome projects. TcruziKB is an ontology-driven problem solving system to describe and provide access to the data available for a traditional genome database for the parasite Trypanosoma Cruzi. The problem solving environment enables many advanced search and information presentation features that enable complex queries that would be difficult, if not impossible, to execute without semantic enhancements. However the problem solving features do not only improve the quality of the information retrieved but also reduces the strain on the user by improving usability over the standard system.

[1]  Bobby Eugene McKnight From a genome database to a semantic knowledge base , 2008 .

[2]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[3]  Elizabeth Shoop,et al.  Data exploration tools for the Gene Ontology database , 2004, Bioinform..

[4]  Abraham Bernstein,et al.  Talking to the Semantic Web - A Controlled English Query Interface for Ontologies* , 2004 .

[5]  Terrence S. Furey,et al.  The UCSC Genome Browser Database , 2003, Nucleic Acids Res..

[6]  Dan Klein,et al.  Fast Exact Inference with a Factored Model for Natural Language Parsing , 2002, NIPS.

[7]  T. Tatusova,et al.  Entrez Gene: gene-centered information at NCBI , 2010, Nucleic Acids Res..

[8]  Leila Kosseim,et al.  Using Selectional Restrictions to Query an OWL Ontology , 2006, FOIS.

[9]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[10]  Sonia Bergamaschi,et al.  The SEWASIE Multi-agent System , 2004, AP2PC.

[11]  Jessica C. Kissinger,et al.  TcruziDB: an integrated Trypanosoma cruzi genome resource , 2004, Nucleic Acids Res..

[12]  Vassilis Christophides,et al.  Generating On the Fly Queries for the Semantic Web: The ICS-FORTH Graphical RQL Interface (GRQL) , 2004, SEMWEB.

[13]  Alex Bateman,et al.  The InterPro database, an integrated documentation resource for protein families, domains and functional sites , 2001, Nucleic Acids Res..

[14]  J. B. Brooke,et al.  SUS: A 'Quick and Dirty' Usability Scale , 1996 .

[15]  Hiroshi Maruyama An Interactive Japanese Parser for Machine Translation , 1990, COLING.