Supporting knowledge discovery for biodiversity

A proposal for text mining as a support for knowledge discovery on biological descriptions is introduced. Our aim is both to sustain the curation of databases and to offer an alternative representation frame for accessing information in the biodiversity domain. We works on raw texts with minimum human intervention, applying natural language processing to integrate linguistic and domain knowledge in a mathematical model that makes it possible to capture concepts and relationships between them in a computable form, using conceptual graphs. This provides a reasoning basis for determining semantic disjointedness or subsumption, as well as sub and super-concept relationships.

[1]  John F. Sowa,et al.  Conceptual Structures: Information Processing in Mind and Machine , 1983 .

[2]  Alistair Moffat,et al.  Statistical power in retrieval experimentation , 2008, CIKM '08.

[3]  Catherine Faron-Zucker,et al.  A Graph-Based Knowledge Representation Language for Concept Description , 2002, ECAI.

[4]  Park,et al.  Identifying the Interaction between Genes and Gene Products Based on Frequently Seen Verbs in Medline Abstracts. , 1998, Genome informatics. Workshop on Genome Informatics.

[5]  P. Bork,et al.  Literature mining for the biologist: from information retrieval to biological discovery , 2006, Nature Reviews Genetics.

[6]  Sylvain Kahane,et al.  Can the TAG derivation tree represent a semantic graph? An answer in the light of Meaning-Text Theory , 1998, TAG+.

[7]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[8]  Ronan Cummins,et al.  A constraint to automatically regulate document-length normalisation , 2012, CIKM '12.

[9]  Karen Spärck Jones Further reflections on TREC , 2000, Inf. Process. Manag..

[10]  John F. Sowa,et al.  Conceptual Graphs for a Data Base Interface , 1976, IBM J. Res. Dev..

[11]  Laura Kallmeyer,et al.  Semantic construction in feature-based TAG , 2003 .

[12]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[13]  Evgeniy Gabrilovich,et al.  Concept-Based Feature Generation and Selection for Information Retrieval , 2008, AAAI.

[14]  Iadh Ounis,et al.  Term Frequency Normalisation Tuning for BM25 and DFR Models , 2005, ECIR.

[15]  J. Balhoff,et al.  Time to change how we describe biodiversity. , 2012, Trends in ecology & evolution.

[16]  W. Bruce Croft,et al.  Analysis of long queries in a large scale search log , 2009, WSCD '09.

[17]  Thorsten Joachims,et al.  Eye-tracking analysis of user behavior in WWW search , 2004, SIGIR '04.

[18]  J. J. Moré,et al.  Estimation of sparse jacobian matrices and graph coloring problems , 1983 .

[19]  Ralf Zimmer,et al.  RelEx - Relation extraction using dependency parse trees , 2007, Bioinform..

[20]  Claudia Bauzer Medeiros,et al.  Aondê: An ontology Web service for interoperability across biodiversity applications , 2008, Inf. Syst..

[21]  Lucien Tesnière Éléments de syntaxe structurale , 1959 .

[22]  David H. D. Warren,et al.  Definite Clause Grammars for Language Analysis - A Survey of the Formalism and a Comparison with Augmented Transition Networks , 1980, Artif. Intell..

[23]  Carl Pollard,et al.  A Computational Semantics for Natural Language , 1985, ACL.

[24]  Aravind K. Joshi,et al.  Parsing Strategies with ‘Lexicalized’ Grammars: Application to Tree Adjoining Grammars , 1988, COLING.

[25]  Giorgio Satta,et al.  Complexity, Parsing, and Factorization of Tree-Local Multi-Component Tree-Adjoining Grammar , 2010, CL.

[26]  Jari Björne,et al.  Extracting Complex Biological Events with Rich Graph-Based Feature Sets , 2009, BioNLP@HLT-NAACL.

[27]  Christopher J. Mungall,et al.  Obol: Integrating Language and Meaning in Bio-Ontologies , 2004, Comparative and functional genomics.

[28]  Alistair Moffat,et al.  Impact transformation: effective and efficient web retrieval , 2002, SIGIR '02.

[29]  M. Ashburner,et al.  The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration , 2007, Nature Biotechnology.

[30]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[31]  Shengli Wu,et al.  Evaluation of System Measures for Incomplete Relevance Judgment in IR , 2006, FQAS.

[32]  Michael Dittenbach,et al.  Patent Claim Decomposition for Improved Information Extraction , 2011, Current Challenges in Patent Information Retrieval.

[33]  C. Marshall Encyclopedia of Life , 2008 .

[34]  Peter Bailey,et al.  Understanding the relationship of information need specificity to search query length , 2007, SIGIR.

[35]  R. H. Richens,et al.  Preprogramming for mechanical translation , 1956, Mech. Transl. Comput. Linguistics.

[36]  John Seely Brown,et al.  MULTIPLE REPRESENTATIONS OF KNOWLEDGE FOR TUTORIAL REASONING , 1975 .

[37]  Benoît Sagot,et al.  The Lefff, a Freely Available and Large-coverage Morphological and Syntactic Lexicon for French , 2010, LREC.

[38]  Jean-François Baget,et al.  Extensions of Simple Conceptual Graphs: the Complexity of Rules and Constraints , 2011, J. Artif. Intell. Res..

[39]  Peer Bork,et al.  Extraction of Transcript Diversity from Scientific Literature , 2005, PLoS Comput. Biol..

[40]  Charles L. A. Clarke,et al.  An Algebra for Structured Text Search and a Framework for its Implementation , 1995, Comput. J..

[41]  Jun'ichi Tsujii,et al.  Semantic Retrieval for the Accurate Identification of Relational Concepts in Massive Textbases , 2006, ACL.

[42]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[43]  Michel Chein,et al.  A content-search information retrieval process based on conceptual graphs , 2005, Knowledge and Information Systems.

[44]  K. Bretonnel Cohen,et al.  Frontiers of biomedical text mining: current progress , 2007, Briefings Bioinform..

[45]  Rob DeSalle,et al.  Integrating DNA barcode data and taxonomic practice: Determination, discovery, and description , 2011, BioEssays : news and reviews in molecular, cellular and developmental biology.

[46]  Stephen E. Robertson,et al.  Hits hits TREC: exploring IR evaluation results with network analysis , 2007, SIGIR.

[47]  Éric Villemonte de la Clergerie Building factorized TAGs with meta-grammars , 2010, TAG.

[48]  Marvin Minsky,et al.  A framework for representing knowledge , 1974 .

[49]  Amit Singhal,et al.  Pivoted document length normalization , 1996, SIGIR 1996.