Unsupervised Learning of Semantic Relations between Concepts of a Molecular Biology Ontology

In this paper we present an unsupervised model for learning arbitrary relations between concepts of a molecular biology ontology for the purpose of supporting text mining and manual ontology building. Relations between named-entities are learned from the GENIA corpus by means of several standard natural language processing techniques. An in-depth analysis of the output of the system shows that the model is accurate and has good potentials for text mining and ontology building applications.

[1]  Jin-Dong Kim,et al.  The GENIA corpus: an annotated research abstract corpus in molecular biology domain , 2002 .

[2]  Michael Krauthammer,et al.  GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles , 2001, ISMB.

[3]  Walter Daelemans,et al.  Automatic Initiation of an Ontology , 2004, CoopIS/DOA/ODBASE.

[4]  Neil R. Smalheiser,et al.  Artificial Intelligence An interactive system for finding complementary literatures : a stimulus to scientific discovery , 1995 .

[5]  Eugene Charniak,et al.  Finding Parts in Very Large Corpora , 1999, ACL.

[6]  Jun'ichi Tsujii,et al.  Tuning support vector machines for biomedical named entity recognition , 2002, ACL Workshop on Natural Language Processing in the Biomedical Domain.

[7]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[8]  Uwe Reyle,et al.  Developing a Protein-Interactions Ontology , 2003, Comparative and functional genomics.

[9]  Dekang Lin,et al.  DIRT – Discovery of Inference Rules from Text , 2001 .

[10]  Peer Bork,et al.  Extracting Regulatory Gene Expression Networks From Pubmed , 2004, ACL.

[11]  Patrick Pantel,et al.  Automatically Labeling Semantic Classes , 2004, NAACL.

[12]  Mark Craven,et al.  Constructing Biological Knowledge Bases by Extracting Information from Text Sources , 1999, ISMB.

[13]  Carole A. Goble,et al.  Transparent access to multiple bioinformatics information sources , 2001, IBM Syst. J..

[14]  James Pustejovsky,et al.  Robust Relational Parsing Over Biomedical Literature: Extracting Inhibit Relations , 2001, Pacific Symposium on Biocomputing.

[15]  Esther Ratsch,et al.  A database system for the analysis of biochemical pathways , 2002, Silico Biol..

[16]  Uwe Sauer,et al.  Modelling gene expression using stochastic simulation , 2004 .

[17]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[18]  Stephen Clark,et al.  Class-based probability estimation using a semantic hierarchy , 2001, HTL 2001.

[19]  Deborah L. McGuinness,et al.  OWL Web ontology language overview , 2004 .

[20]  Barbara Rosario,et al.  Classifying Semantic Relations in Bioscience Texts , 2004, ACL.

[21]  Patrick Pantel,et al.  DIRT @SBT@discovery of inference rules from text , 2001, KDD '01.