A Standard Lexical-Terminological Resource for the Bio Domain

The present paper describes a large-scale lexical resource for the biology domain designed both for human and for machine use. This lexicon aims at semantic interoperability and extendability, through the adoption of ISO-LMF standard for lexical representation and through a granular and distributed encoding of relevant information. The first part of this contribution focuses on three aspects of the model that are of particular interest to the biology community: the treatment of term variants, the representation on bio events and the alignment with a domain ontology. The second part of the paper describes the physical implementation of the model: a relational database equipped with a set of automatic uploading procedures. Peculiarity of the BioLexicon is that it combines features of both terminologies and lexicons. A set verbs relevant for the domain is also represented with full details on their syntactic and semantic argument structure.

[1]  Elisabetta Gola,et al.  A computational semantic lexicon of Italian: SIMPLE , 1999 .

[2]  Joe F. Zhou,et al.  Proceedings of the 1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, : 21-22 June 1999, University of Maryland, College Park, MD, USA , 1999 .

[3]  Goran Nenadic,et al.  Enhancing automatic term recognition through recognition of variation , 2004, COLING.

[4]  Angus Roberts,et al.  A Large Scale Terminology Resource for Biomedical Text Processing , 2004, HLT-NAACL 2004.

[5]  Donald Hindle,et al.  Noun Classification From Predicate-Argument Structures , 1990, ACL.

[6]  Mirella Lapata,et al.  Using Subcategorization to Resolve Verb Class Ambiguity , 1999, EMNLP.

[7]  Monica Monachini,et al.  ELRA Validation Methodology and Standard Promotion for Linguistic Resources , 2004, LREC.

[8]  Udo Hahn,et al.  Joint knowledge capture for grammars and ontologies , 2001, K-CAP '01.

[9]  Goran Nenadic,et al.  Using Domain-Specific Verbs for Term Classification , 2003, BioNLP@ACL.

[10]  Steffen Staab,et al.  Clustering Concept Hierarchies from Text , 2004, LREC.

[11]  Sue Ellen Wright A Global Data Category Registry for Interoperable Language Resources , 2004, LREC.

[12]  David A. Ferrucci,et al.  UIMA: an architectural approach to unstructured information processing in the corporate research environment , 2004, Natural Language Engineering.

[13]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.

[14]  Diana McCarthy,et al.  Word Sense Disambiguation Using Automatically Acquired Verbal Preferences , 2000, Comput. Humanit..