The Choice of Features for Classification of Verbs in Biomedical Texts

We conduct large-scale experiments to investigate optimal features for classification of verbs in biomedical texts. We introduce a range of feature sets and associated extraction techniques, and evaluate them thoroughly using a robust method new to the task: cost-based framework for pairwise clustering. Our best results compare favourably with earlier ones. Interestingly, they are obtained with sophisticated feature sets which include lexical and semantic information about selectional preferences of verbs. The latter are acquired automatically from corpus data using a fully unsupervised method.

[1]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[2]  Barbara B. Levin,et al.  English verb classes and alternations , 1993 .

[3]  Ted Briscoe,et al.  Automatic Extraction of Subcategorization from Corpora , 1997, ANLP.

[4]  Mats Rooth,et al.  Using a Probabilistic Class-Based Lexicon for Lexical Ambiguity Resolution , 2000, COLING.

[5]  Claudia Kunze,et al.  Extension and Use of GermaNet, a Lexical-Semantic Database , 2000, LREC.

[6]  Joachim M. Buhmann,et al.  A theory of proximity based clustering: structure detection by optimization , 2000, Pattern Recognit..

[7]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[8]  Eric Joanis,et al.  Automatic Verb Classification Using a General Feature Space , 2002 .

[9]  Carol Friedman,et al.  Two biomedical sublanguages: a description based on the theories of Zellig Harris , 2002, J. Biomed. Informatics.

[10]  Ted Briscoe,et al.  Robust Accurate Statistical Annotation of General Text , 2002, LREC.

[11]  Suzanne Stevenson,et al.  A General Feature Space for Automatic Verb Classification , 2003, EACL.

[12]  Martha Palmer,et al.  Investigations into the role of lexical semantics in word sense disambiguation , 2004 .

[13]  Bonnie J. Dorr,et al.  Large-Scale Dictionary Construction for Foreign Language Tutoring and Interlingual Machine Translation , 1998, Machine Translation.

[14]  Suzanne Stevenson,et al.  Unsupervised Semantic Role Labellin , 2004, EMNLP.

[15]  Lei Shi,et al.  Putting Pieces Together: Combining FrameNet, VerbNet and WordNet for Robust Semantic Parsing , 2005, CICLing.

[16]  Sophia Ananiadou,et al.  Text mining and its potential applications in systems biology. , 2006, Trends in biotechnology.

[17]  Ted Briscoe,et al.  Bootstrapping the Recognition and Anaphoric Linking of Named Entities in Drosophila Articles , 2006, Pacific Symposium on Biocomputing.

[18]  Nigel Collier,et al.  Automatic Classification of Verbs in Biomedical Texts , 2006, ACL.

[19]  K. Bretonnel Cohen,et al.  A critical review of PASBio's argument structures for biomedical verbs , 2006, BMC Bioinformatics.

[20]  Sabine Schulte im Walde Experiments on the Automatic Induction of German Semantic Verb Classes , 2006, CL.

[21]  김두식,et al.  English Verb Classes and Alternations , 2006 .

[22]  Yuval Krymolowski,et al.  Verb Class Discovery from Rich Syntactic Data , 2008, CICLing.