Distributional Thesaurus Versus WordNet: A Comparison of Backoff Techniques for Unsupervised PP Attachment

Prepositional Phrase (PP) attachment can be addressed by considering frequency counts of dependency triples seen in a non-annotated corpus. However, not all triples appear even in very big corpora. To solve this problem, several techniques have been used. We evaluate two different backoff methods, one based on WordNet and the other on a distributional (automatically created) thesaurus. We work on Spanish. The thesaurus is created using the dependency triples found in the same corpus used for counting the frequency of unambiguous triples. The training corpus used for both methods is an encyclopaedia. The method based on a distributional thesaurus has higher coverage but lower precision than the WordNet method.

[1]  Satanjeev Banerjee,et al.  The Design, Implementation, and Use of the Ngram Statistics Package , 2003, CICLing.

[2]  Stephen Clark,et al.  Class-Based Probability Estimation Using a Semantic Hierarchy , 2002, CL.

[3]  Alexander F. Gelbukh,et al.  Unsupervised Learning of Ontology-Linked Selectional Preferences , 2004, CIARP.

[4]  B. Navarro,et al.  Syntactic , semantic and pragmatic annotation in Cast 3 LB , 2003 .

[5]  Hiram Calvo,et al.  Improving Disambiguation of Prepositional Phrase Attachments Using the Web as Corpus , 2003 .

[6]  Alexander F. Gelbukh,et al.  Evaluation of TnT Tagger for Spanish , 2003, Proceedings of the Fourth Mexican International Conference on Computer Science, 2003. ENC 2003..

[7]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[8]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[9]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[10]  David J. Weir,et al.  Characterising Measures of Lexical Distributional Similarity , 2004, COLING.

[11]  Brian Mitchell Prepositional phrase attachment using machine learning algorithms , 2003 .

[12]  Dan Roth,et al.  Learning to Resolve Natural Language Ambiguities: A Unified Approach , 1998, AAAI/IAAI.

[13]  Julie Elizabeth Weeds,et al.  Measures and applications of lexical distributional similarity , 2003 .

[14]  Gregory Grefenstette,et al.  Explorations in automatic thesaurus discovery , 1994 .

[15]  Hang Li,et al.  Word Clustering and Disambiguation Based on Co-occurrence Data , 1998, COLING.

[16]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Prepositional Phrase Attachment , 1994, HLT.

[17]  Mark McLauchlan Thesauruses for Prepositional Phrase Attachment , 2004, CoNLL.

[18]  Philip Resnik,et al.  Selectional Preference and Sense Disambiguation , 1997 .

[19]  Alexander F. Gelbukh,et al.  Improving Prepositional Phrase Attachment Disambiguation Using the Web as Corpus , 2003, CIARP.

[20]  Mats Rooth,et al.  Structural Ambiguity and Lexical Relations , 1991, ACL.

[21]  Patrick Pantel,et al.  An Unsupervised Approach to Prepositional Phrase Attachment using Contextually Similar Words , 2000, ACL.

[22]  Stephen Clark,et al.  Class-based probability estimation using a semantic hierarchy , 2001, HTL 2001.

[23]  A. Kilgarriff,et al.  Thesauruses for natural language processing , 2003, International Conference on Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003.

[24]  Martin Volk,et al.  Exploiting the WWW as a corpus to resolve PP attachment ambiguities , 2001 .

[25]  Makoto Nagao,et al.  Corpus Based PP Attachment Ambiguity Resolution with a Semantic Dictionary , 1997, VLC.

[26]  Horacio Rodríguez,et al.  Using WordNet for Building WordNets , 1998, WordNet@ACL/COLING.

[27]  Hang Li,et al.  Word Clustering and Disambiguation Based on Co-occurrence Data , 1998, COLING.