A Comparison between Supervised Learning Algorithms for Word Sense Disambiguation

This paper describes a set of comparative experiments, including cross-corpus evaluation, between five alternative algorithms for supervised Word Sense Disambiguation (WSD), namely Naive Bayes, Exemplar-based learning, SNoW, Decision Lists, and Boosting. Two main conclusions can be drawn: 1) The LazyBoosting algorithm outperforms the other four state-of-the-art algorithms in terms of accuracy and ability to tune to new domains; 2) The domain dependence of WSD systems seems very strong and suggests that some kind of adaptation or tuning is required for cross-corpus application.

[1]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[2]  J. Gerring A case study , 2011, Technology and Society.

[3]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[4]  German Rigau Claramunt,et al.  On the portability and tuning of supervised word sense disambiguation systems , 2000 .

[5]  A MillerGeorge,et al.  Using corpus statistics and WordNet relations for sense identification , 1998 .

[6]  Jean Carletta,et al.  Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[7]  Janyce Wiebe,et al.  Decomposable Modeling in Natural Language Processing , 1999, CL.

[8]  RothDan,et al.  A Winnow-Based Approach to Context-Sensitive Spelling Correction , 1999 .

[9]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[10]  Peter E. Hart,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[11]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[12]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[13]  Ellen M. Voorhees,et al.  Disambiguating Highly Ambiguous Words , 1998, CL.

[14]  Eneko Agirre,et al.  Exploring Automatic Word Sense Disambiguation with Decision Lists and the Web , 2000, SAIC@COLING.

[15]  George A. Miller,et al.  Using Corpus Statistics and WordNet Relations for Sense Identification , 1998, CL.

[16]  Nancy Ide,et al.  Introduction to the Special Issue on Word Sense Disambiguation: The State of the Art , 1998, Comput. Linguistics.

[17]  Adam Kilgarriff,et al.  English Senseval: Report and Results , 2000, LREC.

[18]  Hwee Tou Ng,et al.  Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-Based Approach , 1996, ACL.

[19]  David Yarowsky,et al.  DECISION LISTS FOR LEXICAL AMBIGUITY RESOLUTION: Application to Accent Restoration in Spanish and French , 1994, ACL.

[20]  Chung Yong Lim,et al.  A Case Study on Inter-Annotator Agreement for Word Sense Disambiguation , 1999 .

[21]  Raymond J. Mooney,et al.  Comparative Experiments on Disambiguating Word Senses: An Illustration of the Role of Bias in Machine Learning , 1996, EMNLP.

[22]  Satoshi Sekine,et al.  The Domain Dependence of Parsing , 1997, ANLP.

[23]  Lluís Màrquez i Villodre,et al.  Boosting Applied toe Word Sense Disambiguation , 2000, ECML.

[24]  Lluís Màrquez i Villodre,et al.  Boosting Applied to Word Sense Disambiguation , 2000, ArXiv.

[25]  Jean Véronis,et al.  A study of polysemy judgements and inter-annotator agreement , 1999 .

[26]  Lluís Màrquez i Villodre,et al.  Naive Bayes and Exemplar-based Approaches to Word Sense Disambiguation Revisited , 2000, ECAI.

[27]  Dan Roth,et al.  Learning to Resolve Natural Language Ambiguities: A Unified Approach , 1998, AAAI/IAAI.

[28]  Kentaro Inui,et al.  Selective Sampling for Example-based Word Sense Disambiguation , 1998, CL.

[29]  Hwee Tou Ng,et al.  Getting Serious about Word Sense Disambiguation , 2002 .

[30]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[31]  Hwee Tou Ng,et al.  Exemplar-Based Word Sense Disambiguation” Some Recent Improvements , 1997, EMNLP.