All-word Prediction as the Ultimate Confusible Disambiguation

We present a classification-based word prediction model based on IGTree, a decision-tree induction algorithm with favorable scaling abilities and a functional equivalence to n-gram models with back-off smoothing. Through a first series of experiments, in which we train on Reuters newswire text and test either on the same type of data or on general or fictional text, we demonstrate that the system exhibits log-linear increases in prediction accuracy with increasing numbers of training examples. Trained on 30 million words of newswire text, prediction accuracies range between 12.6% on fictional text and 42.2% on newswire text. In a second series of experiments we compare all-words prediction with confusable prediction, i.e., the same task, but specialized to predicting among limited sets of words. Confusable prediction yields high accuracies on nine example confusable sets in all genres of text. The confusable approach outperforms the all-words-prediction approach, but with more data the difference decreases.

[1]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[2]  Eric Brill,et al.  Automatic Rule Acquisition for Spelling Correction , 1997, ICML.

[3]  Walter Daelemans,et al.  Forgetting Exceptions is Harmful in Language Learning , 1998, Machine Learning.

[4]  Dan Roth,et al.  A Classification Approach to Word Prediction , 2000, ANLP.

[5]  Andrew R. Golding,et al.  A Bayesian Hybrid Method for Context-sensitive Spelling Correction , 1996, VLC@ACL.

[6]  Zhifang Sui,et al.  An information-based method for selecting feature types for word prediction , 1999, EUROSPEECH.

[7]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[8]  Donald E. Knuth,et al.  The Art of Computer Programming: Volume 3: Sorting and Searching , 1998 .

[9]  H. Kucera,et al.  Computational analysis of present-day American English , 1967 .

[10]  Michele Banko,et al.  Scaling to Very Very Large Corpora for Natural Language Disambiguation , 2001, ACL.

[11]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[12]  Dan Roth,et al.  A Winnow-Based Approach to Context-Sensitive Spelling Correction , 1998, Machine Learning.

[13]  Walter Daelemans,et al.  Memory-Based Learning: Using Similarity for Smoothing , 1997, ACL.

[14]  Donald E. Knuth,et al.  The art of computer programming, volume 3: (2nd ed.) sorting and searching , 1998 .

[15]  David M. W. Powers,et al.  Large scale experiments on correction of confused words , 2001, Proceedings 24th Australian Computer Science Conference. ACSC 2001.

[16]  David Yarowsky,et al.  DECISION LISTS FOR LEXICAL AMBIGUITY RESOLUTION: Application to Accent Restoration in Spanish and French , 1994, ACL.

[17]  Walter Daelemans,et al.  IGTree: Using Trees for Compression and Classification in Lazy Learning Algorithms , 1997, Artificial Intelligence Review.