Memory-Based Learning: Using Similarity for Smoothing

This paper analyses the relation between the use of similarity in Memory-Based Learning and the notion of backed-off smoothing in statistical language modeling. We show that the two approaches are closely related, and we argue that feature weighting methods in the Memory-Based paradigm can offer the advantage of automatically specifying a suitable domain-specific hierarchy between most specific and most general conditioning information without the need for a large number of parameters. We report two applications of this approach: PP-attachment and POS-tagging. Our method achieves state-of-the-art performance in both domains, and allows the easy integration of diverse information sources, such as rich lexical representations.

[1]  Christer Samuelsson,et al.  Handling Sparse Data by Successive Abstraction , 1996, COLING.

[2]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[3]  Michael Collins,et al.  Prepositional Phrase Attachment through a Backed-off Model , 1995, VLC@ACL.

[4]  B. A. Engel,et al.  INTEGRATING MULTIPLE KNOWLEDGE SOURCES , 1990 .

[5]  David M. Magerman Natural Language Parsing as Statistical Pattern Recognition , 1994, ArXiv.

[6]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Prepositional Phrase Attachment , 1994, HLT.

[7]  Stanley F. Chen,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[8]  Walter Daelemans,et al.  Generalization performance of backpropagation learning on a syllabification task , 1992 .

[9]  Claire Cardie,et al.  Automating Feature Set Selection for Case-Based Learning of Linguistic Knowledge , 1996, EMNLP.

[10]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[11]  Claire Cardie,et al.  Domain-specific knowledge acquisition for conceptual sentence analysis , 1995 .

[12]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[13]  Steven Gillis Abstraction Considered Harmful: Lazy Learning of Language Processing 1 Empirical Learning of Natural Language , 1996 .

[14]  Kenneth Ward Church,et al.  A comparison of the enhanced Good-Turing and deleted estimation methods for estimating probabilities of English bigrams , 1991 .

[15]  Michael Collins,et al.  A New Statistical Parser Based on Bigram Lexical Dependencies , 1996, ACL.

[16]  Sahibsingh A. Dudani The Distance-Weighted k-Nearest-Neighbor Rule , 1976, IEEE Transactions on Systems, Man, and Cybernetics.

[17]  Ido Dagan,et al.  Similarity-Based Estimation of Word Cooccurrence Probabilities , 1994, ACL.

[18]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[20]  金田 重郎,et al.  C4.5: Programs for Machine Learning (書評) , 1995 .

[21]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[22]  Jakub Zavrel,et al.  The Language Environment and Syntactic Word-Class Acquisition. , 1996 .

[23]  David L. Waltz,et al.  Toward memory-based reasoning , 1986, CACM.

[24]  Volker Steinbiss,et al.  Cooccurrence smoothing for stochastic language modeling , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[25]  Walter Daelemans,et al.  Memory-based lexical acquisition and processing , 1993, EAMT.

[26]  Walter Daelemans,et al.  MBT: A Memory-Based Part of Speech Tagger-Generator , 1996, VLC@COLING.

[27]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[28]  Richard M. Schwartz,et al.  Coping with Ambiguity and Unknown Words through Probabilistic Models , 1993, CL.

[29]  Hwee Tou Ng,et al.  Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-Based Approach , 1996, ACL.

[30]  Walter Daelemans,et al.  Abstraction Considered Harmful : Lazy Learning of Language Processing , 1996 .