论文信息 - Memory-Based Learning: Using Similarity for Smoothing

Memory-Based Learning: Using Similarity for Smoothing

This paper analyses the relation between the use of similarity in Memory-Based Learning and the notion of backed-off smoothing in statistical language modeling. We show that the two approaches are closely related, and we argue that feature weighting methods in the Memory-Based paradigm can offer the advantage of automatically specifying a suitable domain-specific hierarchy between most specific and most general conditioning information without the need for a large number of parameters. We report two applications of this approach: PP-attachment and POS-tagging. Our method achieves state-of-the-art performance in both domains, and allows the easy integration of diverse information sources, such as rich lexical representations.

Walter Daelemans | Jakub Zavrel | Jakub Zavrel | Walter Daelemans

[1] Christer Samuelsson,et al. Handling Sparse Data by Successive Abstraction , 1996, COLING.

[2] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .

[3] Michael Collins,et al. Prepositional Phrase Attachment through a Backed-off Model , 1995, VLC@ACL.

[4] B. A. Engel,et al. INTEGRATING MULTIPLE KNOWLEDGE SOURCES , 1990 .

[5] David M. Magerman. Natural Language Parsing as Statistical Pattern Recognition , 1994, ArXiv.

[6] Adwait Ratnaparkhi,et al. A Maximum Entropy Model for Prepositional Phrase Attachment , 1994, HLT.

[7] Stanley F. Chen,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[8] Walter Daelemans,et al. Generalization performance of backpropagation learning on a syllabification task , 1992 .

[9] Claire Cardie,et al. Automating Feature Set Selection for Case-Based Learning of Linguistic Knowledge , 1996, EMNLP.

[10] Alberto Maria Segre,et al. Programs for Machine Learning , 1994 .

[11] Claire Cardie,et al. Domain-specific knowledge acquisition for conceptual sentence analysis , 1995 .

[12] Robert L. Mercer,et al. Class-Based n-gram Models of Natural Language , 1992, CL.

[13] Steven Gillis. Abstraction Considered Harmful: Lazy Learning of Language Processing 1 Empirical Learning of Natural Language , 1996 .

[14] Kenneth Ward Church,et al. A comparison of the enhanced Good-Turing and deleted estimation methods for estimating probabilities of English bigrams , 1991 .

[15] Michael Collins,et al. A New Statistical Parser Based on Bigram Lexical Dependencies , 1996, ACL.

[16] Sahibsingh A. Dudani. The Distance-Weighted k-Nearest-Neighbor Rule , 1976, IEEE Transactions on Systems, Man, and Cybernetics.

[17] Ido Dagan,et al. Similarity-Based Estimation of Word Cooccurrence Probabilities , 1994, ACL.

[18] Lalit R. Bahl,et al. A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19] Adwait Ratnaparkhi,et al. A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[20] 金田重郎,et al. C4.5: Programs for Machine Learning (書評) , 1995 .

[21] F ChenStanley,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[22] Jakub Zavrel,et al. The Language Environment and Syntactic Word-Class Acquisition. , 1996 .

[23] David L. Waltz,et al. Toward memory-based reasoning , 1986, CACM.

[24] Volker Steinbiss,et al. Cooccurrence smoothing for stochastic language modeling , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[25] Walter Daelemans,et al. Memory-based lexical acquisition and processing , 1993, EAMT.

[26] Walter Daelemans,et al. MBT: A Memory-Based Part of Speech Tagger-Generator , 1996, VLC@COLING.

[27] Slava M. Katz,et al. Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[28] Richard M. Schwartz,et al. Coping with Ambiguity and Unknown Words through Probabilistic Models , 1993, CL.

[29] Hwee Tou Ng,et al. Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-Based Approach , 1996, ACL.

[30] Walter Daelemans,et al. Abstraction Considered Harmful : Lazy Learning of Language Processing , 1996 .