A discriminative language model with pseudo-negative samples

In this paper, we propose a novel discriminative language model, which can be applied quite generally. Compared to the well known N-gram language models, discriminative language models can achieve more accurate discrimination because they can employ overlapping features and nonlocal information. However, discriminative language models have been used only for re-ranking in specific applications because negative examples are not available. We propose sampling pseudo-negative examples taken from probabilistic language models. However, this approach requires prohibitive computational cost if we are dealing with quite a few features and training samples. We tackle the problem by estimating the latent information in sentences using a semiMarkov class model, and then extracting features from them. We also use an online margin-based algorithm with efficient kernel computation. Experimental results show that pseudo-negative examples can be treated as real negative examples and our model can classify these sentences correctly.

[1]  Hermann Ney,et al.  Algorithms for bigram and trigram word clustering , 1995, Speech Commun..

[2]  Frédéric Bimbot,et al.  Language modeling by variable length sequences: theoretical formulation and evaluation of multigrams , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[3]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[4]  Ronald Rosenfeld,et al.  Whole-sentence exponential language models: a vehicle for linguistic-statistical integration , 2001, Comput. Speech Lang..

[5]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[6]  Yuji Matsumoto,et al.  Fast Methods for Kernel-Based Text Analysis , 2003, ACL.

[7]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[8]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[9]  Dale Schuurmans,et al.  Exploiting syntactic, semantic and lexical regularities in language modeling via directed Markov random fields , 2004, 2004 International Symposium on Chinese Spoken Language Processing.

[10]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[11]  Wei Yuan,et al.  Minimum Sample Risk Methods for Language Modeling , 2005, HLT/EMNLP.

[12]  Jun'ichi Tsujii,et al.  Probabilistic Disambiguation Models for Wide-Coverage HPSG Parsing , 2005, ACL.

[13]  Noah A. Smith,et al.  Contrastive Estimation: Training Log-Linear Models on Unlabeled Data , 2005, ACL.

[14]  Yoram Singer,et al.  The Forgetron: A Kernel-Based Perceptron on a Fixed Budget , 2005, NIPS.

[15]  Dale Schuurmans,et al.  implicit Online Learning with Kernels , 2006, NIPS.

[16]  Brian Roark,et al.  Discriminative n-gram language modeling , 2007, Comput. Speech Lang..