论文信息 - A discriminative language model with pseudo-negative samples

A discriminative language model with pseudo-negative samples

In this paper, we propose a novel discriminative language model, which can be applied quite generally. Compared to the well known N-gram language models, discriminative language models can achieve more accurate discrimination because they can employ overlapping features and nonlocal information. However, discriminative language models have been used only for re-ranking in specific applications because negative examples are not available. We propose sampling pseudo-negative examples taken from probabilistic language models. However, this approach requires prohibitive computational cost if we are dealing with quite a few features and training samples. We tackle the problem by estimating the latent information in sentences using a semiMarkov class model, and then extracting features from them. We also use an online margin-based algorithm with efficient kernel computation. Experimental results show that pseudo-negative examples can be treated as real negative examples and our model can classify these sentences correctly.

Jun'ichi Tsujii | Daisuke Okanohara | Junichi Tsujii | Daisuke Okanohara

[1] Hermann Ney,et al. Algorithms for bigram and trigram word clustering , 1995, Speech Commun..

[2] Frédéric Bimbot,et al. Language modeling by variable length sequences: theoretical formulation and evaluation of multigrams , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[3] F ChenStanley,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[4] Ronald Rosenfeld,et al. Whole-sentence exponential language models: a vehicle for linguistic-statistical integration , 2001, Comput. Speech Lang..

[5] Koby Crammer,et al. Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[6] Yuji Matsumoto,et al. Fast Methods for Kernel-Based Text Analysis , 2003, ACL.

[7] Nello Cristianini,et al. Kernel Methods for Pattern Analysis , 2003, ICTAI.

[8] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[9] Dale Schuurmans,et al. Exploiting syntactic, semantic and lexical regularities in language modeling via directed Markov random fields , 2004, 2004 International Symposium on Chinese Spoken Language Processing.

[10] Eugene Charniak,et al. Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[11] Wei Yuan,et al. Minimum Sample Risk Methods for Language Modeling , 2005, HLT/EMNLP.

[12] Jun'ichi Tsujii,et al. Probabilistic Disambiguation Models for Wide-Coverage HPSG Parsing , 2005, ACL.

[13] Noah A. Smith,et al. Contrastive Estimation: Training Log-Linear Models on Unlabeled Data , 2005, ACL.

[14] Yoram Singer,et al. The Forgetron: A Kernel-Based Perceptron on a Fixed Budget , 2005, NIPS.

[15] Dale Schuurmans,et al. implicit Online Learning with Kernels , 2006, NIPS.

[16] Brian Roark,et al. Discriminative n-gram language modeling , 2007, Comput. Speech Lang..