Approximation Lasso Methods for Language Modeling

Lasso is a regularization method for parameter estimation in linear models. It optimizes the model parameters with respect to a loss function subject to model complexities. This paper explores the use of lasso for statistical language modeling for text input. Owing to the very large number of parameters, directly optimizing the penalized lasso loss function is impossible. Therefore, we investigate two approximation methods, the boosted lasso (BLasso) and the forward stagewise linear regression (FSLR). Both methods, when used with the exponential loss function, bear strong resemblance to the boosting algorithm which has been used as a discriminative training method for language modeling. Evaluations on the task of Japanese text input show that BLasso is able to produce the best approximation to the lasso solution, and leads to a significant improvement, in terms of character error rate, over boosting and the traditional maximum likelihood estimation.

[1]  P. Zhao Boosted Lasso , 2004 .

[2]  I. Johnstone,et al.  Wavelet Shrinkage: Asymptopia? , 1995 .

[3]  Michael Collins,et al.  Discriminative Reranking for Natural Language Parsing , 2000, CL.

[4]  Wei Yuan,et al.  An Empirical Study on Language Model Adaptation Using a Metric of Domain Similarity , 2005, IJCNLP.

[5]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[6]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[7]  David G. Stork,et al.  Pattern Classification , 1973 .

[8]  Brian Roark,et al.  Corrective language modeling for large vocabulary ASR with the perceptron algorithm , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[10]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[11]  M. R. Osborne,et al.  A new approach to variable selection in least squares problems , 2000 .

[12]  M. R. Osborne,et al.  On the LASSO and its Dual , 2000 .

[13]  Wei Yuan,et al.  Minimum Sample Risk Methods for Language Modeling , 2005, HLT/EMNLP.

[14]  Brian Roark,et al.  Language Model Adaptation with MAP Estimation and the Perceptron Algorithm , 2004, NAACL.

[15]  Jianfeng Gao,et al.  A Comparative Study on Language Model Adaptation Using New Evaluation Metrics , 2005 .

[16]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[17]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[18]  Jianfeng Gao,et al.  Exploiting Headword Dependency and Predictive Clustering for Language Modeling , 2002, EMNLP.

[19]  Jianfeng Gao,et al.  A Comparative Study on Language Model Adaptation Techniques Using New Evaluation Metrics , 2005, HLT.

[20]  Robert Tibshirani,et al.  1-norm Support Vector Machines , 2003, NIPS.