Learning Trans-Dimensional Random Fields with Applications to Language Modeling
暂无分享,去创建一个
[1] Brendan J. Frey,et al. A comparison of algorithms for inference and learning in probabilistic graphical models , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[2] Hermann Ney,et al. A Convergence Analysis of Log-Linear Training , 2011, NIPS.
[3] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.
[4] Jun Wu,et al. Maximum entropy techniques for exploiting syntactic, semantic and collocational dependencies in language modeling , 2000, Comput. Speech Lang..
[5] Moustapha Cissé,et al. Efficient softmax approximation for GPUs , 2016, ICML.
[6] José-Miguel Benedí,et al. Improvement of a Whole Sentence Maximum Entropy Language Model Using Grammatical Features , 2001, ACL.
[7] T. Minka. A comparison of numerical optimizers for logistic regression , 2004 .
[8] Hermann Ney,et al. LSTM Neural Networks for Language Modeling , 2012, INTERSPEECH.
[9] M. Gu,et al. Maximum likelihood estimation for spatial models by Markov chain Monte Carlo stochastic approximation , 2001 .
[10] Ronald Rosenfeld,et al. Whole-sentence exponential language models: a vehicle for linguistic-statistical integration , 2001, Comput. Speech Lang..
[11] Lukás Burget,et al. Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[12] Geoffrey E. Hinton. A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.
[13] H. Robbins. A Stochastic Approximation Method , 1951 .
[14] Joshua Goodman,et al. A bit of progress in language modeling , 2001, Comput. Speech Lang..
[15] L. Younes. Parametric Inference for imperfectly observed Gibbsian fields , 1989 .
[16] R. Carroll,et al. Stochastic Approximation in Monte Carlo Computation , 2007 .
[17] Pierre Priouret,et al. Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.
[18] John D. Lafferty,et al. Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..
[19] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[20] Stanley F. Chen,et al. Shrinking Exponential Language Models , 2009, NAACL.
[21] Brian Roark,et al. Discriminative Language Modeling with Conditional Random Fields and the Perceptron Algorithm , 2004, ACL.
[22] Koray Kavukcuoglu,et al. Learning word embeddings efficiently with noise-contrastive estimation , 2013, NIPS.
[23] Joshua Goodman,et al. Classes for fast maximum entropy training , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).
[24] Vysoké Učení,et al. Statistical Language Models Based on Neural Networks , 2012 .
[25] Tijmen Tieleman,et al. Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.
[26] Holger Schwenk,et al. Continuous space language models , 2007, Comput. Speech Lang..
[27] Bin Wang,et al. Trans-dimensional Random Fields for Language Modeling , 2015, ACL.
[28] Nir Friedman,et al. Probabilistic Graphical Models - Principles and Techniques , 2009 .
[29] Hermann Ney,et al. Algorithms for bigram and trigram word clustering , 1995, Speech Commun..
[30] P. Green. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .
[31] Rob Malouf,et al. A Comparison of Algorithms for Maximum Entropy Parameter Estimation , 2002, CoNLL.
[32] J. Darroch,et al. Generalized Iterative Scaling for Log-Linear Models , 1972 .
[33] Wojciech Zaremba,et al. Recurrent Neural Network Regularization , 2014, ArXiv.
[34] Joris Pelemans,et al. Sparse non-negative matrix language modeling for skip-grams , 2015, INTERSPEECH.
[35] Radford M. Neal. Annealed importance sampling , 1998, Stat. Comput..
[36] Han-Fu Chen. Stochastic approximation and its applications , 2002 .
[37] Z. Tan. Optimally Adjusted Mixture Sampling and Locally Weighted Histogram Analysis , 2017 .
[38] Thorsten Brants,et al. One billion word benchmark for measuring progress in statistical language modeling , 2013, INTERSPEECH.
[39] Boris Polyak,et al. Acceleration of stochastic approximation by averaging , 1992 .
[40] Tanel Alumäe,et al. Using Dependency Grammar Features in Whole Sentence Maximum Entropy Language Model for Speech Recognition , 2010, Baltic HLT.
[41] F ChenStanley,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.