Kernel Approximation Methods for Speech Recognition
暂无分享,去创建一个
Brian Kingsbury | Michael Picheny | Michael Collins | Avner May | Fei Sha | Dong Guo | Daniel J. Hsu | Linxi Fan | Aurélien Bellet | Zhiyun Lu | Daniel Hsu | Kuan Liu | Alireza Bagheri Garakani | M. Picheny | Brian Kingsbury | Fei Sha | Linxi (Jim) Fan | Michael Collins | A. Bellet | A. Garakani | Avner May | Zhiyun Lu | Dong Guo | Kuan Liu
[1] N. Aronszajn. Theory of Reproducing Kernels. , 1950 .
[2] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .
[3] Editors , 1986, Brain Research Bulletin.
[4] Lalit R. Bahl,et al. Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.
[5] M. V. Rossum,et al. In Neural Computation , 2022 .
[6] Hervé Bourlard,et al. Generalization and Parameter Estimation in Feedforward Netws: Some Experiments , 1989, NIPS.
[7] George Cybenko,et al. Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..
[8] Kurt Hornik,et al. Multilayer feedforward networks are universal approximators , 1989, Neural Networks.
[9] Michael I. Jordan,et al. Advances in Neural Information Processing Systems 30 , 1995 .
[10] Thomas G. Dietterich,et al. In Advances in Neural Information Processing Systems 12 , 1991, NIPS 1991.
[11] Jonathan G. Fiscus,et al. Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .
[12] D. Signorini,et al. Neural networks , 1995, The Lancet.
[13] Peter L. Bartlett,et al. For Valid Generalization the Size of the Weights is More Important than the Size of the Network , 1996, NIPS.
[14] Nikko Ström,et al. Sparse connection and pruning in large dynamic artificial neural networks , 1997, EUROSPEECH.
[15] Steve J. Young,et al. MMIE training of large vocabulary recognition systems , 1997, Speech Communication.
[16] Mark J. F. Gales,et al. Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..
[17] Alexander J. Smola,et al. Learning with kernels , 1998 .
[18] Nello Cristianini,et al. Advances in Kernel Methods - Support Vector Learning , 1999 .
[19] John C. Platt,et al. Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .
[20] B. Schölkopf,et al. Advances in kernel methods: support vector learning , 1999 .
[21] Christopher K. I. Williams,et al. Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.
[22] Zdravko Kacic,et al. A novel loss function for the overall risk criterion based discriminative training of HMM models , 2000, INTERSPEECH.
[23] Daniel Povey,et al. Minimum Phone Error and I-smoothing for improved discriminative training , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[24] Robert H. Sloan,et al. Proceedings of the 15th Annual Conference on Computational Learning Theory , 2002 .
[25] Peter L. Bartlett,et al. Localized Rademacher Complexities , 2002, COLT.
[26] Ingo Steinwart,et al. Sparseness of Support Vector Machines---Some Asymptotically Sharp Bounds , 2003, NIPS.
[27] Anthony Widjaja,et al. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.
[28] A. Berlinet,et al. Reproducing kernel Hilbert spaces in probability and statistics , 2004 .
[29] Bernhard Schölkopf,et al. Training Invariant Support Vector Machines , 2002, Machine Learning.
[30] Ivor W. Tsang,et al. Core Vector Machines: Fast SVM Training on Very Large Data Sets , 2005, J. Mach. Learn. Res..
[31] Robert A. Lordo,et al. Nonparametric and Semiparametric Models , 2005, Technometrics.
[32] Thomas Hain,et al. Hypothesis spaces for minimum Bayes risk training in large vocabulary speech recognition , 2006, INTERSPEECH.
[33] Oliver Lemon,et al. Interspeech 2006 - ICSLP , 2006 .
[34] Yuesheng Xu,et al. Universal Kernels , 2006, J. Mach. Learn. Res..
[35] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.
[36] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[37] Jason Weston,et al. Large-scale kernel machines , 2007 .
[38] Yoshua Bengio,et al. Scaling learning algorithms towards AI , 2007 .
[39] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.
[40] Mark J. F. Gales,et al. The Application of Hidden Markov Models in Speech Recognition , 2007, Found. Trends Signal Process..
[41] Zoubin Ghahramani,et al. Proceedings of the 24th international conference on Machine learning , 2007, ICML 2007.
[42] Brian Kingsbury,et al. Evaluation of Proposed Modifications to MPE for Large Scale Discriminative Training , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.
[43] Kenneth L. Clarkson,et al. Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm , 2008, SODA '08.
[44] AI Koan,et al. Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning , 2008, NIPS.
[45] Brian Kingsbury,et al. Boosted MMI for model and feature-space discriminative training , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.
[46] Benjamin Recht,et al. Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning , 2008, NIPS.
[47] Brian Kingsbury,et al. Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.
[48] Elsevier Sdol,et al. Computer Speech & Language , 2009 .
[49] Yoram Singer,et al. Efficient Online and Batch Learning Using Forward Backward Splitting , 2009, J. Mach. Learn. Res..
[50] Andrew Zisserman,et al. Efficient additive kernels via explicit feature maps , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[51] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.
[52] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[53] Sören Sonnenburg,et al. COFFIN: A Computational Framework for Linear SVMs , 2010, ICML.
[54] Brian Kingsbury,et al. The IBM Attila speech recognition toolkit , 2010, 2010 IEEE Spoken Language Technology Workshop.
[55] Patrick Kenny,et al. Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.
[56] Dong Yu,et al. Conversational Speech Transcription Using Context-Dependent Deep Neural Networks , 2012, ICML.
[57] Peter A. Flach,et al. Proceedings of the 28th International Conference on Machine Learning , 2011 .
[58] Dong Yu,et al. Feature engineering in Context-Dependent Deep Neural Networks for conversational speech transcription , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.
[59] Brian Kingsbury,et al. Arccosine kernels: Acoustic modeling with infinite neural networks , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[60] Tara N. Sainath,et al. Making Deep Belief Networks effective for large vocabulary continuous speech recognition , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.
[61] Nathan Halko,et al. Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..
[62] Harish Karnick,et al. Random Feature Maps for Dot Product Kernels , 2012, AISTATS.
[63] Gökhan Tür,et al. Use of kernel deep convex networks and end-to-end learning for spoken language understanding , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).
[64] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.
[65] Carla Teixeira Lopes,et al. TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .
[66] Mark W. Schmidt,et al. A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets , 2012, NIPS.
[67] Dong Yu,et al. Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[68] Rong Jin,et al. Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison , 2012, NIPS.
[69] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[70] Geoffrey E. Hinton,et al. Acoustic Modeling Using Deep Belief Networks , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[71] Hermann Ney,et al. LSTM Neural Networks for Language Modeling , 2012, INTERSPEECH.
[72] Alexander J. Smola,et al. Fastfood: Approximate Kernel Expansions in Loglinear Time , 2014, ArXiv.
[73] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[74] Xiaodong Cui,et al. A high-performance Cantonese keyword search system , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[75] Tara N. Sainath,et al. Deep convolutional neural networks for LVCSR , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[76] Ebru Arisoy,et al. Low-rank matrix factorization for Deep Neural Network training with high-dimensional output targets , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[77] Tara N. Sainath,et al. Optimization Techniques to Improve Training Speed of Deep Neural Networks for Large Speech Tasks , 2013, IEEE Transactions on Audio, Speech, and Language Processing.
[78] Yifan Gong,et al. Restructuring of deep neural network acoustic models with singular value decomposition , 2013, INTERSPEECH.
[79] Alexander J. Smola,et al. Fastfood - Computing Hilbert Space Expansions in loglinear time , 2013, ICML.
[80] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[81] Lukás Burget,et al. Sequence-discriminative training of deep neural networks , 2013, INTERSPEECH.
[82] Dong Yu,et al. Stochastic Gradient Descent Algorithm in the Computational Network Toolkit , 2013 .
[83] Andrew W. Senior,et al. Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.
[84] Shou-De Lin,et al. Sparse Random Feature Algorithm as Coordinate Descent in Hilbert Space , 2014, NIPS.
[85] Franco Scarselli,et al. On the Complexity of Neural Network Classifiers: A Comparison Between Shallow and Deep Architectures , 2014, IEEE Transactions on Neural Networks and Learning Systems.
[86] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[87] Inderjit S. Dhillon,et al. Memory Efficient Kernel Approximation , 2014, ICML.
[88] Le Song,et al. Scalable Kernel Methods via Doubly Stochastic Gradients , 2014, NIPS.
[89] Tara N. Sainath,et al. Joint training of convolutional and non-convolutional neural networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[90] Gerald Penn,et al. Convolutional Neural Networks for Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[91] Surya Ganguli,et al. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.
[92] Razvan Pascanu,et al. On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.
[93] Dennis DeCoste,et al. Compact Random Feature Maps , 2013, ICML.
[94] Tara N. Sainath,et al. Kernel methods match Deep Neural Networks on TIMIT , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[95] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[96] Rich Caruana,et al. Do Deep Nets Really Need to be Deep? , 2013, NIPS.
[97] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.
[98] Le Song,et al. Deep Fried Convnets , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[99] Jeff Johnson,et al. Fast Convolutional Nets With fbfft: A GPU Performance Evaluation , 2014, ICLR.
[100] Ryota Tomioka,et al. In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning , 2014, ICLR.
[101] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[102] Sanjiv Kumar,et al. Spherical Random Features for Polynomial Kernels , 2015, NIPS.
[103] G. Jameson. A simple proof of Stirling's formula for the gamma function , 2015, The Mathematical Gazette.
[104] Yann LeCun,et al. The Loss Surfaces of Multilayer Networks , 2014, AISTATS.
[105] Shih-Fu Chang,et al. Compact Nonlinear Maps and Circulant Extensions , 2015, ArXiv.
[106] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[107] George Saon,et al. The IBM 2016 English Conversational Telephone Speech Recognition System , 2016, INTERSPEECH.
[108] Brian Kingsbury,et al. Compact kernel models for acoustic modeling via random feature selection , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[109] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[110] Quoc V. Le,et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[111] Brian Kingsbury,et al. A comparison between deep neural nets and kernel acoustic models for speech recognition , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[112] Slav Petrov,et al. Globally Normalized Transition-Based Neural Networks , 2016, ACL.
[113] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[114] Geoffrey Zweig,et al. Achieving Human Parity in Conversational Speech Recognition , 2016, ArXiv.
[115] Joachim Bingel,et al. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics , 2016 .
[116] Yiming Wang,et al. Purely Sequence-Trained Neural Networks for ASR Based on Lattice-Free MMI , 2016, INTERSPEECH.
[117] Vaibhava Goel,et al. Advances in Very Deep Convolutional Neural Networks for LVCSR , 2016, INTERSPEECH.
[118] Anima Anandkumar,et al. Efficient approaches for escaping higher order saddle points in non-convex optimization , 2016, COLT.
[119] Chong Wang,et al. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.
[120] Brian Kingsbury,et al. Efficient one-vs-one kernel ridge regression for speech recognition , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[121] Bhuvana Ramabhadran,et al. Training variance and performance evaluation of neural networks in speech , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[122] Yi Zheng,et al. No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis , 2017, ICML.
[123] Charles Sutton,et al. Proceedings for the 5th International Conference on Learning Representations , 2017 .
[124] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[125] Geoffrey Zweig,et al. Toward Human Parity in Conversational Speech Recognition , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[126] Yoshua Bengio,et al. A Closer Look at Memorization in Deep Networks , 2017, ICML.
[127] Jeffrey Pennington,et al. Geometry of Neural Network Loss Surfaces via Random Matrix Theory , 2017, ICML.
[128] Xiaodong Cui,et al. English Conversational Telephone Speech Recognition by Humans and Machines , 2017, INTERSPEECH.
[129] Le Song,et al. Diverse Neural Network Learns True Target Functions , 2016, AISTATS.
[130] Pierre McKenzie,et al. Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing , 2017, STOC.
[131] Tengyu Ma,et al. Finding approximate local minima faster than gradient descent , 2016, STOC.
[132] Tara N. Sainath,et al. State-of-the-Art Speech Recognition with Sequence-to-Sequence Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[133] 2013 Ieee International Conference on Computer Vision , 2022 .