论文信息 - Broadcast News Phoneme Recognition by Sparse Coding

Broadcast News Phoneme Recognition by Sparse Coding

We present in this paper a novel approach for the phoneme recognition task that we want to extend to an automatic speech recognition system (ASR). Usual ASR systems are based on a GMM-HMM combination that represents a fully generative approach. Current discriminative methods are not tractable in large scale data set case, especially with non-linear kernel. In our system, we introduce a new scheme using jointly sparse coding and an approximation additive kernel for fast SVM training for phoneme recognition. Thus, on a broadcast news corpus, our system outperforms the use of GMMs by around 2.5% and is computationally linear to the number of samples.

[1] Andrew Zisserman,et al. Efficient Additive Kernels via Explicit Feature Maps , 2012, IEEE Trans. Pattern Anal. Mach. Intell..

[2] Bernhard Schölkopf,et al. Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[3] John D. Lafferty,et al. Learning image representations from the pixel level via hierarchical sparse coding , 2011, CVPR 2011.

[4] R. Schapire,et al. Analysis of boosting algorithms using the smooth margin function , 2007, 0803.4092.

[5] K. Lange,et al. Coordinate descent algorithms for lasso penalized regression , 2008, 0803.3876.

[6] G. Gravier,et al. STER evaluation campaign of rich transcription of French broadcast news , 2011 .

[7] Stan Davis,et al. Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[8] Guillermo Sapiro,et al. Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[9] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[10] Haizhou Li,et al. An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..

[11] Yann LeCun,et al. Structured sparse coding via lateral inhibition , 2011, NIPS.

[12] Steve Young,et al. The HTK book , 1995 .

[13] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .

[14] Guillaume Gravier,et al. Corpus description of the ESTER Evaluation Campaign for the Rich Transcription of French Broadcast News , 2004, LREC.

[15] Chih-Jen Lin,et al. LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[16] Chih-Jen Lin,et al. A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[17] Ivor W. Tsang,et al. Learning Sparse SVM for Feature Selection on Very High Dimensional Datasets , 2010, ICML.

[18] Alex Acero,et al. Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[19] Noureddine Ellouze,et al. Cooperative supervised and unsupervised learning algorithm for phoneme recognition in continuous speech and speaker-independent context , 2003, Neurocomputing.

[20] Subhransu Maji,et al. Classification using intersection kernel support vector machines is efficient , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[21] Etienne Barnard,et al. Continuous speech recognition with sparse coding , 2009, Comput. Speech Lang..

[22] Thorsten Joachims,et al. Cutting-plane training of structural SVMs , 2009, Machine Learning.

[23] Shin'ichi Satoh,et al. Generalized Lasso based Approximation of Sparse Coding for Visual Recognition , 2011, NIPS.

[24] Jean Paul Haton,et al. Frame-Synchronous and Local Confidence Measures for Automatic Speech Recognition , 2011, Int. J. Pattern Recognit. Artif. Intell..

[25] Irina Illina,et al. The automatic news transcription system: ANTS, some real time experiments , 2004, INTERSPEECH.

[26] Stéphane Mallat,et al. Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[27] Honglak Lee,et al. An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[28] Yihong Gong,et al. Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[29] Jean-Philippe Vert,et al. Group Lasso with Overlaps: the Latent Group Lasso approach , 2011, ArXiv.

[30] Geoffrey E. Hinton,et al. Factored 3-Way Restricted Boltzmann Machines For Modeling Natural Images , 2010, AISTATS.

[31] Sridhar Krishna Nemala,et al. Sparse coding for speech recognition , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[32] Biing-Hwang Juang,et al. Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[33] Erkki Oja,et al. Independent component analysis: algorithms and applications , 2000, Neural Networks.

[34] Chih-Jen Lin,et al. Trust Region Newton Method for Logistic Regression , 2008, J. Mach. Learn. Res..

[35] Yihong Gong,et al. Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[36] Guillermo Sapiro,et al. Supervised Dictionary Learning , 2008, NIPS.

[37] Guillermo Sapiro,et al. Online dictionary learning for sparse coding , 2009, ICML '09.

[38] Yoram Singer,et al. Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[39] Thomas Villmann,et al. Relevance LVQ versus SVM , 2004, ICAISC.