Kernel-based feature extraction with a speech technology application

Kernel-based nonlinear feature extraction and classification algorithms are a popular new research direction in machine learning. This paper examines their applicability to the classification of phonemes in a phonological awareness drilling software package. We first give a concise overview of the nonlinear feature extraction methods such as kernel principal component analysis (KPCA), kernel independent component analysis (KICA), kernel linear discriminant analysis (KLDA), and kernel springy discriminant analysis (KSDA). The overview deals with all the methods in a unified framework, regardless of whether they are unsupervised or supervised. The effect of the transformations on a subsequent classification is tested in combination with learning algorithms such as Gaussian mixture modeling (GMM), artificial neural nets (ANN), projection pursuit learning (PPL), decision tree-based classification (C4.5), and support vector machines (SVMs). We found, in most cases, that the transformations have a beneficial effect on the classification performance. Furthermore, the nonlinear supervised algorithms yielded the best results.

[1]  M. Aizerman,et al.  Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[2]  David G. Stork,et al.  Pattern Classification , 1973 .

[3]  E. Owens,et al.  An Introduction to the Psychology of Hearing , 1997 .

[4]  Oded Ghitza,et al.  Auditory nerve representation criteria for speech analysis/synthesis , 1987, IEEE Trans. Acoust. Speech Signal Process..

[5]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[6]  Biing-Hwang Juang,et al.  New discriminative training algorithms based on the generalized probabilistic descent method , 1991, Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop.

[7]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[8]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[9]  Yves Normandin,et al.  Hidden Markov models, maximum mutual information estimation, and the speech recognition problem , 1992 .

[10]  H. Ney,et al.  Linear discriminant analysis for improved large vocabulary continuous speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Biing-Hwang Juang,et al.  Discriminative learning for minimum error classification [pattern recognition] , 1992, IEEE Trans. Signal Process..

[12]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[13]  Jenq-Neng Hwang,et al.  Regression modeling in back-propagation and projection pursuit learning , 1994, IEEE Trans. Neural Networks.

[14]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[15]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[16]  Michael Picheny,et al.  Robust methods for using context-dependent features and models in a continuous speech recognizer , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[18]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[19]  Olivier Siohan,et al.  On the robustness of linear discriminant analysis as a preprocessing step for noisy speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[20]  Aapo Hyvärinen,et al.  New Approximations of Differential Entropy for Independent Component Analysis and Projection Pursuit , 1997, NIPS.

[21]  Aapo Hyvärinen,et al.  A family of fixed-point algorithms for independent component analysis , 1997, ICASSP.

[22]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[23]  Joseph Picone,et al.  Support vector machines for speech recognition , 1998, ICSLP.

[24]  Bernhard Schölkopf,et al.  Shrinking the Tube: A New Support Vector Regression Algorithm , 1998, NIPS.

[25]  Hynek Hermansky,et al.  Modulation Spectrum in Speech Processing , 1998 .

[26]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[27]  Nello Cristianini,et al.  Advances in Kernel Methods - Support Vector Learning , 1999 .

[28]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, International Conference on Artificial Neural Networks.

[29]  Renato De Mori,et al.  A study on the effect of adding new dimensions to trajectories in the acoustic space , 1999, EUROSPEECH.

[30]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[31]  Pedro J. Moreno,et al.  On the use of support vector machines for phonetic classification , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[32]  Aapo Hyvärinen,et al.  Fast and robust fixed-point algorithms for independent component analysis , 1999, IEEE Trans. Neural Networks.

[33]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[34]  János Csirik,et al.  A Comparative Study of Several Feature Transformation and Learning Methods for Phoneme Classification , 2000, Int. J. Speech Technol..

[35]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[36]  G. Baudat,et al.  Generalized Discriminant Analysis Using a Kernel Approach , 2000, Neural Computation.

[37]  Steven Greenberg,et al.  Computational Models of Auditory Function , 2001 .

[38]  László Tóth,et al.  A Nonlinearized Discriminant Analysis and Its Application to Speech Impediment Therapy , 2001, TSD.

[39]  Shigeki Sagayama,et al.  Support vector machine with dynamic time-alignment kernel for speech recognition , 2001, INTERSPEECH.

[40]  Tong Zhang,et al.  An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods , 2001, AI Mag..

[41]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[42]  Alex Acero,et al.  Spoken Language Processing , 2001 .

[43]  Felipe Cucker,et al.  On the mathematical foundations of learning , 2001 .

[44]  Biing-Hwang Juang,et al.  An application of discriminative feature extraction to filter-bank-based speech recognition , 2001, IEEE Trans. Speech Audio Process..

[45]  Roman Rosipal,et al.  Kernel Partial Least Squares Regression in Reproducing Kernel Hilbert Space , 2002, J. Mach. Learn. Res..

[46]  János Csirik,et al.  Fast Independent Component Analysis in Kernel Feature Spaces , 2001, SOFSEM.

[47]  Marc G. Genton,et al.  Classes of Kernels for Machine Learning: A Statistics Perspective , 2002, J. Mach. Learn. Res..

[48]  Daniel Povey,et al.  Large scale discriminative training of hidden Markov models for speech recognition , 2002, Comput. Speech Lang..

[49]  András Kocsor,et al.  Kernel Springy Discriminant Analysis and Its Application to a Phonological Awareness Teaching System , 2002, TSD.

[50]  Simon King,et al.  Framewise phone classification using support vector machines , 2002, INTERSPEECH.

[51]  J. Wade Davis,et al.  Statistical Pattern Recognition , 2003, Technometrics.

[52]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[53]  James R. Glass A probabilistic framework for segment-based speech recognition , 2003, Comput. Speech Lang..

[54]  Heiga Zen,et al.  On the Use of Kernel PCA for Feature Extraction in Speech Recognition , 2003, IEICE Trans. Inf. Syst..

[55]  Joseph Picone,et al.  Applications of support vector machines to speech recognition , 2004, IEEE Transactions on Signal Processing.