Linear Feature Transformations in Slovak Phoneme-Based Continuous Speech Recognition

The most common acoustic front-ends in automatic speech recognition (ASR) systems are based on the state-of-the-art Mel-Frequency Cepstral Coefficients (MFCCs). The practice shows that this general technique is good choice to obtain satisfactory speech representation. In the past few decades, the researchers have made a great effort in order to develop and apply such techniques, which may improve the recognition performance of the conventional MFCCs. In general, these methods were taken from mathematics and applied in many research areas such as face and speech recognition, high-dimensional data and signal processing, video and image coding and many other. One group of mentioned methods is represented by linear transformations.

[1]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[2]  Hermann Ney,et al.  Experiments with linear feature extraction in speech recognition , 1995, EUROSPEECH.

[3]  Stefan Geirhofer,et al.  Feature Reduction with Linear Discriminant Analysis and its Performance on Phoneme Recognition , 2004 .

[4]  S. Ramakrishnan Modern Speech Recognition Approaches with Case Studies , 2012 .

[5]  Janne Pylkkönen LDA based feature estimation methods for LVCSR , 2006, INTERSPEECH.

[6]  A. Akbari,et al.  Optimized linear discriminant analysis for extracting robust speech features , 2008, 2008 3rd International Symposium on Communications, Control and Signal Processing.

[7]  M. Pleva,et al.  Alternative phonetic class definition in linear discriminant analysis of speech , 2012, 2012 19th International Conference on Systems, Signals and Image Processing (IWSSIP).

[8]  Broderick Crawford,et al.  Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) , 2007 .

[9]  Andreas G. Andreou,et al.  Investigation of silicon auditory models and generalization of linear discriminant analysis for improved speech recognition , 1997 .

[10]  G. Dunteman Principal Components Analysis , 1989 .

[11]  Xiao-Bing Li,et al.  Clustering-based two-dimensional linear discriminant analysis for speech recognition , 2007, INTERSPEECH.

[12]  Hwa Jeon Song,et al.  Improving phone-level discrimination in LDA with subphone-level classes , 2002, INTERSPEECH.

[13]  Jieping Ye,et al.  Two-Dimensional Linear Discriminant Analysis , 2004, NIPS.

[14]  I. Jolliffe Principal Component Analysis , 2002 .

[15]  Andreas Wendemuth,et al.  Improved robustness of automatic speech recognition using a new class definition in linear discriminant analysis , 2003, INTERSPEECH.

[16]  H. Ney,et al.  Linear discriminant analysis for improved large vocabulary continuous speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  Milos Cernak,et al.  Rule-Based Triphone Mapping for Acoustic Modeling in Automatic Speech Recognition , 2011, TSD.

[18]  Jian Yang,et al.  Two-dimensional PCA: a new approach to appearance-based face representation and recognition , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Patrick Wambacq,et al.  Class definition in discriminant feature analysis , 2001, INTERSPEECH.

[20]  W. V. McCarthy,et al.  Discriminant Analysis with Singular Covariance Matrices: Methods and Applications to Spectroscopic Data , 1995 .