Exploiting local and global structures for TIMIT phone classification

Using contextual information of phones is an effective way to improve the performance of phone classification tasks, but requires the use of dimensionality reduction. One of the disadvantages of Linear Discriminant Analysis (LDA), a popular dimensionality reduction method is that it is not able to account for local differences between the distributions of classes in the feature space. Newer methods, such as the Local Fisher Discriminant Analysis (LFDA), on the other hand, may overestimate the contribution of local distributions. In this paper, we propose to use a dimensionality reduction algorithm with an affinity matrix that allows finding the optimal trade-off between local and global information. Experiments on TIMIT show that both local and global information in the MFCC feature space are important for phone classification and that a substantial improvement can be achieved over both LDA and LFDA.

[1]  Jean-Luc Gauvain,et al.  Speaker-Independent Phone Recognition Using BREF , 1992, HLT.

[2]  Kazuya Takeda,et al.  Feature transformation based on discriminant analysis preserving local structure for speech recognition , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Tara N. Sainath,et al.  Bayesian compressive sensing for phonetic classification , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Sanja Fidler,et al.  Combining reconstructive and discriminative subspace methods for robust classification and regression by subsampling , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Herbert Gish,et al.  Parametric trajectory models for speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[6]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[7]  Shuicheng Yan,et al.  Graph Embedding and Extensions: A General Framework for Dimensionality Reduction , 2007 .

[8]  Masashi Sugiyama,et al.  Dimensionality Reduction of Multimodal Labeled Data by Local Fisher Discriminant Analysis , 2007, J. Mach. Learn. Res..

[9]  Hynek Hermansky,et al.  Band-independent speech-event categories for TRAP based ASR , 2003, INTERSPEECH.

[10]  Andrew K. Halberstadt Heterogeneous acoustic measurements and multiple classifiers for speech recognition , 1999 .

[11]  Yan Han,et al.  Trajectory Clustering for Solving the Trajectory Folding Problem in Automatic Speech Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Pietro Perona,et al.  Self-Tuning Spectral Clustering , 2004, NIPS.

[13]  Songbo Tan,et al.  Neighbor-weighted K-nearest neighbor for unbalanced text corpus , 2005, Expert Syst. Appl..