Efficient manifold learning for speech recognition using locality sensitive hashing

This paper considers the application of a random projections based hashing scheme, known as locality sensitive hashing (LSH), for fast computation of neighborhood graphs in manifold learning based feature space transformations in automatic speech recognition (ASR). Discriminative manifold learning based feature transformations have already been found to provide significant improvements in ASR performance. The motivation of this work is the fact that the high computational complexity of these techniques has prevented their application to very large speech corpora. The performance of this integrated system is evaluated both in terms of computational complexity and ASR word recognition accuracy. Further comparisons of ASR performance with the well-known linear discriminant analysis are provided. These results demonstrate that LSH provides the much needed speed boost to manifold learning techniques with minimal impact on their ASR performance, thus enabling the application of these algorithms to large speech databases.

[1]  Richard C. Rose,et al.  A Correlational Discriminant Approach to Feature Extraction for Robust Speech Recognition , 2012, INTERSPEECH.

[2]  Hermann Ney,et al.  Experiments with linear feature extraction in speech recognition , 1995, EUROSPEECH.

[3]  Shihong Lao,et al.  Discriminant analysis in correlation similarity measure space , 2007, ICML '07.

[4]  Richard C. Rose,et al.  Application of a locality preserving discriminant analysis approach to ASR , 2012, 2012 11th International Conference on Information Science, Signal Processing and their Applications (ISSPA).

[5]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[6]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[7]  Trevor Darrell,et al.  Nearest-Neighbor Methods in Learning and Vision: Theory and Practice (Neural Information Processing) , 2006 .

[8]  Yun Tang,et al.  A study of using locality preserving projections for feature extraction in speech recognition , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[10]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[11]  V. Zolotarev One-dimensional stable distributions , 1986 .

[12]  Andreas G. Andreou,et al.  Investigation of silicon auditory models and generalization of linear discriminant analysis for improved speech recognition , 1997 .

[13]  Aren Jansen,et al.  Efficient spoken term discovery using randomized algorithms , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[14]  Aren Jansen,et al.  Intrinsic Fourier Analysis on the Manifold of Speech Sounds , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[15]  Stephen Lin,et al.  Graph Embedding and Extensions: A General Framework for Dimensionality Reduction , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Laurent Amsaleg,et al.  Locality sensitive hashing: A comparison of hash function types and querying mechanisms , 2010, Pattern Recognit. Lett..

[17]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[18]  Mark J. F. Gales Maximum likelihood multiple subspace projections for hidden Markov models , 2002, IEEE Trans. Speech Audio Process..

[19]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.