论文信息 - An inner-product lower-bound estimate for dynamic time warping

An inner-product lower-bound estimate for dynamic time warping

In this paper, we present a lower-bound estimate for dynamic time warping (DTW) on time series consisting of multi-dimensional posterior probability vectors known as posteriorgrams. We develop a lower-bound estimate based on the inner-product distance that has been found to be an effective metric for computing similarities between posteriorgrams. In addition to deriving the lower-bound estimate, we show how it can be efficiently used in an admissible K nearest neighbor (KNN) search for spotting matching sequences. We quantify the amount of computational savings achieved by performing a set of unsupervised spoken keyword spotting experiments using Gaussian mixture model posteriorgrams. In these experiments the proposed lower-bound estimate eliminates 89% of the DTW previously required calculations without affecting overall keyword detection performance.

James R. Glass | Yaodong Zhang | Yaodong Zhang

[1] Kuldip K. Paliwal,et al. Automatic Speech and Speaker Recognition: Advanced Topics , 1999 .

[2] James R. Glass,et al. Unsupervised spoken keyword spotting via segmental DTW on Gaussian posteriorgrams , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[3] James R. Glass,et al. Unsupervised Pattern Discovery in Speech , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[4] Dimitrios Gunopulos,et al. Indexing multi-dimensional time-series with support for multiple distance measures , 2003, KDD '03.

[5] Li Deng,et al. Structure-based and template-based automatic speech recognition - comparing parametric and non-parametric approaches , 2007, INTERSPEECH.

[6] Patrick Wambacq,et al. Template-Based Continuous Speech Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[7] Aren Jansen,et al. NLP on Spoken Documents Without ASR , 2010, EMNLP.

[8] Kuldip K. Paliwal,et al. Automatic Speech and Speaker Recognition , 1996 .

[9] R. Manmatha,et al. Lower-Bounding of Dynamic Time Warping Distances for Multivariate Time Series , 2003 .

[10] Kenneth Ward Church,et al. Towards spoken term discovery at scale with zero resources , 2010, INTERSPEECH.

[11] James R. Glass,et al. Towards multi-speaker unsupervised speech pattern discovery , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12] Biing-Hwang Juang,et al. Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[13] Hervé Bourlard,et al. Analysis of phone posterior feature space exploiting class-specific sparsity and MLP-based similarity measure , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14] Eamonn Keogh. Exact Indexing of Dynamic Time Warping , 2002, VLDB.

[15] Daniel P. W. Ellis,et al. Tandem connectionist feature extraction for conventional HMM systems , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[16] Timothy J. Hazen,et al. Query-by-example spoken term detection using phonetic posteriorgram templates , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.