Subspace Regularized Dynamic Time Warping for Spoken Query Detection

Deep neural network posterior probabilities are the best features for query detection in speech archives. Dynamic time warping (DTW) is the state-of-the-art solution for this task. Posterior features live in low-dimensional subspaces whereas, the current DTW methods do not incorporate this global structure of the data and rely on local feature distances. We exploit the query example as the dictionary for sparse recovery. Local DTW scores are integrated with the sparse reconstruction scores to obtain a subspace regularized distance matrix for DTW. The proposed method yields a substantial performance gain over the baseline system.