Exemplar-based Sparse Representation phone identification features

Exemplar-based techniques, such as k-nearest neighbors (kNNs) and Sparse Representations (SRs), can be used to model a test sample from a few training points in a dictionary set. In past work, we have shown that using a SR approach for phonetic classification allows for a higher accuracy than other classification techniques. These phones are the basic units of speech to be recognized. Motivated by this result, we create a new dictionary which is a function of the phonetic labels of the original dictionary. The SR method now selects relevant samples from this new dictionary to create a new feature representation of the test sample, where the new feature is better linked to the actual units to be recognized. We will refer to these new features as Spif. We present results using these new Spif features in a Hidden Markov Model (HMM) framework for speech recognition. We find that the Spif features allow for a 2.9% relative reduction in Phonetic Error Rate (PER) on the TIMIT phonetic recognition task. Furthermore, we find that the Spif features allow for a 4.8% relative improvement in Word Error Rate (WER) on a large vocabulary 50 hour Broadcast News task.

[1]  James R. Glass,et al.  Heterogeneous measurements and multiple classifiers for speech recognition , 1998, ICSLP.

[2]  Geoffrey Zweig,et al.  An architecture for rapid decoding of large vocabulary conversational speech , 2003, INTERSPEECH.

[3]  Dong Yu,et al.  Use of Differential Cepstra as Acoustic Features in Hidden Trajectory Modeling for Phonetic Recognition , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[4]  Brian Kingsbury,et al.  Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Tara N. Sainath,et al.  Sparse representation features for speech recognition , 2010, INTERSPEECH.

[6]  Tara N. Sainath,et al.  Bayesian compressive sensing for phonetic classification , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.