Discriminant spectrotemporal features for phoneme recognition

We propose discriminant methods for deriving twodimensional spectrotemporal features for phoneme recognition that are estimated to maximize the separation between the representations of phoneme classes. The linearity of the filters results in their intuitive interpretation enabling us to investigate the working principles of the system and to improve its performance by locating the sources of error. Two methods for the estimation of filters are proposed: Regularized Least Square (RLS) and Modified Linear Discriminant Analysis (MLDA). Both methods reach a comparable improvement over the baseline condition demonstrating the advantage of the discriminant spectrotemporal filters.

[1]  Tomaso Poggio,et al.  Everything old is new again: a fresh look at historical approaches in machine learning , 2002 .

[2]  J. Fritz,et al.  Rapid task-related plasticity of spectrotemporal receptive fields in primary auditory cortex , 2003, Nature Neuroscience.

[3]  P. Ladefoged A course in phonetics , 1975 .

[4]  Birger Kollmeier,et al.  Optimization and evaluation of Gabor feature sets for ASR , 2008, INTERSPEECH.

[5]  Theofanis Sapatinas,et al.  Discriminant Analysis and Statistical Pattern Recognition , 2005 .

[6]  Nima Mesgarani,et al.  Discriminating speech and non-speech with regularized least squares , 2006, INTERSPEECH.

[7]  J. Fritz,et al.  Adaptive changes in cortical receptive fields induced by attention to complex sounds. , 2007, Journal of neurophysiology.

[8]  Jonathan Z. Simon,et al.  Robust Spectrotemporal Reverse Correlation for the Auditory System: Optimizing Stimulus Design , 2000, Journal of Computational Neuroscience.

[9]  Hynek Hermansky,et al.  Introducing temporal asymmetries in feature extraction for automatic speech recognition , 2008, INTERSPEECH.

[10]  Hynek Hermansky,et al.  Multi-resolution RASTA filtering for TANDEM-based ASR , 2005, INTERSPEECH.

[11]  Nima Mesgarani,et al.  A computational model of rapid task-related plasticity of auditory cortical receptive fields , 2010, Journal of Computational Neuroscience.

[12]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[13]  Songcan Chen,et al.  Modified linear discriminant analysis , 2005, Pattern Recognit..

[14]  David Gelbart,et al.  Improving word accuracy with Gabor feature extraction , 2002, INTERSPEECH.

[15]  Richard Lippmann,et al.  Accurate consonant perception without mid-frequency speech energy , 1996, IEEE Trans. Speech Audio Process..