Discriminating speech and non-speech with regularized least squares

We consider the task of discriminating speech and non-speec h in noisy environments. Previously, Mesgarani et. al [1] achie ved state-of-the-art performance using a cortical representa tio of sound in conjunction with a feature reduction algorithm andonlinear support vector machine classifier. In the present wor k, we show that we can achieve the same or better accuracy by using a linear regularized least squares classifier directly on th e highdimensional cortical representation; the new system is sub tantially simpler conceptually and computationally. We selec t the regularization constant automatically, yielding a parameter -fr e learning system. Intriguingly, we find that optimal classifiers fo r noisy data can be trained on clean data using heavy regularization .

[1]  Tomaso Poggio,et al.  Everything old is new again: a fresh look at historical approaches in machine learning , 2002 .

[2]  Powen Ru,et al.  Multiresolution spectrotemporal analysis of complex sounds. , 2005, The Journal of the Acoustical Society of America.

[3]  B. Silverman,et al.  Nonparametric regression and generalized linear models , 1994 .

[4]  Joos Vandewalle,et al.  A Multilinear Singular Value Decomposition , 2000, SIAM J. Matrix Anal. Appl..

[5]  Malcolm Slaney,et al.  Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Alvin F. Martin,et al.  The DET curve in assessment of detection task performance , 1997, EUROSPEECH.

[7]  G. Wahba Spline models for observational data , 1990 .

[8]  Brian Kingsbury,et al.  Robust speech recognition in Noisy Environments: The 2001 IBM spine evaluation system , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Nima Mesgarani,et al.  Discrimination of speech from nonspeech based on multiscale spectro-temporal Modulations , 2006, IEEE Transactions on Audio, Speech, and Language Processing.