Modelling the recognition of spectrally reduced speech

Jon Barker and Martin Cooke fj.barker,m.cookeg@dcs.shef.ac.uk Department of Computer Science, University of She eld, She eld, UK ABSTRACT Progress in robust automatic speech recognition may bene t from a fuller account of the mechanisms and representations used by listeners in processing distorted speech. This paper reports on a number of studies which consider how recognisers trained on clean speech can be adapted to cope with a particular form of spectral distortion, namely reduction of clean speech to sine-wave replicas. Using the Resource Management corpus, the rst set of recognition experiments con rm the high information content of sine-wave replicas by demonstrating that such tokens can be recognised at levels approaching those for natural speech if matched conditions apply during training. Further recognition tests show that sine-wave speech can be recognised using natural speech models if a spectral peak representation is employed in concert with occluded speech recognition techniques.

[1]  B. Moore,et al.  Effects of spectral smearing on the intelligibility of sentences in noise , 1993 .

[2]  D. Pisoni,et al.  Speech perception without traditional speech cues. , 1981, Science.

[3]  R. Plomp,et al.  Effect of temporal envelope smearing on speech reception. , 1994, The Journal of the Acoustical Society of America.

[4]  Phil D. Green,et al.  Missing data techniques for robust speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Richard Lippmann,et al.  Accurate consonant perception without mid-frequency speech energy , 1996, IEEE Trans. Speech Audio Process..

[6]  T D Carrell,et al.  The effect of amplitude comodulation on auditory object formation in sentence perception , 1992, Perception & psychophysics.

[7]  R. M. Warren,et al.  Spectral redundancy: Intelligibility of sentences heard through narrow spectral slits , 1995, Perception & psychophysics.

[8]  S. Rosen,et al.  Uncomodulated glimpsing in "checkerboard" noise. , 1993, The Journal of the Acoustical Society of America.

[9]  Patti Price,et al.  The DARPA 1000-word resource management database for continuous speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[10]  Phil D. Green,et al.  Auditory scene analysis and hidden Markov model recognition of speech in noise , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.