Analysis and compensation of stressed and noisy speech with application to robust automatic recognition

This thesis addresses the problem of automatic speech recognition in noisy, stressful environments. The main contributions include a comprehensive and unified investigation which revealed new and statistically reliable acoustic correlates of speech under stress, the formulation of a new class of constrained iterative speech enhancement algorithms, and the achievement of robust automatic speech recognition through the development of speech enhancement and stress compensation programs. The first goal of improving recognition of speech produced under stressful conditions was accomplished through extensive investigations revealing new and statistically reliable acoustic correlates of speech under stress. Analysis was performed on (i) speech with simulated stress, (ii) speech from stress inducing workload tasks or speech in noise, and (iii) speech produced under actual stress or emotional conditions. Characteristics from five speech production domains were addressed (pitch, glottal source, duration, intensity, and vocal-tract shaping). Statistical evaluation ascertained the reliability of variation in average, variability, and distribution of each speech parameter as a stress relayer. A new class of constrained iterative speech enhancement algorithms were formulated for the purposes of improving recognition performance in noisy environments. The new approaches apply inter- and intra-frame spectral constraints in the estimation procedure to ensure optimum speech quality across all speech classes. Constraints are applied based on the presence of perceptually important speech characteristics obtained during the enhancement procedure. The algorithms are preferable to existing techniques in several respects: (i) they result in subtantially improved speech quality and parameter estimation over past techniques for additive white noise distortion, (ii) they have been extended and shown to perform well on non-stationary colored noise, and (iii) they possess a more consistent terminating criterion which was previously unavailable. The final goal of robust recognition in noisy stressful environments was addressed based on formulation of enhancement and stress compensation preprocessors. Enhancement preprocessors were shown to improve recognition performance for neutral speech over past enhancement techniques for all signal-to-noise ratios considered. Stress compensation algorithms are shown to reduce stress effects prior to recognition. Finally, combined speech enhancement stress compensation preprocessing is shown to be extremely effective in reducing and even eliminating effects caused by stress and noise for robust automatic recognition.