Analysis, synthesis, and recognition of stressed speech

This thesis has addressed the combined problems of glottal modelling and analysis of emotionally stressed speech. Specifically, the objective of this research was to analyze the properties of the glottal excitation of eleven styles of speech, to identify and model the significant differences in the glottal waveforms, and to develop applications based on the knowledge gained. This research demonstrated that the glottal waveforms of all of the styles of speech were significantly and identifiably different. A parametric function that accurately models all of the salient differences in the styles of glottal excitation was determined. Using this glottal model, various applications that could be used to improve speech synthesis and human perception and automatic recognition of stressed speech were developed. Several speech style modification algorithms were implemented. By modifying the style of stressed speech to normal, subjective listening tests showed that the perceived neutrality was significantly improved. An algorithm that automatically identifies those deviant speech styles that severely degrade automatic recognition accuracy was developed and shown to be highly accurate. The contributions of this thesis include a consistent method for extracting glottal waveforms from stressed speech, an extensive statistical analysis of the glottal excitation of stressed speech, a simple glottal model that is able to accurately model all eleven styles of glottal excitation, an algorithm that decouples the vocal tract and model glottal excitation signals, improved automatic formant tracking of stressed speech, algorithms that synthesize stressed speech from normal tokens and normal speech from stressed tokens, and an algorithm that automatically identifies the style of unknown speech.