Glottal Models for Digital Speech Processing: A Historical Survey and New Results

Abstract Glottal modeling has been an important topic of research in digital speech processing for many years. The ability to accurately model the glottal excitation is important for applications as varied as acoustic and articulatory speech synthesis, speech coding, and speech analysis. Many glottal models that differ in form and complexity have been suggested over the years. Possible models range from simple parametric models of the glottal volume velocity or the glottal flow derivative that assume linear separability of the glottal source and the vocal tract to more complex parametric function and mechanical models that allow for interaction between the glottal source and the vocal tract to very complex models that are based directly on the physiological properties of the glottis. This paper will provide a historical survey of glottal models, discussing their form and complexity along with the applications for which each is appropriate. This paper will also present a discussion of the problem of modeling the glottal excitation of different styles of speech, a topic that is important for applications such as natural, high-quality speech synthesis. A glottal model that is capable of modeling eleven commonly encountered styles of speech will be presented.