Analysis of breathy, modal and pressed phonation based on low frequency spectral density

Breathy phonation has a high open quotient compared to modal phonation which results in greater influence of the subglottal cavity on the estimated short-time spectrum. This is reflected as an increase in spectral density at frequencies below the first resonance of the vocal tract around the glottal formant. On the contrary, pressed phonation has lesser influence of the subglottal cavity, and hence has a relatively lesser spectral density at low frequencies. In this paper, the use of low frequency spectral density (LFSD) as a feature for analysis and classification of breathy, modal and pressed phonation types is investigated.

[1]  Nicolas Sturmel,et al.  Glottal closure instant detection using Lines of Maximum Amplitudes (LOMA) of thewavelet transform , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Paavo Alku,et al.  Towards Glottal Source Controllability in Expressive Speech Synthesis , 2012, INTERSPEECH.

[3]  Axel Röbel,et al.  Analysis and modification of excitation source characteristics for singing voice synthesis , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  John Kane,et al.  Detecting a targeted voice style in an audiobook using voice quality features , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  P. Alku,et al.  Normalized amplitude quotient for parametrization of the glottal flow. , 2002, The Journal of the Acoustical Society of America.

[6]  John Kane,et al.  Identifying Regions of Non-Modal Phonation Using Features of the Wavelet Transform , 2011, INTERSPEECH.

[7]  Eliathamby Ambikairajah,et al.  Glottal features for speech-based cognitive load classification , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Paavo Alku,et al.  HMM-Based Speech Synthesis Utilizing Glottal Inverse Filtering , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Paavo Alku,et al.  Comparison of multiple voice source parameters in different phonation types , 2007, INTERSPEECH.

[10]  M. Landau Acoustical Properties of Speech as Indicators of Depression and Suicidal Risk , 2008 .

[11]  D. Klatt,et al.  Analysis, synthesis, and perception of voice quality variations among female and male talkers. , 1990, The Journal of the Acoustical Society of America.

[12]  K. Scherer,et al.  Acoustic profiles in vocal emotion expression. , 1996, Journal of personality and social psychology.

[13]  Björn W. Schuller,et al.  Emotion and mental state recognition from speech , 2012, EURASIP J. Adv. Signal Process..

[14]  D G Childers,et al.  Vocal quality factors: analysis, synthesis, and perception. , 1991, The Journal of the Acoustical Society of America.