Identifying Regions of Non-Modal Phonation Using Features of the Wavelet Transform

The present study proposes a new parameter for identifying breathy to tense voice qualities in a given speech segment using measurements from the wavelet transform. Techniques that can deliver robust information on the voice quality of a speech segment are desirable as they can help tune analysis strategies as well as provide automatic voice quality annotation in large corpora. The method described here involves wavelet-based decomposition of the speech signal into octave bands and then fitting a regression line to the maximum amplitudes at the different scales. The slope coefficient is then evaluated in terms of its ability to differentiate voice qualities compared to other parameters in the literature. The new parameter (named here Peak Slope) was shown to have robustness to babble noise added with signal to noise ratios as low as 10 dB. Furthermore, the proposed parameter was shown to provide better differentiation of breathy to tense voice qualities in both vowels and running speech.

[1]  Shubha Kadambe,et al.  Application of the wavelet transform for pitch detection of speech signals , 1992, IEEE Trans. Inf. Theory.

[2]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[3]  Nicolas Sturmel,et al.  Glottal closure instant detection using Lines of Maximum Amplitudes (LOMA) of thewavelet transform , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Nick Campbell,et al.  Listening between the lines : a study of paralinguistic information carried by tone-of-voice , 2004 .

[5]  H M Hanson,et al.  Glottal characteristics of female speakers: acoustic correlates. , 1997, The Journal of the Acoustical Society of America.

[6]  Hiroshi Ishiguro,et al.  A Method for Automatic Detection of Vocal Fry , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Paavo Alku,et al.  Glottal wave analysis with Pitch Synchronous Iterative Adaptive Inverse Filtering , 1991, Speech Commun..

[8]  Paavo Alku,et al.  Comparison of multiple voice source parameters in different phonation types , 2007, INTERSPEECH.

[9]  P. Alku,et al.  Normalized amplitude quotient for parametrization of the glottal flow. , 2002, The Journal of the Acoustical Society of America.

[10]  Noureddine Ellouze,et al.  Wavelet decomposition of voiced speech and mathematical morphology analysis for glottal closure instants detection , 2002, 2002 11th European Signal Processing Conference.

[11]  Tuan Van Pham,et al.  Wavelet analysis for robust speech processing and applications : applications of discrete wavelet transform and wavelet denoising to speech classification, speech enhancement and robust speech recognition , 2008 .

[12]  J. Laver The phonetic description of voice quality , 1980 .