Wavelet Maxima Dispersion for Breathy to Tense Voice Discrimination

This paper proposes a new parameter, the Maxima Dispersion Quotient (MDQ), for differentiating breathy to tense voice. Maxima derived following wavelet decomposition are often used for detecting edges in image processing, where locations of these maxima organize in the vicinity of the edge location. Similarly for tense voice, which typically displays sharp glottal closing characteristics, maxima following wavelet analysis are organized in the vicinity of the glottal closure instant (GCI). Contrastingly, as the phonation type tends away from tense voice towards a breathier phonation it is observed that the maxima become increasingly dispersed. The MDQ parameter is designed to quantify the extent of this dispersion and is shown to compare favorably to existing voice quality parameters, particularly for the analysis of continuous speech. Also, classification experiments reveal a significant improvement in the detection of the voice qualities when MDQ is included as an input to the classifier. Finally, MDQ is shown to be robust to additive noise down to a Signal-to-Noise Ratio of 10 dB.

[1]  A. Rosenberg Effect of glottal pulse shape on the quality of natural vowels. , 1969, The Journal of the Acoustical Society of America.

[2]  J. Laver The phonetic description of voice quality , 1980 .

[3]  C. Gobl Voice source dynamics in connected speech , 1988 .

[4]  T. Hacki [Classification of glottal dysfunctions on the basis of electroglottography]. , 1989, Folia phoniatrica.

[5]  T. Hacki Klassifizierung von Glottisdysfunktionen mit Hilfe der Elektroglottographie , 1989 .

[6]  D. Klatt,et al.  Analysis, synthesis, and perception of voice quality variations among female and male talkers. , 1990, The Journal of the Acoustical Society of America.

[7]  Paavo Alku,et al.  Glottal wave analysis with Pitch Synchronous Iterative Adaptive Inverse Filtering , 1991, Speech Commun..

[8]  D G Childers,et al.  Vocal quality factors: analysis, synthesis, and perception. , 1991, The Journal of the Acoustical Society of America.

[9]  Christer Gobl,et al.  Acoustic characteristics of voice quality , 1992, Speech Commun..

[10]  Stéphane Mallat,et al.  Characterization of Signals from Multiscale Edges , 2011, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[12]  K. Stevens,et al.  Classification of glottal vibration from acoustic measurements , 1995 .

[13]  Paavo Alku,et al.  Parabolic spectral parameter - A new method for quantification of the glottal flow , 1997, Speech Commun..

[14]  H M Hanson,et al.  Glottal characteristics of female speakers: acoustic correlates. , 1997, The Journal of the Acoustical Society of America.

[15]  Christophe d'Alessandro,et al.  Robust glottal closure detection using the wavelet transform , 1999, EUROSPEECH.

[16]  P. Alku,et al.  Normalized amplitude quotient for parametrization of the glottal flow. , 2002, The Journal of the Acoustical Society of America.

[17]  Ailbhe Ní Chasaide,et al.  The role of voice quality in communicating emotion, mood and attitude , 2003, Speech Commun..

[18]  C. Gobl,et al.  Amplitude-Based Source Parameters for Measur ing Voice Quality , 2003 .

[19]  Marc Schröder,et al.  Expressing vocal effort in concatenative synthesis , 2003 .

[20]  Politeness and Voice Quality – The Alternative Method to Measure Aspiration Noise , 2004 .

[21]  Bin Yang,et al.  Robust Estimation of Voice Quality Parameters Under Realworld Disturbances , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[22]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[23]  Paavo Alku,et al.  Comparison of multiple voice source parameters in different phonation types , 2007, INTERSPEECH.

[24]  Mike Brookes,et al.  Estimation of Glottal Closure Instants in Voiced Speech Using the DYPSA Algorithm , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[25]  Bin Yang,et al.  The Relevance of Voice Quality Features in Speaker Independent Emotion Recognition , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[26]  J. Liljencrants,et al.  Dept. for Speech, Music and Hearing Quarterly Progress and Status Report a Four-parameter Model of Glottal Flow , 2022 .

[27]  Colleen Richey,et al.  Effects of vocal effort and speaking style on text-independent speaker verification , 2008, INTERSPEECH.

[28]  Thierry Dutoit,et al.  Glottal closure and opening instant detection from speech signals , 2019, INTERSPEECH.

[29]  Ailbhe Ní Chasaide,et al.  An exploration of voice source correlates of focus , 2010, INTERSPEECH.

[30]  Hiroshi Ishiguro,et al.  Analysis of the Roles and the Dynamics of Breathy and Whispery Voice Qualities in Dialogue Speech , 2010, EURASIP J. Audio Speech Music. Process..

[31]  Nicolas Sturmel,et al.  Glottal closure instant and voice source analysis using time-scale lines of maximum amplitude , 2011 .

[32]  Axel Röbel,et al.  Pitch transposition and breathiness modification using a glottal source model and its adapted vocal-tract filter , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[33]  Junichi Yamagishi,et al.  HMM-based speech synthesiser using the LF-model of the glottal source , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[34]  Paavo Alku,et al.  HMM-Based Speech Synthesis Utilizing Glottal Inverse Filtering , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[35]  Eugene Coyle,et al.  A Frequency Domain Approach to ARX-LF Voiced Speech Parameterization and Synthesis , 2011, INTERSPEECH.

[36]  Hiroshi Ishiguro,et al.  Improved Acoustic Characterization of Breathy and Whispery Voices , 2011, INTERSPEECH.

[37]  Abeer Alwan,et al.  Joint Robust Voicing Detection and Pitch Estimation Based on Residual Harmonics , 2019, INTERSPEECH.

[38]  John Kane,et al.  Detecting a targeted voice style in an audiobook using voice quality features , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[39]  Milan Sigmund,et al.  Impact of vocal effort variability on automatic speech recognition , 2012, Speech Commun..

[40]  John Kane,et al.  An audiovisual political speech analysis incorporating eye-tracking and perception data , 2012, LREC.

[41]  Patrick A. Naylor,et al.  Detection of Glottal Closure Instants From Speech Signals: A Quantitative Review , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[42]  Gilles Degottex,et al.  Usual voice quality features and glottal features for emotional valence detection , 2012 .

[43]  John Kane,et al.  Evaluation of glottal closure instant detection in a range of voice qualities , 2013, Speech Commun..

[44]  Friedhelm Schwenker,et al.  Investigating fuzzy-input fuzzy-output support vector machines for robust voice quality classification , 2013, Comput. Speech Lang..