The Effect of Fuzzy Training Targets on Voice Quality Classification

The dynamic use of voice qualities in spoken language can reveal useful information on a speaker's attitude, mood and affective states. This information may be desirable for a range of speech technology applications. However, annotation of voice quality may frequently be inconsistent across raters. But whom should one trust or is the truth somewhere in between? The current study looks first to describe a voice quality feature set that is suitable for differentiating voice qualities on a tense to breathy dimension. These features are used as inputs to a fuzzy-input fuzzy-output support vector machine (F2SVM) algorithm, to automatically classify the voice qualities. The F2SVM is compared to standard approaches and shows promising results. Performances for cross validation, leave one speaker out, and cross corpus experiments of around 90% are achieved.

[1]  Bin Yang,et al.  Cascaded emotion classification via psychological emotion dimensions using a large set of voice quality parameters , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Paavo Alku,et al.  Glottal wave analysis with Pitch Synchronous Iterative Adaptive Inverse Filtering , 1991, Speech Commun..

[3]  Paavo Alku,et al.  Comparison of multiple voice source parameters in different phonation types , 2007, INTERSPEECH.

[4]  Friedhelm Schwenker,et al.  Fuzzy-Input Fuzzy-Output One-Against-All Support Vector Machines , 2007, KES.

[5]  John Kane,et al.  A spectral LF model based approach to voice source parameterisation , 2010, INTERSPEECH.

[6]  J. Liljencrants,et al.  Dept. for Speech, Music and Hearing Quarterly Progress and Status Report a Four-parameter Model of Glottal Flow , 2022 .

[7]  Sheng-De Wang,et al.  Fuzzy support vector machines , 2002, IEEE Trans. Neural Networks.

[8]  J. Laver The phonetic description of voice quality , 1980 .

[9]  Mirjam Wester Automatic Classification of Voice Quality: Comparing Regression Models and Hidden Markov Models , 1998 .

[10]  C. Gobl The Voice Source in Speech Communication - Production and Perception Experiments Involving Inverse Filtering and Synthesis , 2003 .

[11]  Quarterly Progress and Status Report A preliminary study of acoustic voice quality correlates , 2007 .

[12]  Ailbhe Ní Chasaide,et al.  The role of voice quality in communicating emotion, mood and attitude , 2003, Speech Commun..

[13]  Friedhelm Schwenker,et al.  Solving Multi-class Pattern Recognition Problems with Tree-Structured Support Vector Machines , 2001, DAGM-Symposium.

[14]  Lou Boves,et al.  Fitting a LF-model to inverse filter signals , 1993, EUROSPEECH.

[15]  Lakhmi C. Jain,et al.  Knowledge-Based Intelligent Information and Engineering Systems , 2004, Lecture Notes in Computer Science.

[16]  P. Alku,et al.  Normalized amplitude quotient for parametrization of the glottal flow. , 2002, The Journal of the Acoustical Society of America.

[17]  John Kane,et al.  Identifying Regions of Non-Modal Phonation Using Features of the Wavelet Transform , 2011, INTERSPEECH.

[18]  Kristin P. Bennett,et al.  Support vector machines: hype or hallelujah? , 2000, SKDD.

[19]  M. Lugger,et al.  Classification of different speaking groups ITG Fachtagung Sprachkommunikation 2006 CLASSIFICATION OF DIFFERENT SPEAKING GROUPS BY MEANS OF VOICE QUALITY PARAMETERS , 2011 .