The importance of phase on voice quality assessment

State of the art objective measures for quantifying voice quality mostly consider estimation of features extracted from the magnitude spectrum. Assuming that speech is obtained by exciting a minimum-phase (vocal tract filter) and a maximum-phase component (glottal source), the amplitude spectrum cannot capture the maximum phase characteristics. Since voice quality is connected to the glottal source, the extracted features should be linked with the maximum-phase component of speech. This work proposes a new metric based on the phase spectrum for characterizing the maximum-phase component of the glottal source. The proposed feature, the Phase Distortion Deviation, reveals the irregularities of the glottal pulses and therefore, can be used for detecting voice disorders. This is evaluated in a ranking problem of speakers with spasmodic dysphonia. Results show that the obtained ranking is highly correlated with the subjective ranking provided by doctors in terms of overall severity, tremor and jitter. The high correlation of the suggested feature with different metrics reveals its ability to capture voice irregularities and highlights the importance of the phase spectrum in voice quality assessment.

[1]  Axel Röbel,et al.  Phase Minimization for Glottal Model Estimation , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Alan V. Oppenheim,et al.  Signal analysis by homomorphic prediction , 1976 .

[3]  D. Klatt,et al.  Analysis, synthesis, and perception of voice quality variations among female and male talkers. , 1990, The Journal of the Acoustical Society of America.

[4]  Yannis Stylianou,et al.  Analysis and Synthesis of Speech Using an Adaptive Full-Band Harmonic Model , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  A. Alwan,et al.  Variability in the relationships among voice quality, harmonic amplitudes, open quotient, and glottal area waveform shape in sustained phonation. , 2012, The Journal of the Acoustical Society of America.

[6]  J P Martens,et al.  Pitch and voiced/unvoiced determination with an auditory model. , 1992, The Journal of the Acoustical Society of America.

[7]  F. Itakura,et al.  The effect of group delay spectrum on timbre , 2002 .

[8]  Kuldip K. Paliwal,et al.  Usefulness of phase spectrum in human speech perception , 2003, INTERSPEECH.

[9]  Christophe d'Alessandro,et al.  Zeros of Z-transform representation with application to source-filter separation in speech , 2005, IEEE Signal Processing Letters.

[10]  G. Fant Dept. for Speech, Music and Hearing Quarterly Progress and Status Report the Lf-model Revisited. Transformations and Frequency Domain Analysis the Lf-model Revisited. Transformations and Frequency Domain Analysis* , 2022 .

[11]  Thierry Dutoit,et al.  Phase-based information for voice pathology detection , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  John Vanderkooy,et al.  On the Audibility of Midrange Phase Distortion in Audio Systems , 1980 .

[13]  Christophe d'Alessandro,et al.  The voice source as a causal/anticausal linear filter , 2003 .

[14]  Yannis Stylianou,et al.  Tremor in speakers with spasmodic dysphonia , 2011, MAVEBA.

[15]  Kumara Shama,et al.  Study of Harmonics-to-Noise Ratio and Critical-Band Energy Spectrum of Speech as Acoustic Indicators of Laryngeal and Voice Pathology , 2007, EURASIP J. Adv. Signal Process..

[16]  John Kane,et al.  Evaluation of glottal closure instant detection in a range of voice qualities , 2013, Speech Commun..

[17]  Ibon Saratxaga,et al.  Evaluation of Speaker Verification Security and Detection of HMM-Based Synthetic Speech , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Stephen A. Dyer,et al.  Digital signal processing , 2018, 8th International Multitopic Conference, 2004. Proceedings of INMIC 2004..

[19]  PAAVO ALKU,et al.  Glottal inverse filtering analysis of human voice production — A review of estimation and parameterization methods of the glottal excitation and their applications , 2011 .

[20]  Yannis Stylianou Removing linear phase mismatches in concatenative speech synthesis , 2001, IEEE Trans. Speech Audio Process..

[21]  Bayya Yegnanarayana,et al.  Determination of instants of significant excitation in speech using group delay function , 1995, IEEE Trans. Speech Audio Process..

[22]  Bayya Yegnanarayana,et al.  Speech processing using group delay functions , 1991, Signal Process..

[23]  Axel Röbel,et al.  Function of Phase-Distortion for glottal model estimation , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  A. Fourcin,et al.  Closing and opening phase variability in dysphonia , 2003 .

[25]  Ibon Saratxaga,et al.  Perceptual Importance of the Phase Related Information in Speech , 2012, INTERSPEECH.

[26]  C Manfredi,et al.  Long-term Follow-up of Patients with Spasmodic Dysphonia Repeatedly Treated with Botulinum Toxin Injections , 2011 .

[27]  Gilles Degottex,et al.  Usual voice quality features and glottal features for emotional valence detection , 2012 .

[28]  Yannis Stylianou,et al.  Wrapped Gaussian Mixture Models for Modeling and High-Rate Quantization of Phase Data of Speech , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[29]  A. Rosenberg Effect of glottal pulse shape on the quality of natural vowels. , 1969, The Journal of the Acoustical Society of America.

[30]  Thierry Dutoit,et al.  Complex cepstrum-based decomposition of speech for glottal source estimation , 2009, INTERSPEECH.

[31]  D G Childers,et al.  Vocal quality factors: analysis, synthesis, and perception. , 1991, The Journal of the Acoustical Society of America.

[32]  Nicholas I. Fisher,et al.  Statistical Analysis of Circular Data , 1993 .