A methodology for voice classification based on the personalized fundamental frequency estimation

Abstract Nowadays, the incidence of voice disorders is increasing rapidly, with about a third of the population suffering from dysphonia at some point in their lives. Dysphonia is a disorder that alters vocal quality and can impair and reduce the quality of life. The structural or functional alteration of the phonatory apparatus, unhealthy lifestyles or an excessive use of the vocal cords for work activities (e.g. teaching) can cause voice disorders. Unfortunately, people who suffer from dysphonia often underestimate its symptoms and therefore delay consulting a speech therapist for accurate voice assessment and treatment. Voice disorder evaluation involves a series of tests, including an acoustic analysis. This quantifies the measurements of voice quality through the evaluation of certain characteristic parameters, for example the fundamental frequency (F0). In this paper, a personalized methodology for the estimation of the F0 is presented. The personalization is accomplished by taking into account two of the main factors that influence the F0, the gender and age of the subject. The estimation of the F0 is crucial for the classification of the voice signal, because the discrimination of a healthy voice from a pathological one is achieved by evaluating the inclusion of the F0 value within the healthy range. To evaluate the presented methodology, we have carried out a set of tests by using some voice signals selected from an available database in order to compare the classification ability of the proposed methodology with other algorithms existing in the literature. The numerical results obtained show that the proposed methodology provides a good accuracy, sensitivity, and specificity, respectively of over 77%, 72% and 81%, values better than those achieved by the most frequently other used and cited fundamental frequency estimation algorithms. Additionally, a statistical analysis to evaluate whether or not a statistically significant difference exists between the accuracy, sensitivity and specificity has been carried out. The outcome of the ANOVA tests and of the t-tests confirms that there is a significant difference between the proposed methodology and the other algorithms. Finally, the presented methodology could be embedded in a portable and simple m-health application that could be useful for the monitoring of the state of vocal health and the prevention of voice disorders.

[1]  Michael Wolf,et al.  A clinical comparison between MDVP and Praat softwares: is there a difference? , 2007, MAVEBA.

[2]  Eduardo Lleida,et al.  Voice Pathology Detection on the Saarbrücken Voice Database with Calibration and Fusion of Scores Using MultiFocal Toolkit , 2012, IberSPEECH.

[3]  Hideki Kawahara,et al.  Nearly defect-free F0 trajectory extraction for expressive speech modifications based on STRAIGHT , 2005, INTERSPEECH.

[4]  Haldun Oğuz,et al.  Comparison of results in two acoustic analysis programs: Praat and MDVP , 2011, Turkish Journal of Medical Sciences.

[5]  Alex Burdorf,et al.  Voice disorders in teachers and their associations with work-related factors: a systematic review. , 2013, Journal of communication disorders.

[6]  Joan E Sussman,et al.  Changes in acoustic characteristics of the voice across the life span: measures from individuals 4-93 years of age. , 2011, Journal of speech, language, and hearing research : JSLHR.

[7]  M. Ross,et al.  Average magnitude difference function pitch extractor , 1974 .

[8]  Angélique Remacle,et al.  Description of patients consulting the voice clinic regarding gender, age, occupational status, and diagnosis , 2017, European Archives of Oto-Rhino-Laryngology.

[9]  B Boyanov,et al.  Acoustic analysis of pathological voices. A voice analysis system for the screening of laryngeal diseases. , 1997, IEEE engineering in medicine and biology magazine : the quarterly magazine of the Engineering in Medicine & Biology Society.

[10]  B. Barsties,et al.  Assessment of voice quality: Current state-of-the-art. , 2015, Auris, nasus, larynx.

[11]  Claudio Storck,et al.  Reliable jitter and shimmer measurements in voice clinics: the relevance of vowel, gender, vocal intensity, and fundamental frequency effects in a typical clinical task. , 2011, Journal of voice : official journal of the Voice Foundation.

[12]  A. Gama,et al.  Occupational and individual risk factors for dysphonia in teachers. , 2012, Occupational medicine.

[13]  Eric J. Hunter,et al.  Gender differences affecting vocal health of women in vocally demanding careers , 2011, Logopedics, phoniatrics, vocology.

[14]  Hugo Leonardo Rufiner,et al.  Pathological Voice Analysis and Classification Based on Empirical Mode Decomposition , 2009, COST 2102 Training School.

[15]  Leonardo Wanderley Lopes,et al.  Accuracy of Acoustic Analysis Measurements in the Evaluation of Patients With Different Laryngeal Diagnoses. , 2017, Journal of voice : official journal of the Voice Foundation.

[16]  Marina Mat Baki,et al.  Comparison between OperaVOX™ and MDVP: Preliminary Results , 2013 .

[17]  Mike Brookes,et al.  Estimation of Glottal Closure Instants in Voiced Speech Using the DYPSA Algorithm , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Ellen S. Deutsch,et al.  Clinical Practice Guideline: Hoarseness (Dysphonia) , 2009, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[19]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Sofie Claeys,et al.  The impact of voice disorders among teachers: vocal complaints, treatment-seeking behavior, knowledge of vocal care, and voice-related absenteeism. , 2011, Journal of voice : official journal of the Voice Foundation.

[21]  Anders Eriksson,et al.  The frequency range of the voice fundamental in the speech of male and female adults , 1993 .

[22]  T.W. Berger,et al.  Pathological Voice Assessment , 2006, 2006 International Conference of the IEEE Engineering in Medicine and Biology Society.

[23]  Xuejing Sun,et al.  Pitch determination and voice quality analysis using Subharmonic-to-Harmonic Ratio , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[24]  Norimar Hernandes Dias,et al.  Voice Disorders: Etiology and Diagnosis. , 2016, Journal of voice : official journal of the Voice Foundation.

[25]  Giuseppe De Pietro,et al.  An m-health system for the estimation of voice disorders , 2015, 2015 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[26]  Mario Cannataro,et al.  A Novel Portable Device for Laryngeal Pathologies Analysis and Classification , 2010 .

[27]  M. Stoicheff Speaking fundamental frequency characteristics of nonsmoking female adults. , 1981, Journal of speech and hearing research.

[28]  S. Gray,et al.  Voice Disorders in the General Population: Prevalence, Risk Factors, and Occupational Impact , 2005, The Laryngoscope.

[29]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[30]  Arlindo Neto Montagnoli,et al.  Vocal Dynamic Visual Pattern for voice characterization , 2011 .

[31]  Stefan Todorov Hadjitodorov,et al.  Fundamental frequency estimation of voice of patients with laryngeal disorders , 2003, Inf. Sci..

[32]  Hugo Leonardo Rufiner,et al.  Empirical mode decomposition. Spectral properties in normal and pathological voices , 2009 .

[33]  Bayya Yegnanarayana,et al.  Event-Based Instantaneous Fundamental Frequency Estimation From Speech Signals , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[34]  P. Boersma ACCURATE SHORT-TERM ANALYSIS OF THE FUNDAMENTAL FREQUENCY AND THE HARMONICS-TO-NOISE RATIO OF A SAMPLED SOUND , 1993 .

[35]  Claudia Manfredi,et al.  Acoustic measure of noise energy in vocal folds operated patients , 1998, 9th European Signal Processing Conference (EUSIPCO 1998).

[36]  Haewon Byeon Exploring Potential Risk Factors for Benign Vocal Fold Mucosal Disorders Using Weighted Logistic Regression , 2014, BSBT 2014.

[37]  Pan Jiaqiang Pitch detection method based on Hilbert-Huang Transform for speech signals , 2006 .

[38]  Maria Inês Pegoraro Krook,et al.  Speaking fundamental frequency characteristics of normal Swedish subjects obtained by glottal frequency analysis. , 1988, Folia phoniatrica.

[39]  H. Hollien,et al.  Speaking fundamental frequency and chronologic age in males. , 1972, Journal of speech and hearing research.

[40]  Antanas Verikas,et al.  Categorizing normal and pathological voices: automated and perceptual categorization. , 2011, Journal of voice : official journal of the Voice Foundation.

[41]  M. P. Gelfer,et al.  The relative contributions of speaking fundamental frequency and formant frequencies to gender identification based on isolated vowels. , 2005, Journal of voice : official journal of the Voice Foundation.

[42]  G. Magliulo,et al.  Reinke's edema and risk factors: clinical and histopathologic aspects. , 2002, American journal of otolaryngology.

[43]  N. Gale,et al.  Larynx and Hypopharynx , 2006 .

[44]  Albert Espelt,et al.  Prevalence of voice disorders in the elderly: a systematic review of population-based studies , 2015, European Archives of Oto-Rhino-Laryngology.

[45]  Soren Y Lowell,et al.  Spectral- and cepstral-based measures during continuous speech: capacity to distinguish dysphonia and consistency within a speaker. , 2010, Journal of voice : official journal of the Voice Foundation.

[46]  Lawrence R. Rabiner,et al.  On the use of autocorrelation analysis for pitch detection , 1977 .