Cepstral and Entropy Analyses in Vowels Excerpted from Continuous Speech of Dysphonic and Control Speakers

There is a growing interest in Cepstral and Entropy analyses of voice samples for defining a vocal health indicator, due to their reliability in investigating both regular and irregular voice signals. The purpose of this study is to determine whether the Cepstral Peak Prominence Smoothed (CPPS) and Sample Entropy (SampEn) could differentiate dysphonic speakers from normal speakers in vowels excerpted from readings and to compare their discrimination power. Results are reported for 33 patients and 31 controls, who read a standardized phonetically balanced passage while wearing a head mounted microphone. Vowels were excerpted from recordings using Automatic Speech Recognition and, after obtaining a measure for each vowel, individual distributions and their descriptive statistics were considered for CPPS and SampEn. The Receiver Operating Curve analysis revealed that the mean of the distributions was the parameter with the highest discrimination power for both CPPS and SampEn. CPPS showed a higher diagnostic precision than SampEn, exhibiting an Area Under Curve (AUC) of 0.85 compared to 0.72. A negative correlation between the parameters was found (Spearman; = −0.61), with higher SampEn corresponding to lower CPPS. The automatic method used in this study could provide support to voice monitorings in clinic and during individual's daily activities.

[1]  Arianna Astolfi,et al.  Cepstral peak prominence smoothed distribution as discriminator of vocal health in sustained vowel , 2017, 2017 IEEE International Instrumentation and Measurement Technology Conference (I2MTC).

[2]  D. Jamieson,et al.  Acoustic discrimination of pathological voice: sustained vowels versus continuous speech. , 2001, Journal of speech, language, and hearing research : JSLHR.

[3]  Guus de Krom,et al.  A Cepstrum-Based Technique for Determining a Harmonics-to-Noise Ratio in Speech Signals , 1993 .

[4]  V. Wolfe,et al.  Acoustic correlates of dysphonia: type and severity. , 1997, Journal of communication disorders.

[5]  Y. Heman-Ackah,et al.  The relationship between cepstral peak prominence and selected parameters of dysphonia. , 2002, Journal of voice : official journal of the Voice Foundation.

[6]  Narada D. Warakagoda,et al.  A Noise Robust Multilingual Reference Recogniser Based on Speechdat(II) , 2000, INTERSPEECH.

[7]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[8]  C R Rabinov,et al.  Comparing reliability of perceptual ratings of roughness and acoustic measure of jitter. , 1995, Journal of speech and hearing research.

[9]  J Kreiman,et al.  Comparison of voice analysis systems for perturbation measurement. , 1993, Journal of speech and hearing research.

[10]  Christopher J. Moore,et al.  Quantifying aberrant phonation using approximate entropy in electrolaryngography , 2005, Speech Commun..

[11]  J. Hillenbrand,et al.  Cepstral Peak Prominence: A More Reliable Measure of Dysphonia , 2003, The Annals of otology, rhinology, and laryngology.

[12]  J. Richman,et al.  Physiological time-series analysis using approximate entropy and sample entropy. , 2000, American journal of physiology. Heart and circulatory physiology.

[13]  Subhabrata Chakraborti,et al.  Nonparametric Statistical Inference , 2011, International Encyclopedia of Statistical Science.

[14]  S M Pincus,et al.  Approximate entropy as a measure of system complexity. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[15]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[16]  J. Hillenbrand,et al.  Acoustic correlates of breathy vocal quality: dysphonic voices and continuous speech. , 1996, Journal of speech and hearing research.

[17]  P. Van cauwenberge,et al.  Toward improved ecological validity in the acoustic measurement of overall voice quality: combining continuous speech and sustained vowels. , 2010, Journal of voice : official journal of the Voice Foundation.

[18]  Kathiresan Manickam,et al.  Electroglottogram approximate entropy: a novel single parameter for objective voice assessment. , 2010, The Journal of laryngology and otology.

[19]  G. de Krom A cepstrum-based technique for determining a harmonics-to-noise ratio in speech signals. , 1993, Journal of speech and hearing research.

[20]  L. Brinca,et al.  Use of cepstral analyses for differentiating normal from dysphonic voices: a comparative study of connected speech versus sustained vowel in European Portuguese female speakers. , 2014, Journal of voice : official journal of the Voice Foundation.

[21]  Benjamin Halberstam Acoustic and Perceptual Parameters Relating to Connected Speech Are More Reliable Measures of Hoarseness than Parameters Relating to Sustained Vowels , 2004, ORL.

[22]  Viv Bewick,et al.  Statistics review 13: Receiver operating characteristic curves , 2004, Critical care.

[23]  Giovanni Sparacino,et al.  Voice disorders assessed by (cross-) Sample Entropy of electroglottogram and microphone signals , 2013, Biomed. Signal Process. Control..

[24]  Svante Granqvist,et al.  The softest sound levels of the human voice in normal subjects. , 2015, The Journal of the Acoustical Society of America.

[25]  D. Cuesta-Frau,et al.  Characterization of Sample Entropy in the Context of Biomedical Signal Analysis , 2007, 2007 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[26]  Elmar Nöth,et al.  Vowel- and text-based cepstral analysis of chronic hoarseness. , 2012, Journal of voice : official journal of the Voice Foundation.