Phonetic-class based correlation analysis for severity of dysphonia

The main purpose of the research is to model the cognitive processes that occur when the physician determines the severity of the dysphonia, and to build an IT system that can substitute the subjective severity diagnosis used by a clinician. In this preliminary study the relationship between acoustic parameters and the speech defect severity determined by a clinician is investigated. Being limited in the number of pathological speech samples, it is very important to choose the effective parameters. After a phoneme level segmentation, acoustic parameters were measured at a predetermined fixed points in continuous speech. Parameters were grouped according to the phonetic classes (classes according to the manner of articulation), and the correlation of the grouped parameters with the severity of dysphonia given by the RBH scale was examined, where R stands for roughness, B for breathiness, H for overall hoarseness. The analysis was carried out on a database containing several pathological disease types, the most frequent being recurrent paresis and functional dysphonia. It was found that beyond the initial acoustic parameters such as jitter(ddp), shimmer(dda), Harmonics-to-Noise Ratio (HNR) and mel-frequency cepstral coefficients (mfcc) measured on vowels, it is worth measuring Soft Phonation Index (SPI) and Empirical mode decomposition (EMD) based frequency band ratios on different phonetic classes. These measures were found to correlate with the severity of dysphonia, determined by the clinician (RBH). They provide useful information and could be useful to differentiate different types of dysphonia like functional dysphonia and recurrent paresis.

[1]  J. Bhat,et al.  Soft phonation index — a sensitive parameter? , 2009, Indian journal of otolaryngology and head and neck surgery : official publication of the Association of Otolaryngologists of India.

[2]  Riccardo Fusaroli,et al.  Temporal dynamics of speech and gesture in Autism Spectrum Disorder , 2014, 2014 5th IEEE Conference on Cognitive Infocommunications (CogInfoCom).

[3]  Martin Ptok,et al.  Novel Approach to Acoustical Voice Analysis Using Artificial Neural Networks , 2000, Journal of the Association for Research in Otolaryngology.

[4]  Jack J. Jiang,et al.  Acoustic analyses of sustained and running voices from patients with laryngeal pathologies. , 2008, Journal of voice : official journal of the Voice Foundation.

[5]  Daming Wei,et al.  Pathological Voice Classification Based on a Single Vowel's Acoustic Features , 2007, 7th IEEE International Conference on Computer and Information Technology (CIT 2007).

[6]  Klara Vicsi,et al.  Connection between body condition and speech parameters - especially in the case of hypoxia , 2014, 2014 5th IEEE Conference on Cognitive Infocommunications (CogInfoCom).

[7]  P. Baranyi,et al.  Definition and synergies of cognitive infocommunications , 2012 .

[8]  Klara Vicsi,et al.  Language independent automatic speech segmentation into phoneme-like units on the base of acoustic distinctive features , 2013, 2013 IEEE 4th International Conference on Cognitive Infocommunications (CogInfoCom).

[9]  Pawel Strumillo,et al.  Application of Mel Cepstral Representation of Voice Recordings for Diagnosing Vocal Disorders , 2012 .

[10]  R. Ruben,et al.  Redefining the survival of the fittest: communication disorders in the 21st century. , 1999, International journal of pediatric otorhinolaryngology.

[11]  Elmar Nöth,et al.  Visualization of Intelligibility Measured by Language-Independent Features , 2014, TSD.

[12]  Klára Vicsi,et al.  Automatic Detection of Voice Disorders , 2015, SLSP.

[13]  A. Rauhut,et al.  Classification of voice qualities , 1986 .

[14]  Nancye C. Roussel,et al.  The clinical utility of the soft phonation index , 2006, Clinical linguistics & phonetics.

[15]  N. Huang,et al.  The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis , 1998, Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[16]  I. Titze,et al.  Populations in the U.S. workforce who rely on voice as a primary tool of trade: a preliminary report. , 1997, Journal of voice : official journal of the Voice Foundation.

[17]  Laszlo Czap,et al.  Development of an online subjective evaluation system for recorded speech of deaf and hard of hearing children , 2015, 2015 6th IEEE International Conference on Cognitive Infocommunications (CogInfoCom).

[18]  J. Evans Straightforward Statistics for the Behavioral Sciences , 1995 .

[19]  Peter Baranyi,et al.  Cognitive infocommunications: CogInfoCom , 2010, 2010 11th International Symposium on Computational Intelligence and Informatics (CINTI).

[20]  Vicsi Klára,et al.  Voice Disorder Detection on the Basis of Continuous Speech , 2011 .

[21]  A. Tsanas Acoustic analysis toolkit for biomedical speech signal processing : concepts and algorithms , 2013 .