Data dependent random forest applied to screening for laryngeal disorders through analysis of sustained phonation: acoustic versus contact microphone.

Comprehensive evaluation of results obtained using acoustic and contact microphones in screening for laryngeal disorders through analysis of sustained phonation is the main objective of this study. Aiming to obtain a versatile characterization of voice samples recorded using microphones of both types, 14 different sets of features are extracted and used to build an accurate classifier to distinguish between normal and pathological cases. We propose a new, data dependent random forests-based, way to combine information available from the different feature sets. An approach to exploring data and decisions made by a random forest is also presented. Experimental investigations using a mixed gender database of 273 subjects have shown that the perceptual linear predictive cepstral coefficients (PLPCC) was the best feature set for both microphones. However, the linear predictive coefficients (LPC) and linear predictive cosine transform coefficients (LPCTC) exhibited good performance in the acoustic microphone case only. Models designed using the acoustic microphone data significantly outperformed the ones built using data recorded by the contact microphone. The contact microphone did not bring any additional information useful for the classification. The proposed data dependent random forest significantly outperformed the traditional random forest.

[1]  Ghulam Muhammad,et al.  Multidirectional regression (MDR)-based features for automatic voice disorder detection. , 2012, Journal of voice : official journal of the Voice Foundation.

[2]  Antanas Verikas,et al.  Categorizing normal and pathological voices: automated and perceptual categorization. , 2011, Journal of voice : official journal of the Voice Foundation.

[3]  Antanas Verikas,et al.  Mining data with random forests: A survey and results of new tests , 2011, Pattern Recognit..

[4]  Werner Verhelst,et al.  Improved speech recognition in noisy environments by using a throat microphone for accurate voicing detection , 2010, 2010 18th European Signal Processing Conference.

[5]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[6]  Y. Horii Jitter and shimmer differences among sustained vowel phonations. , 1982, Journal of speech and hearing research.

[7]  Roland Linder,et al.  Artificial neural network-based classification to screen for dysphonia using psychoacoustic scaling of acoustic voice features. , 2008, Journal of voice : official journal of the Voice Foundation.

[8]  Engin Erzin,et al.  Improving Throat Microphone Speech Recognition by Joint Analysis of Throat and Acoustic Microphone Recordings , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Anuradha S. Nigade Throat Microphone Signals for Isolated Word Recognition Using LPC , 2012 .

[10]  Karthikeyan Umapathy,et al.  Discrimination of pathological voices using a time-frequency approach , 2005, IEEE Transactions on Biomedical Engineering.

[11]  Philip de Chazal,et al.  Telephony-based voice pathology assessment using automated speech analysis , 2006, IEEE Transactions on Biomedical Engineering.

[12]  Dimitar D Deliyski,et al.  Influence of data acquisition environment on accuracy of acoustic voice quality measurements. , 2005, Journal of voice : official journal of the Voice Foundation.

[13]  H. K. Schutte,et al.  A new method to record subglottal pressure waves: potential applications. , 2003, Journal of Voice.

[14]  A. Shahina,et al.  Combining spectral features of standard and Throat Microphones for speaker identification , 2012, 2012 International Conference on Recent Trends in Information Technology.

[15]  Thomas Kühnel,et al.  The impact of the microphone position on the frequency analysis of snoring sounds , 2009, European Archives of Oto-Rhino-Laryngology.

[16]  Svante Granqvist,et al.  Guidelines for selecting microphones for human voice production research. , 2010, American journal of speech-language pathology.

[17]  Mark Nolan,et al.  Accelerometer based measurement for the mapping of neck surface vibrations during vocalized speech , 2009, 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[18]  Youri Maryn,et al.  Perturbation Measures of Voice: A Comparative Study between Multi-Dimensional Voice Program and Praat , 2009, Folia Phoniatrica et Logopaedica.

[19]  Antanas Verikas,et al.  Automated speech analysis applied to laryngeal disease categorization , 2008, Comput. Methods Programs Biomed..

[20]  Dimitar D Deliyski,et al.  Adverse effects of environmental noise on acoustic voice quality measurements. , 2005, Journal of voice : official journal of the Voice Foundation.

[21]  Bayya Yegnanarayana,et al.  Mapping Speech Spectra from Throat Microphone to Close-Speaking Microphone: A Neural Network Approach , 2007, EURASIP J. Adv. Signal Process..

[22]  Jack J. Jiang,et al.  Acoustic analyses of sustained and running voices from patients with laryngeal pathologies. , 2008, Journal of voice : official journal of the Voice Foundation.

[23]  Ghulam Muhammad,et al.  Automatic voice pathology detection and classification using vocal tract area irregularity , 2016 .

[24]  W S Winholtz,et al.  Effect of microphone type and placement on voice perturbation measurements. , 1993, Journal of speech and hearing research.

[25]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[26]  Richard B Reilly,et al.  Performance of an Automated, Remote System to Detect Vocal Fold Paralysis , 2008, The Annals of otology, rhinology, and laryngology.

[27]  M. Bodt,et al.  The value of the Acoustic Voice Quality Index as a measure of dysphonia severity in subjects speaking different languages , 2013, European Archives of Oto-Rhino-Laryngology.

[28]  H. Franco,et al.  Combining standard and throat microphones for robust speech recognition , 2003, IEEE Signal Processing Letters.

[29]  Pedro Gómez Vilda,et al.  Automatic detection of voice impairments from text-dependent running speech , 2009, Biomed. Signal Process. Control..

[30]  Ingo R Titze,et al.  Estimation of sound pressure levels of voiced speech from skin vibration of the neck. , 2005, The Journal of the Acoustical Society of America.

[31]  P Kitzing,et al.  A comparison of contact microphone and electroglottograph for the measurement of vocal fundamental frequency. , 1980, Journal of speech and hearing research.

[32]  Elmar Nöth,et al.  Automatic Speech Recognition Systems for the Evaluation of Voice and Speech Disorders in Head and Neck Cancer , 2010, EURASIP J. Audio Speech Music. Process..

[33]  Max A. Little,et al.  Novel Speech Signal Processing Algorithms for High-Accuracy Classification of Parkinson's Disease , 2012, IEEE Transactions on Biomedical Engineering.

[34]  Jacob B Munger,et al.  Frequency response of the skin on the head and neck during production of selected speech sounds. , 2008, The Journal of the Acoustical Society of America.

[35]  Werner Verhelst,et al.  A Multi-sensor Speech Database with Applications towards Robust Speech Processing in hostile Environments , 2008, LREC.

[36]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.