Pathological voice detection and binary classification using MPEG-7 audio features

Abstract Objectives A pathological voice detection and classification method based on MPEG-7 audio low-level features is proposed in this paper. MPEG-7 features are originally used for multimedia indexing, which includes both video and audio. Indexing is related to event detection, and as pathological voice is a separate event than normal voice, we show that MPEG-7 part-4 audio low-level features can do very well in detecting pathological voices, as well as binary classifying the pathologies. Patients and methods The experiments are done on a subset of sustained vowel (“AH”) recordings from healthy and voice pathological subjects, from the Massachusetts Eye and Ear Infirmary (MEEI) database. For classification, support vector machine (SVM) is applied. An optional feature selection method, namely, Fisher discrimination ratio is applied. Results The proposed method with MPEG-7 audio features and SVM classification is evaluated on voice pathology detection, as well as binary pathologies classification. The proposed method is able to achieve an accuracy of 99.994% with a standard deviation of 0.0105% for detecting pathological voices and an accuracy up to 100% for binary pathologies classification. Conclusion MPEG-7 descriptors can reliably be used for automatic voice pathology detection and classification.

[1]  Pedro Gómez Vilda,et al.  Methodological issues in the development of automatic systems for voice pathology detection , 2006, Biomed. Signal Process. Control..

[2]  Ghulam Muhammad,et al.  Multidirectional regression (MDR)-based features for automatic voice disorder detection. , 2012, Journal of voice : official journal of the Voice Foundation.

[3]  Yannis Stylianou,et al.  Spectral jitter modeling and estimation , 2009, Biomed. Signal Process. Control..

[4]  Piotr Dalka,et al.  MPEG-7-based Low-Level Descriptor Effectiveness in the Automatic Musical Sound Classification , 2004 .

[5]  P. Lieberman Perturbations in Vocal Pitch , 1960 .

[6]  Sazali Yaacob,et al.  A hybrid expert system approach for telemonitoring of vocal fold pathology , 2013, Appl. Soft Comput..

[7]  Muhammad Ghulam,et al.  Environment Recognition for Digital Audio Forensics Using MPEG-7 and MEL Cepstral Features , 2011, Int. Arab J. Inf. Technol..

[8]  Roland Linder,et al.  Artificial neural network-based classification to screen for dysphonia using psychoacoustic scaling of acoustic voice features. , 2008, Journal of voice : official journal of the Voice Foundation.

[9]  Chrisa Tsinaraki,et al.  Ontology-Based Semantic Indexing for MPEG-7 and TV-Anytime Audiovisual Content , 2005, Multimedia Tools and Applications.

[10]  Juan Ignacio Godino-Llorente,et al.  Characterization of dysphonic voices by means of a filterbank-based spectral analysis: sustained vowels and running speech. , 2013, Journal of voice : official journal of the Voice Foundation.

[11]  Charalampos Dimoulas,et al.  Pattern classification and audiovisual content management techniques using hybrid expert systems: A video-assisted bioacoustics application in Abdominal Sounds pattern analysis , 2011, Expert Syst. Appl..

[12]  D. Jamieson,et al.  Identification of pathological voices using glottal noise measures. , 2000, Journal of speech, language, and hearing research : JSLHR.

[13]  Douglas A. Reynolds,et al.  Corpora for the evaluation of speaker recognition systems , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[14]  Martin Lojka,et al.  Performance of Basic Spectral Descriptors and MRMR Algorithm to the Detection of Acoustic Events , 2012, MCSS.

[15]  Hong-Goo Kang,et al.  An Investigation of Vocal Tract Characteristics for Acoustic Discrimination of Pathological Voices , 2013, BioMed research international.

[16]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[17]  Ronald J. Baken,et al.  Clinical measurement of speech and voice , 1987 .

[18]  Holger Crysandt,et al.  Temporal audio segmentation using MPEG-7 descriptors , 2003, IS&T/SPIE Electronic Imaging.

[19]  Ian Burnett,et al.  Musical Onset Detection using MPEG-7 Audio Descriptors , 2010 .

[20]  I. Titze,et al.  Comparison of Fo extraction methods for high-precision voice perturbation measurements. , 1993, Journal of speech and hearing research.

[21]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[22]  Pedro Gómez Vilda,et al.  Dimensionality Reduction of a Pathological Voice Quality Assessment System Based on Gaussian Mixture Models and Short-Term Cepstral Parameters , 2006, IEEE Transactions on Biomedical Engineering.

[23]  Farshad Almasganj,et al.  Wavelet adaptation for automatic voice disorders sorting , 2013, Comput. Biol. Medicine.

[24]  Yannis Stylianou,et al.  Voice Pathology Detection and Discrimination Based on Modulation Spectral Features , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[25]  George Kalliris,et al.  Long-term signal detection, segmentation and summarization using wavelets and fractal dimension: A bioacoustics application in gastrointestinal-motility monitoring , 2007, Comput. Biol. Medicine.

[26]  Germán Castellanos-Domínguez,et al.  Automatic Detection of Pathological Voices Using Complexity Measures, Noise Parameters, and Mel-Cepstral Coefficients , 2011, IEEE Transactions on Biomedical Engineering.

[27]  Jhing-Fa Wang,et al.  Environmental Sound Classification using Hybrid SVM/KNN Classifier and MPEG-7 Audio Low-Level Descriptor , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[28]  T. Zielinski,et al.  MRMR-based feature selection for automatic asthma wheezes recognition , 2012, 2012 International Conference on Signals and Electronic Systems (ICSES).

[29]  Nicolas Malyska Automatic voice disorder recognition using acoustic amplitude modulation features , 2004 .

[30]  Andrzej A. Kononowicz,et al.  MPEG-7 as a Metadata Standard for Indexing of Surgery Videos in Medical E-Learning , 2008, ICCS.

[31]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[32]  Sridhar Krishnan,et al.  Pathological speech signal analysis and classification using empirical mode decomposition , 2013, Medical & Biological Engineering & Computing.