Voice pathology detection using auto-correlation of different filters bank

This paper investigates the contribution of frequency bands for automatic voice pathology detection. First, the input voice signal is passed through a number of time-domain band-pass filters. The center frequencies are spaced on an octave scale. Each filter output is then divided into overlapping frames. Auto-correlation function is applied to each block to find the first largest peak, in areas other than near the dc value, and its corresponding lag. Therefore, each frame is having only these two features (peak value and lag). As classifier, we use Gaussian mixture models (GMM) and support vector machine (SVM), separately. Two well-known available databases, one in English (MEEI) and the other one in German (SVD), are used in the investigation. The results demonstrate that the most significant frequency range to detect voice pathology is between 1500 Hz and 3500 Hz. Using this filter band and with only two features, the accuracy is above 97% in case of the MEEI database.

[1]  T. Ritchings,et al.  Objective assessment of pathological voice quality using multi-layer perceptrons , 1999, Proceedings of the First Joint BMES/EMBS Conference. 1999 IEEE Engineering in Medicine and Biology 21st Annual Conference and the 1999 Annual Fall Meeting of the Biomedical Engineering Society (Cat. N.

[2]  Joseana Macêdo Fechine,et al.  Pathological voice discrimination using cepstral analysis, vector quantization and Hidden Markov Models , 2008, 2008 8th IEEE International Conference on BioInformatics and BioEngineering.

[3]  Tim Ritchings,et al.  Pathological voice quality assesment using artificial neural networks , 2001, MAVEBA.

[4]  Aslam Muhammad,et al.  A Speaker Identification System Using MFCC Features with VQ Technique , 2009, 2009 Third International Symposium on Intelligent Information Technology Application.

[5]  Pavel Pudil,et al.  Introduction to Statistical Pattern Recognition , 2006 .

[6]  Muhammad Ghulam,et al.  Pathological voice detection and binary classification using MPEG-7 audio features , 2014, Biomed. Signal Process. Control..

[7]  T. Ananthakrishna,et al.  k-means nearest neighbor classifier for voice pathology , 2004, Proceedings of the IEEE INDICON 2004. First India Annual Conference, 2004..

[8]  Eduardo Lleida,et al.  Voice Pathology Detection on the Saarbrücken Voice Database with Calibration and Fusion of Scores Using MultiFocal Toolkit , 2012, IberSPEECH.

[9]  Sadaoki Furui,et al.  Comparison of text-independent speaker recognition methods using VQ-distortion and discrete/continuous HMMs , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Stefan Hadjitodorov,et al.  ACOUSTIC ANALYSIS OF PATHOLOGICAL VOICES , 1997 .

[11]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[12]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[13]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[14]  H.L. Rufiner,et al.  Acoustic analysis of speech for detection of laryngeal pathologies , 2000, Proceedings of the 22nd Annual International Conference of the IEEE Engineering in Medicine and Biology Society (Cat. No.00CH37143).

[15]  Jorma Laaksonen,et al.  LVQPAK: A software package for the correct application of Learning Vector Quantization algorithms , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[16]  Pedro Gómez Vilda,et al.  Dimensionality Reduction of a Pathological Voice Quality Assessment System Based on Gaussian Mixture Models and Short-Term Cepstral Parameters , 2006, IEEE Transactions on Biomedical Engineering.

[17]  Shigeo Abe Support Vector Machines for Pattern Classification , 2010, Advances in Pattern Recognition.

[18]  B. Atal Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. , 1974, The Journal of the Acoustical Society of America.

[19]  M. Hariharan,et al.  Identification of vocal fold pathology based on Mel Frequency Band Energy Coefficients and singular value decomposition , 2009, 2009 IEEE International Conference on Signal and Image Processing Applications.

[20]  Ana María Martínez Enríquez,et al.  Text-Independent Speaker Identification Using VQ-HMM Model Based Multiple Classifier System , 2010, MICAI.

[21]  Dae-Hyun Kim,et al.  Screening of pathological voice from ARS using neural networks , 2001, MAVEBA.

[22]  M. A. Anusuya,et al.  Front end analysis of speech recognition: a review , 2011, Int. J. Speech Technol..

[23]  Yannis Stylianou,et al.  Voice Pathology Detection and Discrimination Based on Modulation Spectral Features , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[24]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[25]  Jianwu Dang,et al.  An investigation of dependencies between frequency components and speaker characteristics for text-independent speaker identification , 2008, Speech Commun..

[26]  B Boyanov,et al.  Acoustic analysis of pathological voices. A voice analysis system for the screening of laryngeal diseases. , 1997, IEEE engineering in medicine and biology magazine : the quarterly magazine of the Engineering in Medicine & Biology Society.

[27]  Claudio Storck,et al.  Reliable jitter and shimmer measurements in voice clinics: the relevance of vowel, gender, vocal intensity, and fundamental frequency effects in a typical clinical task. , 2011, Journal of voice : official journal of the Voice Foundation.

[28]  W.J.J. Roberts,et al.  Automatic speaker recognition using Gaussian mixture models , 1999, 1999 Information, Decision and Control. Data and Information Fusion Symposium, Signal Processing and Communications Symposium and Decision and Control Symposium. Proceedings (Cat. No.99EX251).

[29]  Stefan Todorov Hadjitodorov,et al.  Laryngeal pathology detection by means of class-specific neural maps , 2000, IEEE Transactions on Information Technology in Biomedicine.

[30]  Juan Ignacio Godino-Llorente,et al.  Characterization of dysphonic voices by means of a filterbank-based spectral analysis: sustained vowels and running speech. , 2013, Journal of voice : official journal of the Voice Foundation.

[31]  Pedro Gómez Vilda,et al.  Automatic detection of voice impairments by means of short-term cepstral parameters and neural network based detectors , 2004, IEEE Transactions on Biomedical Engineering.

[32]  Kohji Fukunaga,et al.  Introduction to Statistical Pattern Recognition-Second Edition , 1990 .