Automatic detection of hypernasal speech of children with cleft lip and palate from spanish vowels and words using classical measures and nonlinear analysis

This paper presents a system for the automatic detection of hypernasal speech signals based on the combination of two different characterization approaches applied to the five Spanish vowels and two selected words. First one is based on classical features such as pitch period perturbations, noise measures, and Mel-Frequency Cepstral Coefficients (MFCC). Second is based on the non-linear dynamics (NLD) analysis. The most relevant features are selected and sorted according to two techniques: principal components analysis (PCA), and sequential floating feature selection (SFFS). The decision about whether a voice record is hypernasal or healthy is taken using a soft margin - support vector machine (SM-SVM). The experiments are carried out using recordings of the five Spanish vowels and the words /coco/ and /gato/, considering three different set of features: (1) the classical approach, (2) the NLD analysis, and (3) the combination of the classical and NLD measures. In general, the accuracy rates are higher and more stable when the classical and NLD features are combined into the same feature space, indicating that the NLD analysis is a complement for the classical approach.

[1]  Imre M. Jánosi,et al.  Book Review: "Nonlinear Time Series Analysis, 2nd Edition" by Holger Kantz and Thomas Schreiber , 2004 .

[2]  Germán Castellanos-Domínguez,et al.  An improved method for voice pathology detection by means of a HMM-based feature space transformation , 2010, Pattern Recognit..

[3]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[4]  Mingzhou Ding,et al.  Estimating correlation dimension from a chaotic time series: when does plateau onset occur? , 1993 .

[5]  Schuster,et al.  Easily calculable measure for the complexity of spatiotemporal patterns. , 1987, Physical review. A, General physics.

[6]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[7]  Guo-She Lee,et al.  Evaluation of Hypernasality in Vowels Using Voice Low Tone to High Tone Ratio , 2009, The Cleft palate-craniofacial journal : official publication of the American Cleft Palate-Craniofacial Association.

[8]  Miguel Angel Ferrer-Ballester,et al.  Characterization of Healthy and Pathological Voice Through Measures Based on Nonlinear Dynamics , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Hans Werner Strube,et al.  Glottal-to-Noise Excitation Ratio - a New Measure for Describing Pathological Voices , 1997 .

[10]  Guo-She Lee,et al.  Voice Low Tone to High Tone Ratio, Nasalance, and Nasality Ratings in Connected Speech of Native Mandarin Speakers: A Pilot Study , 2012, The Cleft palate-craniofacial journal : official publication of the American Cleft Palate-Craniofacial Association.

[11]  A. Kummer,et al.  Evaluation and Treatment of Resonance Disorders , 1996 .

[12]  Pedro Gómez Vilda,et al.  Methodological issues in the development of automatic systems for voice pathology detection , 2006, Biomed. Signal Process. Control..

[13]  John H. L. Hansen,et al.  A comparative study of traditional and newly proposed features for recognition of speech under stress , 2000, IEEE Trans. Speech Audio Process..

[14]  T. Baer,et al.  Harmonics-to-noise ratio as an index of the degree of hoarseness. , 1982, The Journal of the Acoustical Society of America.

[15]  Josef Kittler,et al.  Floating search methods for feature selection with nonmonotonic criterion functions , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[16]  Niels Wessel,et al.  Correlation dimension analysis of heart rate variability in patients with dilated cardiomyopathy , 2005, Comput. Methods Programs Biomed..

[17]  I. Jolliffe Principal Component Analysis , 2002 .

[18]  H. Kasuya,et al.  Normalized noise energy as an acoustic measure to evaluate pathologic voice. , 1986, The Journal of the Acoustical Society of America.

[19]  B Boyanov,et al.  Acoustic analysis of pathological voices. A voice analysis system for the screening of laryngeal diseases. , 1997, IEEE engineering in medicine and biology magazine : the quarterly magazine of the Engineering in Medicine & Biology Society.

[20]  Haydée Fiszbein Wertzner,et al.  Análise da freqüência fundamental, jitter, shimmer e intensidade vocal em crianças com transtorno fonológico , 2005 .

[21]  F. Takens Detecting strange attractors in turbulence , 1981 .

[22]  Germán Castellanos-Domínguez,et al.  Automatic Detection of Pathological Voices Using Complexity Measures, Noise Parameters, and Mel-Cepstral Coefficients , 2011, IEEE Transactions on Biomedical Engineering.

[23]  Jack J. Jiang,et al.  Nonlinear dynamic analysis of disordered voice: the relationship between the correlation dimension (D2) and pre-/post-treatment change in perceived dysphonia severity. , 2010, Journal of voice : official journal of the Voice Foundation.

[24]  M. Ramasubba Reddy,et al.  Acoustic Analysis and Detection of Hypernasality Using a Group Delay Function , 2007, IEEE Transactions on Biomedical Engineering.

[25]  Pedro Gómez-Vilda,et al.  The effectiveness of the glottal to noise excitation ratio for the screening of voice disorders. , 2010, Journal of voice : official journal of the Voice Foundation.

[26]  Jing Zhang,et al.  Automatic Evaluation of Hypernasality Based on a Cleft Palate Speech Database , 2015, Journal of Medical Systems.

[27]  Jesús Francisco Vargas-Bonilla,et al.  Automatic Selection of Acoustic and Non-Linear Dynamic Features in Voice Signals for Hypernasality Detection , 2011, INTERSPEECH.

[28]  A. Giovanni,et al.  Nonlinear behavior of vocal fold vibration: the role of coupling between the vocal folds. , 1999, Journal of voice : official journal of the Voice Foundation.

[29]  Karen J. Golding-Kushner,et al.  Therapy Techniques for Cleft Palate Speech and Related Disorders , 2000 .

[30]  P. Grassberger,et al.  Measuring the Strangeness of Strange Attractors , 1983 .

[31]  Elmar Nöth,et al.  Automatic evaluation of characteristic speech disorders in children with cleft lip and palate , 2008, INTERSPEECH.

[32]  H. Wertzner,et al.  Analysis of fundamental frequency, jitter, shimmer and vocal intensity in children with phonological disorders , 2005, Brazilian journal of otorhinolaryngology.

[33]  G. de Krom A cepstrum-based technique for determining a harmonics-to-noise ratio in speech signals. , 1993, Journal of speech and hearing research.

[34]  V. I. Oseledec A multiplicative ergodic theorem: Lyapunov characteristic num-bers for dynamical systems , 1968 .

[35]  D. Kuehn,et al.  Speech and Language Issues in the Cleft Palate Population: The State of the Art , 2000 .

[36]  Peter J. Murphy,et al.  Cepstrum-Based Harmonics-to-Noise Ratio Measurement in Voiced Speech , 2004, Summer School on Neural Networks.

[37]  Jack J Jiang,et al.  Chaos in voice, from modeling to measurement. , 2006, Journal of voice : official journal of the Voice Foundation.

[38]  Pedro Gómez Vilda,et al.  Dimensionality Reduction of a Pathological Voice Quality Assessment System Based on Gaussian Mixture Models and Short-Term Cepstral Parameters , 2006, IEEE Transactions on Biomedical Engineering.

[39]  Katy Hufnagle,et al.  Therapy techniques for cleft palate speech and related disorders. , 2004, The Cleft palate-craniofacial journal : official publication of the American Cleft Palate-Craniofacial Association.

[40]  A. Kummer,et al.  Cleft Palate and Craniofacial Anomalies: Effects on Speech and Resonance , 2007 .

[41]  Kumara Shama,et al.  Study of Harmonics-to-Noise Ratio and Critical-Band Energy Spectrum of Speech as Acoustic Indicators of Laryngeal and Voice Pathology , 2007, EURASIP J. Adv. Signal Process..

[42]  M. Rosenstein,et al.  A practical method for calculating largest Lyapunov exponents from small data sets , 1993 .