Classification of Voice Aging Using Parameters Extracted from the Glottal Signal

Classification of voice aging has many applications in health and geriatrics. This work focuses on finding the most significant parameters to identify voice aging. This work proposes to choose the most significant parameters extracted of the glottal signal to identify the voice aging process of men and women using the wrapper approach combining a genetic algorithm (as a search algorithm) with a neural network (as an induction algorithm). The chosen parameters will be used as entries in a neural network to classify male and female Brazilian speakers in three different age groups, which will be called young (from 15 to 30 years old), adult (from 31 to 60 years old) and senior (from 61 to 90 years old). The voice database used for this work was composed by one hundred twenty Brazilian people (male and female) of different ages. In this work we use the largest basis for classification of age compared with other similar works, and its rate of classification is superior to other studies reaching 91.6% in males and 83.33% in women.

[1]  Christian A. Müller,et al.  Exploiting speech for recognizing elderly users to respond to their special needs , 2003, INTERSPEECH.

[2]  Paavo Alku,et al.  Glottal wave analysis with Pitch Synchronous Iterative Adaptive Inverse Filtering , 1991, Speech Commun..

[3]  Hannu Pulakka Analysis of human voice production using inverse filtering, high-speed imaging, and electroglottography , 2005 .

[4]  Ivani Rosa dos Santos Análise acústica da voz de indivíduos na terceira idade , 2005 .

[5]  Mireia Farrús,et al.  Using jitter and shimmer in speaker verification , 2009 .

[6]  M. Hariharan,et al.  Identification of vocal fold pathology based on Mel Frequency Band Energy Coefficients and singular value decomposition , 2009, 2009 IEEE International Conference on Signal and Image Processing Applications.

[7]  D. E. Goldberg,et al.  Genetic Algorithms in Search, Optimization & Machine Learning , 1989 .

[8]  M. Vieira Automated measures of dysphonias and the phonatory effects of asymmetries in the posterior larynx , 1997 .

[9]  Paavo Alku,et al.  Amplitude domain quotient for characterization of the glottal volume velocity waveform estimated by inverse filtering , 1996, Speech Commun..

[10]  I. V. Verdonck‐de Leeuw,et al.  Vocal aging and the impact on daily life: a longitudinal study. , 2004, Journal of voice : official journal of the Voice Foundation.

[11]  Lukás Burget,et al.  Maximum Likelihood and Maximum Mutual Information Training in Gender and Age Recognition System , 2007, TSD.

[12]  J. Nazuno Haykin, Simon. Neural networks: A comprehensive foundation, Prentice Hall, Inc. Segunda Edición, 1999 , 2000 .

[13]  Paavo Alku,et al.  HMM-Based Speech Synthesis Utilizing Glottal Inverse Filtering , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.

[15]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[16]  Edie R. Hapner,et al.  Voice therapy improves quality of life in age-related dysphonia: a case-control study. , 2008, Journal of voice : official journal of the Voice Foundation.

[17]  Mohammad Hossein Sedaaghi,et al.  A Comparative Study of Gender and Age Classification in Speech Signals , 2009 .

[18]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[19]  M.M. Homayounpour,et al.  Speaker age interval and sex identification based on Jitters, Shimmers and Mean MFCC using supervised and unsupervised discriminative classification methods , 2006, 2006 8th international Conference on Signal Processing.

[20]  Christian Soize,et al.  Probabilistic modeling of a nonlinear dynamical system used for producing voice , 2009 .

[21]  P. Alku,et al.  Physical variations related to stress and emotional state: A preliminary study. , 1996 .

[22]  Christian Soize,et al.  Modeling random uncertainties in voice production using a parametric approach , 2008 .

[23]  I R Titze,et al.  Vocal intensity in speakers and singers. , 1991, The Journal of the Acoustical Society of America.

[24]  Ariel Salomon,et al.  Use of temporal information: detection of periodicity, aperiodicity, and pitch in speech , 2005, IEEE Transactions on Speech and Audio Processing.

[25]  D G Childers,et al.  Vocal quality factors: analysis, synthesis, and perception. , 1991, The Journal of the Acoustical Society of America.

[26]  P. Alku,et al.  Normalized amplitude quotient for parametrization of the glottal flow. , 2002, The Journal of the Acoustical Society of America.

[27]  Thomas C. Walters,et al.  Discrimination of speaker sex and size when glottal-pulse rate and vocal-tract length are controlled. , 2007, The Journal of the Acoustical Society of America.

[28]  M. Airas METHODS AND STUDIES OF LARYNGEAL VOICE QUALITY ANALYSIS IN SPEECH PRODUCTION , 2008 .

[29]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.