On using voice source measures in automatic gender classification of children's speech

Acoustic characteristics of speech signals differ with gender due to physiological differences of the glottis and the vocal tract. Previous research [1] showed that adding the voice-source related measures H ∗ 1 − H ∗ 2 and H ∗ 1 − A ∗ improved gender classification accuracy compared to using only the fundamental frequency (F0) and formant frequencies. H ∗ i refers to the i–th source spectral harmonic magnitude, and A ∗ refers to the magnitude of the source spectrum at the i–th formant. In this paper, three other voice source related measures: CPP, HNR and H ∗ 2 − H ∗ 4 are used in gender classification of children’s voices. CPP refers to the Cepstral Peak Prominence [2], HNR refers to the harmonic-to-noise ratio [3], and H ∗ 2 − H ∗ 4 refers to the difference between the 2nd and the 4th source spectral harmonic magnitudes. Results show that using these three features improves gender classification accuracy compared with [1]. Index Terms: gender classification, gender identification, voice source

[1]  D. Ashmead,et al.  The acoustic bases for gender identification from children's voices. , 2001, The Journal of the Acoustical Society of America.

[2]  J. Perkell,et al.  Comparisons among aerodynamic, electroglottographic, and acoustic spectral measures of female voice. , 1995, Journal of speech and hearing research.

[3]  D. Childers,et al.  Gender recognition from speech. Part I: Coarse analysis. , 1991, The Journal of the Acoustical Society of America.

[4]  Sungbok Lee,et al.  Creation of two children's speech databases , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[5]  P A Busby,et al.  Formant frequency values of vowels produced by preadolescent boys and girls. , 1995, The Journal of the Acoustical Society of America.

[6]  J. Hillenbrand,et al.  Acoustic correlates of breathy vocal quality. , 1994, Journal of speech and hearing research.

[7]  Jody Kreiman,et al.  Measures of the glottal source spectrum. , 2007, Journal of speech, language, and hearing research : JSLHR.

[8]  Patricia A. Keating,et al.  Voicesauce: A Program for Voice Analysis , 2009, ICPhS.

[9]  Markus Iseli,et al.  The role of voice source measures on automatic gender classification , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Abeer Alwan,et al.  Age, sex, and vowel dependencies of acoustic measures related to the voice source. , 2007, The Journal of the Acoustical Society of America.

[11]  Shrikanth S. Narayanan,et al.  Acoustics of children's speech: developmental changes of temporal and spectral parameters. , 1999, The Journal of the Acoustical Society of America.

[12]  Guus de Krom,et al.  A Cepstrum-Based Technique for Determining a Harmonics-to-Noise Ratio in Speech Signals , 1993 .

[13]  D. Klatt,et al.  Analysis, synthesis, and perception of voice quality variations among female and male talkers. , 1990, The Journal of the Acoustical Society of America.

[14]  G. de Krom A cepstrum-based technique for determining a harmonics-to-noise ratio in speech signals. , 1993, Journal of speech and hearing research.

[15]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[16]  Roy D. Patterson,et al.  An instantaneous-frequency-based pitch extraction method for high-quality speech transformation: revised TEMPO in the STRAIGHT-suite , 1998, ICSLP.

[17]  Abeer Alwan,et al.  An improved correction formula for the estimation of harmonic magnitudes and its application to open quotient estimation , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[18]  Jody Kreiman,et al.  A spectral‐slope compensated scale for measuring perception of vocal aperiodicity. , 2010 .