Pitch-based gender identification with two-stage classification

In this paper, we address the speech-based gender identification problem. Mel-Frequency Cepstral Coefficients (MFCC) of voice samples are typically used as the features for gender identification. However, MFCC-based classification incurs high complexity. This paper proposes a novel pitch-based gender identification system with a two-stage classifier to ensure accurate identification and low complexity. The first stage of the classifier identifies and labels all the speakers whose pitch clearly indicates the gender of the speaker; the complexity of this stage is very low since only threshold-based decision rule on a scalar (i.e., pitch) is used. The ambiguous voice samples from all the other speakers (which cannot be classified with high accuracy by the first stage, and can be regarded as suspicious speakers or difficult cases) are forwarded to the second-stage for finer examination; the second-stage of our classifier uses Gaussian Mixture Model to accurately isolate voice samples based on gender. Experiment results show that our system is speech language/content independent, microphone independent, and robust against noisy recording conditions. Our system is extremely accurate with probability of correct classification of 98.65%, and very efficient with about 5 s required for feature extraction and classification. Copyright © 2011 John Wiley & Sons, Ltd.

[1]  Vasif V. Nabiyev,et al.  Gender identification of the speaker using DTW method , 2009, 2009 IEEE 17th Signal Processing and Communications Applications Conference.

[2]  Hermann Ney,et al.  Formant estimation for speech recognition , 1998, IEEE Trans. Speech Audio Process..

[3]  Hajime Kobayashi,et al.  Weighted autocorrelation for pitch extraction of noisy speech , 2001, IEEE Trans. Speech Audio Process..

[4]  Michael J. Carey,et al.  Language independent gender identification , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[5]  Akira Watanabe,et al.  Formant estimation method using inverse-filter control , 2001, IEEE Trans. Speech Audio Process..

[6]  J. Tierney,et al.  A study of LPC analysis of speech in additive noise , 1980 .

[7]  Petros Maragos,et al.  Energy separation in signal modulations with application to speech analysis , 1993, IEEE Trans. Signal Process..

[8]  Ahmet M. Kondoz,et al.  Digital Speech: Coding for Low Bit Rate Communication Systems , 1995 .

[9]  M. P. Gelfer,et al.  The relative contributions of speaking fundamental frequency and formant frequencies to gender identification based on isolated vowels. , 2005, Journal of voice : official journal of the Voice Foundation.

[10]  Liming Chen,et al.  Voice-Based Gender Identification in Multimedia Applications , 2005, Journal of Intelligent Information Systems.

[11]  Petros Maragos,et al.  On amplitude and frequency demodulation using energy operators , 1993, IEEE Trans. Signal Process..

[12]  C. Espy-Wilson A phonetically based semivowel recognition system , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  C. Neti,et al.  Phone-context specific gender-dependent acoustic-models for continuous speech recognition , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[14]  Raymond N. J. Veldhuis,et al.  Extraction of vocal-tract system characteristics from speech signals , 1998, IEEE Trans. Speech Audio Process..

[15]  古井 貞煕,et al.  Digital speech processing, synthesis, and recognition , 1989 .

[16]  Wu Zhaohui,et al.  Combining MFCC and Pitch to Enhance the Performance of the Gender Recognition , 2006, 2006 8th international Conference on Signal Processing.

[17]  E. Chuang,et al.  Glottal characteristics of male speakers: acoustic correlates and comparison with female data. , 1996, The Journal of the Acoustical Society of America.

[18]  Wolfgang Hess,et al.  Pitch Determination of Speech Signals: Algorithms and Devices , 1983 .

[19]  Buket D. Barkana,et al.  Energy Estimation between Adjacent Formant Frequencies to Identify Speaker's Gender , 2008, Fifth International Conference on Information Technology: New Generations (itng 2008).

[20]  S. McCandless,et al.  An algorithm for automatic formant extraction using linear prediction spectra , 1974 .

[21]  F. Milinazzo,et al.  Formant location from LPC analysis data , 1993, IEEE Trans. Speech Audio Process..

[22]  Sridha Sridharan,et al.  Automatic gender identification under adverse conditions , 1997, EUROSPEECH.

[23]  M. Ross,et al.  Average magnitude difference function pitch extractor , 1974 .

[24]  Wolfgang Hess,et al.  Pitch Determination of Speech Signals , 1983 .

[25]  Alexandros Potamianos,et al.  Statistical analysis of amplitude modulation in speech signals using an AM-FM model , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[26]  Alex Acero,et al.  Speaker and gender normalization for continuous-density hidden Markov models , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.