Automatic male-female voice discrimination

In this work, we have presented a novel simple scheme for classifying audio speech signals into male speech and female speech. In the context of content-based multimedia indexing gender identification based on speech signal is an important task. Some popular salient low level time-domain acoustic features which are very closely related to the physical properties of source audio signal like zero crossing rate (ZCR), short time energy (STE) along with spectral flux, a low level frequency domain feature, are used for this discrimination. RANSAC and Neural-Net has been used as classifier. The experimental result exhibits the efficiency of the proposed scheme.

[1]  Robert P. W. Duin,et al.  Support vector domain description , 1999, Pattern Recognit. Lett..

[2]  J. Nouza,et al.  Speech, Speaker and Speaker\'s Gender Identification in Automatically Processed Broadcast Stream , 2006 .

[3]  Miriam Furst,et al.  Classification of music type by a multilayer neural network , 1994 .

[4]  Liming Chen,et al.  Voice-Based Gender Identification in Multimedia Applications , 2005, Journal of Intelligent Information Systems.

[5]  Hava T. Siegelmann,et al.  Support Vector Clustering , 2002, J. Mach. Learn. Res..

[6]  Akram M. Othman,et al.  Speech Recognition Using Scaly Neural Networks , 2008 .

[7]  Stéphane H. Maes,et al.  A hierarchical approach to large-scale speaker recognition , 1999, EUROSPEECH.

[8]  John Saunders,et al.  Real-time discrimination of broadcast speech/music , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[9]  Anssi Klapuri,et al.  Musical instrument recognition using cepstral coefficients and temporal features , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[10]  J. Stephen Downie,et al.  The Scientific Evaluation of Music Information Retrieval Systems: Foundations and Future , 2004, Computer Music Journal.

[11]  Don Kimber,et al.  Acoustic Segmentation for Audio Browsers , 1997 .

[12]  S.M. Ahadi,et al.  Unsupervised speech/music classification using one-class support vector machines , 2007, 2007 6th International Conference on Information, Communications & Signal Processing.

[13]  Stephen Cox,et al.  Features and classifiers for the automatic classification of musical audio signals , 2004, ISMIR.

[14]  Zhu Liu,et al.  Multimedia content analysis-using both audio and visual clues , 2000, IEEE Signal Process. Mag..

[15]  Markus Iseli,et al.  The role of voice source measures on automatic gender classification , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16]  William M. Hartmann,et al.  Psychoacoustics: Facts and Models , 2001 .

[17]  Adrian P. Simpson,et al.  Phonetic differences between male and female speech , 2009, Lang. Linguistics Compass.

[18]  Mohammad Hossein Sedaaghi,et al.  A Comparative Study of Gender and Age Classification in Speech Signals , 2009 .

[19]  S Grossberg,et al.  A spectral network model of pitch perception. , 1995, The Journal of the Acoustical Society of America.

[20]  Gurpreet Singh,et al.  Multi Utility E-Controlled cum Voice Operated Farm Vehicle , 2010 .

[21]  Jonathan Foote,et al.  Content-based retrieval of music and audio , 1997, Other Conferences.

[22]  Emanuele Trucco,et al.  Computer and Robot Vision , 1995 .

[23]  R. Rajeshwara Rao,et al.  Glottal Excitation Feature based Gender Identification System using Ergodic HMM , 2011 .

[24]  Peter Kabal,et al.  Speech/music discrimination for multimedia applications , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[25]  Scott E. Umbaugh,et al.  Computer Imaging: Digital Image Analysis and Processing , 2005 .

[26]  Stephen Cox,et al.  Finding An Optimal Segmentation for Audio Genre Classification , 2005, ISMIR.

[27]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[28]  C.-C. Jay Kuo,et al.  Content-based classification and retrieval of audio , 1998, Optics & Photonics.

[29]  Milan Sigmund,et al.  Gender Distinction Using Short Segments of Speech Signal , 2008 .

[30]  Douglas Keislar,et al.  Content-Based Classification, Search, and Retrieval of Audio , 1996, IEEE Multim..

[31]  B. S. Manjunath,et al.  The multiRANSAC algorithm and its application to detect planar homographies , 2005, IEEE International Conference on Image Processing 2005.

[32]  Tsuhan Chen,et al.  Audio feature extraction and analysis for scene classification , 1997, Proceedings of First Signal Processing Society Workshop on Multimedia Signal Processing.

[33]  Bibhas Chandra Dhara,et al.  Speech/Music Classification Using Occurrence Pattern of ZCR and STE , 2009, 2009 Third International Symposium on Intelligent Information Technology Application.

[34]  Guodong Guo,et al.  Content-based audio classification and retrieval by support vector machines , 2003, IEEE Trans. Neural Networks.

[35]  Shih-Fu Chang,et al.  Survey on Compressed-Domain Features used in Video / Audio Indexing and Analysis , 2001 .

[36]  Ichiro Fujinaga,et al.  Automatic Genre Classification Using Large High-Level Musical Feature Sets , 2004, ISMIR.

[37]  Zhen-Yang Wu,et al.  Robust GMM Based Gender Classification using Pitch and RASTA-PLP Parameters of Speech , 2006, 2006 International Conference on Machine Learning and Cybernetics.

[38]  Parag C. Pendharkar,et al.  A field study of the impact of gender and user's technical experience on the performance of voice-activated medical tracking application , 2004, Int. J. Hum. Comput. Stud..

[39]  Francesco Camastra,et al.  A novel kernel method for clustering , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Malcolm Slaney,et al.  Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[41]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .