Histogram transform model using MFCC features for text-independent speaker identification

A novel text-independent speaker identification (SI) method is proposed in this paper. This method uses the mel-frequency cepstral coefficients (MFCCs) and the dynamic information among adjacent frames as feature set to capture the speaker's characteristics. In order to utilize dynamic information, we design super MFCCs feature by cascading 3 neighboring MFCCs frames together. The probability density function (PDF) of these super MFCCs features is estimated by the recently proposed histogram transform (HT) method, which generated more training data by random transforms to realize the histogram PDF estimation and recede the discontinuity problem of the common multivariate histograms computing. Compared to the conventional PDF estimation method, such as Gaussian mixture model, the HT model shows promising improvement in a SI task.

[1]  Jalil Taghia,et al.  Bayesian Estimation of the von-Mises Fisher Mixture Model with Variational Inference , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Antonio Nucci,et al.  Fuzzy-Clustering-Based Decision Tree Approach for Large Population Speaker Identification , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Zbancioc Marius Dan,et al.  A study about MFCC relevance in emotion classification for SRoL database , 2013, 2013 4th International Symposium on Electrical and Electronics Engineering (ISEEE).

[4]  Douglas A. Reynolds,et al.  Experimental evaluation of features for robust speaker identification , 1994, IEEE Trans. Speech Audio Process..

[5]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[6]  Jim Euchner Design , 2014, Catalysis from A to Z.

[7]  Ezequiel López-Rubio,et al.  A Histogram Transform for ProbabilityDensity Function Estimation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  J. Simonoff Multivariate Density Estimation , 1996 .

[9]  Goutam Saha,et al.  Design, analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition , 2012, Speech Commun..

[10]  Jalil Taghia,et al.  On von-mises fisher mixture model in text-independent speaker identification , 2013, INTERSPEECH.

[11]  Biing-Hwang Juang,et al.  Line spectrum pair (LSP) and speech data compression , 1984, ICASSP.

[12]  Arne Leijon,et al.  Vector quantization of LSF parameters with a mixture of dirichlet distributions , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Wei Zhang,et al.  Text-independent speaker recognition by combining speaker-specific GMM with speaker adapted syllable-based HMM , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Arne Leijon,et al.  Modelling speech line spectral frequencies with dirichlet mixture models , 2010, INTERSPEECH.

[15]  H. Jeffreys An invariant form for the prior probability in estimation problems , 1946, Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences.

[16]  Longbiao Wang,et al.  Speaker Identification and Verification by Combining MFCC and Phase Information , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Nikos Fakotakis,et al.  Comparative Evaluation of Various MFCC Implementations on the Speaker Verification Task , 2007 .

[18]  Arne Leijon,et al.  Super-Dirichlet Mixture Models Using Differential Line Spectral Frequencies for Text-Independent Speaker Identification , 2011, INTERSPEECH.

[19]  Robert A. Lordo,et al.  Nonparametric and Semiparametric Models , 2005, Technometrics.

[20]  Douglas A. Reynolds,et al.  Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..

[21]  Jenq-Neng Hwang,et al.  Nonparametric multivariate density estimation: a comparative study , 1994, IEEE Trans. Signal Process..

[22]  U. Bhattacharjee,et al.  Language identification system using MFCC and prosodic features , 2013, 2013 International Conference on Intelligent Systems and Signal Processing (ISSP).

[23]  Arne Leijon,et al.  Bayesian Estimation of Beta Mixture Models with Variational Inference , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.