Text-Independent Speaker Identification Using the Histogram Transform Model

In this paper, we propose a novel probabilistic method for the task of text-independent speaker identification (SI). In order to capture the dynamic information during SI, we design super-mel-frequency cepstral coefficients (MFCCs) features by cascading three neighboring MFCCs frames together. These super-MFCC vectors are utilized for probabilistic model training such that the speaker’s characteristics can be sufficiently captured. The probability density function (PDF) of the aforementioned super-MFCCs features is estimated by the recently proposed histogram transform (HT) method. To recede the commonly occurred discontinuity problem in multivariate histograms computing, more training data are generated by the HT method. Using these generated data, a smooth PDF of the super-MFCCs vectors is obtained. Compared with the typical PDF estimation methods, such as Gaussian mixture model, promising improvements have been obtained by employing the HT-based model in SI.

[1]  Arne Leijon,et al.  Vector quantization of LSF parameters with a mixture of dirichlet distributions , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  P. KaewTrakulPong,et al.  An Improved Adaptive Background Mixture Model for Real-time Tracking with Shadow Detection , 2002 .

[3]  Markus Flierl,et al.  Bayesian estimation of Dirichlet mixture model with variational inference , 2014, Pattern Recognit..

[4]  H. Jeffreys An invariant form for the prior probability in estimation problems , 1946, Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences.

[5]  Jalil Taghia,et al.  On von-mises fisher mixture model in text-independent speaker identification , 2013, INTERSPEECH.

[6]  Weisi Lin,et al.  A Universal Framework for Salient Object Detection , 2016, IEEE Transactions on Multimedia.

[7]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[8]  Jiwen Lu,et al.  Summarizing surveillance videos with local-patch-learning-based abnormality detection, blob sequence optimization, and type-based synopsis , 2015, Neurocomputing.

[9]  F. Mezzadri How to generate random matrices from the classical compact groups , 2006, math-ph/0609050.

[10]  Deyu Meng,et al.  Robust Low-Rank Matrix Factorization Under General Mixture Noise Distributions , 2016, IEEE Transactions on Image Processing.

[11]  Wei Zhang,et al.  Text-independent speaker recognition by combining speaker-specific GMM with speaker adapted syllable-based HMM , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  U. Bhattacharjee,et al.  Language identification system using MFCC and prosodic features , 2013, 2013 International Conference on Intelligent Systems and Signal Processing (ISSP).

[13]  Arne Leijon,et al.  Bayesian Estimation of Beta Mixture Models with Variational Inference , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Jalil Taghia,et al.  Bayesian Estimation of the von-Mises Fisher Mixture Model with Variational Inference , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Antonio Nucci,et al.  Fuzzy-Clustering-Based Decision Tree Approach for Large Population Speaker Identification , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Jun Guo,et al.  Histogram transform model using MFCC features for text-independent speaker identification , 2014, 2014 48th Asilomar Conference on Signals, Systems and Computers.

[17]  Zbancioc Marius Dan,et al.  A study about MFCC relevance in emotion classification for SRoL database , 2013, 2013 4th International Symposium on Electrical and Electronics Engineering (ISEEE).

[18]  Arne Leijon,et al.  Super-Dirichlet Mixture Models Using Differential Line Spectral Frequencies for Text-Independent Speaker Identification , 2011, INTERSPEECH.

[19]  Stephen G. Walker,et al.  Sampling the Dirichlet Mixture Model with Slices , 2006, Commun. Stat. Simul. Comput..

[20]  Jianxin Wu,et al.  A Tube-and-Droplet-Based Approach for Representing and Analyzing Motion Trajectories , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Bhiksha Raj,et al.  Privacy-Preserving Speaker Verification and Identification Using Gaussian Mixture Models , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Hynek Hermansky,et al.  Developing a speaker identification system for the DARPA RATS project , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[23]  Deyu Meng,et al.  Robust Matrix Factorization with Unknown Noise , 2013, 2013 IEEE International Conference on Computer Vision.

[24]  Longbiao Wang,et al.  Speaker Identification and Verification by Combining MFCC and Phase Information , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[25]  Z. Zivkovic Improved adaptive Gaussian mixture model for background subtraction , 2004, ICPR 2004.

[26]  Jun Guo,et al.  Dirichlet mixture modeling to estimate an empirical lower bound for LSF quantization , 2014, Signal Process..

[27]  Goutam Saha,et al.  Design, analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition , 2012, Speech Commun..

[28]  Hui Jiang,et al.  Combining information from multi-stream features using deep neural network in speech recognition , 2012, 2012 IEEE 11th International Conference on Signal Processing.

[29]  J. R. Koehler,et al.  Modern Applied Statistics with S-Plus. , 1996 .

[30]  Douglas A. Reynolds,et al.  Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..

[31]  Jenq-Neng Hwang,et al.  Nonparametric multivariate density estimation: a comparative study , 1994, IEEE Trans. Signal Process..

[32]  Tao Mei,et al.  A Diffusion and Clustering-Based Approach for Finding Coherent Motions and Understanding Crowd Scenes , 2016, IEEE Transactions on Image Processing.

[33]  Ezequiel López-Rubio,et al.  A Histogram Transform for ProbabilityDensity Function Estimation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Nikos Fakotakis,et al.  Comparative Evaluation of Various MFCC Implementations on the Speaker Verification Task , 2007 .