Speaker Verification via Modeling Kurtosis Using Sparse Coding

This paper proposes a new model for speaker verification by employing kurtosis statistical method based on sparse coding of human auditory system. Since only a small number of neurons in primary auditory cortex are activated in encoding acoustic stimuli and sparse independent events are used to represent the characteristics of the neurons. Each individual dictionary is learned from individual speaker samples where dictionary atoms correspond to the cortex neurons. The neuron responses possess statistical properties of acoustic signals in auditory cortex so that the activation distribution of individual speaker’s neurons is approximated as the characteristics of the speaker. Kurtosis is an efficient approach to measure the sparsity of the neuron from its activation distribution, and the vector composed of the kurtosis of every neuron is obtained as the model to characterize the speaker’s voice. The experimental results demonstrate that the kurtosis model outperforms the baseline systems and an effective identity validation function is achieved desirably.

[1]  Shengping Zhang,et al.  Action recognition based on overcomplete independent components analysis , 2014, Inf. Sci..

[2]  Shengping Zhang,et al.  Sparse coding based visual tracking: Review and experimental comparison , 2013, Pattern Recognit..

[3]  Lamei Zhang,et al.  Fully Polarimetric SAR Image Classification via Sparse Representation and Polarimetric Features , 2015, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[4]  V. Supriya,et al.  Robust Automatic Speech Recognition System: Hmm Versus Sparse , 2012, 2012 Third International Conference on Intelligent Systems Modelling and Simulation.

[5]  J. Juhar,et al.  Evaluating the modified viterbi decoder for long-term audio events monitoring task , 2012, Proceedings ELMAR-2012.

[6]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[7]  Wei Dai,et al.  Sparse coding with adaptive dictionary learning for underdetermined blind speech separation , 2013, Speech Commun..

[8]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[9]  Nicolas Courty,et al.  Multiclass feature learning for hyperspectral image classification: sparse and hierarchical solutions , 2015, ArXiv.

[10]  Emmanuel J. Candès,et al.  New multiscale transforms, minimum total variation synthesis: applications to edge-preserving image reconstruction , 2002, Signal Process..

[11]  Rohit Sinha,et al.  Robust Speaker Verification With Joint Sparse Coding Over Learned Dictionaries , 2015, IEEE Transactions on Information Forensics and Security.

[12]  Shengping Zhang,et al.  Robust visual tracking based on online learning sparse representation , 2013, Neurocomputing.

[13]  Y. C. Pati,et al.  Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition , 1993, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers.

[14]  Jian Cheng,et al.  Speaker identification based on robust sparse coding with limited data , 2012, 2012 5th International Congress on Image and Signal Processing.

[15]  Douglas A. Reynolds,et al.  Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..

[16]  Mohammed Bennamoun,et al.  Sparse Representation for Speaker Identification , 2010, 2010 20th International Conference on Pattern Recognition.

[17]  L. T. DeCarlo On the meaning and use of kurtosis. , 1997 .

[18]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[19]  S. Mallat VI – Wavelet zoom , 1999 .

[20]  Themos Stafylakis,et al.  PLDA for speaker verification with utterances of arbitrary duration , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[21]  Mingyi Hong,et al.  Combining sparse NMF with deep neural network: A new classification-based approach for speech enhancement , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  Tao Jiang,et al.  A cochlear neuron based robust feature for speaker recognition , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  Ning Xu,et al.  Speech enhancement via sparse coding with ideal binary mask , 2014, 2014 12th International Conference on Signal Processing (ICSP).

[24]  Themos Stafylakis,et al.  PLDA using Gaussian Restricted Boltzmann Machines with application to Speaker Verification , 2012, INTERSPEECH.

[25]  Michael S. Lewicki,et al.  Efficient coding of natural sounds , 2002, Nature Neuroscience.

[26]  Yonghong Yan,et al.  Speaker Verification Using Sparse Representations on Total Variability i-vectors , 2011, INTERSPEECH.

[27]  Haitao Liu,et al.  The Statistical Meaning of Kurtosis and Its New Application to Identification of Persons Based on Seismic Signals , 2008, Sensors.

[28]  Rohit Sinha,et al.  Sparse representation over learned and discriminatively learned dictionaries for speaker verification , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[29]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Patrick Kenny,et al.  Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification , 2009, INTERSPEECH.

[31]  Shrikanth S. Narayanan,et al.  Robust Voice Activity Detection Using Long-Term Signal Variability , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[32]  Haizhou Li,et al.  An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..

[33]  S. Mallat A wavelet tour of signal processing , 1998 .

[34]  Lou Boves,et al.  Sparse coding of the modulation spectrum for noise-robust automatic speech recognition , 2014, EURASIP J. Audio Speech Music. Process..