Fuzzy Restricted Boltzmann Machine based Probabilistic Linear Discriminant Analysis for Noise-Robust Text-Dependent Speaker Verification on Short Utterances

In the i-vector-based speaker verification system, it is important to compensate for session variability on the ivector to improve speaker verification performance. Linear discriminant analysis (LDA) is widely used to compensate for session variability by reducing the dimensionality of the i-vector. Restricted Boltzmann machine (RBM)-based probabilistic linear discriminant analysis (PLDA) has been proposed to improve the session variability compensation ability of LDA. It can be viewed as a probabilistic approach of LDA using RBM. However, since the RBM does not consider uncertainties in obtaining the parameters, the representation capability of RBM-based PLDA is limited. For instance, many real-world speaker verifications must consider noisy environments, which make the compensated session variability uncertain. The fuzzy restricted Boltzmann machine (FRBM) was proposed to improve the capability of the RBM. It showed a more robust performance than that of the RBM. Hence, in this paper, we propose FRBM-based PLDA to improve the representation capability of RBM-PLDA by replacing all the parameters of RBM-PLDA with fuzzy numbers. An evaluation with Part 1 of Robust Speaker Recognition (RSR) 2015 was conducted. In the experimental results, the proposed algorithm shows a better compensation for phonetic variability that exists in short utterances, and a robust speaker verification performance in diverse noisy environments where phonetic and noise variabilities are challenging issues in real-world applications.

[1]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[2]  A. Ganapathiraju,et al.  LINEAR DISCRIMINANT ANALYSIS - A BRIEF TUTORIAL , 1995 .

[3]  George J. Klir,et al.  Fuzzy sets and fuzzy logic - theory and applications , 1995 .

[4]  R. Häggkvist,et al.  Bipartite graphs and their applications , 1998 .

[5]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[6]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[7]  Patrick Kenny,et al.  Eigenvoice modeling with sparse training data , 2005, IEEE Transactions on Speech and Audio Processing.

[8]  Douglas E. Sturim,et al.  Support vector machines using GMM supervectors for speaker verification , 2006, IEEE Signal Processing Letters.

[9]  P. Melin,et al.  Voice Recognition with Neural Networks, Type-2 Fuzzy Logic and Genetic Algorithms , 2006, Eng. Lett..

[10]  Sergey Ioffe,et al.  Probabilistic Linear Discriminant Analysis , 2006, ECCV.

[11]  James H. Elder,et al.  Probabilistic Linear Discriminant Analysis for Inferences About Identity , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[12]  Patrick Kenny,et al.  Joint Factor Analysis Versus Eigenchannels in Speaker Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Patrick Kenny,et al.  Speaker and Session Variability in GMM-Based Speaker Verification , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Jacob Benesty,et al.  Springer handbook of speech processing , 2007, Springer Handbooks.

[15]  Najim Dehak,et al.  Discriminative and generative approaches for long- and short-term speaker characteristics modeling: application to speaker verification , 2009 .

[16]  Niko Brümmer,et al.  The speaker partitioning problem , 2010, Odyssey.

[17]  Patrick Kenny,et al.  Bayesian Speaker Verification with Heavy-Tailed Priors , 2010, Odyssey.

[18]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[19]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  Daniel Garcia-Romero,et al.  Analysis of i-vector Length Normalization in Speaker Recognition Systems , 2011, INTERSPEECH.

[21]  Sridha Sridharan,et al.  i-vector Based Speaker Recognition on Short Utterances , 2011, INTERSPEECH.

[22]  Themos Stafylakis,et al.  PLDA using Gaussian Restricted Boltzmann Machines with application to Speaker Verification , 2012, INTERSPEECH.

[23]  Haizhou Li,et al.  I-vectors in the context of phonetically-constrained short utterances for speaker verification , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[25]  Themos Stafylakis,et al.  Text-dependent speaker recognition using PLDA with uncertainty propagation , 2013, INTERSPEECH.

[26]  Hironobu Fujiyoshi,et al.  To Be Bernoulli or to Be Gaussian, for a Restricted Boltzmann Machine , 2014, 2014 22nd International Conference on Pattern Recognition.

[27]  Bin Ma,et al.  Text-dependent speaker verification: Classifiers, databases and RSR2015 , 2014, Speech Commun..

[28]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[29]  C. L. Philip Chen,et al.  Fuzzy Restricted Boltzmann Machine for the Enhancement of Deep Learning , 2015, IEEE Transactions on Fuzzy Systems.

[30]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[31]  Christoph Busch,et al.  Towards PLDA-RBM based speaker recognition in mobile environment: Designing stacked/deep PLDA-RBM systems , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[32]  Kun She,et al.  Speaker Recognition Using Wavelet Packet Entropy, I-Vector, and Cosine Distance Scoring , 2017, J. Electr. Comput. Eng..

[33]  Xavier Serra,et al.  Freesound Datasets: A Platform for the Creation of Open Audio Datasets , 2017, ISMIR.

[34]  Najiya Abdulrahiman,et al.  Text-dependent speaker recognition , 2018, Odyssey.

[35]  C. L. Philip Chen,et al.  A Fuzzy Restricted Boltzmann Machine: Novel Learning Algorithms Based on the Crisp Possibilistic Mean Value of Fuzzy Numbers , 2018, IEEE Transactions on Fuzzy Systems.

[36]  Hiroyuki Mori,et al.  A Gaussian-Gaussian-Restricted-Boltzmann-Machine-based Deep Neural Network Technique for Photovoltaic System Generation Forecasting , 2019, IFAC-PapersOnLine.