Speaker Recognition using Deep Belief Networks

Short time spectral features such as mel frequency cepstral coefficients(MFCCs) have been previously deployed in state of the art speaker recognition systems, however lesser heed has been paid to short term spectral features that can be learned by generative learning models from speech signals. Higher dimensional encoders such as deep belief networks (DBNs) could improve performance in speaker recognition tasks by better modelling the statistical structure of sound waves. In this paper, we use short term spectral features learnt from the DBN augmented with MFCC features to perform the task of speaker recognition. Using our features, we achieved a recognition accuracy of 0.95 as compared to 0.90 when using standalone MFCC features on the ELSDSR dataset.

[1]  Douglas Eck,et al.  Learning Features from Music Audio with Deep Belief Networks , 2010, ISMIR.

[2]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[3]  Yun Lei,et al.  A novel scheme for speaker recognition using a phonetically-aware deep neural network , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[5]  Pavel Matejka,et al.  Hierarchical Structures of Neural Networks for Phoneme Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[6]  Geoffrey E. Hinton,et al.  Exponential Family Harmoniums with an Application to Information Retrieval , 2004, NIPS.

[7]  Geoffrey E. Hinton Deep belief networks , 2009, Scholarpedia.

[8]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[9]  Michael S. Lewicki,et al.  Efficient auditory coding , 2006, Nature.

[10]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[11]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[12]  Haizhou Li,et al.  An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..

[13]  Radford M. Neal Connectionist Learning of Belief Networks , 1992, Artif. Intell..

[14]  Lars Kai Hansen,et al.  A New Database for Speaker Recognition , 2005 .

[15]  Douglas A. Reynolds Gaussian Mixture Models , 2009, Encyclopedia of Biometrics.

[16]  Honglak Lee,et al.  Unsupervised feature learning for audio classification using convolutional deep belief networks , 2009, NIPS.

[17]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .