Intra-cluster training strategy for deep learning with applications to language identification

In this study we address the problem of training a neural network for language identification using speech samples in the form of i-vectors. Our approach involves training a classifier and analyzing the obtained confusion matrix. We cluster the languages by simultaneously clustering the columns and the rows of the confusion matrix. The language clusters are then used to define a modified cost function for training a neural-network that focuses on distinguishing between the true language and languages within the same cluster. The results show enhanced language identification on the NIST 2015 language identification dataset.

[1]  Naftali Tishby,et al.  Unsupervised document classification using sequential information maximization , 2002, SIGIR '02.

[2]  Fang Chen,et al.  Improvements on hierarchical language identification based on automatic language clustering , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Hui Zhao,et al.  Summary of the 2015 NIST Language Recognition i-Vector Machine Learning Challenge , 2016, Odyssey.

[4]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[5]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[6]  Dorothea Wagner,et al.  Between Min Cut and Graph Bisection , 1993, MFCS.

[7]  Douglas A. Reynolds,et al.  Language Recognition via i-vectors and Dimensionality Reduction , 2011, INTERSPEECH.

[8]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[9]  Inderjit S. Dhillon,et al.  Weighted Graph Cuts without Eigenvectors A Multilevel Approach , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Douglas A. Reynolds,et al.  Deep Neural Network Approaches to Speaker and Language Recognition , 2015, IEEE Signal Processing Letters.

[11]  Jianbo Shi,et al.  A Random Walks View of Spectral Segmentation , 2001, AISTATS.

[12]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[13]  Yan Song,et al.  i-vector representation based on bottleneck features for language identification , 2013 .

[14]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Sri Harish Reddy Mallidi,et al.  Neural Network Bottleneck Features for Language Identification , 2014, Odyssey.

[16]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[17]  Rich Caruana,et al.  Meta Clustering , 2006, Sixth International Conference on Data Mining (ICDM'06).

[18]  Joaquín González-Rodríguez,et al.  Automatic language identification using deep neural networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[20]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .