Submodular Rank Aggregation on Score-Based Permutations for Distributed Automatic Speech Recognition

Distributed automatic speech recognition (ASR) requires to aggregate outputs of distributed deep neural network (DNN)-based models. This work studies the use of submodular functions to design a rank aggregation on score-based permutations, which can be used for distributed ASR systems in both supervised and unsupervised modes. Specifically, we compose an aggregation rank function based on the Lovasz Bregman divergence for setting up linear structured convex and nested structured concave functions. The algorithm is based on stochastic gradient descent (SGD) and can obtain well-trained aggregation models. Our experiments on the distributed ASR system show that the submodular rank aggregation can obtain higher speech recognition accuracy than traditional aggregation methods like Adaboost. Code is available online 1.

[1]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[2]  Jun Qi,et al.  Unsupervised Submodular Rank Aggregation on Score-based Permutations , 2017, ArXiv.

[3]  Xu Liu,et al.  Distributed Submodular Maximization for Large Vocabulary Continuous Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Dong Wang,et al.  Bottleneck features based on gammatone frequency cepstral coefficients , 2013, INTERSPEECH.

[5]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[6]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[7]  Rishabh K. Iyer,et al.  The Lovasz-Bregman Divergence and connections to rank aggregation, clustering, and web ranking , 2013, UAI.

[8]  Jun Qi,et al.  Robust submodular data partitioning for distributed speech recognition , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Dong Wang,et al.  Subspace models for bottleneck features , 2013, INTERSPEECH.

[10]  S. Eddy Hidden Markov models. , 1996, Current opinion in structural biology.

[11]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[12]  László Lovász,et al.  Submodular functions and convexity , 1982, ISMP.

[13]  S. J. Young,et al.  Tree-based state tying for high accuracy acoustic modelling , 1994 .

[14]  Jun Du,et al.  A Theory on Deep Neural Network Based Vector-to-Vector Regression With an Illustration of Its Expressive Power in Speech Enhancement , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[15]  Jeff A. Bilmes,et al.  Submodular Bregman Divergences with Applications , 2012, NIPS 2012.

[16]  Guangsen Wang,et al.  Regression-Based Context-Dependent Modeling of Deep Neural Networks for Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[17]  Satoru Fujishige,et al.  Submodular functions and optimization , 1991 .

[18]  Kadri Hacioglu,et al.  A distributed architecture for robust automatic speech recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[19]  Jack Edmonds,et al.  Submodular Functions, Matroids, and Certain Polyhedra , 2001, Combinatorial Optimization.