Federated Acoustic Modeling for Automatic Speech Recognition

Data privacy and protection is a crucial issue for any automatic speech recognition (ASR) service provider when dealing with clients. In this paper, we investigate federated acoustic modeling using data from multiple clients. A client’s data is stored on a local data server and the clients communicate only model parameters with a central server, and not their data. The communication happens infrequently to reduce the communication cost. To mitigate the non-iid issue, client adaptive federated training (CAFT) is proposed to canonicalize data across clients. The experiments are carried out on 1,150 hours of speech data from multiple domains. Hybrid LSTM acoustic models are trained via federated learning and their performance is compared to traditional centralized acoustic model training. The experimental results demonstrate the effectiveness of the proposed federated acoustic modeling strategy. We also show that CAFT can further improve the performance of the federated acoustic model.

[1]  Richard Nock,et al.  Advances and Open Problems in Federated Learning , 2021, Found. Trends Mach. Learn..

[2]  George Saon,et al.  Distributed Training of Deep Neural Network Acoustic Models for Automatic Speech Recognition: A comparison of current training strategies , 2020, IEEE Signal Processing Magazine.

[3]  Hubert Eichner,et al.  Federated Learning for Mobile Keyboard Prediction , 2018, ArXiv.

[4]  Joseph Dureau,et al.  Federated Learning for Keyword Spotting , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Mark J. F. Gales Cluster adaptive training of hidden Markov models , 2000, IEEE Trans. Speech Audio Process..

[6]  Xiaodong Cui,et al.  Distributed Deep Learning Strategies for Automatic Speech Recognition , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Niranjan A. Subrahmanya,et al.  Training Keyword Spotting Models on Non-IID Data with Federated Learning , 2020, INTERSPEECH.

[8]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[9]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[10]  Qian Xu,et al.  Federated Acoustic Model Optimization for Automatic Speech Recognition , 2020, DASFAA.

[11]  Qiang Huo,et al.  Scalable training of deep learning machines by incremental block training with intra-block parallel optimization and blockwise model-update filtering , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[13]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[14]  Guangsen Wang,et al.  A Random Gossip BMUF Process for Neural Language Modeling , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Yashesh Gaur,et al.  A Federated Approach in Training Acoustic Models , 2020, INTERSPEECH.

[16]  Fan Zhou,et al.  On the convergence properties of a K-step averaging stochastic gradient descent algorithm for nonconvex optimization , 2017, IJCAI.

[17]  Emiliano De Cristofaro,et al.  Toward Robustness and Privacy in Federated Learning: Experimenting with Local and Central Differential Privacy , 2020, ArXiv.