A Novel Distributed Duration-Aware LSTM for Large Scale Sequential Data Analysis

Long short-term memory (LSTM) is an important model for sequential data processing. However, large amounts of matrix computations in LSTM unit seriously aggravate the training when model grows larger and deeper as well as more data become available. In this work, we propose an efficient distributed duration-aware LSTM(D-LSTM) for large scale sequential data analysis. We improve LSTM’s training performance from two aspects. First, the duration of sequence item is explored in order to design a computationally efficient cell, called duration-aware LSTM(D-LSTM) unit. With an additional mask gate, the D-LSTM cell is able to perceive the duration of sequence item and adopt an adaptive memory update accordingly. Secondly, on the basis of D-LSTM unit, a novel distributed training algorithm is proposed, where D-LSTM network is divided logically and multiple distributed neurons are introduced to perform the easier and concurrent linear calculations in parallel. Different from the physical division in model parallelism, the logical split based on hidden neurons can greatly reduce the communication overhead which is a major bottleneck in distributed training. We evaluate the effectiveness of the proposed method on two video datasets. The experimental results shown our distributed D-LSTM greatly reduces the training time and can improve the training efficiency for large scale sequence analysis.

[1]  Geoffrey Zweig,et al.  Accelerating recurrent neural network training via two stage classes and parallelization , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[2]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[3]  Olga Radyvonenko,et al.  Accelerating recurrent neural network training using sequence bucketing and multi-GPU data parallelization , 2016, 2016 IEEE First International Conference on Data Stream Mining & Processing (DSMP).

[4]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[5]  Ali Farhadi,et al.  Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding , 2016, ECCV.

[6]  Janis Keuper,et al.  Distributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability , 2016, 2016 2nd Workshop on Machine Learning in HPC Environments (MLHPC).

[7]  Yansong Tang,et al.  COIN: A Large-Scale Dataset for Comprehensive Instructional Video Analysis , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Yongzhao Zhan,et al.  ALSTM: Adaptive LSTM for Durative Sequential Data , 2018, 2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI).

[9]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[10]  Dipo Theophilus Akomolafe,et al.  Comparative study of biological and artificial neural networks , 2013 .

[11]  Kurt Keutzer,et al.  Integrated Model, Batch, and Domain Parallelism in Training Neural Networks , 2017, SPAA.

[12]  Yaoliang Yu,et al.  Petuum: A New Platform for Distributed Machine Learning on Big Data , 2015, IEEE Trans. Big Data.

[13]  Alexander J. Smola,et al.  Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.

[14]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  Pradeep Dubey,et al.  BlackOut: Speeding up Recurrent Neural Network Language Models With Very Large Vocabularies , 2015, ICLR.