Returnn: The RWTH extensible training framework for universal recurrent neural networks

In this work we release our extensible and easily configurable neural network training software. It provides a rich set of functional layers with a particular focus on efficient training of recurrent neural network topologies on multiple GPUs. The source of the software package is public and freely available for academic research purposes and can be used as a framework or as a standalone tool which supports a flexible configuration. The software allows to train state-of-the-art deep bidirectional long short-term memory (LSTM) models on both one dimensional data like speech or two dimensional data like handwritten text and was used to develop successful submission systems in several evaluation campaigns.

[1]  Hermann Ney,et al.  Fast and Robust Training of Recurrent Neural Networks for Offline Handwriting Recognition , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[2]  Mohak Shah,et al.  Comparative Study of Deep Learning Software Frameworks , 2015, 1511.06435.

[3]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[4]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[5]  Jon Barker,et al.  The third ‘CHiME’ speech separation and recognition challenge: Dataset, task and baselines , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[6]  Andrew W. Senior,et al.  Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition , 2014, ArXiv.

[7]  Hermann Ney,et al.  Handwriting Recognition with Large Multidimensional Long Short-Term Memory Recurrent Neural Networks , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[8]  Hermann Ney,et al.  RASR - The RWTH Aachen University Open Source Speech Recognition Toolkit , 2011 .

[9]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[10]  Yoshua Bengio,et al.  Blocks and Fuel: Frameworks for deep learning , 2015, ArXiv.

[11]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Alex Graves,et al.  Associative Long Short-Term Memory , 2016, ICML.

[13]  Samy Bengio,et al.  Torch: a modular machine learning software library , 2002 .

[14]  T. Munich,et al.  Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks , 2008, NIPS.

[15]  Barak A. Pearlmutter,et al.  Automatic differentiation in machine learning: a survey , 2015, J. Mach. Learn. Res..

[16]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[17]  Mohak Shah,et al.  Comparative Study of Caffe, Neon, Theano, and Torch for Deep Learning , 2015, ArXiv.

[18]  Yoshua Bengio,et al.  Gated Feedback Recurrent Neural Networks , 2015, ICML.

[19]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[20]  Yajie Miao,et al.  EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[21]  Hermann Ney,et al.  RASR/NN: The RWTH neural network toolkit for speech recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  Razvan Pascanu,et al.  Theano: new features and speed improvements , 2012, ArXiv.

[23]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[24]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[25]  Jürgen Schmidhuber,et al.  Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks , 2008, NIPS.

[26]  Tobias Grüning,et al.  Cells in Multidimensional Recurrent Neural Networks , 2016, J. Mach. Learn. Res..

[27]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[28]  Hermann Ney,et al.  A comprehensive study of deep bidirectional LSTM RNNS for acoustic modeling in speech recognition , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[29]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[30]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.