A context-sensitive-chunk BPTT approach to training deep LSTM/BLSTM recurrent neural networks for offline handwriting recognition

We propose a context-sensitive-chunk based back-propagation through time (BPTT) approach to training deep (bidirectional) long short-term memory ((B)LSTM) recurrent neural networks (RNN) that splits each training sequence into chunks with appended contextual observations for character modeling of offline handwriting recognition. Using short context-sensitive chunks in both training and recognition brings following benefits: (1) the learned (B)LSTM will model mainly local character image dependency and the effect of long-range language model information reflected in training data is reduced; (2) mini-batch based training on GPU can be made more efficient; (3) low-latency BLSTM-based handwriting recognition is made possible by incurring only a delay of a short chunk rather than a whole sentence. Our approach is evaluated on IAM offline handwriting recognition benchmark task and performs better than the previous state-of-the-art BPTT-based approaches.

[1]  Jing Peng,et al.  An Efficient Gradient-Based Algorithm for On-Line Training of Recurrent Network Trajectories , 1990, Neural Computation.

[2]  Hervé Bourlard,et al.  Connectionist Speech Recognition: A Hybrid Approach , 1993 .

[3]  Ronald J. Williams,et al.  Gradient-based learning algorithms for recurrent networks and their computational complexity , 1995 .

[4]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[5]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[6]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[7]  Jürgen Schmidhuber,et al.  Learning Precise Timing with LSTM Recurrent Networks , 2003, J. Mach. Learn. Res..

[8]  Horst Bunke,et al.  The IAM-database: an English sentence database for offline handwriting recognition , 2002, International Journal on Document Analysis and Recognition.

[9]  Jürgen Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[10]  Jürgen Schmidhuber,et al.  Bidirectional LSTM Networks for Improved Phoneme Classification and Recognition , 2005, ICANN.

[11]  J. Schmidhuber,et al.  A Novel Connectionist System for Unconstrained Handwriting Recognition , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[13]  Tara N. Sainath,et al.  FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .

[14]  Navdeep Jaitly,et al.  Hybrid speech recognition with Deep Bidirectional LSTM , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[15]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16]  Thomas M. Breuel,et al.  High-Performance OCR for Printed English and Fraktur Using LSTM Networks , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[17]  Andrew W. Senior,et al.  Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.

[18]  Navdeep Jaitly,et al.  Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.

[19]  Hermann Ney,et al.  Fast and Robust Training of Recurrent Neural Networks for Offline Handwriting Recognition , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[20]  Christopher Kermorvant,et al.  The A2iA Multi-lingual Text Recognition System at the Second Maurdor Evaluation , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[21]  Georg Heigold,et al.  Sequence discriminative distributed training of long short-term memory recurrent neural networks , 2014, INTERSPEECH.

[22]  Hermann Ney,et al.  A Comparison of Sequence-Trained Deep Neural Networks and Recurrent Neural Networks Optical Modeling for Handwriting Recognition , 2014, SLSP.

[23]  Dong Yu,et al.  Deep Learning: Methods and Applications , 2014, Found. Trends Signal Process..

[24]  Hermann Ney,et al.  Sequence-discriminative training of recurrent neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  Min Wang,et al.  Building Handwriting Recognizers by Leveraging Skeletons of Both Offline and Online Samples , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).