Supervised Sequence Labelling with Recurrent Neural Networks

Recurrent neural networks are powerful sequence learners. They are able to incorporate context information in a flexible way, and are robust to localised distortions of the input data. These properties make them well suited to sequence labelling, where input sequences are transcribed with streams of labels. The aim of this thesis is to advance the state-of-the-art in supervised sequence labelling with recurrent networks. Its two main contributions are (1) a new type of output layer that allows recurrent networks to be trained directly for sequence labelling tasks where the alignment between the inputs and the labels is unknown, and (2) an extension of the long short-term memory network architecture to multidimensional data, such as images and video sequences.

[1]  M. Hestenes,et al.  Methods of conjugate gradients for solving linear systems , 1952 .

[2]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[3]  A. A. Mullin,et al.  Principles of neurodynamics , 1962 .

[4]  P. Welch The use of fast Fourier transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms , 1967 .

[5]  G. Doddington,et al.  High performance speaker‐independent word recognition , 1978 .

[6]  W. Nauta,et al.  The organization of the brain. , 1979, Scientific American.

[7]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Rolf Carlson,et al.  ISOLATED WORD RECOGNITION , 1984 .

[9]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[10]  Geoffrey E. Hinton,et al.  Experiments on Learning by Back Propagation. , 1986 .

[11]  Geoffrey Leech,et al.  The tagged LOB Corpus : user's manual , 1986 .

[12]  PAUL J. WERBOS,et al.  Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[13]  David S. Broomhead,et al.  Multivariable Functional Interpolation and Adaptive Networks , 1988, Complex Syst..

[14]  Teuvo Kohonen,et al.  Self-organization and associative memory: 3rd edition , 1989 .

[15]  John Scott Bridle,et al.  Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition , 1989, NATO Neurocomputing.

[16]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[17]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[18]  Steve Young,et al.  Token passing: a simple conceptual model for connected speech recognition systems , 1989 .

[19]  S. Fahlman Fast-learning variations on back propagation: an empirical study. , 1989 .

[20]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[21]  W S McCulloch,et al.  A logical calculus of the ideas immanent in nervous activity , 1990, The Philosophy of Artificial Intelligence.

[22]  John Holdsworth,et al.  A comparison of preprocessors for the cambridge recurrent error propagation network speech recognition system , 1990, ICSLP.

[23]  Paul J. Werbos,et al.  Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[24]  W. Pitts,et al.  A Logical Calculus of the Ideas Immanent in Nervous Activity (1943) , 2021, Ideas That Created the Future.

[25]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[26]  Jirí Benes,et al.  On neural networks , 1990, Kybernetika.

[27]  Michael I. Jordan Attractor dynamics and parallelism in a connectionist sequential machine , 1990 .

[28]  Geoffrey E. Hinton,et al.  A time-delay neural network architecture for isolated word recognition , 1990, Neural Networks.

[29]  Petri Koistinen,et al.  Kernel regression and backpropagation training with noise , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[30]  Sepp Hochreiter,et al.  Untersuchungen zu dynamischen neuronalen Netzen , 1991 .

[31]  Michael C. Mozer,et al.  Induction of Multiscale Temporal Structure , 1991, NIPS.

[32]  Jürgen Schmidhuber,et al.  Learning Complex, Extended Sequences Using the Principle of History Compression , 1992, Neural Computation.

[33]  Tony Plate,et al.  Holographic Recurrent Networks , 1992, NIPS.

[34]  Yoshua Bengio,et al.  Global optimization of a neural network-hidden Markov model hybrid , 1992, IEEE Trans. Neural Networks.

[35]  Geoffrey E. Hinton,et al.  Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.

[36]  Hervé Bourlard,et al.  Connectionist Speech Recognition: A Hybrid Approach , 1993 .

[37]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[38]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[39]  Jean-Luc Gauvain,et al.  High performance speaker-independent phone recognition using CDHMM , 1993, EUROSPEECH.

[40]  Yoshua Bengio A Connectionist Approach to Speech Recognition , 1993, Int. J. Pattern Recognit. Artif. Intell..

[41]  George Zavaliagkos,et al.  A Hybrid Continuous Speech Recognition System Using Segmental Neural Nets with Hidden Markov Models , 1993, Int. J. Pattern Recognit. Artif. Intell..

[42]  Yochai Konig,et al.  A neural network based, speaker independent, large vocabulary, continuous speech recognition system: the WERNICKE project , 1993, EUROSPEECH.

[43]  Alan F. Murray,et al.  Enhanced MLP performance and fault tolerance resulting from synaptic weight noise during training , 1994, IEEE Trans. Neural Networks.

[44]  J. Shewchuk An Introduction to the Conjugate Gradient Method Without the Agonizing Pain , 1994 .

[45]  Hervé Bourlard,et al.  Connectionist probability estimators in HMM speech recognition , 1994, IEEE Trans. Speech Audio Process..

[46]  Anthony J. Robinson,et al.  An application of recurrent nets to phone probability estimation , 1994, IEEE Trans. Neural Networks.

[47]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[48]  Ronald J. Williams,et al.  Gradient-based learning algorithms for recurrent networks and their computational complexity , 1995 .

[49]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[50]  David Mackay,et al.  Probable networks and plausible predictions - a review of practical Bayesian methods for supervised neural networks , 1995 .

[51]  Jorge Nocedal,et al.  A Limited Memory Algorithm for Bound Constrained Optimization , 1995, SIAM J. Sci. Comput..

[52]  Yoshua Bengio,et al.  LeRec: A NN/HMM Hybrid for On-Line Handwriting Recognition , 1995, Neural Computation.

[53]  Anthony J. Robinson,et al.  Context-Dependent Classes in a Hybrid Recurrent Network-HMM Speech Recognition System , 1995, NIPS.

[54]  Ciro Martins,et al.  Speaker-adaptation for hybrid HMM-ANN continuous speech recognition system , 1995, EUROSPEECH.

[55]  Steve Young,et al.  The HTK book , 1995 .

[56]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[57]  Anthony J. Robinson,et al.  Forward-backward retraining of recurrent neural networks , 1995, NIPS.

[58]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[59]  C. Lee Giles,et al.  An analysis of noise in recurrent neural networks: convergence and generalization , 1996, IEEE Trans. Neural Networks.

[60]  Ruxin Chen,et al.  Experiments on the implementation of recurrent neural networks for speech phone recognition , 1996, Conference Record of The Thirtieth Asilomar Conference on Signals, Systems and Computers.

[61]  Yochai Konig,et al.  A new training algorithm for hybrid HMM/ANN speech recognition systems , 1996, 1996 8th European Signal Processing Conference (EUSIPCO 1996).

[62]  Peter Tiño,et al.  Learning long-term dependencies in NARX recurrent neural networks , 1996, IEEE Trans. Neural Networks.

[63]  Guozhong An,et al.  The Effects of Adding Noise During Backpropagation Training on a Generalization Performance , 1996, Neural Computation.

[64]  Hervé Bourlard,et al.  Estimation of global posteriors and forward-backward training of hybrid HMM/ANN systems , 1997, EUROSPEECH.

[65]  Alessandro Sperduti,et al.  Supervised neural networks for the classification of structures , 1997, IEEE Trans. Neural Networks.

[66]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[67]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[68]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[69]  Barbara Hammer,et al.  On the approximation capability of recurrent neural networks , 2000, Neurocomputing.

[70]  Jason M. Kinser,et al.  Image Processing using Pulse-Coupled Neural Networks , 1998, Perspectives in Neural Computing.

[71]  Jane W. Chang,et al.  Near-miss modeling: a segment-based approach to speech recognition , 1998 .

[72]  X. Pang,et al.  Neural network design for J function approximation in dynamic programming , 1998, adap-org/9806001.

[73]  Alessandro Sperduti,et al.  A general framework for adaptive processing of data structures , 1998, IEEE Trans. Neural Networks.

[74]  Francis Jack Smith,et al.  Improved phone recognition using Bayesian triphone models , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[75]  Yoshinori Sagisaka,et al.  Phoneme boundary estimation using bidirectional recurrent neural networks and its applications , 1999 .

[76]  Christoph Goller,et al.  A connectionist approach for learning search-control heuristics for automated deduction systems , 1999, DISKI.

[77]  Réjean Plamondon,et al.  On-line handwriting recognition. , 1999 .

[78]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[79]  Yoshinori Sagisaka,et al.  Phoneme boundary estimation using bidirectional recurrent neural networks and its applications , 1999, Systems and Computers in Japan.

[80]  Giovanni Soda,et al.  Exploiting the past and the future in protein secondary structure prediction , 1999, Bioinform..

[81]  Andrew K. Halberstadt Heterogeneous acoustic measurements and multiple classifiers for speech recognition , 1999 .

[82]  Mike Schuster,et al.  On supervised learning from sequential data with applications for speech regognition , 1999 .

[83]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[84]  Robert M. Gray,et al.  Image classification by a two-dimensional hidden Markov model , 2000, IEEE Trans. Signal Process..

[85]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[86]  Robert M. Gray,et al.  Image Classification by a Two-Dimensional Hidden , 2000 .

[87]  Bram Bakker,et al.  Reinforcement Learning with Long Short-Term Memory , 2001, NIPS.

[88]  Giovanni Soda,et al.  Bidirectional Dynamics for Protein Secondary Structure Prediction , 2001, Sequence Learning.

[89]  F. Gers,et al.  Long short-term memory in recurrent neural networks , 2001 .

[90]  Herbert Jaeger,et al.  The''echo state''approach to analysing and training recurrent neural networks , 2001 .

[91]  Gerhard Rigoll,et al.  Facial Expression Recognition with Pseudo-3D Hidden Markov Models , 2001, DAGM-Symposium.

[92]  Yoshua Bengio,et al.  Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[93]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[94]  Horst Bunke,et al.  Using a Statistical Language Model to Improve the Performance of an HMM-Based Cursive Handwriting Recognition System , 2001, Int. J. Pattern Recognit. Artif. Intell..

[95]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[96]  Jürgen Schmidhuber,et al.  LSTM recurrent networks learn simple context-free and context-sensitive languages , 2001, IEEE Trans. Neural Networks.

[97]  Jürgen Schmidhuber,et al.  Learning Precise Timing with LSTM Recurrent Networks , 2003, J. Mach. Learn. Res..

[98]  Horst Bunke,et al.  The IAM-database: an English sentence database for offline handwriting recognition , 2002, International Journal on Document Analysis and Recognition.

[99]  Barbara Hammer,et al.  Recurrent networks for structured data – A unifying approach and its properties , 2002, Cognitive Systems Research.

[100]  Mohammed Waleed Kadous,et al.  Temporal classification: extending the classification paradigm to multivariate time series , 2002 .

[101]  M. Pechwitz,et al.  IFN/ENIT: database of handwritten arabic words , 2002 .

[102]  Booncharoen Sirinaovakul,et al.  Introduction to the Special Issue , 2002, Comput. Intell..

[103]  Jürgen Schmidhuber,et al.  Finding temporal structure in music: blues improvisation with LSTM recurrent networks , 2002, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing.

[104]  Nicol N. Schraudolph,et al.  Fast Curvature Matrix-Vector Products for Second-Order Gradient Descent , 2002, Neural Computation.

[105]  Pierre Baldi,et al.  The Principled Design of Large-Scale Recursive Neural Network Architectures--DAG-RNNs and the Protein Structure Prediction Problem , 2003, J. Mach. Learn. Res..

[106]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[107]  James R. Glass A probabilistic framework for segment-based speech recognition , 2003, Comput. Speech Lang..

[108]  Marco Gori,et al.  Robust combination of neural networks and hidden Markov models for speech recognition , 2003, IEEE Trans. Neural Networks.

[109]  Narendra S. Chaudhari,et al.  Capturing Long-Term Dependencies for Protein Secondary Structure Prediction , 2004, ISNN.

[110]  Ricardo Vilalta,et al.  Introduction to the Special Issue on Meta-Learning , 2004, Machine Learning.

[111]  Bernhard Schölkopf,et al.  Training Invariant Support Vector Machines , 2002, Machine Learning.

[112]  Yoshua Bengio,et al.  Markovian Models for Sequential Data , 2004 .

[113]  G. Miller Learning to Forget , 2004, Science.

[114]  Jürgen Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[115]  Jürgen Schmidhuber,et al.  Bidirectional LSTM Networks for Improved Phoneme Classification and Recognition , 2005, ICANN.

[116]  Volker Märgner,et al.  Arabic Handwriting Recognition Competition , 2005, ICDAR.

[117]  James Ze Wang,et al.  Parameter estimation of multi-dimensional hidden Markov models - a scalable approach , 2005, IEEE International Conference on Image Processing 2005.

[118]  J. Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM networks , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[119]  M. T. Johnson,et al.  Capacity and complexity of HMM duration modeling techniques , 2005, IEEE Signal Processing Letters.

[120]  Alex Graves,et al.  Rapid Retraining on Speech Data with LSTM Recurrent Networks. , 2005 .

[121]  Marcus Liwicki,et al.  IAM-OnDB - an on-line English sentence database acquired from handwritten text on a whiteboard , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[122]  Jürgen Schmidhuber,et al.  Modeling systems with internal state using evolino , 2005, GECCO '05.

[123]  Volker Märgner,et al.  ICDAR 2009-Arabic handwriting recognition competition , 2011, 2011 International Conference on Document Analysis and Recognition.

[124]  Marcus Liwicki,et al.  Handwriting Recognition of Whiteboard Notes , 2005 .

[125]  Yann LeCun,et al.  Graph transformer networks for image recognition , 2005 .

[126]  Jean-Cédric Chappelier,et al.  Offline grammar-based recognition of handwritten sentences , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[127]  Pierre Baldi,et al.  Modular DAG-RNN Architectures for Assembling Coarse Protein Structures , 2006, J. Comput. Biol..

[128]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[129]  Lawrence K. Saul,et al.  Large Margin Hidden Markov Models for Automatic Speech Recognition , 2006, NIPS.

[130]  Lin Wu,et al.  A Scalable Machine Learning Approach to Go , 2006, NIPS.

[131]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[132]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[133]  Eric Fosler-Lussier,et al.  Combining phonetic attributes using conditional random fields , 2006, INTERSPEECH.

[134]  Bernard Mérialdo,et al.  Multi-Dimensional Dependency-Tree Hidden Markov Models , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[135]  Farhad Faradji,et al.  A Comprehensive Isolated Farsi/Arabic Character Database for Handwritten OCR Research , 2006 .

[136]  Ching Y. Suen,et al.  Standard Databases for Recognition of Handwritten Digits, Numerical Strings, Legal Amounts, Letters and Dates in Farsi Language , 2006 .

[137]  Dong Yu,et al.  A lattice search technique for a long-contextual-span hidden trajectory model of speech , 2006, Speech Commun..

[138]  Klaus Obermayer,et al.  Fast model-based protein homology detection without alignment , 2007, Bioinform..

[139]  Jürgen Schmidhuber,et al.  Sequence Labelling in Structured Domains with Hierarchical Recurrent Neural Networks , 2007, IJCAI.

[140]  Ehsanollah Kabir,et al.  Introducing a very large dataset of handwritten Farsi digits and a study on their varieties , 2007, Pattern Recognit. Lett..

[141]  T. Thireou,et al.  Bidirectional Long Short-Term Memory Networks for Predicting the Subcellular Localization of Eukaryotic Proteins , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[142]  Marcus Liwicki,et al.  A novel approach to on-line handwriting recognition based on bidirectional long short-term memory networks , 2007 .

[143]  Patrice Simardy,et al.  Learning Long-Term Dependencies with , 2007 .

[144]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[145]  Yoshua Bengio,et al.  Scaling learning algorithms towards AI , 2007 .

[146]  Jürgen Schmidhuber,et al.  Training Recurrent Networks by Evolino , 2007, Neural Computation.

[147]  Horst Bunke,et al.  Multiple Classifier Methods for Offline Handwritten Text Line Recognition , 2007, MCS.

[148]  A. Graves,et al.  Unconstrained Online Handwriting Recognition with Recurrent Neural Networks , 2007 .

[149]  Jürgen Schmidhuber,et al.  An Application of Recurrent Neural Networks to Discriminative Keyword Spotting , 2007, ICANN.

[150]  Jürgen Schmidhuber,et al.  Multi-dimensional Recurrent Neural Networks , 2007, ICANN.

[151]  Jürgen Schmidhuber,et al.  Phoneme recognition in TIMIT with BLSTM-CTC , 2008, ArXiv.

[152]  Jürgen Schmidhuber,et al.  Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks , 2008, NIPS.

[153]  J. Schmidhuber,et al.  A Novel Connectionist System for Unconstrained Handwriting Recognition , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[154]  Sandro Tosi,et al.  Matplotlib for Python Developers , 2009 .

[155]  Volker Märgner,et al.  ICDAR 2009 Online Arabic Handwriting Recognition Competition , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[156]  Julian Togelius,et al.  Evolving Memory Cell Structures for Sequence Learning , 2009, ICANN.

[157]  Tara N. Sainath,et al.  An exploration of large vocabulary tools for small vocabulary phonetic recognition , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[158]  Steve Renals,et al.  Speech Recognition Using Augmented Conditional Random Fields , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[159]  Edouard Geoffrois,et al.  Results of the RIMES Evaluation Campaign for Handwritten Mail Processing , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[160]  Jaap Heringa,et al.  Protein secondary structure prediction. , 2010, Methods in molecular biology.