Deep Sequential Neural Networks

Neural Networks sequentially build high-level features through their successive layers. We propose here a new neural network model where each layer is associated with a set of candidate mappings. When an input is processed, at each layer, one mapping among these candidates is selected according to a sequential decision process. The resulting model is structured according to a DAG like architecture, so that a path from the root to a leaf node defines a sequence of transformations. Instead of considering global transformations, like in classical multilayer networks, this model allows us for learning a set of local transformations. It is thus able to process data with different characteristics through specific sequences of such local transformations, increasing the expression power of this model w.r.t a classical multilayered network. The learning algorithm is inspired from policy gradient techniques coming from the reinforcement learning domain and is used here instead of the classical back-propagation based gradient descent techniques. Experiments on different datasets show the relevance of this approach.

[1]  J. R. Quinlan Induction of decision trees , 2004, Machine Learning.

[2]  Paul E. Utgoff,et al.  Perceptron Trees : A Case Study in ybrid Concept epresentations , 1999 .

[3]  Jean-Pierre Nadal,et al.  Neural trees: a new tool for classification , 1990 .

[4]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[5]  Jürgen Schmidhuber,et al.  Solving Deep Memory POMDPs with Recurrent Policy Gradients , 2007, ICANN.

[6]  Russell Greiner,et al.  Learning to segment from a few well-selected training images , 2009, ICML '09.

[7]  Patrick Gallinari,et al.  Text Classification: A Sequential Reading Approach , 2011, ECIR.

[8]  Trevor Darrell,et al.  Timely Object Recognition , 2012, NIPS.

[9]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Yoshua Bengio,et al.  Deep Learning for NLP (without Magic) , 2012, ACL.

[11]  Balázs Kégl,et al.  Fast classification using sparse decision DAGs , 2012, ICML.

[12]  Patrick Gallinari,et al.  Sequential approaches for learning datum-wise sparse representations , 2012, Machine Learning.

[13]  Ilya Sutskever,et al.  Training Deep and Recurrent Networks with Hessian-Free Optimization , 2012, Neural Networks: Tricks of the Trade.

[14]  Christopher D. Manning,et al.  Bilingual Word Embeddings for Phrase-Based Machine Translation , 2013, EMNLP.

[15]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16]  Pierre Baldi,et al.  The dropout learning algorithm , 2014, Artif. Intell..

[17]  Alex Graves,et al.  Recurrent Models of Visual Attention , 2014, NIPS.

[18]  Matthieu Cord,et al.  Sequentially Generated Instance-Dependent Image Representations for Classification , 2014, ICLR.