Learning Dynamic Programming with Split-Merge Networks

We consider the learning of algorithmic tasks by mere observation of input-output pairs. Rather than studying this as a black-box discrete regression problem with no assumption whatsoever on the input-output mapping, we concentrate on tasks that are amenable to the principle of divide and conquer, and study what are its implications in terms of learning. This principle creates a powerful inductive bias that we exploit with neural architectures that are defined recursively, by learning two scale-invariant atomic operators: how to split a given input into two disjoint sets, and how to merge two partially solved tasks into a larger partial solution. The scale invariance creates parameter sharing across all stages of the architecture, and the dynamic design creates architectures whose complexity can be tuned in a differentiable manner. As a result, our model is trained by backpropagation not only to minimize the errors at the output, but also to do so as efficiently as possible, by enforcing shallower computation graphs. Moreover, thanks to the scale invariance, the model can be trained only with only input/output pairs, removing the need to know oracle intermediate split and merge decisions. As it turns out, accuracy and complexity are not independent qualities, and we verify empirically that when the learnt complexity matches the underlying complexity of the task, this results in higher accuracy and better generalization in two paradigmatic problems: sorting and finding planar convex hulls. ∗Currently on leave from UC Berkeley 1 ar X iv :1 61 1. 02 40 1v 1 [ cs .L G ] 8 N ov 2 01 6

[1]  Whitney Tabor,et al.  Fractal encoding of context‐free grammars in connectionist networks , 2000, Expert Syst. J. Knowl. Eng..

[2]  Mohamad A. Akra,et al.  On the Solution of Linear Recurrence Equations , 1998 .

[3]  Jason Weston,et al.  Memory Networks , 2014, ICLR.

[4]  Tomas Mikolov,et al.  Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets , 2015, NIPS.

[5]  Wojciech Zaremba,et al.  Learning to Execute , 2014, ArXiv.

[6]  Marcin Andrychowicz,et al.  Learning Efficient Algorithms with Hierarchical Attentive Memory , 2016, ArXiv.

[7]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[8]  Lukasz Kaiser,et al.  Neural GPUs Learn Algorithms , 2015, ICLR.

[9]  Mark A. Fanty,et al.  Context-free parsing with connectionist networks , 1987 .

[10]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[11]  Wojciech Zaremba,et al.  Reinforcement Learning Neural Turing Machines - Revised , 2015 .

[12]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.

[13]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[14]  Phil Blunsom,et al.  Learning to Transduce with Unbounded Memory , 2015, NIPS.

[15]  J. Pollack The Induction of Dynamical Recognizers , 1996, Machine Learning.

[16]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[17]  Samy Bengio,et al.  Order Matters: Sequence to sequence for sets , 2015, ICLR.

[18]  Rob Fergus,et al.  Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[19]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[20]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.