Divide and Conquer Networks

We consider the learning of algorithmic tasks by mere observation of input-output pairs. Rather than studying this as a black-box discrete regression problem with no assumption whatsoever on the input-output mapping, we concentrate on tasks that are amenable to the principle of divide and conquer, and study what are its implications in terms of learning. This principle creates a powerful inductive bias that we leverage with neural architectures that are defined recursively and dynamically, by learning two scale-invariant atomic operations: how to split a given input into smaller sets, and how to merge two partially solved tasks into a larger partial solution. Our model can be trained in weakly supervised environments, namely by just observing input-output pairs, and in even weaker environments, using a non-differentiable reward signal. Moreover, thanks to the dynamic aspect of our architecture, we can incorporate the computational complexity as a regularization term that can be optimized by backpropagation. We demonstrate the flexibility and efficiency of the Divide-and-Conquer Network on several combinatorial and geometric tasks: convex hull, clustering, knapsack and euclidean TSP. Thanks to the dynamic programming nature of our model, we show significant improvements in terms of generalization error and computational complexity.

[1]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[2]  Samy Bengio,et al.  Order Matters: Sequence to sequence for sets , 2015, ICLR.

[3]  Whitney Tabor,et al.  Fractal encoding of context‐free grammars in connectionist networks , 2000, Expert Syst. J. Knowl. Eng..

[4]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[5]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[6]  Tomas Mikolov,et al.  Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets , 2015, NIPS.

[7]  Wojciech Zaremba,et al.  Learning to Execute , 2014, ArXiv.

[8]  Dawn Xiaodong Song,et al.  Making Neural Programming Architectures Generalize via Recursion , 2017, ICLR.

[9]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.

[10]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[11]  Mohamad A. Akra,et al.  On the Solution of Linear Recurrence Equations , 1998 .

[12]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[13]  Lukasz Kaiser,et al.  Neural GPUs Learn Algorithms , 2015, ICLR.

[14]  Nando de Freitas,et al.  Neural Programmer-Interpreters , 2015, ICLR.

[15]  Phil Blunsom,et al.  Learning to Transduce with Unbounded Memory , 2015, NIPS.

[16]  J. Pollack The Induction of Dynamical Recognizers , 1996, Machine Learning.

[17]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[18]  Marcin Andrychowicz,et al.  Learning Efficient Algorithms with Hierarchical Attentive Memory , 2016, ArXiv.

[19]  Wojciech Zaremba,et al.  Reinforcement Learning Neural Turing Machines - Revised , 2015 .

[20]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[21]  Rob Fergus,et al.  Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[22]  Rex A. Dwyer,et al.  On the convex hull of random points in a polytope , 1988, Journal of Applied Probability.