Spectral Inference Networks: Unifying Spectral Methods With Deep Learning

We present Spectral Inference Networks, a framework for learning eigenfunctions of linear operators by stochastic optimization. Spectral Inference Networks generalize Slow Feature Analysis to generic symmetric operators, and are closely related to Variational Monte Carlo methods from computational physics. As such, they can be a powerful tool for unsupervised representation learning from video or pairs of data. We derive a training algorithm for Spectral Inference Networks that addresses the bias in the gradients due to finite batch size and allows for online learning of multiple eigenfunctions. We show results of training Spectral Inference Networks on problems in quantum mechanics and feature learning for videos on synthetic datasets as well as the Arcade Learning Environment. Our results demonstrate that Spectral Inference Networks accurately recover eigenfunctions of linear operators, can discover interpretable representations from video and find meaningful subgoals in reinforcement learning environments.

[1]  Cristian Sminchisescu,et al.  Matrix Backpropagation for Deep Networks with Structured Layers , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[2]  Lin Sun,et al.  DL-SFA: Deeply-Learned Slow Feature Analysis for Action Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Terrence J. Sejnowski,et al.  Slow Feature Analysis: Unsupervised Learning of Invariances , 2002, Neural Computation.

[4]  Marc G. Bellemare,et al.  The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract) , 2012, IJCAI.

[5]  David Pfau,et al.  Connecting Generative Adversarial Networks and Actor-Critic Methods , 2016, ArXiv.

[6]  Matthias Troyer,et al.  Solving the quantum many-body problem with artificial neural networks , 2016, Science.

[7]  Marlos C. Machado,et al.  Eigenoption Discovery through the Deep Successor Representation , 2017, ICLR.

[8]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[9]  Dimitri P. Bertsekas,et al.  Constrained Optimization and Lagrange Multiplier Methods , 1982 .

[10]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[11]  V. Borkar Stochastic approximation with two time scales , 1997 .

[12]  R. Needs,et al.  Quantum Monte Carlo simulations of solids , 2001 .

[13]  Christos Tzamos,et al.  A converse to Banach's fixed point theorem and its CLS-completeness , 2017, STOC.

[14]  M. Giles An extended collection of matrix derivative results for forward and reverse mode algorithmic dieren tiation , 2008 .

[15]  Joan Bruna,et al.  Spectral Networks and Locally Connected Networks on Graphs , 2013, ICLR.

[16]  John P. Cunningham,et al.  Maximum Entropy Flow Networks , 2017, ICLR.

[17]  C. J. Umrigar,et al.  Introduction to the variational and diffusion Monte Carlo methods , 2015, 1508.02989.

[18]  Jürgen Schmidhuber,et al.  Incremental Slow Feature Analysis: Adaptive Low-Complexity Slow Feature Updating from High-Dimensional Input Streams , 2012, Neural Computation.

[19]  E. Oja Simplified neuron model as a principal component analyzer , 1982, Journal of mathematical biology.

[20]  Ali Alavi,et al.  Krylov-Projected Quantum Monte Carlo Method. , 2015, Physical review letters.

[21]  Dean Alderucci A SPECTRAL ALGORITHM FOR LEARNING HIDDEN MARKOV MODELS THAT HAVE SILENT STATES , 2015 .

[22]  Nicolas Le Roux,et al.  Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering , 2003, NIPS.

[23]  Tom Schaul,et al.  Successor Features for Transfer in Reinforcement Learning , 2016, NIPS.

[24]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[25]  Yang,et al.  Analytic solution of a two-dimensional hydrogen atom. I. Nonrelativistic theory. , 1991, Physical review. A, Atomic, molecular, and optical physics.

[26]  Byron Boots,et al.  Closing the learning-planning loop with predictive state representations , 2009, Int. J. Robotics Res..

[27]  M. Botvinick,et al.  The hippocampus as a predictive map , 2016 .

[28]  P. König,et al.  A Model of the Ventral Visual System Based on Temporal Stability and Local Memory , 2006, PLoS biology.

[29]  Laurenz Wiskott,et al.  Slowness and Sparseness Lead to Place, Head-Direction, and Spatial-View Cells , 2007, PLoS Comput. Biol..

[30]  Jun Zhu,et al.  A Spectral Approach to Gradient Estimation for Implicit Distributions , 2018, ICML.

[31]  Marlos C. Machado,et al.  A Laplacian Framework for Option Discovery in Reinforcement Learning , 2017, ICML.

[32]  Iain Murray,et al.  Differentiation of the Cholesky decomposition , 2016, ArXiv.

[33]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[34]  Sridhar Mahadevan,et al.  Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes , 2007, J. Mach. Learn. Res..

[35]  Alan Edelman,et al.  The Geometry of Algorithms with Orthogonality Constraints , 1998, SIAM J. Matrix Anal. Appl..

[36]  Anima Anandkumar,et al.  A Spectral Algorithm for Latent Dirichlet Allocation , 2012, Algorithmica.

[37]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[38]  Henning Sprekeler,et al.  On the Relation of Slow Feature Analysis and Laplacian Eigenmaps , 2011, Neural Computation.

[39]  V. Pan,et al.  The Complexity of the AlgebraicEigenproblem , 1998 .