On Contrastive Representations of Stochastic Processes

Learning representations of stochastic processes is an emerging problem in machine learning with applications from meta-learning to physical object models to time series. Typical methods rely on exact reconstruction of observations, but this approach breaks down as observations become high-dimensional or noise distributions become complex. To address this, we propose a unifying framework for learning contrastive representations of stochastic processes (CRESP) that does away with exact reconstruction. We dissect potential use cases for stochastic process representations, and propose methods that accommodate each. Empirically, we show that our methods are effective for learning representations of periodic functions, 3D objects and dynamical processes. Our methods tolerate noisy high-dimensional observations better than traditional approaches, and the learned representations transfer to a range of downstream tasks.

[1]  D. Sherrington Stochastic Processes in Physics and Chemistry , 1983 .

[2]  Alexander Kolesnikov,et al.  Revisiting Self-Supervised Visual Representation Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Sheldon M. Ross,et al.  Stochastic Processes , 2018, Gauge Integral Structures for Stochastic Calculus and Quantum Electrodynamics.

[4]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[5]  H. McKean,et al.  Diffusion processes and their sample paths , 1996 .

[6]  Yoshua Bengio,et al.  Learning deep representations by mutual information estimation and maximization , 2018, ICLR.

[7]  H. Robbins A Stochastic Approximation Method , 1951 .

[8]  Yannis Panagakis,et al.  CoPE: Conditional image generation using Polynomial Expansions , 2021, ArXiv.

[9]  Pratul P. Srinivasan,et al.  NeRF , 2020, ECCV.

[10]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[11]  R Devon Hjelm,et al.  Learning Representations by Maximizing Mutual Information Across Views , 2019, NeurIPS.

[12]  Alexandre Lacoste,et al.  Quantifying the Carbon Emissions of Machine Learning , 2019, ArXiv.

[13]  Tom Rainforth,et al.  Improving Transformation Invariance in Contrastive Representation Learning , 2020, ICLR.

[14]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[15]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[16]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  W. Ebeling Stochastic Processes in Physics and Chemistry , 1995 .

[18]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[19]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[20]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[21]  Yee Whye Teh,et al.  Conditional Neural Processes , 2018, ICML.

[22]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[23]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[24]  Koray Kavukcuoglu,et al.  Neural scene representation and rendering , 2018, Science.

[25]  Kurt Jacobs,et al.  Stochastic Processes for Physicists: Understanding Noisy Systems , 2010 .

[26]  Kaiming He,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Alex Graves,et al.  Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.

[28]  Christoph H. Lampert,et al.  Zero-Shot Learning—A Comprehensive Evaluation of the Good, the Bad and the Ugly , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[30]  Ali Esmaili,et al.  Probability and Random Processes , 2005, Technometrics.

[31]  Silvio Savarese,et al.  3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction , 2016, ECCV.

[32]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[33]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[34]  Phillip Isola,et al.  Contrastive Multiview Coding , 2019, ECCV.

[35]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[36]  J. Steele Stochastic Calculus and Financial Applications , 2000 .

[37]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[38]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[39]  Thomas Paine,et al.  Few-shot Autoregressive Density Estimation: Towards Learning to Learn Distributions , 2017, ICLR.

[40]  Yee Whye Teh,et al.  Attentive Neural Processes , 2019, ICLR.

[41]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[42]  Yee Whye Teh,et al.  Neural Processes , 2018, ArXiv.

[43]  S. Shreve,et al.  Stochastic differential equations , 1955, Mathematical Proceedings of the Cambridge Philosophical Society.

[44]  Yee Whye Teh,et al.  Noise Contrastive Meta-Learning for Conditional Density Estimation using Kernel Mean Embeddings , 2019, AISTATS.

[45]  Paul C. Bressloff,et al.  Stochastic Processes in Cell Biology , 2014, Interdisciplinary Applied Mathematics.

[46]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[47]  Iain Murray,et al.  On Contrastive Learning for Likelihood-free Inference , 2020, ICML.

[48]  Mike Wu,et al.  A Simple Framework for Uncertainty in Contrastive Learning , 2020, ArXiv.

[49]  Dustin Tran,et al.  Image Transformer , 2018, ICML.

[50]  Aapo Hyvärinen,et al.  Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[51]  Bernhard Schölkopf,et al.  Function Contrastive Learning of Transferable Meta-Representations , 2020, ICML.

[52]  Lihi Zelnik-Manor,et al.  Event-based analysis of video , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[53]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.