Two Distributed-State Models For Generating High-Dimensional Time Series

In this paper we develop a class of nonlinear generative models for high-dimensional time series. We first propose a model based on the restricted Boltzmann machine (RBM) that uses an undirected model with binary latent variables and real-valued "visible" variables. The latent and visible variables at each time step receive directed connections from the visible variables at the last few time-steps. This "conditional" RBM (CRBM) makes on-line inference efficient and allows us to use a simple approximate learning procedure. We demonstrate the power of our approach by synthesizing various sequences from a model trained on motion capture data and by performing on-line filling in of data lost during capture. We extend the CRBM in a way that preserves its most important computational properties and introduces multiplicative three-way interactions that allow the effective interaction weight between two variables to be modulated by the dynamic state of a third variable. We introduce a factoring of the implied three-way weight tensor to permit a more compact parameterization. The resulting model can capture diverse styles of motion with a single set of parameters, and the three-way interactions greatly improve its ability to blend motion styles or to transition smoothly among them. Videos and source code can be found at http://www.cs.nyu.edu/~gwtaylor/publications/jmlr2011.

[1]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[2]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[3]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[4]  David Haussler,et al.  Unsupervised learning of distributions on binary vectors using two layer networks , 1991, NIPS 1991.

[5]  Kiyotoshi Matsuoka,et al.  Noise injection into inputs in back-propagation learning , 1992, IEEE Trans. Syst. Man Cybern..

[6]  Geoffrey E. Hinton,et al.  The "wake-sleep" algorithm for unsupervised neural networks. , 1995, Science.

[7]  Zoubin Ghahramani,et al.  Learning Dynamic Bayesian Networks , 1997, Summer School on Neural Networks.

[8]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[9]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[10]  F. Sebastian Grassia,et al.  Practical Parameterization of Rotations Using the Exponential Map , 1998, J. Graphics, GPU, & Game Tools.

[11]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[12]  Michael F. Cohen,et al.  Verbs and Adverbs: Multidimensional Motion Interpolation , 1998, IEEE Computer Graphics and Applications.

[13]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[14]  Geoffrey E. Hinton,et al.  Spiking Boltzmann Machines , 1999, NIPS.

[15]  Adrian Hilton,et al.  Realistic synthesis of novel human movements from a database of motion capture examples , 2000, Proceedings Workshop on Human Motion.

[16]  Yee Whye Teh,et al.  Rate-coded Restricted Boltzmann Machines for Face Recognition , 2000, NIPS.

[17]  Joshua B. Tenenbaum,et al.  Separating Style and Content with Bilinear Models , 2000, Neural Computation.

[18]  Aaron Hertzmann,et al.  Style machines , 2000, SIGGRAPH 2000.

[19]  Vladimir Pavlovic,et al.  Learning Switching Linear Models of Human Motion , 2000, NIPS.

[20]  Harry Shum,et al.  Motion texture: a two-level statistical model for character motion synthesis , 2002, ACM Trans. Graph..

[21]  Jessica K. Hodgins,et al.  Interactive control of avatars animated with human motion data , 2002, SIGGRAPH.

[22]  Christoph Bregler,et al.  Motion capture assisted animation: texturing and synthesis , 2002, ACM Trans. Graph..

[23]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[24]  Okan Arikan,et al.  Interactive motion generation from examples , 2002, ACM Trans. Graph..

[25]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[26]  Sung Yong Shin,et al.  On-line locomotion generation based on motion blending , 2002, SCA '02.

[27]  Lucas Kovar,et al.  Motion Graphs , 2002, ACM Trans. Graph..

[28]  David A. Forsyth,et al.  Motion synthesis from annotations , 2003, ACM Trans. Graph..

[29]  Neil D. Lawrence,et al.  Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data , 2003, NIPS.

[30]  Geoffrey E. Hinton,et al.  Exponential Family Harmoniums with an Application to Information Retrieval , 2004, NIPS.

[31]  Lucas Kovar,et al.  Automated extraction and parameterization of motions in large data sets , 2004, ACM Trans. Graph..

[32]  Pascal Fua,et al.  Style‐Based Motion Synthesis † , 2004, Comput. Graph. Forum.

[33]  Michael Gleicher,et al.  Automated extraction and parameterization of motions in large data sets , 2004, SIGGRAPH 2004.

[34]  Jovan Popović,et al.  Style translation for human motion , 2005, ACM Trans. Graph..

[35]  C. Karen Liu,et al.  Learning physics-based motion style with nonlinear inverse optimization , 2005, ACM Trans. Graph..

[36]  Tomohiko Mukai,et al.  Geostatistical motion interpolation , 2005, ACM Trans. Graph..

[37]  Aapo Hyvärinen,et al.  Estimation of Non-Normalized Statistical Models by Score Matching , 2005, J. Mach. Learn. Res..

[38]  Alessandro Bissacco,et al.  Modeling and learning contact dynamics in human motion , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[39]  Miguel Á. Carreira-Perpiñán,et al.  On Contrastive Divergence Learning , 2005, AISTATS.

[40]  Peter V. Gehler,et al.  The rate adapting poisson model for information retrieval and object recognition , 2006, ICML.

[41]  D. Lawrence The Gaussian Process Latent Variable Model , 2006 .

[42]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[43]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[44]  Geoffrey E. Hinton,et al.  Modeling Human Motion Using Binary Latent Variables , 2006, NIPS.

[45]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[46]  Marc'Aurelio Ranzato,et al.  Efficient Learning of Sparse Representations with an Energy-Based Model , 2006, NIPS.

[47]  Lorenzo Torresani,et al.  Learning Motion Style Synthesis from Perceptual Observations , 2006, NIPS.

[48]  David J. Fleet,et al.  3D People Tracking with Gaussian Process Dynamical Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[49]  Geoffrey E. Hinton Learning multiple layers of representation , 2007, Trends in Cognitive Sciences.

[50]  Geoffrey E. Hinton,et al.  Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.

[51]  Neil D. Lawrence,et al.  Learning for Larger Datasets with the Gaussian Process Latent Variable Model , 2007, AISTATS.

[52]  Honglak Lee,et al.  Sparse deep belief net model for visual area V2 , 2007, NIPS.

[53]  David J. Fleet,et al.  Multifactor Gaussian process models for style-content separation , 2007, ICML '07.

[54]  Geoffrey E. Hinton,et al.  Unsupervised Learning of Image Transformations , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[55]  Marc'Aurelio Ranzato,et al.  A Unified Energy-Based Framework for Unsupervised Learning , 2007, AISTATS.

[56]  Marc'Aurelio Ranzato,et al.  Sparse Feature Learning for Deep Belief Networks , 2007, NIPS.

[57]  Geoffrey E. Hinton,et al.  Modeling image patches with a directed hierarchy of Markov random fields , 2007, NIPS.

[58]  Yoshua Bengio,et al.  An empirical evaluation of deep architectures on problems with many factors of variation , 2007, ICML '07.

[59]  Geoffrey E. Hinton,et al.  Three new graphical models for statistical language modelling , 2007, ICML '07.

[60]  Geoffrey E. Hinton,et al.  Learning Multilevel Distributed Representations for High-Dimensional Sequences , 2007, AISTATS.

[61]  Neil D. Lawrence,et al.  Hierarchical Gaussian process latent variable models , 2007, ICML '07.

[62]  Geoffrey E. Hinton,et al.  The Recurrent Temporal Restricted Boltzmann Machine , 2008, NIPS.

[63]  Ruslan Salakhutdinov,et al.  On the quantitative analysis of deep belief networks , 2008, ICML '08.

[64]  Geoffrey E. Hinton,et al.  Deep, Narrow Sigmoid Belief Networks Are Universal Approximators , 2008, Neural Computation.

[65]  David J. Fleet,et al.  Topologically-constrained latent variable models , 2008, ICML '08.

[66]  David J. Fleet,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Gaussian Process Dynamical Model , 2007 .

[67]  Tijmen Tieleman,et al.  Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.

[68]  Geoffrey E. Hinton,et al.  Using fast weights to improve persistent contrastive divergence , 2009, ICML '09.

[69]  Geoffrey E. Hinton,et al.  Factored conditional restricted Boltzmann Machines for modeling motion style , 2009, ICML '09.

[70]  Geoffrey E. Hinton,et al.  3D Object Recognition with Deep Belief Nets , 2009, NIPS.

[71]  Yoshua Bengio,et al.  Justifying and Generalizing Contrastive Divergence , 2009, Neural Computation.

[72]  Geoffrey E. Hinton,et al.  Learning to Represent Spatial Transformations with Factored Higher-Order Boltzmann Machines , 2010, Neural Computation.

[73]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.