Unified losses for multi-modal pose coding and regression

Sparsity and redundancy reduction have been shown to be useful in machine learning, but empirical evaluation has been performed primarily on classification tasks using datasets of natural images and sounds. Similarly, the performance of unsupervised feature learning followed by supervised fine-tuning has primarily focused on classification tasks. In comparison, relatively little work has investigated the use of sparse codes for representing human movements and poses, or for using these codes in regression tasks with movement data. This paper defines a basic coding and regression architecture for evaluating the impact of sparsity when coding human pose information, and tests the performance of several coding methods within this framework for the task of mapping from a kinematic (joint angle) modality to a dynamic (joint torque) one. In addition, we evaluate the performance of unified loss functions defined on the same class of models. We show that, while sparse codes are useful for effective mappings between modalities, their primary benefit for this task seems to be in admitting overcomplete codebooks. We make use of the proposed architecture to examine in detail the sources of error for each stage in the model under various coding strategies. Furthermore, we show that using a unified loss function that passes gradient information between stages of the coding and regression architecture provides substantial reductions in overall error.

[1]  Geoffrey E. Hinton,et al.  Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine , 2010, NIPS.

[2]  Joseph Picone,et al.  Signal modeling techniques in speech recognition , 1993, Proc. IEEE.

[3]  A. E. Hoerl,et al.  Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[4]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[5]  James Martens,et al.  Deep learning via Hessian-free optimization , 2010, ICML.

[6]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[7]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[8]  Dana H. Ballard,et al.  Realtime, Physics-Based Marker Following , 2012, MIG.

[9]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[10]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[11]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[12]  H. B. Barlow,et al.  Possible Principles Underlying the Transformations of Sensory Messages , 2012 .

[13]  Gregor Schöner,et al.  The uncontrolled manifold concept: identifying control variables for a functional task , 1999, Experimental Brain Research.

[14]  Marc'Aurelio Ranzato,et al.  Sparse Feature Learning for Deep Belief Networks , 2007, NIPS.

[15]  M. Latash,et al.  Motor Control Strategies Revealed in the Structure of Motor Variability , 2002, Exercise and sport sciences reviews.

[16]  Andrew Y. Ng,et al.  The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization , 2011, ICML.

[17]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[18]  Nitish Srivastava,et al.  Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[19]  Ilya Sutskever,et al.  Learning Recurrent Neural Networks with Hessian-Free Optimization , 2011, ICML.

[20]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.