Classifying movements using efficient kinematic codes

Classifying movements using efficient kinematic codes Leif Johnson (leif@cs.utexas.edu) and Dana Ballard (dana@cs.utexas.edu) Department of Computer Science The University of Texas at Austin Abstract Efficient codes have been shown to perform well in image and audio classification tasks, but the impact of sparsity—and in- deed the entire notion of efficient coding—has not yet been well explored in the context of human movements. This pa- per tests several coding approaches on a movement classifi- cation task and finds that efficient codes for kinematic (joint angle) data perform well for classifying many different types of movements. In particular, the best classification method re- lied on a sparse coding algorithm combined with a codebook that was tuned to kinematic movement data. The other ap- proaches tested here—sparse coding with a random codebook, and dense coding using PCA—provide interesting baseline results and allow us to investigate why sparse codes appear to work well. Introduction When modeling sensory data like images and sound, ef- ficient codes were proposed (Barlow, 1961) as a mech- anism for reducing statistical redundancy in natural in- puts, thus providing a neural substrate with an effective use of limited metabolic resources. Indeed, in the past decades, sparse codes have been shown to yield rep- resentations of natural sensory data that are similar to receptive fields in living animals (Olshausen & Field, 1996; Smith & Lewicki, 2006), interpretable by humans (Tibshirani, 1996), and effective for computational clas- sification tasks (Lee, Battle, Raina, & Ng, 2007; Glorot, Bordes, & Bengio, 2011; Le, Karpenko, Ngiam, & Ng, 2011; Coates & Ng, 2011). However, in computer sci- ence and machine learning, sparsity has not yet been ap- plied widely outside visual and auditory domains; partly this seems to be due to the ease with which photos and sounds can be interpreted by human researchers, and partly this might be due to the large amount of such data available online. At the same time, sparsity seems ideal for coding movement information because, like sensory data sam- pled from the natural world, human movements ap- pear to lie along a low-dimensional manifold embed- ded within the space of all possible movements (Scholz & Schoner, 1999; Latash, Scholz, & Schoner, 2002). Recent ideas in coding (Olshausen & Field, 2004) and feature learning (Bengio, 2013) suggest that sparse codes are effective for representing data along low- dimensional manifolds because the basis vectors that are used to represent a particular data element can be spread out along the manifold, with only a few basis elements representing any particular location in space. This paper explores the use of efficient codes for classifying kinematic data derived from human move- Figure 1: The articulated skeleton in the CMU mocap database consists of 30 rigid bone segments joined to- gether with a total of 59 angular degrees of freedom. The joint angles in each frame are computed by the motion capture system, which combines the observed marker positions with a fitted skeleton to compute an angular kinematic representation of the pose. ments. We first describe the data source and our com- putational model for movement classification, and then briefly present the coding approaches that we evaluated for the classification task. The paper concludes by dis- cussing the results of our experiments and comparing them with similar, existing research. Data Processing We used motion-capture data available online through the CMU Mocap Database 1 ; the database contains motion-capture recordings from more than 100 subjects performing a variety of actions, ranging from simple walking to complex acrobatic stunts and even common household activities like washing up. The database is not uniformly covered, however: some subjects only performed one type of action, while others performed several; likewise, some actions were only performed once, while others were repeated multiple times. In ad- dition, some motion-capture recordings are quite long (tens of seconds), while many are very brief (just two or three seconds). 1 http://mocap.cs.cmu.edu

[1]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[2]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[3]  Wei Liang,et al.  Discriminative human action recognition in the learned hierarchical manifold space , 2010, Image Vis. Comput..

[4]  H. B. Barlow,et al.  Possible Principles Underlying the Transformations of Sensory Messages , 2012 .

[5]  Yoshua Bengio,et al.  Deep Learning of Representations: Looking Forward , 2013, SLSP.

[6]  Patrick Pérez,et al.  View-Independent Action Recognition from Temporal Self-Similarities , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Quoc V. Le,et al.  ICA with Reconstruction Cost for Efficient Overcomplete Feature Learning , 2011, NIPS.

[8]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[9]  Rama Chellappa,et al.  View Invariance for Human Action Recognition , 2005, International Journal of Computer Vision.

[10]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[11]  Gregor Schöner,et al.  The uncontrolled manifold concept: identifying control variables for a functional task , 1999, Experimental Brain Research.

[12]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[13]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[14]  M. Latash,et al.  Motor Control Strategies Revealed in the Structure of Motor Variability , 2002, Exercise and sport sciences reviews.

[15]  Bruno A Olshausen,et al.  Sparse coding of sensory inputs , 2004, Current Opinion in Neurobiology.

[16]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[17]  B. Schölkopf,et al.  Modeling Human Motion Using Binary Latent Variables , 2007 .

[18]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[19]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[20]  Andrew Y. Ng,et al.  The Importance of Encoding Versus Training with Sparse Coding and Vector Quantization , 2011, ICML.

[21]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[22]  Michael S. Lewicki,et al.  Efficient auditory coding , 2006, Nature.