Computer Vision – ECCV 2012

Three-dimensional object shape is commonly represented in terms of deformations of a triangular mesh from an exemplar shape. Existing models, however, are based on a Euclidean representation of shape deformations. In contrast, we argue that shape has a manifold structure: For example, summing the shape deformations for two people does not necessarily yield a deformation corresponding to a valid human shape, nor does the Euclidean difference of these two deformations provide a meaningful measure of shape dissimilarity. Consequently, we define a novel manifold for shape representation, with emphasis on body shapes, using a new Lie group of deformations. This has several advantages. First we define triangle deformations exactly, removing non-physical deformations and redundant degrees of freedom common to previous methods. Second, the Riemannian structure of Lie Bodies enables a more meaningful definition of body shape similarity by measuring distance between bodies on the manifold of body shape deformations. Third, the group structure allows the valid composition of deformations. This is important for models that factor body shape deformations into multiple causes or represent shape as a linear combination of basis shapes. Finally, body shape variation is modeled using statistics on manifolds. Instead of modeling Euclidean shape variation with Principal Component Analysis we capture shape variation on the manifold using Principal Geodesic Analysis. Our experiments show consistent visual and quantitative advantages of Lie Bodies over traditional Euclidean models of shape deformation and our representation can be easily incorporated into existing methods.

[1]  J. Findlay,et al.  Active Vision: The Psychology of Looking and Seeing , 2003 .

[2]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[3]  Allan D. Jepson,et al.  Computational Perception of Scene Dynamics , 1996, ECCV.

[4]  P. Perona,et al.  Objects predict fixations better than early saliency. , 2008, Journal of vision.

[5]  James M. Rehg,et al.  A Scalable Approach to Activity Recognition based on Object Use , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[6]  Alex Pentland,et al.  An Interactive Computer Vision System DyPERS: Dynamic Personal Enhanced Reality System , 1999, ICVS.

[7]  D. Ballard,et al.  Eye movements in natural behavior , 2005, Trends in Cognitive Sciences.

[8]  Ali Farhadi,et al.  Understanding egocentric activities , 2011, 2011 International Conference on Computer Vision.

[9]  Xiaofeng Ren,et al.  Figure-ground segmentation improves handled object recognition in egocentric video , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Jitendra Malik,et al.  From contours to regions: An empirical evaluation , 2009, CVPR.

[11]  Martial Hebert,et al.  Temporal segmentation and activity classification from first-person sensing , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[12]  M. Hayhoe,et al.  In what ways do eye movements contribute to everyday activities? , 2001, Vision Research.

[13]  James M. Rehg,et al.  Social interactions: A first-person perspective , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Antonio Torralba,et al.  Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. , 2006, Psychological review.

[15]  Takeo Kanade,et al.  Mode-seeking by Medoidshifts , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[16]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[17]  P. Bartlett,et al.  Probabilities for SV Machines , 2000 .

[18]  Larry S. Davis,et al.  Observing Human-Object Interactions: Using Spatial and Functional Compatibility for Recognition , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Blake Hannaford,et al.  A Hybrid Discriminative/Generative Approach for Modeling Human Activities , 2005, IJCAI.

[20]  Tat-Jun Chin,et al.  The Random Cluster Model for robust geometric fitting , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[22]  T. Kanade,et al.  A Wearable Device for First Person Vision , 2011 .

[23]  Deva Ramanan,et al.  Detecting activities of daily living in first-person camera views , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Y. Chikuse Statistics on special manifolds , 2003 .

[25]  A. L. Yarbus,et al.  Eye Movements and Vision , 1967, Springer US.

[26]  Yong Jae Lee,et al.  Discovering important people and objects for egocentric video summarization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Bernt Schiele,et al.  Analyzing appearance and contour based methods for object categorization , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[28]  Andrew Zisserman,et al.  A Statistical Approach to Texture Classification from Single Images , 2004, International Journal of Computer Vision.

[29]  J. Pelz,et al.  Oculomotor behavior and perceptual strategies in complex tasks , 2001, Vision Research.

[30]  Ali Borji,et al.  Probabilistic learning of task-specific visual attention , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Takahiro Okabe,et al.  Fast unsupervised ego-action learning for first-person sports videos , 2011, CVPR 2011.

[32]  David W. Murray,et al.  Guided Sampling and Consensus for Motion Estimation , 2002, ECCV.

[33]  James M. Rehg,et al.  Learning to recognize objects in egocentric activities , 2011, CVPR 2011.