3D Human Pose Recovery from Monocular Images via Effi cient Visual Feature Selection

In this paper we propose a new examplar-based approach to recover 3D human poses from monocular images. Given the visual feature of each frame, pose retrieval is first conducted in th e examplar database to find relevant pose candidates. Then, d ynamic programming is applied on the weighted set of candidates to impose temporal coherence and recover a continuous pose sequence. We made two contributions within this framework. First, we propose to use an effi cient feature selection algorithm to select the optimal visual feature components. The feature selection task is formulated as a trace-ratio objective which measures the score of the selected feature component subset, and the objective is effi ciently optimized to get the global optimum. The selected components are used instead of the original visual feature to improve th e accuracy and effi ciency of pose recovery. As second contribution, we propose to use sparse representation to retrieve the pose ca ndidates, where the measured visual feature is expressed as a sparse linear combination of the labeled samples in the database. Sparse representation ensures that semantically similar po ses have larger probability to be retrieved. The effectiveness of our approach has been validated quantitatively through extensive evaluations on both synthetic data and real data, and qualitatively by insp ecting the results of the real time system we have implemented.

[1]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[2]  Esther M. Arkin,et al.  An efficiently computable metric for comparing polygonal shapes , 1991, SODA '90.

[3]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[5]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[6]  Thomas Serre,et al.  Object recognition with features inspired by visual cortex , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[7]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[8]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[9]  Luis Enrique Sucar,et al.  Human silhouette recognition with Fourier descriptors , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[10]  Ramakant Nevatia,et al.  Human Pose Tracking in Monocular Sequence Using Multilevel Structured Models , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Nicholas R. Howe,et al.  Silhouette lookup for monocular 3D pose tracking , 2007, Image Vis. Comput..

[12]  Ankur Agarwal,et al.  A Local Basis Representation for Estimating Human Pose from Cluttered Images , 2006, ACCV.

[13]  Ankur Agarwal,et al.  Hyperfeatures - Multilevel Local Coding for Visual Recognition , 2006, ECCV.

[14]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[15]  Yueting Zhuang,et al.  Silhouette representation and matching for 3D pose discrimination - A comparative study , 2010, Image Vis. Comput..

[16]  S. Gong,et al.  Recognising action as clouds of space-time interest points , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Thomas B. Moeslund,et al.  A Survey of Computer Vision-Based Human Motion Capture , 2001, Comput. Vis. Image Underst..

[18]  Ronen Basri,et al.  Shape Representation and Classification Using the Poisson Equation , 2006, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Paul A. Viola,et al.  Learning silhouette features for control of human motion , 2004, SIGGRAPH '04.

[20]  Ronald Poppe,et al.  Evaluating Example-based Pose Estimation: Experiments on the HumanEva Sets , 2007 .

[21]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[22]  Sudeep Sarkar,et al.  Distribution-Based Dimensionality Reduction Applied to Articulated Motion Recognition , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Xu Zhao,et al.  Generative tracking of 3D human motion by hierarchical annealed genetic algorithm , 2008, Pattern Recognit..

[24]  Trevor Darrell,et al.  Fast pose estimation with parameter-sensitive hashing , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[25]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[26]  Ming-Kuei Hu,et al.  Visual pattern recognition by moment invariants , 1962, IRE Trans. Inf. Theory.

[27]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[28]  Ralph Roskies,et al.  Fourier Descriptors for Plane Closed Curves , 1972, IEEE Transactions on Computers.

[29]  Mubarak Shah,et al.  A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[30]  Ankur Agarwal,et al.  Recovering 3D human pose from monocular images , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  D. Donoho For most large underdetermined systems of linear equations the minimal 𝓁1‐norm solution is also the sparsest solution , 2006 .

[33]  Feiping Nie,et al.  Trace Ratio Criterion for Feature Selection , 2008, AAAI.

[34]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2006, BMVC.

[35]  Mannes Poel,et al.  Comparison of silhouette shape descriptors for example-based human pose recovery , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[36]  Ronald Poppe,et al.  Vision-based human motion analysis: An overview , 2007, Comput. Vis. Image Underst..

[37]  Rómer Rosales,et al.  Combining Generative and Discriminative Models in a Framework for Articulated Pose Estimation , 2006, International Journal of Computer Vision.

[38]  Pascal Fua,et al.  Multicamera People Tracking with a Probabilistic Occupancy Map , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  HiltonAdrian,et al.  A survey of advances in vision-based human motion capture and analysis , 2006 .

[40]  Yueting Zhuang,et al.  Adaptive and compact shape descriptor by progressive feature combination and selection with boosting , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Cristian Sminchisescu,et al.  Learning Joint Top-Down and Bottom-up Processes for 3D Visual Inference , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[42]  Jitendra Malik,et al.  Efficient shape matching using shape contexts , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  David J. Fleet,et al.  Stochastic Tracking of 3 D Human Figures Using 2 D Image Motion , 2000 .

[44]  Shaogang Gong,et al.  Recognising action as clouds of space-time interest points , 2009, CVPR.