Human pose recovery by supervised spectral embedding

In this paper we propose a subspace learning algorithm based on supervised manifold learning techniques to address the problem of inferring 3D human poses from monocular video frames. Low-dimensional representations of visual features are computed via spectral embedding, regularized by the pairwise relationship of poses for simultaneously preserving the locality in the feature space and taking account of similarities in the pose space. To deal with the "out-of-sample" problem, we obtain a global linear projection from the embedding whereby the Euclidean distances between transformed feature vectors can faithfully reflect the corresponding pose distances. To retrieve the most similar candidate from the exemplar database, weighted sum of Euclidean distances of features is employed to achieve better accuracy instead of simply summing up the squared distances of all feature types. The experimental results on HumanEva dataset validate the efficacy of our proposed method.

[1]  Jun Yu,et al.  Modern Machine Learning Techniques and Their Applications in Cartoon Animation Research , 2013 .

[2]  Chiraz BenAbdelkader Robust Head Pose Estimation Using Supervised Manifold Learning , 2010, ECCV.

[3]  Ronald Poppe,et al.  Vision-based human motion analysis: An overview , 2007, Comput. Vis. Image Underst..

[4]  Meng Wang,et al.  Event Driven Web Video Summarization by Tag Localization and Key-Shot Identification , 2012, IEEE Transactions on Multimedia.

[5]  Shuicheng Yan,et al.  Neighborhood preserving embedding , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[6]  Xuelong Li,et al.  General Tensor Discriminant Analysis and Gabor Features for Gait Recognition , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Jitendra Malik,et al.  Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[8]  Yue Gao,et al.  View-Based Discriminative Probabilistic Modeling for 3D Object Retrieval and Recognition , 2013, IEEE Transactions on Image Processing.

[9]  A. Elgammal,et al.  Inferring 3D body pose from silhouettes using activity manifold learning , 2004, CVPR 2004.

[10]  Xiaoqing Ding,et al.  MiLDA: A graph embedding approach to multi-view face recognition , 2015, Neurocomputing.

[11]  Paul A. Viola,et al.  Learning silhouette features for control of human motion , 2004, SIGGRAPH '04.

[12]  Bernt Schiele,et al.  Pictorial structures revisited: People detection and articulated pose estimation , 2009, CVPR.

[13]  Jun Yu,et al.  Click Prediction for Web Image Reranking Using Multimodal Sparse Coding , 2014, IEEE Transactions on Image Processing.

[14]  WangMeng,et al.  Beyond distance measurement , 2009 .

[15]  Nicolas Le Roux,et al.  Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering , 2003, NIPS.

[16]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[17]  Meng Wang,et al.  Beyond Distance Measurement: Constructing Neighborhood Similarity for Video Annotation , 2009, IEEE Transactions on Multimedia.

[18]  Dacheng Tao,et al.  Fick’s Law Assisted Propagation for Semisupervised Learning , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[19]  Bingbing Ni,et al.  Assistive tagging: A survey of multimedia tagging with human-computer joint exploration , 2012, CSUR.

[20]  Y. Rui,et al.  Learning to Rank Using User Clicks and Visual Features for Image Retrieval , 2015, IEEE Transactions on Cybernetics.

[21]  Jun Yu,et al.  Complex Object Correspondence Construction in Two-Dimensional Animation , 2011, IEEE Transactions on Image Processing.

[22]  Jiawei Han,et al.  Isometric Projection , 2007, AAAI.

[23]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[24]  Nicholas R. Howe,et al.  A recognition-based motion capture baseline on the HumanEva II test data , 2011, Machine Vision and Applications.

[25]  Meng Wang,et al.  Semisupervised Multiview Distance Metric Learning for Cartoon Synthesis , 2012, IEEE Transactions on Image Processing.

[26]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[27]  Hal Daumé,et al.  Co-regularized Multi-view Spectral Clustering , 2011, NIPS.

[28]  Naohiro Ishii,et al.  Combining Multiple k-Nearest Neighbor Classifiers Using Different Distance Functions , 2004, IDEAL.

[29]  Trevor Darrell,et al.  Fast pose estimation with parameter-sensitive hashing , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[30]  Yi Yang,et al.  3D human pose recovery from image by efficient visual feature selection , 2011, Comput. Vis. Image Underst..

[31]  Meng Wang,et al.  Joint Learning of Labels and Distance Metric , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[32]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset for Evaluation of Articulated Human Motion , 2006 .

[33]  Yongdong Zhang,et al.  Multiview Spectral Embedding , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[34]  Hong Bao,et al.  Video-Based Human Motion Analysis , 2011 .

[35]  Nicolas Garc ´ õa-Pedrajas Boosting k-Nearest Neighbor Classifier by Means of Input Space Projection , 2008 .

[36]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[37]  Yuan Yan Tang,et al.  NNMap: A method to construct a good embedding for nearest neighbor classification , 2015, Neurocomputing.

[38]  Ankur Agarwal,et al.  Recovering 3D human pose from monocular images , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Xuelong Li,et al.  Geometric Mean for Subspace Selection , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Bing Liu,et al.  Extreme spectral regression for efficient regularized subspace learning , 2015, Neurocomputing.

[41]  Markus Hagenbuchner,et al.  Spectral embedding based facial expression recognition with multiple features , 2014, Neurocomputing.

[42]  Meng Wang,et al.  In-video product annotation with web information mining , 2012, TOMCCAP.

[43]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[44]  Dacheng Tao,et al.  Multi-View Intact Space Learning , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Xuelong Li,et al.  Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.