论文信息 - Pose Embeddings: A Deep Architecture for Learning to Match Human Poses

Pose Embeddings: A Deep Architecture for Learning to Match Human Poses

We present a method for learning an embedding that places images of humans in similar poses nearby. This embedding can be used as a direct method of comparing images based on human pose, avoiding potential challenges of estimating body joint positions. Pose embedding learning is formulated under a triplet-based distance criterion. A deep architecture is used to allow learning of a representation capable of making distinctions between different poses. Experiments on human pose matching and retrieval from video data demonstrate the potential of the method.

[1] Dariu Gavrila,et al. Pedestrian Detection from a Moving Vehicle , 2000, ECCV.

[2] Jitendra Malik,et al. Estimating Human Body Configurations Using Shape Context Matching , 2002, ECCV.

[3] Michael I. Jordan,et al. Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[4] Trevor Darrell,et al. Fast pose estimation with parameter-sensitive hashing , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[5] David A. Forsyth,et al. Automatic Annotation of Everyday Movements , 2003, NIPS.

[6] Thorsten Joachims,et al. Learning a Distance Metric from Relative Comparisons , 2003, NIPS.

[7] Geoffrey E. Hinton,et al. Neighbourhood Components Analysis , 2004, NIPS.

[8] Ankur Agarwal,et al. 3D human pose from silhouettes by relevance vector regression , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[9] Daniel P. Huttenlocher,et al. Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[10] V. Pavlovic. Model-based motion clustering using boosted mixture modeling , 2004, CVPR 2004.

[11] George Kollios,et al. BoostMap: A method for efficient approximate similarity rankings , 2004, CVPR 2004.

[12] Kilian Q. Weinberger,et al. Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[13] Mubarak Shah,et al. Recognizing human actions , 2005, VSSN@MM.

[14] Jitendra Malik,et al. Image Retrieval and Classification Using Local Distance Functions , 2006, NIPS.

[15] David J. Fleet,et al. 3D People Tracking with Gaussian Process Dynamical Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[16] Jitendra Malik,et al. Learning Globally-Consistent Local Distance Functions for Shape-Based Image Retrieval and Classification , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[17] Prateek Jain,et al. Fast image search for learned metrics , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[18] Nazli Ikizler-Cinbis,et al. Learning actions from the Web , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[19] Jitendra Malik,et al. Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[20] Larry S. Davis,et al. Recognizing actions by shape-motion prototype trees , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[21] Andrew Zisserman,et al. Pose search: Retrieving people using their pose , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[22] Cordelia Schmid,et al. Multiple Instance Metric Learning from Automatically Labeled Bags of Faces , 2010, ECCV.

[23] Fei-Fei Li,et al. Modeling mutual context of object and human pose in human-object interaction activities , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[24] Christoph Bregler,et al. Pose-Sensitive Embedding by Nonlinear NCA Regression , 2010, NIPS.

[25] Trevor Darrell,et al. The NBNN kernel , 2011, 2011 International Conference on Computer Vision.

[26] Jiayan Jiang,et al. Learning a mixture of sparse distance metrics for classification and dimensionality reduction , 2011, 2011 International Conference on Computer Vision.

[27] Mark Everingham,et al. Learning effective human pose estimation from inaccurate annotation , 2011, CVPR 2011.

[28] Cristian Sminchisescu,et al. Latent structured models for human pose estimation , 2011, 2011 International Conference on Computer Vision.

[29] Yi Yang,et al. Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[30] David J. Fleet,et al. Hamming Distance Metric Learning , 2012, NIPS.

[31] Ben Taskar,et al. MODEC: Multimodal Decomposable Models for Human Pose Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[32] Gabriela Csurka,et al. Distance-Based Image Classification: Generalizing to New Classes at Near-Zero Cost , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33] Peter V. Gehler,et al. Poselet Conditioned Pictorial Structures , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[34] Jonathan Tompson,et al. Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation , 2014, NIPS.

[35] Silvio Savarese,et al. Discovering Groups of People in Images , 2014, ECCV.

[36] Christian Szegedy,et al. DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[37] Fei-Fei Li,et al. Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[38] Bernt Schiele,et al. 2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[39] Yang Song,et al. Learning Fine-Grained Image Similarity with Deep Ranking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[40] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.