Pictorial Human Spaces: How Well Do Humans Perceive a 3D Articulated Pose?

Human motion analysis in images and video is a central computer vision problem. Yet, there are no studies that reveal how humans perceive other people in images and how accurate they are. In this paper we aim to unveil some of the processing-as well as the levels of accuracy-involved in the 3D perception of people from images by assessing the human performance. Our contributions are: (1) the construction of an experimental apparatus that relates perception and measurement, in particular the visual and kinematic performance with respect to 3D ground truth when the human subject is presented an image of a person in a given pose, (2) the creation of a dataset containing images, articulated 2D and 3D pose ground truth, as well as synchronized eye movement recordings of human subjects, shown a variety of human body configurations, both easy and difficult, as well as their 're-enacted' 3D poses, (3) quantitative analysis revealing the human performance in 3D pose re-enactment tasks, the degree of stability in the visual fixation patterns of human subjects, and the way it correlates with different poses. We also discuss the implications of our findings for the construction of visual human sensing systems.

[1]  Hsi-Jian Lee,et al.  Determination of 3D human body postures from a single view , 1985, Comput. Vis. Graph. Image Process..

[2]  David J. Fleet,et al.  Priors for people tracking from small training sets , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[3]  D. Wolpert,et al.  Principles of sensorimotor learning , 2011, Nature Reviews Neuroscience.

[4]  Cristian Sminchisescu,et al.  Variational mixture smoothing for non-linear dynamical systems , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[5]  Cristian Sminchisescu,et al.  Twin Gaussian Processes for Structured Prediction , 2010, International Journal of Computer Vision.

[6]  Ben Taskar,et al.  Cascaded Models for Articulated Pose Estimation , 2010, ECCV.

[7]  Jan J. Koenderink,et al.  Pictorial relief , 2019, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[8]  Ankur Agarwal,et al.  Recovering 3D human pose from monocular images , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Subhransu Maji,et al.  Detecting People Using Mutually Consistent Poselet Activations , 2010, ECCV.

[10]  Andrew Blake,et al.  Articulated body motion capture by annealed particle filtering , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[11]  Cristian Sminchisescu,et al.  Learning Joint Top-Down and Bottom-up Processes for 3D Visual Inference , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[12]  G. Johansson Visual perception of biological motion and a model for its analysis , 1973 .

[13]  Yi Yang,et al.  Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[14]  David J. Fleet,et al.  Human attributes from 3D pose tracking , 2010, Comput. Vis. Image Underst..

[15]  Hans-Peter Seidel,et al.  Optimization and Filtering for Human Motion Capture , 2010, International Journal of Computer Vision.

[16]  Andrew Zisserman,et al.  Pose search: Retrieving people using their pose , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Cristian Sminchisescu,et al.  Action from Still Image Dataset and Inverse Optimal Control to Learn Task Specific Visual Scanpaths , 2013, NIPS.

[18]  Michael J. Black,et al.  Viewpoint and Pose in Body-Form Adaptation , 2013, Perception.

[19]  Min Sun,et al.  Conditional regression forests for human pose estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Cristian Sminchisescu,et al.  Building Roadmaps of Minima and Transitions in Visual Models , 2004, International Journal of Computer Vision.

[21]  Juergen Gall,et al.  International Journal of Computer Vision manuscript No. (will be inserted by the editor) Optimization and Filtering for Human Motion Capture A Multi-layer Framework , 2022 .

[22]  Cristian Sminchisescu,et al.  Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Michael J. Black,et al.  Combined discriminative and generative articulated pose and non-rigid shape estimation , 2007, NIPS.

[24]  Krista A. Ehinger,et al.  Modelling search for people in 900 scenes: A combined source model of eye guidance , 2009 .

[25]  Cristian Sminchisescu,et al.  Kinematic jump processes for monocular 3D human tracking , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[26]  Bernt Schiele,et al.  Monocular 3D pose estimation and tracking by detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.