A Sampling Approach to Generating Closely Interacting 3D Pose-Pairs from 2D Annotations

We introduce a data-driven method to generate a large number of plausible, closely interacting 3D human pose-pairs, for a given motion category, e.g., wrestling or salsa dance. With much difficulty in acquiring close interactions using 3D sensors, our approach utilizes abundant existing video data which cover many human activities. Instead of treating the data generation problem as one of reconstruction, either through 3D acquisition or direct 2D-to-3D data lifting from video annotations, we present a solution based on Markov Chain Monte Carlo (MCMC) sampling. Given a motion category and a set of video frames depicting the motion with the 2D pose-pair in each frame annotated, we start the sampling with one or few seed 3D pose-pairs which are manually created based on the target motion category. The initial set is then augmented by MCMC sampling around the seeds, via the Metropolis-Hastings algorithm and guided by a probability density function (PDF) that is defined by two terms to bias the sampling towards 3D pose-pairs that are physically valid and plausible for the motion category. With a focus on efficient sampling over the space of close interactions, rather than pose spaces, we develop a novel representation called interaction coordinates (IC) to encode both poses and their interactions in an integrated manner. Plausibility of a 3D pose-pair is then defined based on the IC and with respect to the annotated 2D pose-pairs from video. We show that our sampling-based approach is able to efficiently synthesize a large volume of plausible, closely interacting 3D pose-pairs which provide a good coverage of the input 2D pose-pairs.

[1]  Camillo J. Taylor,et al.  Reconstruction of Articulated Objects from Point Correspondences in a Single Uncalibrated Image , 2000, Comput. Vis. Image Underst..

[2]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Xiaolin K. Wei,et al.  VideoMocap: modeling physically realistic human motion from monocular video sequences , 2010, ACM Trans. Graph..

[4]  Taku Komura,et al.  Interaction patches for multi-character animation , 2008, ACM Trans. Graph..

[5]  Ligang Liu,et al.  Interaction context (ICON) , 2015, ACM Trans. Graph..

[6]  Pat Hanrahan,et al.  SceneGrok: inferring action maps in 3D environments , 2014, ACM Trans. Graph..

[7]  Jessica K. Hodgins,et al.  Video-based 3D motion capture through biped control , 2012, ACM Trans. Graph..

[8]  Xiaowei Zhou,et al.  3D Shape Reconstruction from 2D Landmarks: A Convex Formulation , 2014, ArXiv.

[9]  Michiel van de Panne,et al.  Motion synthesis by example , 1996 .

[10]  D. Roetenberg,et al.  Xsens MVN: Full 6DOF Human Motion Tracking Using Miniature Inertial Sensors , 2009 .

[11]  Qionghai Dai,et al.  Video-based hand manipulation capture through composite motion control , 2013, ACM Trans. Graph..

[12]  Hans-Peter Seidel,et al.  Markerless motion capture of interacting characters using multi-view image segmentation , 2011, CVPR 2011.

[13]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[14]  Qionghai Dai,et al.  Performance Capture of Interacting Characters with Handheld Kinects , 2012, ECCV.

[15]  Peter V. Gehler,et al.  DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  T. Kanade,et al.  Reconstructing 3D Human Pose from 2D Image Landmarks , 2012, ECCV.

[17]  C. Karen Liu,et al.  Performance capture with physical interaction , 2010, SCA '10.

[18]  C. Karen Liu,et al.  Leveraging depth cameras and wearable pressure sensors for full-body kinematics and dynamics capture , 2014, ACM Trans. Graph..

[19]  Jehee Lee,et al.  Synchronized multi-character motion editing , 2009, ACM Trans. Graph..

[20]  Taesoo Kwon,et al.  Two-Character Motion Analysis and Synthesis , 2008, IEEE Transactions on Visualization and Computer Graphics.

[21]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[22]  Hans-Peter Seidel,et al.  A data-driven approach for real-time full body pose reconstruction from a depth camera , 2011, 2011 International Conference on Computer Vision.

[23]  Cristian Sminchisescu,et al.  Discriminative density propagation for 3D human motion estimation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[24]  C. Karen Liu,et al.  Composition of complex optimal multi-character motions , 2006, SCA '06.

[25]  Edmond S. L. Ho,et al.  Spatial relationship preserving character motion adaptation , 2010, ACM Trans. Graph..

[26]  Jessica K. Hodgins,et al.  Generating and ranking diverse multi-character interactions , 2014, ACM Trans. Graph..

[27]  Taku Komura,et al.  Character Motion Synthesis by Topology Coordinates , 2009, Comput. Graph. Forum.

[28]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[29]  Andrew Blake,et al.  Articulated body motion capture by annealed particle filtering , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[30]  Jinxiang Chai,et al.  Accurate realtime full-body motion capture using a single depth camera , 2012, ACM Trans. Graph..

[31]  山田 祐,et al.  Open Dynamics Engine を用いたスノーボードロボットシミュレータの開発 , 2007 .

[32]  Peter V. Gehler,et al.  Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image , 2016, ECCV.

[33]  K. S. Arun,et al.  Least-Squares Fitting of Two 3-D Point Sets , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Taku Komura,et al.  Indexing and Retrieving Motions of Characters in Close Contact , 2009, IEEE Transactions on Visualization and Computer Graphics.

[35]  Youjie Zhou,et al.  Pose Locality Constrained Representation for 3D Human Pose Reconstruction , 2014, ECCV.

[36]  J R Morris,et al.  Accelerometry--a technique for the measurement of human body movements. , 1973, Journal of biomechanics.

[37]  Michael J. Black,et al.  SMPL: A Skinned Multi-Person Linear Model , 2023 .

[38]  Jehee Lee,et al.  Motion Grammars for Character Animation , 2016, Comput. Graph. Forum.

[39]  F. Raab,et al.  Magnetic Position and Orientation Tracking System , 1979, IEEE Transactions on Aerospace and Electronic Systems.

[40]  Michael J. Black,et al.  Pose-conditioned joint angle limits for 3D human pose reconstruction , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Yoonsang Lee,et al.  Data-driven biped control , 2010, ACM Trans. Graph..

[42]  Leonidas J. Guibas,et al.  Shape2Pose , 2014, ACM Trans. Graph..

[43]  Jehee Lee,et al.  Tiling Motion Patches , 2013, IEEE Transactions on Visualization and Computer Graphics.

[44]  Cristian Sminchisescu,et al.  Covariance scaled sampling for monocular 3D body tracking , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.