Panoptic Studio: A Massively Multiview System for Social Motion Capture

We present an approach to capture the 3D structure and motion of a group of people engaged in a social interaction. The core challenges in capturing social interactions are: (1) occlusion is functional and frequent, (2) subtle motion needs to be measured over a space large enough to host a social group, and (3) human appearance and configuration variation is immense. The Panoptic Studio is a system organized around the thesis that social interactions should be measured through the perceptual integration of a large variety of view points. We present a modularized system designed around this principle, consisting of integrated structural, hardware, and software innovations. The system takes, as input, 480 synchronized video streams of multiple people engaged in social activities, and produces, as output, the labeled time-varying 3D structure of anatomical landmarks on individuals in the space. The algorithmic contributions include a hierarchical approach for generating skeletal trajectory proposals, and an optimization framework for skeletal reconstruction with trajectory re-association.

[1]  C. Darwin The Expression of the Emotions in Man and Animals , .

[2]  E. Sapir The unconscious patterning of behavior in society. , 1927 .

[3]  R. D. Lockhart,et al.  The Human Figure in Motion , 1957 .

[4]  A. L. I︠A︡rbus Eye Movements and Vision , 1967 .

[5]  A. L. Yarbus,et al.  Eye Movements and Vision , 1967, Springer US.

[6]  R. Birdwhistell Kinesics and Context: Essays on Body Motion Communication , 1971 .

[7]  C. Izard The face of emotion , 1971 .

[8]  W. S. Condon,et al.  Synchrony demonstrated between movements of the neonate and adult speech. , 1974, Child development.

[9]  T. Brazelton,et al.  The origins of reciprocity : The early mother-infant interaction , 1974 .

[10]  Robert Jan. Williams,et al.  The Geometrical Foundation of Natural Structure: A Source Book of Design , 1979 .

[11]  W. Güth,et al.  An experimental analysis of ultimatum bargaining , 1982 .

[12]  Takeo Kanade,et al.  Virtualized Reality: Constructing Virtual Worlds from Real Scenes , 1997, IEEE Multim..

[13]  Ramesh Raskar,et al.  Image-based visual hulls , 2000, SIGGRAPH.

[14]  Takashi Matsuyama,et al.  Generation, visualization, and editing of 3D video , 2002, Proceedings. First International Symposium on 3D Data Processing Visualization and Transmission.

[15]  Luc Van Gool,et al.  Blue-c: a spatially immersive display and 3D video portal for telepresence , 2003, IPT/EGVE.

[16]  Sidharth Bhatia,et al.  Tracking loose-limbed people , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[17]  H. Meeren,et al.  Rapid perceptual integration of facial expression and emotional body language. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Sophie Jörg,et al.  Evaluating the emotional content of human motions on real and virtual characters , 2008, APGV '08.

[19]  Hans-Peter Seidel,et al.  Performance capture from sparse multi-view video , 2008, SIGGRAPH 2008.

[20]  Hans-Peter Seidel,et al.  Performance capture from sparse multi-view video , 2008, ACM Trans. Graph..

[21]  Wojciech Matusik,et al.  Articulated mesh animation from multi-view silhouettes , 2008, ACM Trans. Graph..

[22]  Ananish Chaudhuri Experiments in Economics : Playing fair with money , 2009 .

[23]  Bruno Raffin,et al.  Virtualization gate , 2009, SIGGRAPH '09.

[24]  Hans-Peter Seidel,et al.  Motion capture using joint skeleton tracking and surface estimation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Jean Ponce,et al.  Dense 3D motion capture from synchronized video streams , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[27]  Dariu Gavrila,et al.  Multi-view 3D Human Pose Estimation in Complex Environment , 2011, International Journal of Computer Vision.

[28]  Hans-Peter Seidel,et al.  Fast articulated motion tracking using a sums of Gaussians body model , 2011, 2011 International Conference on Computer Vision.

[29]  Hans-Peter Seidel,et al.  A data-driven approach for real-time full body pose reconstruction from a depth camera , 2011, 2011 International Conference on Computer Vision.

[30]  James M. Rehg,et al.  Social interactions: A first-person perspective , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Y. Trope,et al.  Body Cues, Not Facial Expressions, Discriminate Between Intense Positive and Negative Emotions , 2012, Science.

[32]  Yi Yang,et al.  Recognizing proxemics in personal photos , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Hans-Peter Seidel,et al.  Spatio-temporal motion tracking with unsynchronized cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Fei-Fei Li,et al.  Social Role Discovery in Human Events , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Yi Yang,et al.  Articulated Human Detection with Flexible Mixtures of Parts , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Yaser Sheikh,et al.  Predicting Primary Gaze Behavior Using Social Saliency Fields , 2013, 2013 IEEE International Conference on Computer Vision.

[37]  Stefan Carlsson,et al.  3D Pictorial Structures for Multiple View Articulated Pose Estimation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Hans-Peter Seidel,et al.  Markerless Motion Capture of Multiple Characters Using Multiview Image Segmentation , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Hui Cheng,et al.  3D Visual Proxemics: Recognizing Human Interactions in 3D from a Single Image , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Yaser Sheikh,et al.  MAP Visibility Estimation for Large-Scale Dynamic 3D Reconstruction , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Nassir Navab,et al.  3D Pictorial Structures for Multiple Human Pose Estimation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Fei-Fei Li,et al.  Socially-Aware Large-Scale Crowd Forecasting , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Yaser Sheikh,et al.  Automatic editing of footage from multiple social cameras , 2014, ACM Trans. Graph..

[44]  Katherine B. Martin,et al.  Facial Action Coding System , 2015 .