Gamesourcing to acquire labeled human pose estimation data

In this paper, we present a gamesourcing method for automatically and rapidly acquiring labeled images of human poses to obtain ground truth data as input for human pose estimation from 2D images. Typically, these datasets are constructed manually through a tedious process of clicking on joint locations in images. By using a low-cost RGBD sensor, we capture synchronized, registered images, depth maps, and skeletons of users playing a movement-based game and automatically filter the data to keep a subset of unique poses. Using a recently-developed, learning-based human pose estimation method, we demonstrate how data collected in this manner is as suitable for use as training data as existing, manually-constructed data sets.

[1]  Albert A. Rizzo,et al.  FAAST: The Flexible Action and Articulated Skeleton Toolkit , 2011, 2011 IEEE Virtual Reality Conference.

[2]  Andrew Zisserman,et al.  Progressive search space reduction for human pose estimation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[4]  Zachary Wartell,et al.  Astrojumper: Motivating Exercise with an Immersive Virtual Reality Exergame , 2011, PRESENCE: Teleoperators and Virtual Environments.

[5]  Mark Everingham,et al.  Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation , 2010, BMVC.

[6]  Mark Everingham,et al.  Learning effective human pose estimation from inaccurate annotation , 2011, CVPR 2011.

[7]  Manuel Blum,et al.  Peekaboom: a game for locating objects in images , 2006, CHI.

[8]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[9]  Cristian Sminchisescu,et al.  Discriminative density propagation for 3D human motion estimation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[10]  Jane Yung-jen Hsu,et al.  PhotoSlap: A Multi-player Online Game for Semantic Annotation , 2007, AAAI.

[11]  Deva Ramanan,et al.  Learning to parse images of articulated bodies , 2006, NIPS.

[12]  Jitendra Malik,et al.  Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[13]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[14]  Yi Yang,et al.  Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[15]  Vittorio Ferrari,et al.  Better Appearance Models for Pictorial Structures , 2009, BMVC.

[16]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Rómer Rosales,et al.  Learning Body Pose via Specialized Maps , 2001, NIPS.