The TUM Kitchen Data Set of everyday manipulation activities for motion tracking and action recognition

We introduce the publicly available TUM Kitchen Data Set as a comprehensive collection of activity sequences recorded in a kitchen environment equipped with multiple complementary sensors. The recorded data consists of observations of naturally performed manipulation tasks as encountered in everyday activities of human life. Several instances of a table-setting task were performed by different subjects, involving the manipulation of objects and the environment. We provide the original video sequences, full-body motion capture data recorded by a markerless motion tracker, RFID tag readings and magnetic sensor readings from objects and the environment, as well as corresponding action labels. In this paper, we both describe how the data was computed, in particular the motion tracker and the labeling, and give examples what it can be used for. We present first results of an automatic method for segmenting the observed motions into semantic classes, and describe how the data can be integrated in a knowledge-based framework for reasoning about the observations.

[1]  Michael Isard,et al.  Partitioned Sampling, Articulated Objects, and Interface-Quality Hand Tracking , 2000, ECCV.

[2]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Trevor Darrell,et al.  Inferring 3D structure with a statistical image-based shape model , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[4]  Ian D. Reid,et al.  Articulated Body Motion Capture by Stochastic Search , 2005, International Journal of Computer Vision.

[5]  B. Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[6]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[7]  Yaser Sheikh,et al.  Exploring the space of a human action , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[8]  Luc Van Gool,et al.  Markerless tracking of complex human motions from multiple views , 2006, Comput. Vis. Image Underst..

[9]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset for Evaluation of Articulated Human Motion , 2006 .

[10]  Michael J. Witbrock,et al.  An Introduction to the Syntax and Content of Cyc , 2006, AAAI Spring Symposium: Formalizing and Compiling Background Knowledge and Its Applications to Knowledge Representation and Question Answering.

[11]  David J. Fleet,et al.  3D People Tracking with Gaussian Process Dynamical Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[12]  Christopher W. Geib,et al.  The meaning of action: a review on action recognition and mapping , 2007, Adv. Robotics.

[13]  Tamim Asfour,et al.  Toward an Unified Representation for Imitation of Human Motion on Humanoids , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[14]  Nico Blodow,et al.  The Assistive Kitchen — A demonstration scenario for cognitive technical systems , 2007, RO-MAN 2008 - The 17th IEEE International Symposium on Robot and Human Interactive Communication.

[15]  Dana Kulic,et al.  Incremental Learning, Clustering and Hierarchy Formation of Whole Body Motion Patterns using Adaptive Hidden Markov Chains , 2008, Int. J. Robotics Res..

[16]  Cristian Sminchisescu,et al.  Fast algorithms for large scale conditional 3D prediction , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Michael Beetz,et al.  Evaluation of Hierarchical Sampling Strategies in 3D Human Pose Estimation , 2008, BMVC.

[18]  Moritz Tenorth,et al.  KNOWROB — knowledge processing for autonomous personal robots , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[19]  Michael Beetz,et al.  Tracking humans interacting with the environment using efficient hierarchical sampling and layered observation models , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.