JAR-Aibo: A Multi-view Dataset for Evaluation of Model-Free Action Recognition Systems

We present a novel multi-view dataset for evaluating model-free action recognition systems. Superior to existing datasets, it covers 56 distinct action classes. Each of them was performed ten times by remotely controlled Sony ERS-7 AIBO robot dogs observed by six distributed and synchronized cameras at 17 fps and VGA resolution. In total, our dataset contains 576 sequences. Baseline results show its applicability for benchmarking model-free action recognition methods.

[1]  Ioannis Pitas,et al.  The i3DPost Multi-View and 3D Human Action/Interaction Database , 2009, 2009 Conference for Visual Media Production.

[2]  Mario Cannataro,et al.  Protein-to-protein interactions: Technologies, databases, and algorithms , 2010, CSUR.

[3]  Robert B. Fisher,et al.  The BEHAVE video dataset: ground truthed video for multi-person behavior classification , 2010 .

[4]  Jiebo Luo,et al.  Recognizing realistic actions from videos “in the wild” , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Ian D. Reid,et al.  High Five: Recognising human interactions in TV shows , 2010, BMVC.

[6]  Ying Wang,et al.  Human Activity Recognition Based on R Transform , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Mubarak Shah,et al.  Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  David Elliott,et al.  In the Wild , 2010 .

[9]  Antonio Fernández-Caballero,et al.  A survey of video datasets for human action and activity recognition , 2013, Comput. Vis. Image Underst..

[10]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[11]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[12]  Ronen Basri,et al.  Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Appearance Based Pose Estimation of AIBO ’ s , 2004 .

[14]  Hossein Ragheb,et al.  MuHAVi: A Multicamera Human Action Video Dataset for the Evaluation of Action Recognition Methods , 2010, 2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance.

[15]  Rémi Ronfard,et al.  Action Recognition from Arbitrary Views using 3D Exemplars , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[16]  Robert B. Fisher,et al.  The PETS04 Surveillance Ground-Truth Data Sets , 2004 .

[17]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[18]  Joachim Denzler,et al.  Temporal Self-Similarity for Appearance-Based Action Recognition in Multi-View Setups , 2013, CAIP.

[19]  Cordelia Schmid,et al.  Actions in context , 2009, CVPR.

[20]  Bir Bhanu,et al.  VideoWeb Dataset for Multi-camera Activities and Non-verbal Communication , 2011 .

[21]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.