Efficient video-based retrieval of human motion with flexible alignment

We present a novel and scalable approach for retrieval and flexible alignment of 3d human motion examples given a video query. Our method efficiently searches a large set of motion capture (mocap) files accounting for speed variations in motion. To align a short video clip with a part of a longer mocap sequence, we experiment with different feature representations comparable across the two modalities. We also evaluate two different Dynamic Time Warping (DTW) approaches that allow sub-sequence matching and suggest additional local constraints for a smooth alignment. Finally, to quantify video-based mocap retrieval, we introduce a benchmark providing a novel set of per-frame action labels for 2 000 files of the CMU-mocap dataset, as well as a collection of realistic video queries taken from YouTube. Our experiments show that temporal flexibility is not only required for the correct alignment of pose and motion, but it also improves the retrieval accuracy.

[1]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[2]  Stan Salvador,et al.  FastDTW: Toward Accurate Dynamic Time Warping in Linear Time and Space , 2004 .

[3]  Meinard Müller,et al.  Information retrieval for music and motion , 2007 .

[4]  Meinard Müller,et al.  Efficient content-based retrieval of motion capture data , 2005, SIGGRAPH '05.

[5]  Tido Röder,et al.  Documentation Mocap Database HDM05 , 2007 .

[6]  Volkan Tuzcu,et al.  Dynamic time warping as a novel tool in pattern recognition of ECG changes in heart rhythm disturbances , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[7]  Chao-Hung Lin,et al.  Human Motion Retrieval from Hand-Drawn Sketch , 2012, IEEE Transactions on Visualization and Computer Graphics.

[8]  Claus Bahlmann,et al.  Online handwriting recognition with support vector machines - a kernel approach , 2002, Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition.

[9]  Mark Craven,et al.  Similarity Queries for Temporal Toxicogenomic Expression Profiles , 2008, PLoS Comput. Biol..

[10]  Lucas Kovar,et al.  Automated extraction and parameterization of motions in large data sets , 2004, ACM Trans. Graph..

[11]  Enrique Vidal,et al.  Computation of Normalized Edit Distance and Applications , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Frédéric Bimbot,et al.  Variability Tolerant Audio Motif Discovery , 2009, MMM.

[14]  Cordelia Schmid,et al.  Event Retrieval in Large Video Collections with Circulant Temporal Encoding , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Cordelia Schmid,et al.  Towards Understanding Action Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[16]  Paul A. Viola,et al.  Learning silhouette features for control of human motion , 2004, SIGGRAPH '04.

[17]  Leow Wee Kheng Motion Tracking , 2017, Encyclopedia of GIS.

[18]  Cristian Sminchisescu,et al.  Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Tae-Kyun Kim,et al.  Unconstrained Monocular 3D Human Pose Estimation by Action Detection and Cross-Modality Regression Forest , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Trevor Darrell,et al.  Fast pose estimation with parameter-sensitive hashing , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[21]  James J. Little,et al.  3D Pose from Motion for Cross-View Action Recognition via Non-linear Circulant Temporal Encoding , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[23]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[24]  Fernando De la Torre,et al.  Generalized Canonical Time Warping , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Takeo Igarashi,et al.  Retrieval and Visualization of Human Motion Data via Stick Figures , 2012, Comput. Graph. Forum.

[26]  James Davis,et al.  Motion capture data retrieval using an artist’s doll , 2008, 2008 19th International Conference on Pattern Recognition.

[27]  Xavier Anguera Miró,et al.  Memory efficient subsequence DTW for Query-by-Example Spoken Term Detection , 2013, 2013 IEEE International Conference on Multimedia and Expo (ICME).

[28]  Luc Van Gool,et al.  Coupled Action Recognition and Pose Estimation from Multiple Views , 2012, International Journal of Computer Vision.

[29]  Yi Yang,et al.  Articulated Human Detection with Flexible Mixtures of Parts , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[31]  Rémi Ronfard,et al.  Free viewpoint action recognition using motion history volumes , 2006, Comput. Vis. Image Underst..

[32]  Martin Vingron,et al.  Development and application of a modified dynamic time warping algorithm (DTW-S) to analyses of primate brain expression time series , 2011, BMC Bioinformatics.

[33]  Atsushi Nakazawa,et al.  A puppet interface for retrieval of motion capture data , 2011, SCA '11.

[34]  Norman I. Badler,et al.  Efficient motion retrieval in large motion databases , 2013, I3D '13.

[35]  Tak-Chung Fu,et al.  Stock time series pattern matching: Template-based vs. rule-based approaches , 2007, Eng. Appl. Artif. Intell..

[36]  Mark Craven,et al.  Clustered alignments of gene-expression time series data , 2009, Bioinform..

[37]  Björn Krüger,et al.  Motion Tracking, Retrieval and 3D Reconstruction from Video , 2014, MUE 2014.