Making a Case for Learning Motion Representations with Phase

This work advocates Eulerian motion representation learning over the current standard Lagrangian optical flow model. Eulerian motion is well captured by using phase, as obtained by decomposing the image through a complex-steerable pyramid. We discuss the gain of Eulerian motion in a set of practical use cases: (i) action recognition, (ii) motion prediction in static images, (iii) motion transfer in static images and, (iv) motion transfer in video. For each task we motivate the phase-based direction and provide a possible approach.

[1]  Frédo Durand,et al.  Image-space modal bases for plausible manipulation of objects in video , 2015, ACM Trans. Graph..

[2]  Nitish Srivastava,et al.  Unsupervised Learning of Video Representations using LSTMs , 2015, ICML.

[3]  Luc Van Gool,et al.  Efficient Two-Stream Motion and Appearance 3D CNNs for Video Classification , 2016, ArXiv.

[4]  Tinne Tuytelaars,et al.  Modeling video evolution for action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Limin Wang,et al.  Action recognition with trajectory-pooled deep-convolutional descriptors , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Cordelia Schmid,et al.  Action and Event Recognition with Fisher Vectors on a Compact Feature Set , 2013, 2013 IEEE International Conference on Computer Vision.

[7]  Edward H. Adelson,et al.  Shiftable multiscale transforms , 1992, IEEE Trans. Inf. Theory.

[8]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Andrew Zisserman,et al.  Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Lorenzo Torresani,et al.  C3D: Generic Features for Video Analysis , 2014, ArXiv.

[11]  Max Grosse,et al.  Phase-based frame interpolation for video , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Frédo Durand,et al.  Eulerian video magnification for revealing subtle changes in the world , 2012, ACM Trans. Graph..

[13]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[14]  David J. Fleet,et al.  Computation of component image velocity from local phase information , 1990, International Journal of Computer Vision.

[15]  Martial Hebert,et al.  Dense Optical Flow Prediction from a Static Image , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[16]  Arnold W. M. Smeulders,et al.  Déjà Vu: - Motion Prediction in Static Images , 2018, ECCV.

[17]  Antonio Torralba,et al.  Anticipating the future by watching unlabeled video , 2015, ArXiv.

[18]  Julian F. P. Kooij,et al.  Depth-Aware Motion Magnification , 2016, ECCV.

[19]  Patrick Bouthemy,et al.  Better Exploiting Motion for Better Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  William T. Freeman William T. Freeman , 1998 .

[21]  Rob Fergus,et al.  Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks , 2015, NIPS.

[22]  Cees Snoek,et al.  APT: Action localization proposals from dense trajectories , 2015, BMVC.

[23]  Edward H. Adelson,et al.  The Design and Use of Steerable Filters , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[25]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Cees Snoek,et al.  VideoLSTM convolves, attends and flows for action recognition , 2016, Comput. Vis. Image Underst..

[27]  Marc M. Van Hulle,et al.  A phase-based approach to the estimation of the optical flow field using spatial filtering , 2002, IEEE Trans. Neural Networks.

[28]  Ivan Laptev,et al.  Recognizing human actions in still images: a study of bag-of-features and part-based representations , 2010, BMVC.

[29]  Frédo Durand,et al.  Modal identification of simple structures with high-speed video using motion magnification , 2015 .

[30]  Leon A. Gatys,et al.  Preserving Color in Neural Artistic Style Transfer , 2016, ArXiv.

[31]  Thomas Brox,et al.  Artistic Style Transfer for Videos , 2016, GCPR.

[32]  Frédo Durand,et al.  Phase-based video motion processing , 2013, ACM Trans. Graph..

[33]  Leon A. Gatys,et al.  A Neural Algorithm of Artistic Style , 2015, ArXiv.