Computer Vision – ECCV 2016 Workshops

Recently, convolutional networks (convnets) have proven useful for predicting optical flow. Much of this success is predicated on the availability of large datasets that require expensive and involved data acquisition and laborious labeling. To bypass these challenges, we propose an unsupervised approach (i.e., without leveraging groundtruth flow) to train a convnet end-to-end for predicting optical flow between two images. We use a loss function that combines a data term that measures photometric constancy over time with a spatial term that models the expected variation of flow across the image. Together these losses form a proxy measure for losses based on the groundtruth flow. Empirically, we show that a strong convnet baseline trained with the proposed unsupervised approach outperforms the same network trained with supervision on the KITTI dataset.

[1]  Jianbo Shi,et al.  Force from Motion: Decoding Physical Sensation in a First Person Video , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Yoichi Sato,et al.  Recognizing Micro-Actions and Reactions from Paired Egocentric Videos , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Xiu-Shen Wei,et al.  Mask-CNN: Localizing Parts and Selecting Descriptors for Fine-Grained Image Recognition , 2016, ArXiv.

[4]  Zhigang Deng,et al.  Analysis of emotion recognition using facial expressions, speech and multimodal information , 2004, ICMI '04.

[5]  Jun Wang,et al.  A 3D facial expression database for facial behavior research , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[6]  Mubarak Shah,et al.  Human Pose Estimation in Videos , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[7]  Andrea Vedaldi,et al.  MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[8]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[9]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Cordelia Schmid,et al.  Dense Trajectories and Motion Boundary Descriptors for Action Recognition , 2013, International Journal of Computer Vision.

[11]  Larry H. Matthies,et al.  Pooled motion features for first-person videos , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Pavel Krsek,et al.  The Trimmed Iterative Closest Point algorithm , 2002, Object recognition supported by user interaction for service robots.

[13]  Amit K. Roy-Chowdhury,et al.  Continuous Learning of Human Activity Models Using Deep Nets , 2014, ECCV.

[14]  Mohamed R. Amer,et al.  HiRF: Hierarchical Random Field for Collective Activity Recognition in Videos , 2014, ECCV.

[15]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[16]  Ming-Hsuan Yang,et al.  Real-Time Exemplar-Based Face Sketch Synthesis , 2014, ECCV.

[17]  Jorge Nocedal,et al.  Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization , 1997, TOMS.

[18]  Andrew Zisserman,et al.  Flowing ConvNets for Human Pose Estimation in Videos , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19]  Stan Sclaroff,et al.  Learning Activity Progression in LSTMs for Activity Detection and Early Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Tomás Pajdla,et al.  3D with Kinect , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[21]  Arman Savran,et al.  Bosphorus Database for 3D Face Analysis , 2008, BIOID.

[22]  Bingbing Ni,et al.  Interaction part mining: A mid-level approach for fine-grained action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[24]  C. R. Rao,et al.  Generalized Inverse of Matrices and its Applications , 1972 .

[25]  Wang Chengzhang,et al.  BJUT-3D Large Scale 3D Face Database and Information Processing , 2009 .

[26]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[27]  Albert Ali Salah,et al.  Contrasting and Combining Least Squares Based Learners for Emotion Recognition in the Wild , 2015, ICMI.

[28]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[29]  Raimondo Schettini,et al.  UMB-DB: A database of partially occluded 3D faces , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[30]  Kristen Grauman,et al.  Intentional Photos from an Unintentional Photographer: Detecting Snap Points in Egocentric Video with a Web Photo Prior , 2014, Mobile Cloud Visual Media Computing.

[31]  Steven M. Seitz,et al.  Spacetime faces , 2004, ACM Trans. Graph..

[32]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[33]  Jean-Luc Dugelay,et al.  KinectFaceDB: A Kinect Database for Face Recognition , 2014, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[34]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[35]  Shiguang Shan,et al.  Modeling Video Dynamics with Deep Dynencoder , 2014, ECCV.

[36]  Jongmoo Choi,et al.  Real-time 3D face identification from a depth camera , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[37]  Zhe Wang,et al.  Towards Good Practices for Very Deep Two-Stream ConvNets , 2015, ArXiv.

[38]  Jiri Matas,et al.  XM2VTSDB: The Extended M2VTS Database , 1999 .

[39]  Yoshua Bengio,et al.  Challenges in representation learning: A report on three machine learning contests , 2013, Neural Networks.

[40]  Einar Meister,et al.  Real-time mimicking of estonian speaker's mouth movements on a 3D avatar using Kinect 2 , 2015, 2015 International Conference on Information and Communication Technology Convergence (ICTC).

[41]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[42]  Richard S. Palais,et al.  Euler’s fixed point theorem: The axis of a rotation , 2007 .

[43]  Alan C. Bovik,et al.  Texas 3D Face Recognition Database , 2010, 2010 IEEE Southwest Symposium on Image Analysis & Interpretation (SSIAI).

[44]  Xiu-Shen Wei,et al.  Selective Convolutional Descriptor Aggregation for Fine-Grained Image Retrieval , 2016, IEEE Transactions on Image Processing.

[45]  K.W. Bowyer,et al.  Using a Multi-Instance Enrollment Representation to Improve 3D Face Recognition , 2007, 2007 First IEEE International Conference on Biometrics: Theory, Applications, and Systems.

[46]  Thierry Pun,et al.  DEAP: A Database for Emotion Analysis ;Using Physiological Signals , 2012, IEEE Transactions on Affective Computing.

[47]  Yiying Tong,et al.  FaceWarehouse: A 3D Facial Expression Database for Visual Computing , 2014, IEEE Transactions on Visualization and Computer Graphics.