Representations, Analysis and Recognition of Shape and Motion from Imaging Data

This paper presents a comparison between two core paradigms for computing scene flow from multi-view videos of dynamic scenes. In both approaches, shape and motion estimation are decoupled, in accordance to a large segment of the relevant literature. The first approach is faster and considers only one optical flow field and the depth difference between pixels in consecutive frames to generate a dense scene flow estimate. The second approach is more robust to outliers by considering multiple optical flow fields to generate scene flow. Our goal is to compare the isolated fundamental scene flow estimation methods, without using any post-processing, or optimization. We assess the accuracy of the two methods performing two tests: an optical flow prediction, and a future image prediction, both on a novel view. This is the first quantitative evaluation of scene flow estimation on real imagery of dynamic scenes, in absence of ground truth data.

[1]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[2]  Rainer Lienhart,et al.  An extended set of Haar-like features for rapid object detection , 2002, Proceedings. International Conference on Image Processing.

[3]  Anil K. Jain,et al.  Face Detection in Color Images , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Serge J. Belongie,et al.  Counting Crowded Moving Objects , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[5]  Robert B. Fisher,et al.  Hidden Markov Models for Optical Flow Analysis in Crowds , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[6]  Jun Wang,et al.  A 3D facial expression database for facial behavior research , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[7]  Mubarak Shah,et al.  A Lagrangian Particle Dynamics Approach for Crowd Flow Segmentation and Stability Analysis , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Xiaosheng Wu,et al.  An Effective Texture Spectrum Descriptor , 2009, 2009 Fifth International Conference on Information Assurance and Security.

[9]  Majid Mirmehdi,et al.  Archive Film Restoration Based on Spatiotemporal Random Walks , 2010, ECCV.

[10]  Ivan Laptev,et al.  Density-aware person detection and tracking in crowds , 2011, ICCV.

[11]  Graham W. Taylor,et al.  Adaptive deconvolutional networks for mid and high level feature learning , 2011, 2011 International Conference on Computer Vision.

[12]  Wuming Zhang,et al.  3D Aided Face Recognition across Pose Variations , 2012, CCBR.

[13]  Naoufel Werghi,et al.  An ordered topological representation of 3D triangular mesh facial surface: concept and applications , 2012, EURASIP J. Adv. Signal Process..

[14]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[15]  Alberto Del Bimbo,et al.  The Mesh-LBP: Computing Local Binary Patterns on Discrete Manifolds , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[16]  Haroon Idrees,et al.  Multi-source Multi-scale Counting in Extremely Dense Crowd Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[18]  Xiaolong Wang,et al.  2D-3D face recognition via Restricted Boltzmann Machines , 2015, 2014 International Conference on Computer Vision Theory and Applications (VISAPP).

[19]  Alberto Del Bimbo,et al.  Representing 3D texture on mesh manifolds for retrieval and recognition applications , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Ivan Laptev,et al.  Context-Aware CNNs for Person Head Detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[21]  Gang Hua,et al.  A convolutional neural network cascade for face detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Alberto Del Bimbo,et al.  Computing Local Binary Patterns on Mesh Manifolds for 3D Texture Retrieval , 2015, 3DOR@Eurographics.

[23]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[24]  Nicu Sebe,et al.  The S-HOCK dataset: Analyzing crowds at the stadium , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Trevor Darrell,et al.  Fully convolutional networks for semantic segmentation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[27]  Alberto Del Bimbo,et al.  The Mesh-LBP: A Framework for Extracting Local Binary Patterns From Discrete Manifolds , 2015, IEEE Transactions on Image Processing.

[28]  Alberto Del Bimbo,et al.  Local binary patterns on triangular meshes: Concept and applications , 2015, Comput. Vis. Image Underst..

[29]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[30]  Stefano Berretti,et al.  Boosting 3D LBP-based face recognition by fusing shape and texture descriptors on the mesh , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[31]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Yu Qiao,et al.  A Discriminative Feature Learning Approach for Deep Face Recognition , 2016, ECCV.

[33]  Shuo Yang,et al.  WIDER FACE: A Face Detection Benchmark , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Yong Dou,et al.  Localized region context and object feature fusion for people head detection , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[35]  Horst Bischof,et al.  Interactive 3D Segmentation of Rock-Art by Enhanced Depth Maps and Gradient Preserving Regularization , 2016, JOCCH.

[36]  Andrew Y. Ng,et al.  End-to-End People Detection in Crowded Scenes , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Bernard Ghanem,et al.  Multi-scale Fully Convolutional Network for Face Detection in the Wild , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[38]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.