A Spatiotemporal Oriented Energy Network for Dynamic Texture Recognition

This paper presents a novel hierarchical spatiotemporal orientation representation for spacetime image analysis. It is designed to combine the benefits of the multilayer architecture of ConvNets and a more controlled approach to spacetime analysis. A distinguishing aspect of the approach is that unlike most contemporary convolutional networks no learning is involved; rather, all design decisions are specified analytically with theoretical motivations. This approach makes it possible to understand what information is being extracted at each stage and layer of processing as well as to minimize heuristic choices in design. Another key aspect of the network is its recurrent nature, whereby the output of each layer of processing feeds back to the input. To keep the network size manageable across layers, a novel cross-channel feature pooling is proposed. The multilayer architecture that results systematically reveals hierarchical image structure in terms of multiscale, multiorientation properties of visual spacetime. To illustrate its utility, the network has been applied to the task of dynamic texture recognition. Empirical evaluation on multiple standard datasets shows that it sets a new state-of-the-art.

[1]  Michel Ménard,et al.  Characterization and recognition of dynamic textures based on the 2D+T curvelet transform , 2015, Signal Image Video Process..

[2]  Michael S. Landy,et al.  Nonlinear Model of Neural Responses in Cat Visual Cortex , 1991 .

[3]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[4]  Richard P. Wildes,et al.  Spacetime Texture Representation and Recognition Based on a Spatiotemporal Orientation Analysis , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Mark J. Huiskes,et al.  DynTex: A comprehensive database of dynamic textures , 2010, Pattern Recognit. Lett..

[6]  Matti Pietikäinen,et al.  Dynamic Texture Recognition Using Volume Local Binary Patterns , 2006, WDV.

[7]  Eero P. Simoncelli,et al.  Nonlinear image representation using divisive normalization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Guillermo Sapiro,et al.  Anisotropic diffusion of multivalued images with applications to color filtering , 1996, IEEE Trans. Image Process..

[9]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[10]  David G. Lowe,et al.  Multiclass Object Recognition with Sparse, Localized Features , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[11]  Martin Szummer,et al.  Temporal texture modeling , 1996, Proceedings of 3rd IEEE International Conference on Image Processing.

[12]  Edward H. Adelson,et al.  The Design and Use of Steerable Filters , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Arnold W. M. Smeulders,et al.  Structured Receptive Fields in CNNs , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Brian C. Lovell,et al.  Dictionary Learning and Sparse Coding on Grassmann Manifolds: An Extrinsic Solution , 2013, 2013 IEEE International Conference on Computer Vision.

[15]  Yong Xu,et al.  Wavelet Domain Multifractal Analysis for Static and Dynamic Texture Classification , 2013, IEEE Transactions on Image Processing.

[16]  Silvano Di Zenzo,et al.  A note on the gradient of a multi-image , 1986, Comput. Vis. Graph. Image Process..

[17]  Stéphane Mallat,et al.  Invariant Scattering Convolution Networks , 2012, IEEE transactions on pattern analysis and machine intelligence.

[18]  Honglak Lee,et al.  Understanding and Improving Convolutional Neural Networks via Concatenated Rectified Linear Units , 2016, ICML.

[19]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[20]  Dumitru Erhan,et al.  Deep Neural Networks for Object Detection , 2013, NIPS.

[21]  Weixin Xie,et al.  Dynamic Texture Recognition by Spatio-Temporal Multiresolution Histograms , 2005, 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05) - Volume 1.

[22]  Jiwen Lu,et al.  PCANet: A Simple Deep Learning Baseline for Image Classification? , 2014, IEEE Transactions on Image Processing.

[23]  Dmitry Chetverikov,et al.  Dynamic Texture Recognition Using Normal Flow and Texture Regularity , 2005, IbPRIA.

[24]  Thomas Serre,et al.  A Biologically Inspired System for Action Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[25]  C. Baker,et al.  Processing of second-order stimuli in the visual cortex. , 2001, Progress in brain research.

[26]  Richard P. Wildes,et al.  Qualitative Spatiotemporal Analysis Using an Oriented Energy Representation , 2000, ECCV.

[27]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[28]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[29]  Trevor Darrell,et al.  Learning the Structure of Deep Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[30]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[31]  Xudong Jiang,et al.  Dynamic texture recognition using enhanced LBP features , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[32]  Yong Xu,et al.  Scale-space texture description on SIFT-like textons , 2012, Comput. Vis. Image Underst..

[33]  E. Adelson,et al.  The Plenoptic Function and the Elements of Early Vision , 1991 .

[34]  Payam Saisan,et al.  Dynamic texture recognition , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[35]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[36]  Yan Huang,et al.  Dynamic Texture Recognition via Orthogonal Tensor Dictionary Learning , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[37]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[38]  Matti Pietikäinen,et al.  Dynamic Texture Recognition Using Local Binary Patterns with an Application to Facial Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.