A New Large Scale Dynamic Texture Dataset with Application to ConvNet Understanding

We introduce a new large scale dynamic texture dataset. With over 10,000 videos, our Dynamic Texture DataBase (DTDB) is two orders of magnitude larger than any previously available dynamic texture dataset. DTDB comes with two complementary organizations, one based on dynamics independent of spatial appearance and one based on spatial appearance independent of dynamics. The complementary organizations allow for uniquely insightful experiments regarding the abilities of major classes of spatiotemporal ConvNet architectures to exploit appearance vs. dynamic information. We also present a new two-stream ConvNet that provides an alternative to the standard optical-flow-based motion stream to broaden the range of dynamic patterns that can be encompassed. The resulting motion stream is shown to outperform the traditional optical flow stream by considerable margins. Finally, the utility of DTDB as a pretraining substrate is demonstrated via transfer learning on a different dynamic texture dataset as well as the companion task of dynamic scene recognition resulting in a new state-of-the-art.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Gui-Song Xia,et al.  Dynamic texture recognition by aggregating spatial and temporal features via ensemble SVMs , 2016, Neurocomputing.

[3]  Matthew J. Hausknecht,et al.  Beyond short snippets: Deep networks for video classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Richard P. Wildes,et al.  Spatiotemporal Multiplier Networks for Video Action Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Luc Van Gool,et al.  The Synthesizability of Texture Examples , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Matti Pietikäinen,et al.  Dynamic Texture Recognition Using Volume Local Binary Patterns , 2006, WDV.

[8]  Hui Ji,et al.  Equiangular Kernel Dictionary Learning with Applications to Dynamic Texture Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Richard P. Wildes,et al.  Temporal Residual Networks for Dynamic Scene Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Mark J. Huiskes,et al.  DynTex: A comprehensive database of dynamic textures , 2010, Pattern Recognit. Lett..

[12]  Richard P. Wildes,et al.  Spacetime Texture Representation and Recognition Based on a Spatiotemporal Orientation Analysis , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Dmitry Chetverikov,et al.  A Brief Survey of Dynamic Texture Description and Recognition , 2005, CORES.

[14]  Richard P. Wildes,et al.  Dynamic texture recognition based on distributions of spacetime oriented structure , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  René Vidal,et al.  View-invariant dynamic texture recognition using a bag of dynamical systems , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Ko Nishino,et al.  The Scale of Geometric Texture , 2012, ECCV.

[17]  Michel Ménard,et al.  Characterization and recognition of dynamic textures based on the 2D+T curvelet transform , 2015, Signal Image Video Process..

[18]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[19]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[20]  Iasonas Kokkinos,et al.  Describing Textures in the Wild , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[22]  Michael S. Langer,et al.  Optical Snow , 2003, International Journal of Computer Vision.

[23]  Payam Saisan,et al.  Dynamic texture recognition , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[24]  Richard P. Wildes,et al.  A Spatiotemporal Oriented Energy Network for Dynamic Texture Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[25]  Lilly Irani,et al.  Amazon Mechanical Turk , 2018, Advances in Intelligent Systems and Computing.

[26]  Subhransu Maji,et al.  Deep filter banks for texture recognition and segmentation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Stefano Soatto,et al.  Dynamic Textures , 2003, International Journal of Computer Vision.

[28]  Andrew Zisserman,et al.  Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Andrew Zisserman,et al.  A Statistical Approach to Texture Classification from Single Images , 2004, International Journal of Computer Vision.

[30]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[32]  Subhransu Maji,et al.  Visualizing and Understanding Deep Texture Representations , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[34]  Andrew Zisserman,et al.  Texture classification: are filter banks necessary? , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[35]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Yan Huang,et al.  Dynamic Texture Recognition via Orthogonal Tensor Dictionary Learning , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[37]  Richard P. Wildes,et al.  Spatiotemporal Residual Networks for Video Action Recognition , 2016, NIPS.

[38]  Narendra Ahuja,et al.  Maximum Margin Distance Learning for Dynamic Texture Recognition , 2010, ECCV.

[39]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.