Pre-Training Without Natural Images

Is it possible to use convolutional neural networks pre-trained without any natural images to assist natural image understanding? The paper proposes a novel concept, Formula-driven Supervised Learning. We automatically generate image patterns and their category labels by assigning fractals, which are based on a natural law existing in the background knowledge of the real world. Theoretically, the use of automatically generated images instead of natural images in the pre-training phase allows us to generate an infinite scale dataset of labeled images. Although the models pre-trained with the proposed Fractal DataBase (FractalDB), a database without natural images, does not necessarily outperform models pre-trained with human annotated datasets at all settings, we are able to partially surpass the accuracy of ImageNet/Places pre-trained models. The image representation with the proposed FractalDB captures a unique feature in the visualization of convolutional layers and attentions.

[1]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[2]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[3]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[4]  Fabio Viola,et al.  The Kinetics Human Action Video Dataset , 2017, ArXiv.

[5]  Alexei A. Efros,et al.  Colorful Image Colorization , 2016, ECCV.

[6]  Kaiming He,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Timnit Gebru,et al.  Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification , 2018, FAT.

[9]  Fei-Fei Li,et al.  Towards fairer datasets: filtering and balancing the distribution of the people subtree in the ImageNet hierarchy , 2019, FAT*.

[10]  Chen Sun,et al.  Revisiting Unreasonable Effectiveness of Data in Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  Bolei Zhou,et al.  Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[13]  Benoit B. Mandelbrot,et al.  Fractal Geometry of Nature , 1984 .

[14]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Bolei Zhou,et al.  Moments in Time Dataset: One Million Videos for Event Understanding , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Kaiming He,et al.  Exploring the Limits of Weakly Supervised Pretraining , 2018, ECCV.

[17]  Quoc V. Le,et al.  Searching for MobileNetV3 , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[20]  Gerald E. Farin,et al.  Curves and surfaces for computer-aided geometric design - a practical guide, 4th Edition , 1997, Computer science and scientific computing.

[21]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[23]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[24]  Yong Xu,et al.  Viewpoint Invariant Texture Description Using Fractal Analysis , 2009, International Journal of Computer Vision.

[25]  Alex Pentland,et al.  Fractal-Based Description of Natural Scenes , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Cordelia Schmid,et al.  Learning from Synthetic Humans , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[28]  Paolo Favaro,et al.  Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles , 2016, ECCV.

[29]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[30]  Matthijs Douze,et al.  Deep Clustering for Unsupervised Learning of Visual Features , 2018, ECCV.

[31]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[32]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[33]  Gregory Shakhnarovich,et al.  FractalNet: Ultra-Deep Neural Networks without Residuals , 2016, ICLR.

[34]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[35]  W. B. Marks,et al.  Fractal methods and results in cellular morphology — dimensions, lacunarity and multifractals , 1996, Journal of Neuroscience Methods.

[36]  G. Landini,et al.  Local connected fractal dimensions and lacunarity analyses of 60 degrees fluorescein angiograms. , 1995, Investigative ophthalmology & visual science.

[37]  Paolo Favaro,et al.  Boosting Self-Supervised Learning via Knowledge Transfer , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Michael F. Barnsley,et al.  Fractals everywhere , 1988 .

[39]  Alexei A. Efros,et al.  Unsupervised Visual Representation Learning by Context Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[40]  Paolo Favaro,et al.  Representation Learning by Learning to Count , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[41]  Manik Varma,et al.  Locally Invariant Fractal Features for Statistical Texture Classification , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[42]  Quoc V. Le,et al.  Do Better ImageNet Models Transfer Better? , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Ken Perlin,et al.  Improving noise , 2002, SIGGRAPH.

[44]  Guoan Bi,et al.  3-D IFS fractals as real-time graphics model , 1997, Comput. Graph..

[45]  Alexei A. Efros,et al.  What makes ImageNet good for transfer learning? , 2016, ArXiv.

[46]  Kenneth Falconer,et al.  Fractal Geometry: Mathematical Foundations and Applications , 1990 .

[47]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[48]  Nikos Komodakis,et al.  Unsupervised Representation Learning by Predicting Image Rotations , 2018, ICLR.