A Possible Reason for why Data-Driven Beats Theory-Driven Computer Vision

Why do some continue to wonder about the success and dominance of deep learning methods in computer vision and AI? Is it not enough that these methods provide practical solutions to many problems? Well no, it is not enough, at least for those who feel there should be a science that underpins all of this and that we should have a clear understanding of how this success was achieved. Here, this paper proposes that the dominance we are witnessing would not have been possible by the methods of deep learning alone: the tacit change has been the evolution of empirical practice in computer vision and AI over the past decades. We demonstrate this by examining the distribution of sensor settings in vision datasets and performance of both classic and deep learning algorithms under various camera settings. This reveals a strong mismatch between optimal performance ranges of classical theory-driven algorithms and sensor setting distributions in the common vision datasets, while data-driven models were trained for those datasets. The head-to-head comparisons between data-driven and theory-driven models were therefore unknowingly biased against the theory-driven models.

[1]  Robert B. Fisher,et al.  Fine-grained Recognition in the Noisy Wild: Sensitivity Analysis of Convolutional Neural Networks Approaches , 2016, BMVC.

[2]  Matthias Bethge,et al.  Generalisation in humans and deep neural networks , 2018, NeurIPS.

[3]  Lina J. Karam,et al.  A Study and Comparison of Human and Deep Learning Recognition Performance under Visual Distortions , 2017, 2017 26th International Conference on Computer Communication and Networks (ICCCN).

[4]  O. Faugeras Three-dimensional computer vision: a geometric viewpoint , 1993 .

[5]  Michael Milford,et al.  A robustness analysis of Deep Q Networks , 2016, ICRA 2016.

[6]  Xiongkuo Min,et al.  GazeGAN: A Generative Adversarial Saliency Model based on Invariance Analysis of Human Gaze During Scene Free Viewing , 2019, ArXiv.

[7]  Refractor Vision , 2000, The Lancet.

[8]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[9]  John K. Tsotsos,et al.  Active Control of Camera Parameters for Object Detection Algorithms , 2017, ArXiv.

[10]  Michael Brady,et al.  Saliency, Scale and Image Description , 2001, International Journal of Computer Vision.

[11]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[12]  Linda G. Shapiro,et al.  Computer and Robot Vision , 1991 .

[13]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[14]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[15]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[16]  John K. Tsotsos,et al.  50 Years of object recognition: Directions forward , 2013, Comput. Vis. Image Underst..

[17]  Barbara Caputo,et al.  A Deeper Look at Dataset Bias , 2015, Domain Adaptation in Computer Vision Applications.

[18]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[19]  Reiner Fageth,et al.  Statistic analysis of millions of digital photos , 2008, Electronic Imaging.

[20]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[21]  R. Weale Vision. A Computational Investigation Into the Human Representation and Processing of Visual Information. David Marr , 1983 .

[22]  L Sirovich,et al.  Low-dimensional procedure for the characterization of human faces. , 1987, Journal of the Optical Society of America. A, Optics and image science.

[23]  Dana H. Ballard,et al.  Computer Vision , 1982 .

[24]  Y. J. Tejwani,et al.  Robot vision , 1989, IEEE International Symposium on Circuits and Systems,.

[25]  ZissermanAndrew,et al.  The Pascal Visual Object Classes Challenge , 2015 .

[26]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[27]  Alex Pentland,et al.  Face recognition using eigenfaces , 1991, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[28]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[29]  Jan J. Koenderink,et al.  Solid shape , 1990 .

[30]  Saumik Bhattacharya,et al.  Effects of Degradations on Deep Neural Network Architectures , 2018, ArXiv.

[31]  John K. Tsotsos,et al.  On Sensor Bias in Experimental Methods for Comparing Interest-Point, Saliency, and Recognition Algorithms , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Cordelia Schmid,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[33]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Steven W. Zucker,et al.  On the computational neurobiology of curve detection , 1990, BMVC.

[35]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[36]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[37]  Azriel Rosenfeld,et al.  Digital Picture Processing , 1976 .

[38]  Geoffrey E. Hinton,et al.  Adaptive Elastic Models for Hand-Printed Character Recognition , 1991, NIPS.

[39]  John K. Tsotsos The Complexity of Perceptual Search Tasks , 1989, IJCAI.

[40]  John K. Tsotsos,et al.  Saliency, attention, and visual search: an information theoretic approach. , 2009, Journal of vision.

[41]  William J. Dally,et al.  The GPU Computing Era , 2010, IEEE Micro.

[42]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[43]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Richard Szeliski,et al.  Computer Vision - Algorithms and Applications , 2011, Texts in Computer Science.

[45]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[46]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[47]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Marvin A. Carlson Editor , 2015 .

[49]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[50]  Yair Weiss,et al.  Why do deep convolutional networks generalize so poorly to small image transformations? , 2018, J. Mach. Learn. Res..

[51]  Xiongkuo Min,et al.  Leverage eye-movement data for saliency modeling: Invariance Analysis and a Robust New Model , 2019 .

[52]  Kouichi Sakurai,et al.  One Pixel Attack for Fooling Deep Neural Networks , 2017, IEEE Transactions on Evolutionary Computation.

[53]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[54]  John K. Tsotsos,et al.  Elephant in the room , 2018 .

[55]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.