Five points to check when comparing visual perception in humans and machines

With the rise of machines to human-level performance in complex recognition tasks, a growing amount of work is directed toward comparing information processing in humans and machines. These studies are an exciting chance to learn about one system by studying the other. Here, we propose ideas on how to design, conduct, and interpret experiments such that they adequately support the investigation of mechanisms when comparing human and machine perception. We demonstrate and apply these ideas through three case studies. The first case study shows how human bias can affect the interpretation of results and that several analytic tools can help to overcome this human reference point. In the second case study, we highlight the difference between necessary and sufficient mechanisms in visual reasoning tasks. Thereby, we show that contrary to previous suggestions, feedback mechanisms might not be necessary for the tasks in question. The third case study highlights the importance of aligning experimental conditions. We find that a previously observed difference in object recognition does not hold when adapting the experiment to make conditions more equitable between humans and machines. In presenting a checklist for comparative studies of visual reasoning in humans and machines, we hope to highlight how to overcome potential pitfalls in design and inference.

[1]  Felix Wichmann,et al.  Beyond accuracy: quantifying trial-by-trial behaviour of CNNs and humans by measuring error consistency , 2020, NeurIPS.

[2]  Manfred Fahle,et al.  Closure facilitates contour integration , 2007, Vision Research.

[3]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[4]  Denis G. Pelli,et al.  ECVP '07 Abstracts , 2007, Perception.

[5]  R. Shapley,et al.  Spatial and Temporal Properties of Illusory Contours and Amodal Boundary Completion , 1996, Vision Research.

[6]  Razvan Pascanu,et al.  A simple neural network module for relational reasoning , 2017, NIPS.

[7]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Tomaso A. Poggio,et al.  Bridging the Gaps Between Residual Learning, Recurrent Neural Networks and Visual Cortex , 2016, ArXiv.

[10]  Gunter Loffler,et al.  Local and global contributions to shape discrimination , 2003, Vision Research.

[11]  M. Tomasello,et al.  Assessing the validity of ape-human comparisons: a reply to Boesch (2007). , 2008, Journal of comparative psychology.

[12]  Ting Li,et al.  Comparing machines and humans on a visual categorization test , 2011, Proceedings of the National Academy of Sciences.

[13]  V. Braitenberg Vehicles, Experiments in Synthetic Psychology , 1984 .

[14]  Thomas L. Griffiths,et al.  Adapting Deep Network Features to Capture Psychological Representations: An Abridged Report , 2017, IJCAI.

[15]  Denis G. Pelli,et al.  Deep learning—Using machine learning to study biological vision , 2018, Journal of vision.

[16]  Shimon Ullman,et al.  Atoms of recognition in human and computer vision , 2016, Proceedings of the National Academy of Sciences.

[17]  Justus H. Piater,et al.  25 Years of CNNs: Can We Compare to Human Abstraction Capabilities? , 2016, ICANN.

[18]  I Kovács,et al.  A closed curve is much more than an incomplete one: effect of closure in figure-ground segmentation. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Dennis M. Levi,et al.  Global contour processing in amblyopia , 2007, Vision Research.

[20]  D H Brainard,et al.  The Psychophysics Toolbox. , 1997, Spatial vision.

[21]  Xinhua Zhang,et al.  Can Deep Learning Learn the Principle of Closed Contour Detection? , 2018, ISVC.

[22]  Thomas Serre,et al.  Not-So-CLEVR: learning same–different relations strains feedforward neural networks , 2018, Interface Focus.

[23]  Ikuya Murakami,et al.  Functional brain imaging of the Rotating Snakes illusion by fMRI. , 2008, Journal of vision.

[24]  Hung-Yu Kao,et al.  Probing Neural Network Comprehension of Natural Language Arguments , 2019, ACL.

[25]  Leon A. Gatys,et al.  Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Matthias Bethge,et al.  ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness , 2018, ICLR.

[27]  A. Leonardis,et al.  Understanding images in biological and computer vision , 2018, Interface Focus.

[28]  Wilson S. Geisler,et al.  Contour grouping: closure effects are explained by good continuation and proximity , 2004, Vision Research.

[29]  Nikolaus Kriegeskorte,et al.  Recurrent Convolutional Neural Networks: A Better Model of Biological Object Recognition , 2017, bioRxiv.

[30]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[31]  Giorgio Vallortigara,et al.  Origins of spatial, temporal and numerical cognition: Insights from comparative psychology , 2010, Trends in Cognitive Sciences.

[32]  Gaurav Malhotra,et al.  What do adversarial images tell us about human vision? , 2020, bioRxiv.

[33]  Gihyun Kwon,et al.  Representation of white- and black-box adversarial examples in deep neural networks and humans: A functional magnetic resonance imaging study , 2019, 2019 International Joint Conference on Neural Networks (IJCNN).

[34]  Akiyoshi Kitaoka,et al.  Phenomenal characteristics of the peripheral drift illusion , 2003 .

[35]  Tomaso A. Poggio,et al.  Do Deep Neural Networks Suffer from Crowding? , 2017, NIPS.

[36]  Xu Sun,et al.  Adaptive Gradient Methods with Dynamic Bound of Learning Rate , 2019, ICLR.

[37]  Chaz Firestone,et al.  Humans can decipher adversarial images , 2018, Nature Communications.

[38]  Jonas Kubilius,et al.  Brain-Score: Which Artificial Neural Network for Object Recognition is most Brain-Like? , 2018, bioRxiv.

[39]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[40]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[41]  Jiajun Wu,et al.  A Comparative Evaluation of Approximate Probabilistic Simulation and Deep Neural Networks as Accounts of Human Physical Scene Understanding , 2016, CogSci.

[42]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[43]  A. Kitaoka,et al.  Illusory Motion Reproduced by Deep Neural Networks Trained for Prediction , 2018, Front. Psychol..

[44]  Alexander Gómez Villa,et al.  Convolutional Neural Networks Can Be Deceived by Visual Illusions , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[46]  Jonas Kubilius,et al.  Deep Neural Networks as a Computational Model for Human Shape Sensitivity , 2016, PLoS Comput. Biol..

[47]  Claudio Gennaro,et al.  Testing Deep Neural Networks on the Same-Different Task , 2019, 2019 International Conference on Content-Based Multimedia Indexing (CBMI).

[48]  Xavier Boix,et al.  Do Neural Networks for Segmentation Understand Insideness? , 2020, Neural Computation.

[49]  Xiang Sean Zhou,et al.  How intelligent are convolutional neural networks? , 2017, ArXiv.

[50]  Jascha Sohl-Dickstein,et al.  Adversarial Examples that Fool both Computer Vision and Time-Limited Humans , 2018, NeurIPS.

[51]  Ha Hong,et al.  Performance-optimized hierarchical models predict neural responses in higher visual cortex , 2014, Proceedings of the National Academy of Sciences.

[52]  R. Thomas McCoy,et al.  Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference , 2019, ACL.

[53]  Jun Du,et al.  Challenge of Spatial Cognition for Deep Learning , 2019, ArXiv.

[54]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[55]  O. Reiser,et al.  Principles Of Gestalt Psychology , 1936 .

[56]  James Elder,et al.  The effect of contour closure on the rapid discrimination of two-dimensional shapes , 1993, Vision Research.

[57]  Cameron Buckner The Comparative Psychology of Artificial Intelligences , 2019 .

[58]  Thomas Serre,et al.  Deep Learning: The Good, the Bad, and the Ugly. , 2019, Annual review of vision science.

[59]  Thomas Serre,et al.  How Deep is the Feature Analysis underlying Rapid Visual Categorization? , 2016, NIPS.

[60]  Felix Hill,et al.  Measuring abstract reasoning in neural networks , 2018, ICML.

[61]  Samuel Ritter,et al.  Cognitive Psychology for Deep Neural Networks: A Shape Bias Case Study , 2017, ICML.

[62]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[63]  Xavier Boix,et al.  Minimal Images in Deep Neural Networks: Fragile Object Recognition in Natural Images , 2019, ICLR.

[64]  Leon A. Gatys,et al.  Deep convolutional models improve predictions of macaque V1 responses to natural images , 2019, PLoS Comput. Biol..

[65]  Jakob H. Macke,et al.  Analyzing biological and artificial neural networks: challenges with opportunities for synergy? , 2018, Current Opinion in Neurobiology.

[66]  Bevil R. Conway,et al.  Neural Basis for a Powerful Static Motion Illusion , 2005, The Journal of Neuroscience.

[67]  Radoslaw Martin Cichy,et al.  Deep Neural Networks as Scientific Models , 2019, Trends in Cognitive Sciences.

[68]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[69]  W. Köhler The Mentality of Apes. , 2018, Nature.

[70]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[71]  A. Tate A measure of intelligence , 2012 .

[72]  Winifred D. Ashton,et al.  The Logit Transformation with Special Reference to its Uses in Bioassay. , 1973 .

[73]  Matthias Bethge,et al.  Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet , 2019, ICLR.

[74]  Matthias Bethge,et al.  Generalisation in humans and deep neural networks , 2018, NeurIPS.

[75]  Alexander S. Ecker,et al.  Benchmarking Robustness in Object Detection: Autonomous Driving when Winter is Coming , 2019, ArXiv.

[76]  C. Boesch What makes us human (Homo sapiens)? The challenge of cognitive cross-species comparison. , 2007, Journal of comparative psychology.

[77]  D G Pelli,et al.  The VideoToolbox software for visual psychophysics: transforming numbers into movies. , 1997, Spatial vision.

[78]  Wei Ji Ma,et al.  A neural network walks into a lab: towards using deep nets as models for human behavior , 2020, ArXiv.

[79]  Riegeskorte CONTROVERSIAL STIMULI: PITTING NEURAL NETWORKS AGAINST EACH OTHER AS MODELS OF HUMAN RECOGNITION , 2019 .

[80]  Martin Wattenberg,et al.  Do Neural Networks Show Gestalt Phenomena? An Exploration of the Law of Closure , 2019, ArXiv.

[81]  Ikuya Murakami,et al.  The effects of eccentricity and retinal illuminance on the illusory motion seen in a stationary luminance gradient , 2008, Vision Research.

[82]  James J. DiCarlo,et al.  How Does the Brain Solve Visual Object Recognition? , 2012, Neuron.