Assessment of Faster R-CNN in Man-Machine Collaborative Search

With the advent of modern expert systems driven by deep learning that supplement human experts (e.g. radiologists, dermatologists, surveillance scanners), we analyze how and when do such expert systems enhance human performance in a fine-grained small target visual search task. We set up a 2 session factorial experimental design in which humans visually search for a target with and without a Deep Learning (DL) expert system. We evaluate human changes of target detection performance and eye-movements in the presence of the DL system. We find that performance improvements with the DL system (computed via a Faster R-CNN with a VGG16) interacts with observer's perceptual abilities (e.g., sensitivity). The main results include: 1) The DL system reduces the False Alarm rate per Image on average across observer groups of both high/low sensitivity; 2) Only human observers with high sensitivity perform better than the DL system, while the low sensitivity group does not surpass individual DL system performance, even when aided with the DL system itself; 3) Increases in number of trials and decrease in viewing time were mainly driven by the DL system only for the low sensitivity group. 4) The DL system aids the human observer to fixate at a target by the 3rd fixation. These results provide insights of the benefits and limitations of deep learning systems that are collaborative or competitive with humans.

[1]  Pietro Perona,et al.  The Ignorant Led by the Blind: A Hybrid Human–Machine Vision System for Fine-Grained Categorization , 2014, International Journal of Computer Vision.

[2]  Miguel P Eckstein,et al.  Visual search: a retrospective. , 2011, Journal of vision.

[3]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Bruno A. Olshausen,et al.  Emergence of foveal image sampling from learning to attend in visual scenes , 2016, ICLR.

[5]  Bram van Ginneken,et al.  A survey on deep learning in medical image analysis , 2017, Medical Image Anal..

[6]  S. P. Arun,et al.  Do Computational Models Differ Systematically from Human Object Perception? , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Miguel P. Eckstein,et al.  Object detection through search with a foveated visual system , 2014, PLoS Comput. Biol..

[8]  David Maxwell Chickering,et al.  Machine Teaching: A New Paradigm for Building Machine Learning Systems , 2017, ArXiv.

[9]  Jascha Sohl-Dickstein,et al.  Adversarial Examples that Fool both Human and Computer Vision , 2018, ArXiv.

[10]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[11]  Jitendra Malik,et al.  The three R's of computer vision: Recognition, reconstruction and reorganization , 2016, Pattern Recognit. Lett..

[12]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  R. Rosenholtz,et al.  More than the Useful Field: Considering peripheral vision in driving. , 2017, Applied ergonomics.

[14]  Miguel P. Eckstein,et al.  Human Supervisory Control of Robotic Teams: Integrating Cognitive Modeling with Engineering Design , 2015, IEEE Control Systems.

[15]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Lauren E. Welbourne,et al.  Humans, but Not Deep Neural Networks, Often Miss Giant Targets in Scenes , 2017, Current Biology.

[17]  Miguel Ángel Guevara-López,et al.  Representation learning for mammography mass lesion classification with convolutional neural networks , 2016, Comput. Methods Programs Biomed..

[18]  Sebastian Thrun,et al.  Dermatologist-level classification of skin cancer with deep neural networks , 2017, Nature.

[19]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Dan Zecha,et al.  Improving Small Object Proposals for Company Logo Detection , 2017, ICMR.

[21]  Andrew Y. Ng,et al.  CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning , 2017, ArXiv.

[22]  Elizabeth A Krupinski,et al.  Current perspectives in medical image perception , 2010, Attention, perception & psychophysics.

[23]  Jitendra Malik,et al.  Object Instance Segmentation and Fine-Grained Localization Using Hypercolumns , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[25]  Miguel P. Eckstein,et al.  Attention Allocation Aid for Visual Search , 2017, CHI.

[26]  Gabriel J. Brostow,et al.  Becoming the expert - interactive multi-class machine teaching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Stefan Lee,et al.  Evaluating Visual Conversational Agents via Cooperative Human-AI Games , 2017, HCOMP.

[28]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[29]  Ronald T. Kneusel,et al.  Improving Human-Machine Cooperative Visual Search With Soft Highlighting , 2016, ACM Trans. Appl. Percept..

[30]  Miguel P. Eckstein,et al.  Can Peripheral Representations Improve Clutter Metrics on Complex Scenes? , 2016, NIPS.

[31]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[32]  Nico Karssemeijer,et al.  Large scale deep learning for computer aided detection of mammographic lesions , 2017, Medical Image Anal..

[33]  Andrew L. Beam,et al.  Adversarial Attacks Against Medical Deep Learning Systems , 2018, ArXiv.

[34]  Fei-Fei Li,et al.  Best of both worlds: Human-machine collaboration for object annotation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Dhruv Batra,et al.  Human Attention in Visual Question Answering: Do Humans and Deep Networks look at the same regions? , 2016, EMNLP.

[36]  Matthias Bethge,et al.  Comparing deep neural networks against humans: object recognition when the signal gets weaker , 2017, ArXiv.

[37]  Jascha Sohl-Dickstein,et al.  Adversarial Examples that Fool both Computer Vision and Time-Limited Humans , 2018, NeurIPS.

[38]  Patrick Pérez,et al.  Poisson image editing , 2003, ACM Trans. Graph..

[39]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.