Perfect Accuracy with Human-in-the-Loop Object Detection

Modern state-of-the-art computer vision systems still perform imperfectly in many benchmark object recognition tasks. This hinders their application to real-time tasks where even a low but non-zero probability of error in analyzing every frame from a camera quickly accumulates to unacceptable performance for end users. Here we consider a visual aid to guide blind or visually-impaired persons in finding items in grocery stores using a head-mounted camera. The system uses a human-in-the-decision-loop approach to instruct the user how to turn or move when an object is detected with low confidence, to improve the object’s view captured by the camera, until computer vision confidence is higher than the highest mistaken confidence observed during algorithm training. In experiments with 42 blindfolded participants reaching for 25 different objects randomly arranged on shelves 15 times, our system was able to achieve 100 % accuracy, with all participants selecting the goal object in all trials.

[1]  Michele Merler,et al.  Recognizing Groceries in situ Using in vitro Training Data , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Jeffrey W Jutai,et al.  Low vision assistive technology device usage and importance in daily occupations. , 2011, Work.

[3]  Thomas Ertl,et al.  Interactive tracking of movable objects for the blind on the basis of environment models and perception-oriented object recognition methods , 2006, Assets '06.

[4]  Danica Kragic,et al.  Visual servoing on unknown objects , 2012 .

[5]  Xiaoyi Jiang,et al.  Adaptive Local Thresholding by Verification-Based Multithreshold Probing with Application to Vessel Detection in Retinal Images , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  B Phillips,et al.  Predictors of assistive technology abandonment. , 1993, Assistive technology : the official journal of RESNA.

[8]  Alberto Del Bimbo,et al.  Improving evidential quality of surveillance imagery through active face tracking , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[9]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Jan-Olof Eklundh,et al.  Vision in the real world: Finding, attending and recognizing objects , 2006, Int. J. Imaging Syst. Technol..

[11]  Gérard G. Medioni,et al.  Evaluation of feedback mechanisms for wearable visual aids , 2013, 2013 IEEE International Conference on Multimedia and Expo Workshops (ICMEW).

[12]  Romedi Passini,et al.  Wayfinding without Vision , 1988 .

[13]  Alan Yuille,et al.  Active Vision , 2014, Computer Vision, A Reference Guide.

[14]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[15]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, CVPR 2004.

[16]  John Nicholson,et al.  RoboCart: toward robot-assisted navigation of grocery stores by the visually impaired , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[17]  David A. McAllester,et al.  Cascade object detection with deformable part models , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Lawrence O'Gorman Binarization and Multithresholding of Document Images Using Connectivity , 1994, CVGIP Graph. Model. Image Process..

[19]  John Nicholson,et al.  ShopTalk: Independent Blind Shopping Through Verbal Route Directions and Barcode Scans , 2009 .

[20]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[21]  Laurent Itti,et al.  Attention biased speeded up robust featureS (AB-SURF): A neurally-inspired object recognition algorithm for a wearable aid for the visually-impaired , 2013, 2013 IEEE International Conference on Multimedia and Expo Workshops (ICMEW).

[22]  Suranga Nanayakkara,et al.  EyeRing: a finger-worn assistant , 2012, CHI EA '12.

[23]  Amir Amedi,et al.  ‘Visual’ Acuity of the Congenitally Blind Using Visual-to-Auditory Sensory Substitution , 2012, PloS one.

[24]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Sim Heng Ong,et al.  Adaptive local thresholding with fuzzy-validity-guided spatial partitioning , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[26]  Christophe Jouffrais,et al.  Artificial Vision for the Blind: a Bio-Inspired Algorithm for Objects and Obstacles Detection , 2010, Int. J. Image Graph..

[27]  Gordon Cheng,et al.  Foveated vision systems with two cameras per eye , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..