VizWiz-FewShot: Locating Objects in Images Taken by People With Visual Impairments

We introduce a few-shot localization dataset originating from photographers who authentically were trying to learn about the visual content in the images they took. It includes nearly 10,000 segmentations of 100 categories in over 4,500 images that were taken by people with visual impairments. Compared to existing few-shot object detection and instance segmentation datasets, our dataset is the first to locate holes in objects (e.g., found in 12.3\% of our segmentations), it shows objects that occupy a much larger range of sizes relative to the images, and text is over five times more common in our objects (e.g., found in 22.4\% of our segmentations). Analysis of three modern few-shot localization algorithms demonstrates that they generalize poorly to our new dataset. The algorithms commonly struggle to locate objects with holes, very small and very large objects, and objects lacking text. To encourage a larger community to work on these unsolved challenges, we publicly share our annotated few-shot dataset at https://vizwiz.org .

[1]  D. Gurari,et al.  Grounding Answers for Visual Questions Asked by Visually Impaired People , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Gao Xinbo,et al.  A Comparative Review of Recent Few-Shot Object Detection Algorithms , 2021, ArXiv.

[3]  Limeng Qiao,et al.  DeFRCN: Decoupled Faster R-CNN for Few-Shot Object Detection , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[4]  Matthew Tobias Harris,et al.  ORBIT: A Real-World Few-Shot Dataset for Teachable Object Recognition , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Sinisa Todorovic,et al.  FAPIS: A Few-shot Anchor-free Part-based Instance Segmenter , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Tai-Yin Chiu,et al.  Vision Skills Needed to Answer Visual Questions , 2020, Proc. ACM Hum. Comput. Interact..

[7]  Mary Beth Rosson,et al.  The Emerging Professional Practice of Remote Sighted Assistance for People with Visual Impairments , 2020, CHI.

[8]  Gui-Song Xia,et al.  FGN: Fully Guided Network for Few-Shot Instance Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Tai-Yin Chiu,et al.  Assessing Image Quality Issues for Real-World Problems , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  D. Gurari,et al.  Captioning Images Taken by People Who Are Blind , 2020, ECCV.

[11]  Xiaodan Liang,et al.  Meta R-CNN: Towards General Solver for Instance-Level Low-Shot Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Khoi Nguyen,et al.  Feature Weighting and Boosting for Few-Shot Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Qing Li,et al.  Why Does a Visual Question Have Different Answers? , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Yu-Wing Tai,et al.  Few-Shot Object Detection With Attention-RPN and Multi-Relation Detector , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Chi-Keung Tang,et al.  FSS-1000: A 1000-Class Dataset for Few-Shot Segmentation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Chi Lin,et al.  VizWiz-Priv: A Dataset for Recognizing the Presence and Purpose of Private Visual Information in Images Taken by Blind People , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Yong Jae Lee,et al.  YOLACT: Real-Time Instance Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  Xin Wang,et al.  Few-Shot Object Detection via Feature Reweighting , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Alexander S. Ecker,et al.  One-Shot Instance Segmentation , 2018, ArXiv.

[20]  Kristen Grauman,et al.  BrowseWithMe: An Online Clothes Shopping Assistant for People with Visual Impairments , 2018, ASSETS.

[21]  Xindong Wu,et al.  Object Detection With Deep Learning: A Review , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[22]  Jiebo Luo,et al.  VizWiz Grand Challenge: Answering Visual Questions from Blind People , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Byron Boots,et al.  One-Shot Learning for Semantic Segmentation , 2017, BMVC.

[24]  Deyu Meng,et al.  Few-Example Object Detection with Model Communication , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Margrit Betke,et al.  Predicting Foreground Object Ambiguity and Efficiently Crowdsourcing the Segmentation(s) , 2017, International Journal of Computer Vision.

[26]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[27]  Rob Miller,et al.  VizWiz: nearly real-time answers to visual questions , 2010, UIST.

[28]  Christopher K. I. Williams,et al.  International Journal of Computer Vision manuscript No. (will be inserted by the editor) The PASCAL Visual Object Classes (VOC) Challenge , 2022 .

[29]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  D C MacFarland,et al.  The blind. , 1966, Rehabilitation record.