Understanding Blind People's Experiences with Computer-Generated Captions of Social Media Images

Research advancements allow computational systems to automatically caption social media images. Often, these captions are evaluated with sighted humans using the image as a reference. Here, we explore how blind and visually impaired people experience these captions in two studies about social media images. Using a contextual inquiry approach (n=6 blind/visually impaired), we found that blind people place a lot of trust in automatically generated captions, filling in details to resolve differences between an image's context and an incongruent caption. We built on this in-person study with a second, larger online experiment (n=100 blind/visually impaired) to investigate the role of phrasing in encouraging trust or skepticism in captions. We found that captions emphasizing the probability of error, rather than correctness, encouraged people to attribute incongruence to an incorrect caption, rather than missing details. Where existing research has focused on encouraging trust in intelligent systems, we conclude by challenging this assumption and consider the benefits of encouraging appropriate skepticism.

[1]  Francis Ferraro,et al.  Visual Storytelling , 2016, NAACL.

[2]  Valerie Morash,et al.  Guiding Novice Web Workers in Making Image Descriptions Using Templates , 2015, ACM Trans. Access. Comput..

[3]  Rob Miller,et al.  VizWiz: nearly real-time answers to visual questions , 2010, UIST.

[4]  A. Tversky,et al.  The framing of decisions and the psychology of choice. , 1981, Science.

[5]  Bernt Schiele,et al.  Towards improving trust in context-aware systems by displaying system confidence , 2005, Mobile HCI.

[6]  Meredith Ringel Morris,et al.  "With most of it being pictures now, I rarely use it": Understanding Twitter's Evolving Accessibility to Blind Users , 2016, CHI.

[7]  Antonella De Angeli,et al.  Framing the user experience: information biases on website quality judgement , 2008, CHI.

[8]  P. Koopman,et al.  A Philosophy for Developing Trust in Self-driving Cars , 2015 .

[9]  Lorenzo Torresani,et al.  AutoCaption: Automatic caption generation for personal photos , 2014, IEEE Winter Conference on Applications of Computer Vision.

[10]  Bongshin Lee,et al.  TimeAware: Leveraging Framing Effects to Enhance Personal Productivity , 2016, CHI.

[11]  Bongshin Lee,et al.  Nudging People Away from Privacy-Invasive Mobile Apps through Visual Framing , 2013, INTERACT.

[12]  Daniel M. Oppenheimer,et al.  Effect of communication strategy on personal risk perception and treatment adherence intentions , 2009, Psychology, health & medicine.

[13]  Lada A. Adamic,et al.  Visually impaired users on an online social network , 2014, CHI.

[14]  J. J. Higgins,et al.  The aligned rank transform for nonparametric factorial analyses using only anova procedures , 2011, CHI.

[15]  T. Marteau,et al.  Framing of information: its influence upon decisions of doctors and patients. , 1989, The British journal of social psychology.

[16]  Meredith Ringel Morris,et al.  Investigating the appropriateness of social network question asking as a resource for blind users , 2013, CSCW.

[17]  Göran Falkman,et al.  Presenting system uncertainty in automotive UIs for supporting trust calibration in autonomous driving , 2013, AutomotiveUI.

[18]  D. Prelec A Bayesian Truth Serum for Subjective Data , 2004, Science.

[19]  Geoffrey Zweig,et al.  From captions to visual concepts and back , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Richard E. Ladner,et al.  WebInSight:: making web images accessible , 2006, Assets '06.

[21]  Sean A. Munson,et al.  Persuasive Performance Feedback: The Effect of Framing on Self-Efficacy , 2013, AMIA.

[22]  Morten Goodwin,et al.  Global Web Accessibility Analysis of National Government Portals and Ministry Web Sites , 2011 .

[23]  Yuquan Shi,et al.  E-Government Web Site Accessibility in Australia and China , 2006 .

[24]  Geoffrey Zweig,et al.  Language Models for Image Captioning: The Quirks and What Works , 2015, ACL.

[25]  Shaomei Wu,et al.  Automatic Alt-text: Computer-generated Image Descriptions for Blind Users on a Social Network Service , 2017, CSCW.

[26]  Margaret Mitchell,et al.  Generating Natural Questions About an Image , 2016, ACL.

[27]  Jian Sun,et al.  Rich Image Captioning in the Wild , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[28]  Gilly Leshed,et al.  How Blind People Interact with Visual Content on Social Networking Services , 2016, CSCW.

[29]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Jeffrey P. Bigham,et al.  Crowdsourcing subjective fashion advice using VizWiz: challenges and opportunities , 2012, ASSETS '12.

[31]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[32]  Mark Vollrath,et al.  Improving the Driver–Automation Interaction , 2013, Hum. Factors.