Firefox Voice: An Open and Extensible Voice Assistant Built Upon the Web

Voice assistants are fundamentally changing the way we access information. However, voice assistants still leverage little about the web beyond simple search results. We introduce Firefox Voice, a novel voice assistant built on the open web ecosystem with an aim to expand access to information available via voice. Firefox Voice is a browser extension that enables users to use their voice to perform actions such as setting timers, navigating the web, and reading a webpage’s content aloud. Through an iterative development process and use by over 12,000 active users, we find that users see voice as a way to accomplish certain browsing tasks efficiently, but struggle with discovering functionality and frequently discontinue use. We conclude by describing how Firefox Voice enables the development of novel, open web-powered voice-driven experiences.

[1]  Shaun W. Lawson,et al.  Voice as a Design Material: Sociophonetic Inspired Design Strategies in Human-Computer Interaction , 2019, CHI.

[2]  I. V. Ramakrishnan,et al.  More than meets the eye: a survey of screen-reader browsing strategies , 2010, W4A.

[3]  Leah Findlater,et al.  "Phantom Friend" or "Just a Box with Information" , 2019, Proc. ACM Hum. Comput. Interact..

[4]  Jichen Zhu,et al.  Patterns for How Users Overcome Obstacles in Voice User Interfaces , 2018, CHI.

[5]  Xiang 'Anthony' Chen,et al.  Geno: A Developer Tool for Authoring Multimodal Interaction on Existing Web Applications , 2020, UIST.

[6]  Leah Findlater,et al.  Use of Intelligent Voice Assistants by Older Adults with Low Technology Use , 2020, ACM Trans. Comput. Hum. Interact..

[7]  Rachael Tatman,et al.  Effects of Talker Dialect, Gender & Race on Accuracy of Bing Speech and YouTube Automatic Captions , 2017, INTERSPEECH.

[8]  Jaime Teevan,et al.  Calendar.help: Designing a Workflow-Based Scheduling Agent with Humans in the Loop , 2017, CHI.

[9]  Fabio Casati,et al.  Conversational Web Interaction: Proposal of a Dialog-Based Natural Language Interaction Paradigm for the Web , 2019, CONVERSATIONS.

[10]  Benjamin R. Cowan,et al.  Mapping Perceptions of Humanness in Intelligent Personal Assistant Interaction , 2019, MobileHCI.

[11]  Chinmay Kulkarni,et al.  One Voice Fits All? Social Implications and Research Challenges of Designing Voices for Smart Devices , 2019 .

[12]  Xiao Ma,et al.  Challenges in Supporting Exploratory Search through Voice Assistants , 2020, CIU.

[13]  Janice Y. Tsai,et al.  Hey Scout: Designing a Browser-Based Voice Assistant , 2018, AAAI Spring Symposia.

[14]  Sarah Sharples,et al.  Voice Interfaces in Everyday Life , 2018, CHI.

[15]  Chinmay Kulkarni,et al.  Vitro: Designing a Voice Assistant for the Scientific Lab Workplace , 2019, Conference on Designing Interactive Systems.

[16]  Jens Edlund,et al.  The State of Speech in HCI: Trends, Themes and Challenges , 2018, Interact. Comput..

[17]  Imed Zitouni,et al.  Hey Cortana ! Exploring the use cases of a Desktop based Digital , 2018 .

[18]  Benjamin R. Cowan,et al.  Revolution or Evolution? Speech Interaction and HCI Design Guidelines , 2019, IEEE Pervasive Computing.

[19]  Clifford Nass,et al.  Computers are social actors , 1994, CHI '94.

[20]  Benjamin R. Cowan,et al.  "What can i help you with?": infrequent users' experiences of intelligent personal assistants , 2017, MobileHCI.

[21]  Alex C. Williams,et al.  Toward Voice-Assisted Browsers: A Preliminary Study with Firefox Voice , 2020, CIU.

[22]  Mira Dontcheva,et al.  Vocal Shortcuts for Creative Experts , 2019, CHI.

[23]  Mark West,et al.  I'd blush if I could: closing gender divides in digital skills through education , 2019 .

[24]  An Empirical Study : Adding Voice Input to a Graphical Editor , 1991 .

[25]  Samuel B. Williams,et al.  ASSOCIATION FOR COMPUTING MACHINERY , 2000 .

[26]  Stacy M. Branham,et al.  Reading Between the Guidelines: How Commercial Voice Assistant Guidelines Hinder Accessibility for Blind Users , 2019, ASSETS.

[27]  Ryen W. White,et al.  VERSE: Bridging Screen Readers and Voice Assistants for Enhanced Eyes-Free Web Search , 2019, ASSETS.

[28]  Chinmay Kulkarni,et al.  Methods and Tools for Prototyping Voice Interfaces , 2020, CIU.

[29]  Ben Shneiderman,et al.  The limits of speech recognition , 2000, CACM.

[30]  Pattie Maes,et al.  AlterEgo: A Personalized Wearable Silent Speech Interface , 2018, IUI.

[31]  Manfred Tscheligi,et al.  Exploring voice user interfaces for seniors , 2013, PETRA '13.

[32]  Jimmy Lin,et al.  Universal voice-enabled user interfaces using JavaScript , 2019, IUI Companion.

[33]  Jason C. Yip,et al.  Parenting with Alexa: Exploring the Introduction of Smart Speakers on Family Dynamics , 2020, CHI.

[34]  Lone Koefoed Hansen,et al.  Intimate Futures: Staying with the Trouble of Digital Personal Assistants through Design Fiction , 2018, Conference on Designing Interactive Systems.

[35]  Eben M. Haber,et al.  CoScripter: automating & sharing how-to knowledge in the enterprise , 2008, CHI.

[36]  Niels van Berkel,et al.  "Hi! I am the Crowd Tasker" Crowdsourcing through Digital Voice Assistants , 2020, CHI.

[37]  Henriette Cramer,et al.  Challenges and Methods in Design of Domain-specific Voice Assistants , 2018, AAAI Spring Symposia.

[38]  Jodi Forlizzi,et al.  "Hey Alexa, What's Up?": A Mixed-Methods Studies of In-Home Conversational Agent Usage , 2018, Conference on Designing Interactive Systems.

[39]  Alexandra Vtyurina Towards Non-Visual Web Search , 2019, CHIIR.

[40]  Emmi Parviainen,et al.  Experiential Qualities of Whispering with Voice Assistants , 2020, CHI.

[41]  Edward Cutrell,et al.  "Yours is better!": participant response bias in HCI , 2012, CHI.

[42]  Wendy Ju,et al.  Is Now A Good Time?: An Empirical Study of Vehicle-Driver Communication Timing , 2019, CHI.

[43]  John Zimmerman Case for a Voice-Internet: Voice Before Conversation , 2020, CIU.

[44]  Andreea Danielescu,et al.  Eschewing Gender Stereotypes in Voice Assistants to Promote Inclusion , 2020, CIU.

[45]  Astrid Weber,et al.  What can I say?: addressing user experience challenges of a mobile voice user interface for accessibility , 2016, MobileHCI.

[46]  Ryen W. White,et al.  Proactive Suggestion Generation: Data and Methods for Stepwise Task Assistance , 2020, SIGIR.

[47]  Monica S. Lam,et al.  Genie: a generator of natural language semantic parsers for virtual assistant commands , 2019, PLDI.

[48]  Ryen W. White,et al.  Bridging Screen Readers and Voice Assistants for Enhanced Eyes-Free Web Search , 2019, WWW.

[49]  Ido Guy,et al.  Searching by Talking: Analysis of Voice Queries on Mobile Web Search , 2016, SIGIR.

[50]  Francoise Beaufays,et al.  “Your Word is my Command”: Google Search by Voice: A Case Study , 2010 .

[51]  Ravi Kuber,et al.  "Siri Talks at You": An Empirical Investigation of Voice-Activated Personal Assistant (VAPA) Usage by Individuals Who Are Blind , 2018, ASSETS.

[52]  Jimmy J. Lin,et al.  Howl: A Deployed, Open-Source Wake Word Detection System , 2020, NLPOSS.

[53]  Benjamin R. Cowan,et al.  Siri, Echo and Performance: You have to Suffer Darling , 2019, CHI Extended Abstracts.

[54]  Shruti Sannon,et al.  "Alexa is my new BFF": Social Roles, User Satisfaction, and Personification of the Amazon Echo , 2017, CHI Extended Abstracts.

[55]  Leah Findlater,et al.  "Accessibility Came by Accident": Use of Voice-Controlled Intelligent Personal Assistants by People with Disabilities , 2018, CHI.

[56]  Monica S. Lam,et al.  Almond: The Architecture of an Open, Crowdsourced, Privacy-Preserving, Programmable Virtual Assistant , 2017, WWW.

[57]  Frank Bentley,et al.  Understanding the Long-Term Use of Smart Speaker Assistants , 2018, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[58]  Abigail Sellen,et al.  "Like Having a Really Bad PA": The Gulf between User Expectation and Experience of Conversational Agents , 2016, CHI.

[59]  Adam Fourney,et al.  Exploring the Role of Conversational Cues in Guided Task Support with Virtual Assistants , 2018, CHI.

[60]  Tilman Dingler,et al.  Enabling Creative Crowd Work through Smart Speakers , 2019, CHI 2019.

[61]  Gina-Anne Levow,et al.  Designing SpeechActs: issues in speech user interfaces , 1995, CHI '95.

[62]  Scott R. Klemmer,et al.  ReMap: Multimodal Help-Seeking , 2019, UIST.

[63]  Dan Jurafsky,et al.  Racial disparities in automated speech recognition , 2020, Proceedings of the National Academy of Sciences.

[64]  Gierad Laput,et al.  PixelTone: a multimodal interface for image editing , 2013, CHI.

[65]  Jeffrey Nichols,et al.  A conversational interface to web automation , 2010, UIST '10.

[66]  Jessica Colnago,et al.  Choice of Voices: A Large-Scale Evaluation of Text-to-Speech Voice Quality for Long-Form Content , 2020, CHI.

[67]  Henriette Cramer,et al.  "Play PRBLMS": Identifying and Correcting Less Accessible Content in Voice Interfaces , 2018, CHI.

[68]  Maneesh Agrawala,et al.  How to Design Voice Based Navigation for How-To Videos , 2019, CHI.

[69]  Kun-Pyo Lee,et al.  Once a Kind Friend is Now a Thing: Understanding How Conversational Agents at Home are Forgotten , 2019, Conference on Designing Interactive Systems.

[70]  Roger K. Moore Is Spoken Language All-or-Nothing? Implications for Future Speech-Based Human-Machine Interaction , 2016, IWSDS.