论文信息 - Voicesetting: Voice Authoring UIs for Improved Expressivity in Augmentative Communication

Voicesetting: Voice Authoring UIs for Improved Expressivity in Augmentative Communication

Alternative and augmentative communication (AAC) systems used by people with speech disabilities rely on text-to-speech (TTS) engines for synthesizing speech. Advances in TTS systems allowing for the rendering of speech with a range of emotions have yet to be incorporated into AAC systems, leaving AAC users with speech that is mostly devoid of emotion and expressivity. In this work, we describe voicesetting as the process of authoring the speech properties of text. We present the design and evaluation of two voicesetting user interfaces: the Expressive Keyboard, designed for rapid addition of expressivity to speech, and the Voicesetting Editor, designed for more careful crafting of the way text should be spoken. We evaluated the perceived output quality, requisite effort, and usability of both interfaces; the concept of voicesetting and our interfaces were highly valued by end-users as an enhancement to communication quality. We close by discussing design insights from our evaluations.

[1] P. Ekman,et al. Pan-Cultural Elements in Facial Displays of Emotion , 1969, Science.

[2] Sheida White. Backchannels across cultures: A study of Americans and Japanese , 1989, Language in Society.

[3] S. Goldin-Meadow,et al. The role of gesture in communication and thinking , 1999, Trends in Cognitive Sciences.

[4] T. Munsat,et al. Amyotrophic Lateral Sclerosis: A Guide for Patients and Families , 2001 .

[5] C. Guimond. Amyotrophic Lateral Sclerosis: A Guide for Patients and Families , 2001 .

[6] Päivi Majaranta,et al. Twenty years of eye typing: systems and design issues , 2002, ETRA.

[7] Michael Picheny,et al. The IBM expressive text-to-speech synthesis system for American English , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[8] Mark Tatham,et al. Speech Synthesis Markup Language (SSML) , 2006 .

[9] N. Campbell. APPROACHES TO CONVERSATIONAL SPEECH RHYTHM: SPEECH ACTIVITY IN TWO-PERSON TELEPHONE DIALOGES , 2007 .

[10] K. Scherer,et al. Multimodal expression of emotion: affect programs or componential appraisal patterns? , 2007, Emotion.

[11] Janice C Light,et al. Children's ideas for the design of AAC assistive technologies for young children with complex communication needs , 2007, Augmentative and alternative communication.

[12] Marc Schröder,et al. Expressive Speech Synthesis: Past, Present, and Possible Futures , 2009, Affective Information Processing.

[13] D. Higginbotham. Humanizing Vox Artificialis: The Role of Speech Synthesis in Augmentative and Alternative Communication , 2010 .

[14] Andrew Rosenberg,et al. Evaluating importance of facial expression in american sign language and pidgin signed english animations , 2011, ASSETS.

[15] Julie Carson-Berndsen,et al. WinkTalk: a demonstration of a multimodal speech synthesis platform linking facial expressions to expressive synthetic voices , 2012, SLPAT@HLT-NAACL.

[16] Marcela Charfuelan,et al. Expressive speech synthesis in MARY TTS using audiobook data and emotionML , 2013, INTERSPEECH.

[17] Kevin Jones,et al. Exploring expressivity and emotion with artificial voice and speech technologies , 2013, Logopedics, phoniatrics, vocology.

[18] Graham Pullin,et al. 17 Ways to Say Yes: Toward Nuanced Tone of Voice in AAC and Speech Technology , 2015, Augmentative and alternative communication.

[19] Matthew P. Aylett,et al. Don't Say Yes, Say Yes: Interacting with Synthetic Speech Using Tonetable , 2016, CHI Extended Abstracts.

[20] Meredith Ringel Morris,et al. Smartphone-Based Gaze Gesture Communication for People with Motor Disabilities , 2017, CHI.

[21] Meredith Ringel Morris,et al. Improving Dwell-Based Gaze Typing with Dynamic, Cascading Dwell Times , 2017, CHI.

[22] Meredith Ringel Morris,et al. "At times avuncular and cantankerous, with the reflexes of a mongoose": Understanding Self-Expression through Augmentative and Alternative Communication Devices , 2017, CSCW.

[23] Esther Nathanson. Native voice, self-concept and the moral case for personalized voice technology , 2017, Disability and rehabilitation.

[24] Meredith Ringel Morris,et al. Exploring the Design Space of AAC Awareness Displays , 2017, CHI.

[25] Johansen,et al. D4.1 Design Specifications and Guidelines for Cogain Eye-typing Systems , 2022 .