Social Media through Voice

With advances in expressive speech synthesis and conversational understanding, an ever-increasing amount of digital content---including social and personal content---can be consumed through voice. Voice has long been known to convey personal characteristics and emotional states, both of which are prominent aspects of social media. Yet, no study has investigated voice design requirements for social media platforms. We interviewed 15 active social media users about their preferences on using synthesized voices to represent their profiles. Our findings show that participants want to have control over how a voice delivers their content, such as the personality and emotion with which the voice speaks, because these prosodic variations can impact users' online personas and interfere with impression management. We report motivations behind customizing or not customizing voice characteristics in different scenarios, and uncover key challenges around usability and the potential for stereotyping. We argue that synthesized speech for social media should be evaluated not only on listening experience and voice quality but also on its expressivity, degree of customizability, and ability to adapt to contexts (e.g., social media platforms, groups, individual posts). We discuss how our contribution confirms and extends knowledge of voice technology design and online self-presentation, and offer design considerations for voice personalization related to social interactions.

[1]  Danah Boyd,et al.  Friends, Friendsters, and Top 8: Writing community into being on social network sites , 2006, First Monday.

[2]  K. M. Lee,et al.  Children’s Responses to Computer-Synthesized Speech in Educational Media: Gender Consistency and Gender Similarity Effects , 2007 .

[3]  Richard J. Anderson,et al.  ReCall: Crowdsourcing on Basic Phones to Financially Sustain Voice Forums , 2019, CHI.

[4]  Shaun W. Lawson,et al.  Voice as a Design Material: Sociophonetic Inspired Design Strategies in Human-Computer Interaction , 2019, CHI.

[5]  Brent Waters,et al.  Privacy management for portable recording devices , 2004, WPES '04.

[6]  John C. Tang,et al.  More to Meetings: Challenges in Using Speech-Based Technology to Support Meetings , 2017, CSCW.

[7]  D R Beukelman,et al.  Synthetic and natural speech preferences of male and female listeners in four age groups. , 1989, Journal of speech and hearing research.

[8]  Laura Robinson,et al.  The cyberself: the self-ing project goes online, symbolic interaction in the digital age , 2007, New Media Soc..

[9]  Elizabeth A. Strand Uncovering the Role of Gender Stereotypes in Speech Perception , 1999 .

[10]  Adam N. Joinson,et al.  Looking at, looking up or keeping up with people?: motives and use of facebook , 2008, CHI.

[11]  J. Dillard,et al.  The sounds of dominance: Vocal precursors of perceived dominance during interpersonal influence , 2000 .

[12]  B. Depaulo,et al.  Nonverbal behavior and self-presentation. , 1992, Psychological bulletin.

[13]  Clifford Nass,et al.  Designing social presence of social actors in human computer interaction , 2003, CHI '03.

[14]  M. Hunt,et al.  No More FOMO: Limiting Social Media Decreases Loneliness and Depression , 2018, Journal of Social and Clinical Psychology.

[15]  Graham Pullin,et al.  17 Ways to Say Yes: Toward Nuanced Tone of Voice in AAC and Speech Technology , 2015, Augmentative and alternative communication.

[16]  Rosalind Gill,et al.  Aesthetic Labour: Beauty Politics in Neoliberalism , 2017 .

[17]  Xuan Zhao,et al.  The many faces of facebook: experiencing social media as performance, exhibition, and personal archive , 2013, CHI.

[18]  W. Idsardi,et al.  Perceptual and Phonetic Experiments on American English Dialect Identification , 1999 .

[19]  Jens Edlund,et al.  The State of Speech in HCI: Trends, Themes and Challenges , 2018, Interact. Comput..

[20]  Nancy Niedzielski,et al.  The Effect of Social Information on the Perception of Sociolinguistic Variables , 1999 .

[21]  Abigail Sellen,et al.  "Like Having a Really Bad PA": The Gulf between User Expectation and Experience of Conversational Agents , 2016, CHI.

[22]  Chinmay Kulkarni,et al.  One Voice Fits All? , 2019, Proc. ACM Hum. Comput. Interact..

[23]  Keiichi Tokuda,et al.  The blizzard challenge - 2005: evaluating corpus-based speech synthesis on common datasets , 2005, INTERSPEECH.

[24]  Rupal Patel,et al.  Towards Personalized Speech Synthesis for Augmentative and Alternative Communication , 2014, Augmentative and alternative communication.

[25]  Karl F. MacDorman,et al.  The Uncanny Valley [From the Field] , 2012, IEEE Robotics Autom. Mag..

[26]  Nicole B. Ellison,et al.  Managing Impressions Online: Self-Presentation Processes in the Online Dating Environment , 2006, J. Comput. Mediat. Commun..

[27]  Aaditeshwar Seth,et al.  Gurgaon idol: a singing competition over community radio and IVRS , 2013, ACM DEV '13.

[28]  Steven E. Stern,et al.  The Persuasiveness of Synthetic Speech versus Human Speech , 1999, Hum. Factors.

[29]  Benjamin R. Cowan,et al.  "What can i help you with?": infrequent users' experiences of intelligent personal assistants , 2017, MobileHCI.

[30]  Agha Ali Raza,et al.  Threats, Abuses, Flirting, and Blackmail: Gender Inequity in Social Media Voice Forums , 2019, CHI.

[31]  Benjamin R. Cowan,et al.  Mapping Perceptions of Humanness in Intelligent Personal Assistant Interaction , 2019, MobileHCI.

[32]  Sébastien Le Maguer,et al.  Speech Synthesis Evaluation — State-of-the-Art Assessment and Suggestion for a Novel Research Program , 2019, 10th ISCA Workshop on Speech Synthesis (SSW 10).

[33]  Ilya Yaroslavsky,et al.  Persuasion and social perception of human vs. synthetic voice across person as source and computer as source conditions , 2006, Int. J. Hum. Comput. Stud..

[34]  Bruce MacDonald,et al.  Artificial Empathy in Social Robots: An analysis of Emotions in Speech , 2018, 2018 27th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN).

[35]  Who owns your voice?: ethically sourced voices for non-commercial tts applications , 2019, CUI.

[36]  R. Sebastian,et al.  Speech Cues and Social Evaluation: Markers of Ethnicity, Social Class, and Age , 2018, Recent Advances in Language, Communication, and Social Psychology.

[37]  S. R. Mahadeva Prasanna,et al.  Expressive speech synthesis: a review , 2013, Int. J. Speech Technol..

[38]  Mike Lewis,et al.  MelNet: A Generative Model for Audio in the Frequency Domain , 2019, ArXiv.

[39]  Danah Boyd,et al.  I tweet honestly, I tweet passionately: Twitter users, context collapse, and the imagined audience , 2011, New Media Soc..

[40]  Niloufar Salehi,et al.  Random, Messy, Funny, Raw: Finstas as Intimate Reconfigurations of Social Media , 2020, CHI.

[41]  Patricia D. Mautone,et al.  Social cues in multimedia learning: Role of speaker's voice. , 2003 .

[42]  Katie Drager,et al.  Sociophonetic Variation in Speech Perception , 2010, Lang. Linguistics Compass.

[43]  Molly Babel,et al.  Loose Lips and Silver Tongues, or, Projecting Sexual Orientation Through Speech , 2007, Lang. Linguistics Compass.

[44]  E. Goffman The Presentation of Self in Everyday Life , 1959 .

[45]  Bruce A. MacDonald,et al.  The Effects of Synthesized Voice Accents on User Perceptions of Robots , 2011, Int. J. Soc. Robotics.

[46]  Meredith Ringel Morris,et al.  Voicesetting: Voice Authoring UIs for Improved Expressivity in Augmentative Communication , 2018, CHI.

[47]  D. Boyd Why Youth (Heart) Social Network Sites: The Role of Networked Publics in Teenage Social Life , 2007 .

[48]  David W. Addington The relationship of selected vocal characteristics to personality perception , 1968 .

[49]  Jessica Colnago,et al.  Choice of Voices: A Large-Scale Evaluation of Text-to-Speech Voice Quality for Long-Form Content , 2020, CHI.

[50]  S. A. Trent,et al.  Voice quality: Listener identification of African‐American versus Caucasian speakers , 1995 .

[51]  Franziska Roesner,et al.  End User Security and Privacy Concerns with Smart Homes , 2017, SOUPS.

[52]  Ying-Qing Xu,et al.  Emojilization: An Automated Method For Speech to Emoji-Labeled Text , 2019, CHI Extended Abstracts.

[53]  Heloir,et al.  The Uncanny Valley , 2019, The Animation Studies Reader.

[54]  Sarah Sharples,et al.  "Do Animals Have Accents?": Talking with Agents in Multi-Party Conversation , 2017, CSCW.

[55]  B. Hogan The Presentation of Self in the Age of Social Media: Distinguishing Performances and Exhibitions Online , 2010 .

[56]  Lynn M. Farnsworth,et al.  The perceptual representation of voice gender. , 1995, The Journal of the Acoustical Society of America.

[57]  Patti M. Valkenburg,et al.  Norms of online expressions of emotion: Comparing Facebook, Twitter, Instagram, and WhatsApp , 2017, New Media Soc..

[58]  Florian Alt,et al.  At Your Service: Designing Voice Assistant Personalities to Improve Automotive User Interfaces , 2019, CHI.

[59]  E. Thomas Sociophonetic Applications of Speech Perception Experiments , 2002 .

[60]  Meredith Ringel Morris,et al.  "At times avuncular and cantankerous, with the reflexes of a mongoose": Understanding Self-Expression through Augmentative and Alternative Communication Devices , 2017, CSCW.

[61]  David B Pisoni,et al.  Perception of synthetic speech produced automatically by rule: Intelligibility of eight text-to-speech systems , 1986, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[62]  Hui-Tzu Grace Chou,et al.  "They Are Happier and Having Better Lives than I Am": The Impact of Using Facebook on Perceptions of Others' Lives , 2012, Cyberpsychology Behav. Soc. Netw..

[63]  C. Aronovitch The voice of personality: stereotyped judgments and their relation to voice quality and sex of speaker. , 1976, The Journal of social psychology.