Spot the Pleasant People! Navigating the Cocktail Party Buzz

We present an experimental platform for making voice likability assessments that are decoupled from individual voices, and instead capture voice characteristics over groups of speakers. We employ methods that we have previously used for other purposes to create the Cocktail platform, where respondents navigate in a voice buzz made up of about 400 voices on a touch screen. They then choose the location where they find the voice buzz most pleasant. Since there is no image or message on the screen, the platform can be used by visually impaired people, who often need to rely on spoken text, on the same premises as seeing people. In this paper, we describe the platform and its motivation along with our analysis method. We conclude by presenting two experiments in which we verify that the platform behaves as expected: one simple sanity test, and one experiment with voices grouped according to their mean pitch variance.

[1]  Erik Marchi,et al.  Introducing the Weighted Trustability Evaluator for Crowdsourcing Exemplified by Speaker Likability Classification , 2016, LREC.

[2]  Sebastian Möller,et al.  Perceptual Ratings of Voice Likability Collected Through In-Lab Listening Tests vs. Mobile-Based Crowdsourcing , 2017, INTERSPEECH.

[3]  Amitava Chattopadhyay,et al.  Hearing Voices: The Impact of Announcer Speech Characteristics on Consumer Response to Broadcast Advertising , 2002 .

[4]  Jens Edlund,et al.  Cocktail : a demonstration of massively multi-component audio environments for illustration and analysis , 2010 .

[5]  Björn W. Schuller,et al.  "Would You Buy a Car from Me?" - On the Likability of Telephone Voices , 2011, INTERSPEECH.

[6]  M. Schröder,et al.  Modelling personality features by changing prosody in synthetic speech , 2006 .

[7]  Felix Burkhardt,et al.  Voice attributes affecting likability perception , 2010, INTERSPEECH.

[8]  Joakim Gustafson,et al.  What makes a good speaker? subject ratings, acoustic measurements and perceptual evaluations , 2008, INTERSPEECH.

[9]  Erik Marchi,et al.  Likability of human voices: A feature analysis and a neural network regression approach to automatic likability estimation , 2013, 2013 14th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS).

[10]  Jens Edlund,et al.  Bringing Order to Chaos: A Non-Sequential Approach for Browsing Large Sets of Found Audio Data , 2018, LREC.

[11]  R. Pipitone,et al.  Women's voice attractiveness varies across the menstrual cycle , 2008 .

[12]  Elmar Nöth,et al.  The INTERSPEECH 2012 Speaker Trait Challenge , 2012, INTERSPEECH.

[13]  N. Moray Attention in Dichotic Listening: Affective Cues and the Influence of Instructions , 1959 .