Online Evaluation of Text to Speech Systems for Three Social Robots

The success of social robots is mainly based on their capacity for interaction with people. In this regard, verbal and non-verbal communication skills are essential for social robots to get a natural human-robot interaction. This paper focuses on the first of them since the majority of social robots implement a Text to Speech system. We present a comparative study of 8 off-the-shelf systems used in social robots where 125 participants evaluated the performance of the systems. The results show that, in general, the participants detect differences between the Text to Speech systems, being able to determine which are the more intelligible, expressive, and artificial ones. Besides, the participants also conclude that there are some systems more suitable than others depending on the physical appearance of the robots.

[1]  Nikolaos G. Tsagarakis,et al.  iCub: the design and realization of an open humanoid platform for cognitive and neuroscience research , 2007, Adv. Robotics.

[2]  R. Barber,et al.  Maggie: A Robotic Platform for Human-Robot Social Interaction , 2006, 2006 IEEE Conference on Robotics, Automation and Mechatronics.

[3]  Saleh Alshomrani,et al.  A Comparative Study of Arabic Text-to-Speech Synthesis Systems , 2014 .

[4]  Paul Taylor,et al.  The architecture of the Festival speech synthesis system , 1998, SSW.

[5]  Zöe Handley Is text-to-speech synthesis ready for use in computer-assisted language learning? , 2009, Speech Commun..

[6]  Mahesh Viswanathan,et al.  Measuring speech quality for text-to-speech systems: development and assessment of a modified mean opinion score (MOS) scale , 2005, Comput. Speech Lang..

[7]  María Malfaz,et al.  A Social Robot Assisting in Cognitive Stimulation Therapy , 2018, PAAMS.

[8]  Hanafiah Yussof,et al.  Humanoid robot NAO: Review of control and motion exploration , 2011, 2011 IEEE International Conference on Control System, Computing and Engineering.

[9]  D H Klatt,et al.  Review of text-to-speech conversion for English. , 1987, The Journal of the Acoustical Society of America.

[10]  Bruce MacDonald,et al.  Towards Expressive Speech Synthesis in English on a Robotic Platform , 2006 .

[11]  Parteek Kumar,et al.  Comparative study of text to speech system for Indian language , 2012 .

[12]  Thierry Dutoit,et al.  The MBROLA project: towards a set of high quality speech synthesizers free of use for non commercial purposes , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[13]  Fernando Alonso-Martín,et al.  Augmented Robotics Dialog System for Enhancing Human–Robot Interaction , 2015, Sensors.

[14]  Pierre-Brice Wieber,et al.  Linear model predictive control of the locomotion of Pepper, a humanoid robot with omnidirectional wheels , 2014, 2014 IEEE-RAS International Conference on Humanoid Robots.

[15]  Hideki Kenmochi,et al.  VOCALOID - commercial singing synthesizer based on sample concatenation , 2007, INTERSPEECH.

[16]  Michael H. O'Malley Text-to-speech conversion technology , 1990, Computer.