Speech Synthesis and Uncanny Valley

The paper discusses a hypothesis relating high quality text-to-speech (TTS) synthesis in spoken dialogue systems with the concept of “uncanny valley”. It introduces a “Wizard-of-Oz” experiment with 30 volunteers engaged in conversations with two synthetic voices of different naturalness. The results of the experiment are summarized and interpreted, leading to the conclusion that the TTS uncanny valley effect in dialogue systems can probably be superseded and inverted by a positive attitude of the systems’ users toward new technologies.

[1]  Takayuki Kanda,et al.  Is The Uncanny Valley An Uncanny Cliff? , 2007, RO-MAN 2007 - The 16th IEEE International Symposium on Robot and Human Interactive Communication.

[2]  Morena Danieli,et al.  Application of expressive TTS synthesis in an advanced ECA system , 2010, SSW.

[3]  Roger K. Moore A Bayesian explanation of the ‘Uncanny Valley’ effect and related psychological phenomena , 2012, Scientific Reports.

[4]  Heloir,et al.  The Uncanny Valley , 2019, The Animation Studies Reader.

[5]  Karl F. MacDorman,et al.  The Uncanny Valley [From the Field] , 2012, IEEE Robotics Autom. Mag..