An open source speech synthesis module for a visual-speech recognition system

A Silent Speech Interface (SSI) is a voice replacement technology that permits speech communication without vocalisation. The visual-speech recognition engine of the proposed SSI is based on vocal tract imaging. The system aims to give the laryngectomised speaker the opportunity to speak with his/her original voice. This paper presents the speech synthesis module of a SSI that uses the open-source MaryTTS (Text-To-Speech). The visual-speech recognition engine of the SSI outputs a text sentence, which is imported to the speech synthesis module in order to synthesise speech in French or English. A new module of phonetic transcription has been developed and integrated into MaryTTS. In addition, English and French semi-HMM (Hidden Markov Models) model voices have been built. The SSI can be remotely controlled using a mobile device and the new voices are installed in a Web Server.

[1]  Minkyu Lee Text-to-speech systems , 2002 .

[2]  F. Béchet LIA―PHON: Un système complet de phonétisation de textes , 2001 .

[3]  Lise Crevier-Buchman,et al.  Silent vs vocalized articulation for a portable ultrasound-based silent speech interface , 2010, INTERSPEECH.

[4]  Steve Young,et al.  The HTK book , 1995 .

[5]  Gérard Chollet,et al.  Swiss French PolyPhone and PolyVar: telephone speech databases to model inter- and intra-speaker variability , 1996 .

[6]  Gérard Chollet,et al.  Visuo-phonetic decoding using multi-stream and context-dependent models for an ultrasound-based silent speech interface , 2009, INTERSPEECH.

[7]  Gérard Chollet,et al.  Acquisition of Ultrasound, Video and Acoustic Speech Data for a Silent-Speech Interface Application , 2008 .

[8]  Gérard Chollet,et al.  Towards a segmental vocoder driven by ultrasound and optical images of the tongue and lips , 2008, INTERSPEECH.

[9]  C. Pelachaud,et al.  GRETA. A BELIEVABLE EMBODIED CONVERSATIONAL AGENT , 2005 .

[10]  Marc Schröder,et al.  The German Text-to-Speech Synthesis System MARY: A Tool for Research, Development and Teaching , 2003, Int. J. Speech Technol..

[11]  Stefanie Shattuck-Hufnagel,et al.  The original ToBI system and the evolution of the ToBI framework , 2003 .

[12]  Kiyohiro Shikano,et al.  Julius - an open source real-time large vocabulary recognition engine , 2001, INTERSPEECH.

[13]  J. M. Gilbert,et al.  Silent speech interfaces , 2010, Speech Commun..