TELMA : Telephony for the Hearing-Impaired People. From Models to User Tests

Opening the New Technologies of Information and Communication to the disabled people is a question of increasing interest nowadays. The TELMA project aims at developing software and hardware bricks for a telecommunication terminal (cellular phone) for hearing impaired users. This terminal will be augmented with original audiovisual functionalities. More specifically, the TELMA terminal will exploit the visual modality of speech in two main tasks. On the one hand, visual speech information is used to improve speech enhancement techniques in adverse environment (environmental noise reduction enables the hearing-impaired to better exploit his/her residual acoustic abilities). On the other hand, the terminal will provide analysis/synthesis of lip movements and Cued Speech gestures. The Cued Speech is a face-to-face communication method used by a part of the oralist hearing-impaired community. It is based on the association of lip shapes with cues formed by the hand at specific locations. The TELMA terminal will translate lipreading + Cued Speech towards acoustic speech, and vice- versa, so that hearing-impaired people can communicate between them and with normal hearing people through telephone networks. To associate scientific developments, economic perspectives and efficient integration of disabled people concerns, the project is build on a partnership between universities (INPG and ENST), industrial/service company (France Telecom, R&D division) and potential users from the hearing-impaired community, under the supervision of health professionals (Grenoble Hospital Center / ORL). Categories and subject descriptors Telecommunications, Cued Speech, speech enhancement.

[1]  Christine Serviere,et al.  BLIND SEPARATION OF CONVOLUTIVE AUDIO MIXTURES USING NONSTATIONARITY , 2003 .

[2]  Christian Jutten,et al.  Mixing Audiovisual Speech Processing and Blind Source Separation for the Extraction of Speech Signals From Convolutive Mixtures , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Christian Jutten,et al.  Visual voice activity detection as a help for speech source separation from convolutive mixtures , 2007, Speech Commun..

[4]  Alice Caplier,et al.  Jumping snakes and parametric model for lip segmentation , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[5]  Denis Beautemps,et al.  Hand and Lip Desynchronization Analysis in French Cued Speech: Automatic Temporal Segmentation of Hand Flow , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[6]  Gérard Bailly,et al.  Learning optimal audiovisual phasing for an HMM-based control model for facial animation , 2007, SSW.

[7]  Guillaume Gibert,et al.  Evaluating a virtual speech cuer , 2006, INTERSPEECH.

[8]  Laurent Girin,et al.  ARTUS : calcul et tatouage audiovisuel des mouvements d'un personnage animé virtuel pour l'accessibilité d'émissions télévisuelles aux téléspectateurs sourds comprenant la Langue Française Parlée Complétée , 2006 .

[9]  Christophe Garcia,et al.  Convolutional face finder: a neural architecture for fast and robust face detection , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  C. Jutten,et al.  Using a Visual Voice Activity Detector to Regularize the Permutations in Blind Separation of Convolutive Speech Mixtures , 2007, 2007 15th International Conference on Digital Signal Processing.

[11]  Thomas Burger,et al.  Intercepting Static Hand Gestures in Dynamic Context , 2006 .

[12]  Heiga Zen,et al.  An introduction of trajectory model into HMM-based speech synthesis , 2004, SSW.

[13]  Denis Beautemps,et al.  A pilot study of temporal organization in Cued Speech production of French syllables: rules for a Cued Speech synthesizer , 2004, Speech Commun..

[14]  Thomas Burger,et al.  Modeling Hesitation and Conflict: A Belief-Based Approach for Multi-class Problems , 2006, 2006 5th International Conference on Machine Learning and Applications (ICMLA'06).

[15]  Gérard Bailly,et al.  A new trainable trajectory formation system for facial animation , 2006, ExLing.

[16]  Lale Akarun,et al.  Sequential Belief-Based Fusion of Manual and Non-Manual Signs , 2007 .

[17]  Denis Beautemps,et al.  Automatic identification of vowels in the Cued Speech context , 2007 .

[18]  Christian Jutten,et al.  Log-Rayleigh Distribution: A Simple and Efficient Statistical Representation of Log-Spectral Coefficients , 2007, IEEE Transactions on Audio, Speech, and Language Processing.