Acoustic-labial Speaker Verification

This paper describes a multimodal approach for speaker verification. The system consists of two classifiers, one using visual features and the other using acoustic features. A lip tracker is used to extract visual information from the speaking face which provides shape and intensity features. We describe an approach for normalizing and mapping different modalities onto a common confidence interval. We also describe a novel method for integrating the scores of multiple classifiers. Verification experiments are reported for the individual modalities and for the combined classifier. The performance of the integrated system outperformed each sub-system and reduced the false acceptance rate of the acoustic sub-system from 2.3% to 0.5%.

[1]  Timothy F. Cootes,et al.  The Use of Active Shape Models for Locating Structures in Medical Images , 1993, IPMI.

[2]  Roberto Brunelli,et al.  Person identification using multiple cues , 1995, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  N. Thacker,et al.  Speechreading Using Probabilistic Models Speechreading Using Probabilistic Models , 1997 .

[4]  TATuP Redaktion European Conference "Multimedia Applications, Services and Techniques" , 1996 .

[5]  Pierre Jourlin Handling disynchronization phenomena with HMM in connected speech , 1996, 1996 8th European Signal Processing Conference (EUSIPCO 1996).

[6]  Gérard Chollet,et al.  Combining methods to improve speaker verification decision , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[7]  Juergen Luettin,et al.  Locating and tracking facial speech features , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[8]  Man-Wai Mak,et al.  Lip-motion analysis for speech segmentation in noise , 1994, Speech Commun..

[9]  Luc Vandendorpe,et al.  The M2VTS Multimodal Face Database (Release 1.00) , 1997, AVBPA.

[10]  Eric David Petajan,et al.  Automatic Lipreading to Enhance Speech Recognition (Speech Reading) , 1984 .

[11]  Rama Chellappa,et al.  Human and machine recognition of faces: a survey , 1995, Proc. IEEE.

[12]  Gérard Chollet,et al.  Swiss French PolyPhone and PolyVar: telephone speech databases to model inter- and intra-speaker variability , 1996 .

[13]  Sadaoki Furui,et al.  An Overview of Speaker Recognition Technology , 1996 .

[14]  Timothy F. Cootes,et al.  Use of active shape models for locating structures in medical images , 1994, Image Vis. Comput..

[15]  Eric D. Petajan Automatic lipreading to enhance speech recognition , 1984 .

[16]  R. J. Lickley,et al.  Proceedings of the International Conference on Spoken Language Processing. , 1992 .

[17]  Juergen Luettin,et al.  Speaker identification by lipreading , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[18]  Juergen Luettin,et al.  Speechreading using shape and intensity information , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[19]  Shaogang Gong,et al.  Audio- and Video-based Biometric Person Authentication , 1997, Lecture Notes in Computer Science.

[20]  Luc Vandendorpe,et al.  Multi-modal person verification tools using speech and images , 1996 .