Identity verification using speech and face information

This article first provides an overview of important concept s in the field of information fusion, followed by a review of milestones in audio-visual person identification and verifi cation. Several recent adaptive and non-adaptive techniques for reaching the verification decision (i.e., to accept or re ject the claimant), based on speech and face information, are then evaluated in clean and noisy audio conditions on a common database; it is shown that in clean conditions most of the non-adaptive approaches provide similar performance and in noisy conditions most exhibit a severe deterioration i n performance; it is also shown that current adaptive approac hes are either inadequate or utilize restrictive assumptio ns. A new category of classifiers is then introduced, where the de cision boundary is fixed but constructed to take into account how the distributions of opinions are likely to change due to noisy conditions; compared to a previously proposed adaptive approach, the proposed classifiers do not make a direct assum ption about the type of noise that causes the mismatch between training and testing conditions.

[1]  Nils Sandell,et al.  Detection with Distributed Sensors , 1980, IEEE Transactions on Aerospace and Electronic Systems.

[2]  Ioannis Pitas,et al.  Recent advances in biometric person authentication , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Juergen Luettin,et al.  Visual Speech and Speaker Recognition , 1997 .

[4]  John D. Woodward,et al.  Biometrics: privacy's foe or privacy's friend? , 1997, Proc. IEEE.

[5]  Douglas A. Reynolds,et al.  The NIST speaker recognition evaluation - Overview, methodology, systems, results, perspective , 2000, Speech Commun..

[6]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .

[7]  Sargur N. Srihari,et al.  Decision Combination in Multiple Classifier Systems , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Sridha Sridharan,et al.  The use of temporal speech and lip information for multi-modal speaker identification via multi-stream HMMs , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[9]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Alvin F. Martin,et al.  The NIST speaker recognition evaluation program , 2005 .

[11]  J F Osborn,et al.  Significance tests , 1989, British Dental Journal.

[12]  Wendy Atkins A testing time for face recognition technology , 2001 .

[13]  Arun Ross,et al.  Information fusion in biometrics , 2003, Pattern Recognit. Lett..

[14]  Anil K. Jain,et al.  Integrating Faces and Fingerprints for Personal Identification , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Samy Bengio,et al.  Face verification using adapted generative models , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[16]  S. Mitra,et al.  Handbook for Digital Signal Processing , 1993 .

[17]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[18]  Roberto Brunelli,et al.  Person identification using multiple cues , 1995, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  E. Mayoraz,et al.  Fusion of face and speech data for person identity verification , 1999, IEEE Trans. Neural Networks.

[20]  William H. Press,et al.  Book-Review - Numerical Recipes in Pascal - the Art of Scientific Computing , 1989 .

[21]  D. Reynolds,et al.  Authentication gets personal with biometrics , 2004, IEEE Signal Processing Magazine.

[22]  Douglas A. Reynolds,et al.  A Gaussian mixture modeling approach to text-independent speaker identification , 1992 .

[23]  Alan C. Bovik,et al.  Computer lipreading for improved accuracy in automatic speech recognition , 1996, IEEE Trans. Speech Audio Process..

[24]  Ara V. Nefian,et al.  A Bayesian Approach to Audio-Visual Speaker Identification , 2003, AVBPA.

[25]  Thomas Wagner,et al.  SESAM: A biometric person identification system using sensor fusion , 1997, Pattern Recognit. Lett..

[26]  Volker Roth,et al.  Nonlinear Discriminant Analysis Using Kernel Functions , 1999, NIPS.

[28]  Anil K. Jain,et al.  Hiding Biometric Data , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  John S. D. Mason,et al.  A voice activity detector based on cepstral analysis , 1993, EUROSPEECH.

[30]  Sadaoki Furui,et al.  Recent advances in speaker recognition , 1997, Pattern Recognit. Lett..

[31]  Daniel R. Fuhrmann,et al.  Quadtree Traversal Algorithms for Pointer-Based and Depth-First Representations , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  Jerry D. Cavin Advances in distributed sensor technology , 1996, IEEE Parallel & Distributed Technology: Systems & Applications.

[33]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[34]  Ali Adjoudani,et al.  Audio-visual speech recognition compared across two architectures , 1995, EUROSPEECH.

[35]  Gerasimos Potamianos,et al.  Discriminative training of HMM stream exponents for audio-visual speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[36]  JainAnil,et al.  Integrating Faces and Fingerprints for Personal Identification , 1998 .

[37]  Luís A. Alexandre,et al.  On combining classifiers using sum and product rules , 2001, Pattern Recognit. Lett..

[38]  Pramod K. Varshney,et al.  Distributed Detection and Data Fusion , 1996 .

[39]  William H. Press,et al.  The Art of Scientific Computing Second Edition , 1998 .

[40]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[41]  Juergen Luettin,et al.  Integrating acoustic and labial information for speaker identification and verification , 1997, EUROSPEECH.

[42]  Douglas A. Reynolds,et al.  Experimental evaluation of features for robust speaker identification , 1994, IEEE Trans. Speech Audio Process..

[43]  David Casasent,et al.  Multisensor Image Registration: Experimental Verification , 1981, Optics & Photonics.

[44]  James Llinas,et al.  Handbook of Multisensor Data Fusion , 2001 .

[45]  Jiri Matas,et al.  Combining evidence in personal identity verification systems , 1997, Pattern Recognit. Lett..

[46]  Nils R. Sandell,et al.  Strategies for Distributed Decisionmaking , 1981, IEEE Transactions on Systems, Man, and Cybernetics.

[47]  Shaogang Gong,et al.  Audio- and Video-based Biometric Person Authentication , 1997, Lecture Notes in Computer Science.

[48]  Marc Acheroy,et al.  A Contribution to Multi-Modal Identity Verification Using D ecision Fusion , 1999 .

[49]  Samy Bengio,et al.  Multimodal speech processing using asynchronous Hidden Markov Models , 2004, Inf. Fusion.

[51]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[52]  Joseph Picone,et al.  Signal modeling techniques in speech recognition , 1993, Proc. IEEE.

[53]  Tomaso Poggio,et al.  Automatic person recognition by acoustic and geometric features , 1995 .

[54]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[55]  L. F. Pau,et al.  FUSION OF MULTISENSOR DATA IN PATTERN RECOGNITION , 1982 .

[56]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[57]  Nalini K. Ratha,et al.  Biometric perils and patches , 2002, Pattern Recognit..

[58]  Mübeccel Demirekler,et al.  An information theoretic framework for weight estimation in the combination of probabilistic classifiers for speaker identification , 2000, Speech Commun..

[59]  Samy Bengio,et al.  Statistical transformations of frontal models for non-frontal face verification , 2004, 2004 International Conference on Image Processing, 2004. ICIP '04..

[60]  Gérard Chollet,et al.  Combining methods to improve speaker verification decision , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[61]  Bernhard Fröba,et al.  SESAM: A Biometric Person Identification System Using Sensor Fusion , 1997, AVBPA.

[62]  Juergen Luettin,et al.  Acoustic-labial Speaker Verification , 1997, AVBPA.

[63]  Pramod K. Varshney,et al.  Multisensor Data Fusion , 1997, IEA/AIE.

[64]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[65]  Horst Bunke,et al.  Combination of Classifiers on the Decision Level for Face Recognition , 1996 .

[66]  James Llinas,et al.  Multisensor Data Fusion , 1990 .

[67]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[68]  Samy Bengio,et al.  Multimodal Authentication Using Asynchronous HMMs , 2003, AVBPA.

[69]  Tim Wark,et al.  Multi-modal speech processing for automatic speaker recognition , 2001 .

[70]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[71]  Ren C. Luo,et al.  Multisensor integration and fusion for intelligent machines and systems , 1995 .

[72]  Chin-Chuan Han,et al.  Why recognition in a statistics-based face recognition system should be based on the pure face portion: a probabilistic decision-based proof , 2001, Pattern Recognit..

[73]  James L. Wayman Digital signal processing in biometric identification: a review , 2002, Proceedings. International Conference on Image Processing.

[74]  Kuldip K. Paliwal,et al.  Noise compensation in a person verification system using face and multiple speech feature , 2003, Pattern Recognit..

[75]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[76]  Mübeccel Demirekler,et al.  Comparison of different objective functions for optimal linear combination of classifiers for speaker identification , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[77]  Aaron E. Rosenberg,et al.  On the use of instantaneous and transitional spectral information in speaker recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[78]  Kuldip K. Paliwal,et al.  USE OF VOICING AND PITCH INFORMATION FOR SPEAKER RECOGNITION , 2000 .

[79]  Sara H. Basson,et al.  NTIMIT: a phonetically balanced, continuous speech, telephone bandwidth speech database , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[80]  Cheng-Shang Chang Calculus , 2020, Bicycle or Unicycle?.

[81]  Sridha Sridharan,et al.  Robust speaker verification via fusion of speech and lip modalities , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[82]  David G. Stork,et al.  Pattern Classification , 1973 .

[83]  Samy Bengio,et al.  Non-Linear Variance Reduction Techniques in Biometric Authentication , 2003 .

[84]  Vlasta Radová,et al.  An approach to speaker identification using multiple classifiers , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[85]  Kuldip K. Paliwal,et al.  Fast features for face authentication under illumination direction changes , 2003, Pattern Recognit. Lett..