Noise compensation in a person verification system using face and multiple speech feature

In this paper, we demonstrate that use of a recently proposed feature set, termed Maximum Auto-Correlation Values, which utilizes information from the source part of the speech signal, significantly improves the robustness of a text independent identity verification system. We also propose an adaptive fusion technique for integration of audio and visual information in a multi-modal verification system. The proposed technique explicitly measures the quality of the speech signal, adjusting the amount of contribution of the speech modality to the final verification decision. Results on the VidTIMIT database indicate that the proposed approach outperforms existing adaptive and non-adaptive fusion techniques. For a wide range of audio SNRs, the performance of the multi-modal system utilizing the proposed technique is always found to be better than the performance of the face modality.

[1]  Johan Stephen Simeon Ballot Face recognition using Hidden Markov Models , 2005 .

[2]  Douglas A. Reynolds,et al.  A Gaussian mixture modeling approach to text-independent speaker identification , 1992 .

[3]  Alan C. Bovik,et al.  Computer lipreading for improved accuracy in automatic speech recognition , 1996, IEEE Trans. Speech Audio Process..

[4]  Frank K. Soong,et al.  On the use of instantaneous and transitional spectral information in speaker recognition , 1988, IEEE Trans. Acoust. Speech Signal Process..

[5]  H. Gish,et al.  Text-independent speaker identification , 1994, IEEE Signal Processing Magazine.

[6]  E. Mayoraz,et al.  Fusion of face and speech data for person identity verification , 1999, IEEE Trans. Neural Networks.

[7]  Richard J. Mammone,et al.  Channel estimation and normalization by coherent spectral averaging for robust speaker recognition , 2000 .

[8]  Aaron E. Rosenberg,et al.  On the use of instantaneous and transitional spectral information in speaker recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Anil K. Jain,et al.  Integrating Faces and Fingerprints for Personal Identification , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Kuldip K. Paliwal,et al.  USE OF VOICING AND PITCH INFORMATION FOR SPEAKER RECOGNITION , 2000 .

[11]  Sara H. Basson,et al.  NTIMIT: a phonetically balanced, continuous speech, telephone bandwidth speech database , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[12]  Lawrence R. Rabiner,et al.  On the use of autocorrelation analysis for pitch detection , 1977 .

[13]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[14]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[15]  Joseph Picone,et al.  Signal modeling techniques in speech recognition , 1993, Proc. IEEE.

[16]  JainAnil,et al.  Integrating Faces and Fingerprints for Personal Identification , 1998 .

[17]  Sridha Sridharan,et al.  The use of temporal speech and lip information for multi-modal speaker identification via multi-stream HMMs , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[18]  Thomas W. Parsons,et al.  Voice and Speech Processing , 1986 .

[19]  Douglas A. Reynolds,et al.  Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..

[20]  Arun Ross,et al.  Information fusion in biometrics , 2003, Pattern Recognit. Lett..

[21]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .

[22]  Tim Wark,et al.  Multi-modal speech processing for automatic speaker recognition , 2001 .

[23]  Brian C. J. Moore,et al.  Chapter 5 – Frequency Analysis and Masking , 1995 .

[24]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[25]  Kuldip K. Paliwal,et al.  Adaptive Multi-Modal Person Verification System , 2000 .

[26]  Peter Vary,et al.  Digital Speech Signal Processing , 2004 .

[27]  Chin-Chuan Han,et al.  Why recognition in a statistics-based face recognition system should be based on the pure face portion: a probabilistic decision-based proof , 2001, Pattern Recognit..

[28]  Sadaoki Furui,et al.  Recent advances in speaker recognition , 1997, Pattern Recognit. Lett..

[29]  Douglas D. O'Shaughnessy,et al.  Speech communications - human and machine, 2nd Edition , 2000 .

[30]  T. Moon The expectation-maximization algorithm , 1996, IEEE Signal Process. Mag..