Multisensor user authentication

User recognition is examined using neural and conventional techniques for processing speech and face images. This article for the first time attempts to overcome this significant problem of distortions inherently captured over multiple sessions (days). Speaker recognition uses both Linear Predictive Coding (LPC) cepstral and auditory neural model representations with speaker dependent codebook designs. For facial imagery, recognition is developed on a neural network that consists of a single hidden layer multilayer perceptron backpropagation network using either the raw data as inputs or principal components of the raw data computed using the Karhunen-Loeve Transform as inputs. The data consists of 10 subjects; each subject recorded utterances and had images collected for 10 days. The utterances collected were 400 rich phonetic sentences (4 sec), 200 subject name recordings (3 sec), and 100 imposter name recordings (3 sec). Face data consists of over 2000, 32 X 32 pixel, 8 bit gray scale images of the 10 subjects. Each subsystem attains over 90% verification accuracy individually using test data gathered on day following the training data.

[1]  Hiroaki Hattori,et al.  Text-independent speaker recognition using neural networks , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Oded Ghitza,et al.  Auditory neural feedback as a basis for speech processing , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[3]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[4]  George R. Doddington,et al.  An integrated pitch tracking algorithm for speech systems , 1983, ICASSP.

[5]  M. K. Fleming,et al.  Categorization of faces using unsupervised feature extraction , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[6]  John S. Baras,et al.  Free-text speaker identification over long distance telephone channel using hypothesized phonetic segmentation , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Aaron E. Rosenberg,et al.  On the use of instantaneous and transitional spectral information in speaker recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Richard F. Lyon,et al.  An analog electronic cochlea , 1988, IEEE Trans. Acoust. Speech Signal Process..

[9]  Timothy R. Anderson,et al.  A comparison of auditory models for speaker independent phoneme recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Alex Pentland,et al.  Recognition in face space , 1991, Other Conferences.

[11]  M. Hunt,et al.  Speaker dependent and independent speech recognition experiments with an auditory model , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[12]  S. Seneff A joint synchrony/mean-rate model of auditory speech processing , 1990 .

[13]  Biing-Hwang Juang,et al.  A vector quantization approach to speaker recognition , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Garrison W. Cottrell,et al.  EMPATH: Face, Emotion, and Gender Recognition Using Holons , 1990, NIPS.

[15]  K. Payton Vowel processing by a model of the auditory periphery: A comparison to eighth‐nerve responses , 1988 .

[16]  Steven K. Rogers,et al.  Auditory model representation for speaker recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.