论文信息 - Error weighted classifier combination for multi-modal human identification

Error weighted classifier combination for multi-modal human identification

Abstract In this paper we describe a technique of classiﬁer combi-nation used in a human identiﬁcation system. The systemintegrates all available features from multi-modal sourceswithin a Bayesian framework. The framework allows repre-senting a class of popular classiﬁer combination rules andmethods within a single formalism. It relies on a “per-class” measure of conﬁdence derived from performance ofeach classiﬁer on training data that is shown to improveperformance on a synthetic data set. The method is es-pecially relevant in autonomous surveillance setting wherevarying time scales and missing features are a commonoccurrence. We show an application of this technique tothe real-world surveillance database of video and audiorecordings of people collected over several weeks in the of-ﬁce setting. 1 Introduction and Motivation In problems of biometric veriﬁcation and identiﬁcation alarge role is played by the multi-modal aspect of the obser-vation. A person can be identiﬁed by a number of features,including face, height, body shape, gait, voice etc. How-ever, the features are not equal in their overall contributionto identifying a person. For instance, modern algorithms forface classiﬁcation (e.g. [11]) and speaker identiﬁcation (e.g.[6]) can attain high recognition rates, provided that the datais well formed and is relatively free of variations and noise,while other features, such as, gait (e.g. [1]) or body shape,are only mildly discriminative.Even though one can achieve high recognition rateswhen classifying some of these features, in reality they areobserved only relatively rarely - in a surveillance video se-quence the face image can only be used if the person is closeenough and is facing the camera, or a person’s voice whenthe person is speaking. In contrast, there is a plentiful sup-ply of the less discriminative features. This situation is il-lustrated on an example of one of our video sequences inﬁgure 1.

Thomas Serre | Yuri Ivanov | Jacob Bouvrie

[1] Larry S. Davis,et al. Stride and cadence as a biometric in automatic person identification and verification , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[2] Jeho Nam,et al. Speaker identification and video analysis for hierarchical video shot classification , 1997, Proceedings of International Conference on Image Processing.

[3] Jiri Matas,et al. Combining evidence in personal identity verification systems , 1997, Pattern Recognit. Lett..

[4] Arun Ross,et al. Information fusion in biometrics , 2003, Pattern Recognit. Lett..

[5] Azriel Rosenfeld,et al. Face recognition: A literature survey , 2003, CSUR.

[6] Josef Kittler,et al. Combining multiple classifiers by averaging or by multiplying? , 2000, Pattern Recognit..

[7] Jiri Matas,et al. Combining Evidence in Multimodal Personal Identity Recognition Systems , 1997, AVBPA.

[8] Thomas Serre,et al. Categorization by Learning and Combining Object Parts , 2001, NIPS.

[9] Robert P. W. Duin,et al. A Discussion on the Classifier Projection Space for Classifier Combining , 2002, Multiple Classifier Systems.

[10] Jeff A. Bilmes,et al. Directed graphical models of classifier combination: application to phone recognition , 2000, INTERSPEECH.