User recognition is one of the most fundamental functionalities for intelligent service robots. However, in robot applications, the conditions are far severer compared to the traditional biometric security systems. The robots should be able to recognize users non-intrusively, which confines the available biometric features to face and voice. Also, the robots are expected to recognize users from relatively afar, which inevitably deteriorates the accuracy of each recognition module. In this paper, we tried to improve the overall accuracy by integrating the evidences issued by independently developed face and speaker recognition modules. Each recognition module exhibits different statistical characteristics in representing its confidence of the recognition. Therefore, it is essential to transform the evidences to a normalized form to integrate the results. This paper introduces a novel approach to integrate mutually independent multiple evidences to achieve an improved performance. Typical approach to this problem is to model the statistical characteristics of the evidences by well-known parametric form such as Gaussian. Using Mahalanobis distance is a good example. However, the characteristics of the evidences often do not fit into the parametric models, which results in performance degradation. To overcome this problem, we adopted a discrete PDF that can model the statistical characteristics as it is. To confirm the validity of the proposed method, we used a multi-modal database that consists of 10 registered users and 550 probe data. Each probe data contains face image and voice signal. Face and speaker recognition modules are applied to generate respective evidences. The experiment showed an improvement of 11.27% in accuracy compared to the individual recognizers, which is 2.72% better than the traditional Mahalanobis distance approach.
[1]
Ho-Sub Yoon,et al.
A vision-based user authentication system in robot environments by using semi-biometrics and tracking
,
2005,
2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[2]
P. P. Vaidyanathan,et al.
Discrete pdf estimation in the presence of noise
,
2004,
2004 IEEE International Symposium on Circuits and Systems (IEEE Cat. No.04CH37512).
[3]
Daijin Kim,et al.
Robust Real-Time Face Detection Using Face Certainty Map
,
2007,
ICB.
[4]
Marta Mrak,et al.
A resolution adaptive interpolation technique for enhanced decoding of scalable coded video
,
2005,
Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..
[5]
Hyeyoung Park,et al.
A New Similarity Measure Based on Intraclass Statistics for Biometrie Systems
,
2003
.
[6]
Do-Hyung Kim,et al.
Real-time face verification using multiple feature combination and a support vector machine supervisor
,
2003,
2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).
[7]
Kyu-Dae Ban,et al.
Speech-based Human-Robot Interaction Components for URC Intelligent Service Robots
,
2006,
2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[8]
Alex Pentland,et al.
Probabilistic Visual Learning for Object Representation
,
1997,
IEEE Trans. Pattern Anal. Mach. Intell..