Multimodal Non-Cooperative User Identification Technique in Network-based Robot Environments

This paper is concerned with a multimodal non-cooperative user identification technique that combines face recognition with speaker identification frequently used in conjunction with Human-Robot Interaction(HRI) in network-based intelligent service robot environments. For this purpose, we use Tensor Subspace Analysis(TSA) to recognize the user's face through robot camera when robot performs various services in home environments. Thus, the spatial correlation between the pixels in an image can be naturally characterized by TSA. The speaker identification is performed by the conventional Mel-Frequency Cepstral Coefficients - Gaussian Mixture Model(MFCC-GMM) in multichannel environments. Finally, each recognition system is combined by using the linearly weighted sum for multimodal user identification. It can be used as a core technique with fast processing capability for network-based home robot application services. The experimental results on database with distance-varying reveal that the presented method shows good performance in comparison with the individual recognition system and conventional method.