On-Line Speaker Enrollment using Rhythmical Voices for Human Robot Interaction

In this study, we present a simple on-line speaker enrollment and identification among human-robot interaction (HRI) with intelligent service robots. For this purpose, speaker enrollment is performed through rhythmical singing voices or a simple game such as paper-scissors-rock. While the conventional enrollment methods frequently used in the security area should be cooperative, the proposed approach can be enrolled in a very natural way. After enrolling, the text-independent speaker recognition is accomplished by using the well-known mel-frequency cepstral coefficients (MFCC) and Gaussian mixture models (GMM). The experimental results reveal that the proposed approach yields better recognition performance in comparison to the results obtained by the conventional enrollment method.

[1]  Jeih-weih Hung Optimization of filter-bank to improve the extraction of MFCC features in speech recognition , 2004, Proceedings of 2004 International Symposium on Intelligent Multimedia, Video and Speech Processing, 2004..

[2]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[3]  Bing Sun,et al.  Hierarchical speaker identification using speaker clustering , 2003, International Conference on Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003.

[4]  Douglas A. Reynolds,et al.  Text independent speaker identification using automatic acoustic segmentation , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[5]  Richard J. Mammone,et al.  Speaker identification using neural tree networks , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[7]  N. Sedgwick,et al.  Noise compensation for speech recognition using probabilistic models , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.

[9]  Sadaoki Furui,et al.  A text-independent speaker recognition method robust against utterance variations , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.