论文信息 - Evaluating target utterance identification method using practical free conversation

Evaluating target utterance identification method using practical free conversation

We develop a conversation support system for the public community. Our concept is that supporting elderly person's active life by assisting human-to-human conversation is more effective than providing a speech dialogue system. To use a conversation support system in an actual restaurant or lounge environment, it is necessary to separate the conversation of the target near the microphone from the ambient noise. We have already proposed the identification method of the utterances spoken between near a microphone and far from it using the standard deviation values of the fundamental frequency (SD-F0) and those of the speech power level (SD-SP) for each utterance. In the paper, we evaluate the effectiveness of our identification method for an actual free conversation using Support Vector Machine(SVM) method. As a result, the precision rate of the utterances near the microphone is 87.8%. This means that the identification method using the standard deviations of the fundamental frequency and speech power would be effective even if they are used in real environments. However, the performance depends on the utterances lengths, the F0 value's stability of the utterance part of over the threshold and the position of the microphones. In future, it evaluation should be done using more number of speakers and variable situations to define a suitable system specification.

Yumi Wakita | Naoto Kosaka

[1] Yuta Yoshida,et al. Influence of Personal Characteristics on Nonverbal Information for Estimating Communication Smoothness , 2016, HCI.

[2] Norihiro Hagita,et al. Effectiveness of Social Behaviors for Autonomous Wheelchair Robot to Support Elderly People in Japan , 2015, PloS one.

[3] Boaz Rafaely,et al. Microphone Array Signal Processing , 2008 .

[4] Yumi Wakita,et al. A Simple Identification Method for Differentiating Between Ambient and Target Speech , 2019, 2019 IEEE 8th Global Conference on Consumer Electronics (GCCE).

[5] Tetsuya Shimamura,et al. Pitch extraction by using autocorrelation function on the log spectrum , 2000 .

[6] Taras Butko,et al. Detection and Positioning of Overlapped Sounds in a Room Environment , 2012, INTERSPEECH.

[7] Tatsuya Kawahara,et al. Bayesian Multichannel Audio Source Separation Based on Integrated Source and Spatial Models , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[8] Vanessa Evers,et al. The influence of social presence on acceptance of a companion robot by older people , 2008 .

[9] H. Sebastian Seung,et al. Algorithms for Non-negative Matrix Factorization , 2000, NIPS.