Evaluating target utterance identification method using practical free conversation

We develop a conversation support system for the public community. Our concept is that supporting elderly person's active life by assisting human-to-human conversation is more effective than providing a speech dialogue system. To use a conversation support system in an actual restaurant or lounge environment, it is necessary to separate the conversation of the target near the microphone from the ambient noise. We have already proposed the identification method of the utterances spoken between near a microphone and far from it using the standard deviation values of the fundamental frequency (SD-F0) and those of the speech power level (SD-SP) for each utterance. In the paper, we evaluate the effectiveness of our identification method for an actual free conversation using Support Vector Machine(SVM) method. As a result, the precision rate of the utterances near the microphone is 87.8%. This means that the identification method using the standard deviations of the fundamental frequency and speech power would be effective even if they are used in real environments. However, the performance depends on the utterances lengths, the F0 value's stability of the utterance part of over the threshold and the position of the microphones. In future, it evaluation should be done using more number of speakers and variable situations to define a suitable system specification.