Identification of spoken questions using similarity-based TF.AoI

Similarity is utilized in the retrieval and extraction of information, but it can also be used in dialog processing. Spoken dialog processing must deal with speech recognition error, interjections and noise, and it is rare that the same expressions are used consistently. It is required to find a sentence which is similar to the input sentence while taking account of these phenomena. This paper proposes an identification method for the question sentence based on TF⋅AoI (term frequency × amount of information) weighting. In this method, the words contained in the input sentence are weighted by (word similarity) × (amount of information). Then, based on the calculated Euclidean distance, the response corresponding to the question with the highest similarity is output. Comparison experiments verify an improvement of 13 points over the method of comparison by matching ratio to the input sentence, and by 6.5 points over the method of “similarity by TF⋅AoI weighting.” © 2007 Wiley Periodicals, Inc. Syst Comp Jpn, 38(10): 81–94, 2007; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/scj.20363