Evaluation of re-ranking by prioritizing highly ranked documents in spoken term detection

In spoken term detection, the detection of out-of-vocabulary (OOV) query terms is very important because of the high probability of OOV query terms occurring. This paper proposes a re-ranking method for improving the detection accuracy for OOV query terms after extracting candidate sections by conventional method. The candidate sections are ranked by using dynamic time warping to match the query terms to all available spoken documents. Because highly ranked candidate sections are usually reliable and users are assumed to input query terms that are specific to and appear frequently in the target documents, we prioritize candidate sections contained in highly ranked documents by adjusting the matching score. Experiments were conducted to evaluate the performance of the proposed method, using open test collections for the SpokenDoc-2 task in the NTCIR-10 workshop. Results showed that the mean average precision (MAP) was improved more than 7.0 points by the proposed method for the two test sets. Also, the proposed method was applied to the results obtained by other participants in the workshop, in which the MAP was improved by more than 6 points in all cases. This demonstrated the effectiveness of the proposed method.