High priority in highly ranked documents in spoken term detection

In spoken term detection, the retrieval of OOV (Out-Of-Vocabulary) query terms are very important because query terms are likely to be OOV terms. To improve the retrieval performance for OOV query terms, the paper proposes a re-scoring method after determining the candidate segments. Each candidate segment has a matching score and a segment number. Because highly ranked candidate is usually reliable and a user is assumed to select query terms so that they are the special terms for the target documents and they appear frequently in the target documents, we give a high priority to the candidate segments that are included in highly ranked documents by adjusting the matching score. We conducted the performance evaluation experiments for the proposed method using open test collections for SpokenDoc-2 in NTCIR-10. Results showed the retrieval performance was more than 7.0 points improved by the proposed method for two test sets in the test collections, and demonstrated the effectiveness of the proposed method.