Spoken document retrieval method combining query expansion with continuous syllable recognition for NTCIR-SpokenDoc

In this paper, we propose a spoken document retrieval method which combines query expansion with continuous syllable recognition. The proposed method expands a query by using words from the web pages collected by a search engine. It is assumed that relevant document vectors exist on the plane which is constructed from the query vector and the extended vector. The weight parameter between a target document vector and a query vector is calculated for query expansion. In addition, target documents are mapped not only to space constructed by continuous word speech recognition results, but also to space constructed by syllable speech recognition results. Then, the proposed method calculates a distance between the query vector and the document vector for each space and combines these distances. For evaluating the proposed method, we conducted spoken document retrieval experiments on the SpokenDoc task of the NTCIR-9 meeting. The experimental results showed that the proposed method improved the mean average precision score from the baseline provided by the meeting organizer of 0.392784 to 0.406085 when running the formal run of SpokenDoc task.