An STD System Using Multiple STD Results and Multiple Rescoring Method for NTCIR-12 SpokenQuery&Doc Task

Researches of Spoken Term Detection (STD) have been actively conducted in recent years. The task of STD is searching for a particular speech segment from a large amount of multimedia data that include audio or speech data. In NTCIR-12, a task containing multiple spoken queries is newly added to the STD task. In this paper, we explain an STD system that our team developed for the NTCIR-12 SpokenQuery & Doc task. We have already proposed the various methods to improve the STD accuracy for out-ofvocabulary (OOV) query terms. Our method consists of four steps. First, multiple automatic speech recognizers (ASRs) are performed for spoken documents using triphone, syllables, demiphone and SPS and multiple speech recognition results are obtained. Retrieval results are obtained for each subword unit. Second, these retrieval results are integrated [1][2]. Third, we apply a rescoring method to improve the STD accuracy that contains highly ranked candidates [3]. Lastly, a rescoring method is applied to compare a query with spoken documents in more detail by using the posterior probability obtained from Deep Neural Network (DNN) [4]. We apply this method to only the top candidates to reduce the retrieval time [5]. For a spoken query, we use two rescoring methods. First method compares two posterior probability vectors of the spoken query and spoken documents. Second method utilizes the papers in proceedings. We apply these methods to the test collection of NTCIR-12 and show experimental results for these methods.