Document Expansion using a Side Collection for Monolingual and Cross-language Spoken Document Retrieval

This paper presents a method of document expansion using a side collection for improving the overall performance in retrieving spoken documents using text queries. This method is applied to Chinese spoken document retrieval (SDR) tasks where a series of experiments have been carried out for both monolingual and cross-language SDR systems. In our monolingual retrieval experiments, Cantonese broadcast news documents are retrieved using a multi-scale syllable-based approach. Experimental results show that application of document expansion can achieve an improvement of 56% in average inverse rank (AIR). For the cross-language spoken document retrieval (CL-SDR) task where Mandarin broadcast news is retrieved using English textual queries, experimental results show that the use of document expansion has brought 14% relative improvement in retrieval performance.