DCU at NTCIR-10 SpokenDoc2 Passage Retrieval Task

We describe details of our runs and the results obtained for the "2nd round of IR for Spoken Documents (SpokenDoc2)" task. We participated in the passage retrieval from the Corpus of Spoken Document Processing Workshop (SDPWS) task. For our participation in the NTCIR-9 SpokenDoc task, we investigated the use of different content-based segmentation methods that attempt to identify topically coherent units for retrieval. For NTCIR-10 we compare content-based segmentation (the TextTiling algorithm) to division of the content into segments of a fixed number of Inter-Pausal Units (IPUs) using a sliding window, and subsequent combination of overlapping segments into single units in the ranked list of results. Another focus of our submissions to NTCIR-10 is the potential for use of external data for document expansion. For this we used a DBpedia collection for IPU expansion for all segmentation methods.