Combining multiple subword representations for open-vocabulary spoken document retrieval

The paper describes subword-based approaches for open-vocabulary spoken document retrieval. First, the feasibility of subword units in spoken document retrieval is investigated, and our previously proposed sub-phonetic segment units are compared to typical subword units, such as syllables, phonemes, and triphones. Next, we explore the linear combination of retrieval score from multiple subword representations to improve retrieval performance. Experimental evaluation of open-vocabulary spoken document retrieval tasks demonstrates that our proposed sub-phonetic segment units are more effective than typical subword units, and the linear combination of multiple subword representations resulted in a consistent improvement in the F-measure.