Multilayer subword units for open-vocabulary spoken document retrieval

This paper describes the application of subword units in an effort of improving open-vocabulary spoken document retrieval performance in the case of highly corrupted recognition output. This paper presents the developed open-vocabulary spoken document retrieval system including the newly proposed subphonetic segment unit and combining multilayer subword units. Our experiments on Japanese spoken documents show that using the proposed subphonetic segment unit can improve retrieval performance, high precision and recall, and a combination of multilayer subword units is also effective.