Spoken Term Detection Results Using Plural Subword Models by Estimating Detection Performance for Each Query

The present paper proposes a new integration method of plural spoken term detection (STD) results obtained from plural subword models that we previously proposed. We confirmed that these new subword models, which are the 1/2 phone model, the 1/3 phone model, and the sub-phonetic segment (SPS) model, are effective for STD systems, which must be vocabulary-free in order to process arbitrary query words. In addition, these models are more sophisticated on the time axis than conventional phone models, such as the triphone model. In the present study, we utilize the results of the subword models explicitly when integrating the plural results. For this purpose, we introduce an STD performance index that expresses the degree of detection difficulty for each query word. The index is approximated by the recognition accuracy of the query subword sequence. We demonstrate improved performance through experiments using an actual presentation speech corpus.