论文信息 - Acceleration for query-by-example using posteriorgram of deep neural network

Acceleration for query-by-example using posteriorgram of deep neural network

Much research has been conducted on spoken term detection. Query-by-example has also been an important topic in spoken term detection, which covers spoken queries. A previous study examined posteriorgrams, which are sequences of output probabilities generated by a deep neural network from speech queries and speech data. Although posteriorgram matching between a spoken query and speech data exhibits improved retrieval accuracy, the time required to search with a spoken query is long, even for a relatively small quantity of speech data. Reducing retrieval time is thus a crucial problem. In this paper, we propose two methods for reducing retrieval time in posteriorgram matching. One is to accelerate posteriorgram matching by transforming a posteriorgram to a bit matrix, and the other is to use a sparse vector method. The first method, "posteriorgram bit operation," transforms the posteriorgrams of both spoken queries and speech data to bit matrices. The second method retains a small number of elements, of which only a few are of high probability, in a posteriorgram. Because most of the elements in a sparse vector are 0, the thousands of output probabilities of the posteriorgram are reduced to only a small number of output probabilities. Evaluation experiments have been carried out using open test collections (Spoken-Doc tasks of NTCIR-10 workshops) [1,2], and the results have demonstrated the effectiveness of the proposed method.

[1] Tatsuya Kawahara,et al. Overview of the NTCIR-10 SpokenDoc-2 Task , 2013, NTCIR.

[2] Yoshiaki Itoh,et al. An STD System for OOV Query Terms Integrating Multiple STD Results of Various Subword units , 2013, NTCIR.

[3] Fabio Valente,et al. English spoken term detection in multilingual recordings , 2010, INTERSPEECH.

[4] K. Maekawa. CORPUS OF SPONTANEOUS JAPANESE : ITS DESIGN AND EVALUATION , 2003 .

[5] Shi-wook Lee,et al. Open-vocabulary spoken document retrieval based on new subword models and subword phonetic similarity , 2006, INTERSPEECH.

[6] Bin Ma,et al. Unsupervised Bottleneck Features for Low-Resource Query-by-Example Spoken Term Detection , 2016, INTERSPEECH.

[7] James R. Glass,et al. Unsupervised spoken keyword spotting via segmental DTW on Gaussian posteriorgrams , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[8] Shi-wook Lee,et al. Rescoring by Combination of Posteriorgram Score and Subword-Matching Score for Use in Query-by-Example , 2016, INTERSPEECH.

[9] Brian Kingsbury,et al. Exploiting diversity for spoken term detection , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10] Tatsuya Kawahara,et al. Overview of the IR for Spoken Documents Task in NTCIR-9 Workshop , 2011, NTCIR.

[11] Sridha Sridharan,et al. A phonetic search approach to the 2006 NIST spoken term detection evaluation , 2007, INTERSPEECH.