Acceleration for query-by-example using posteriorgram of deep neural network

Much research has been conducted on spoken term detection. Query-by-example has also been an important topic in spoken term detection, which covers spoken queries. A previous study examined posteriorgrams, which are sequences of output probabilities generated by a deep neural network from speech queries and speech data. Although posteriorgram matching between a spoken query and speech data exhibits improved retrieval accuracy, the time required to search with a spoken query is long, even for a relatively small quantity of speech data. Reducing retrieval time is thus a crucial problem. In this paper, we propose two methods for reducing retrieval time in posteriorgram matching. One is to accelerate posteriorgram matching by transforming a posteriorgram to a bit matrix, and the other is to use a sparse vector method. The first method, "posteriorgram bit operation," transforms the posteriorgrams of both spoken queries and speech data to bit matrices. The second method retains a small number of elements, of which only a few are of high probability, in a posteriorgram. Because most of the elements in a sparse vector are 0, the thousands of output probabilities of the posteriorgram are reduced to only a small number of output probabilities. Evaluation experiments have been carried out using open test collections (Spoken-Doc tasks of NTCIR-10 workshops) [1,2], and the results have demonstrated the effectiveness of the proposed method.