论文信息 - Constructing Japanese test collections for spoken term detection

Constructing Japanese test collections for spoken term detection

Spoken Document Retrieval (SDR) and Spoken Term Detection (STD) have been two of the most intensively investigated topics in spoken document processing research according to the establishment of the SDR and STD test collections by the Text REtrieval Conference (TREC) and NIST. Because Japanese spoken document processing researchers also requires such test collections for SDR and STD, we have established a working group to develop these collections in Special Interest Group -Spoken Language Processing (SIG-SLP) of the Information Processing Society of Japan. The working group has constructed and made available a test collection for SDR, and is now constructing new test collections for STD that will be open to researchers. The present paper introduces the policies, outline, and schedule of the new test collections. Then, the new test collections are compared with the NIST STD test collections. Index Terms: spoken term detection, test collection

[1] K. Maekawa. CORPUS OF SPONTANEOUS JAPANESE : ITS DESIGN AND EVALUATION , 2003 .

[2] Herbert Gish,et al. Rapid and accurate spoken term detection , 2007, INTERSPEECH.

[3] James R. Glass,et al. Recent progress in the MIT spoken lecture processing project , 2007, INTERSPEECH.

[4] Ellen M. Voorhees,et al. The TREC Spoken Document Retrieval Track: A Success Story , 2000, TREC.

[5] Ellen M. Voorhees,et al. 1998 TREC-7 Spoken Document Retrieval Track Overview and Results , 1998 .

[6] Tatsuya Kawahara,et al. Construction of a Test Collection for Spoken Document Retrieval from Lecture Audio Data , 2009, J. Inf. Process..

[7] Karen Spärck Jones,et al. TREC-6 1997 Spoken Document Retrieval Track Overview and Results , 1997, TREC.

[8] Beth Logan,et al. Confusion-based query expansion for OOV words in spoken document retrieval , 2002, INTERSPEECH.