Introduction to the Special Issue on Recent Advances in Asian Language Spoken Document Retrieval
暂无分享,去创建一个
This special issue on Recent Advances in Asian Language Spoken Document Retrieval of the ACM Transactions on Asian Language Information Processing is devoted to recent advances in the burgeoning field of spoken document processing of Asian languages. Rapidly increasing spoken language content from the various audio media sources such as radio, television, lectures, and telephony recordings has led to an increasing demand for more effective automatic indexing and retrieval of these relatively unstructured spoken documents. Yet, just like text documents, spoken documents can be described by attributes such as subjects, topics, and semantic concepts. As such, the vast amount of spoken documents available should be as accessible to us as are text documents. However, unlike handling text documents which are better structured, for example, with titles, headings, and paragraphs, retrieving and browsing spoken documents relies heavily on the performance of speech recognition engines, which are still far from perfect. For this special issue, we encouraged submissions that report on novel techniques for tasks such as spoken document retrieval, spoken document summarization, spoken document translation, and other related areas of research. Of particular interest are studies that directly address problems involving spoken Asian languages. Spoken Document Retrieval (SDR) is essentially the task of retrieving excerpts from a large collection of spoken documents based on a user’s request. SDR is the key element in many applications such as voice search, voice surveillance, voice data mining, and call center automation. As an interdisciplinary research involving automatic speech recognition, natural language processing, and information retrieval, the area of SDR has benefited much from the advances in speech and language processing as well as from the availability of large spoken databases. The valuable content in these large audio archives needs to be accessed in an effective yet practical way. This need in turn gives rise to the area of spoken document summarization, which seeks to distill salient information while removing redundant, incorrect information from spoken documents to produce user-friendly, skim-able spoken document summaries. Research problems, such as information extraction and spoken summary generation, are our main focus in the area of spoken document summarization.