论文信息 - Comparing Approaches for Automatic Question Identification

Comparing Approaches for Automatic Question Identification

Collecting spontaneous speech corpora that are open-ended, yet topically constrained, is increasingly popular for research in spoken dialogue systems and speaker state, inter alia. Typically, these corpora are labeled by human annotators, either in the lab or through crowd-sourcing; however, this is cumbersome and time-consuming for large corpora. We present four different approaches to automatically tagging a corpus when general topics of the conversations are known. We develop these approaches on the Columbia X-Cultural Deception corpus and find accuracy that significantly exceeds the baseline. Finally, we conduct a cross-corpus evaluation by testing the best performing approach on the Columbia/SRI/Colorado corpus.

Julia Hirschberg | Sarah Ita Levitan | Angel Maredia | Kara Schechtman

[1] M. de Rijke,et al. Short Text Similarity with Word Embeddings , 2015, CIKM.

[2] Maite Taboada,et al. Subtopic Annotation in a Corpus of News Texts: Steps Towards Automatic Subtopic Segmentation , 2013, STIL.

[3] Carlo Strapparava,et al. Corpus-based and Knowledge-based Measures of Text Semantic Similarity , 2006, AAAI.

[4] Julia Hirschberg,et al. Cross-Cultural Production and Detection of Deception from Speech , 2015, WMDD@ICMI.

[5] M. Dolores del Castillo,et al. SyMSS: A syntax-based measure for short-text semantic similarity , 2011, Data Knowl. Eng..

[6] José Gabriel Pereira Lopes,et al. Topic Segmentation Algorithms for Text Summarization and Passage Retrieval: An Exhaustive Evaluation , 2007, AAAI.

[7] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[8] Andreas Stolcke,et al. Distinguishing deceptive from non-deceptive speech , 2005, INTERSPEECH.