Applying Neural Network Techniques for Topic Change Detection in the HuComTech Corpus

In the age of The Internet we are generating documents (both written and spoken) at an unprecedented rate. This rate of document creation—as well as the number of already existing documents—makes manual processing time-consuming and costly to the point of infeasibility. This is the reason why we are in need of automatic methods that are suitable for the processing of written as well as spoken documents. One crucial part of processing documents is partitioning said documents into different segments based on the topic being discussed. A self-evident application of this would be for example partitioning a news broadcast into different news stories. One of the first steps of doing so would be identifying the shifts in the topic framework, or in other words, finding the time-interval where the announcer is changing from one news story to the next. Naturally, as the transition between news stories are often accompanied by easily identifiable audio—(e.g. signal) and visual (e.g. change in graphics) cues, this would not be a particularly different task. However, in other cases the solution to this problem would be far less obvious. Here, we approach this task for the case of spoken dialogues (interviews). One particular difficulty of these dialogues is that the interlocutors often switch between languages. Because of this (and in the hope of contributing to the generality of our method) we carried out topic change detection in a content-free manner, focusing on speaker roles, and prosodic features. For the processing of said features we will employ neural networks, and will demonstrate that using the proper classifier combination methods this can lead to a detection performance that is competitive with that of the state-of-the-art.

[1]  Jeffrey C. Reynar An Automatic Method of Finding Topic Boundaries , 1994, ACL.

[2]  György Kovács,et al.  Classification of Formal and Informal Dialogues Based on Turn-Taking and Intonation Using Deep Neural Networks , 2017, SPECOM.

[3]  Saturnino Luz,et al.  Multidisciplinary Medical Team Meetings: An Analysis of Collaborative Working with Special Attention to Timing and Teleconferencing , 2006, Computer Supported Cooperative Work (CSCW).

[4]  Nivja H. Jong,et al.  Praat script to detect syllable nuclei and measure speech rate automatically , 2009, Behavior research methods.

[5]  Alexander I. Rudnicky,et al.  Segmenting meetings into agenda items by extracting implicit supervision from human note-taking , 2007, IUI '07.

[6]  T. IstvánNagy,et al.  Document Classification with Deep Rectifier Neural Networks and Probabilistic Sampling , 2014, TSD.

[7]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[8]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[9]  Andrew Rosenberg,et al.  Classifying Skewed Data: Importance Weighting to Optimize Average Recall , 2012, INTERSPEECH.

[10]  Jacek Kitowski,et al.  Sentiment Analysis with Tree-Structured Gated Recurrent Units , 2017, TSD.

[11]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[12]  Eric Fosler-Lussier,et al.  Discourse Segmentation of Multi-Party Conversation , 2003, ACL.

[13]  Gökhan Tür,et al.  Integrating Prosodic and Lexical Cues for Automatic Topic Segmentation , 2001, CL.

[14]  Adrian-Gabriel Chifu,et al.  SegChain: Towards a generic automatic video segmentation framework, based on lexical chains of audio transcriptions , 2016, WIMS.

[15]  Yang Liu,et al.  Using hidden Markov models for topic segmentation of meeting transcripts , 2008, 2008 IEEE Spoken Language Technology Workshop.

[16]  Mark Steedman,et al.  Using Prosody in ASR: the Segmentation of Broadcast Radio News , 2002 .

[17]  László Tóth,et al.  Training HMM/ANN Hybrid Speech Recognizers by Probabilistic Sampling , 2005, ICANN.

[18]  György Kovács,et al.  Topical unit classification using deep neural nets and probabilistic sampling , 2016, 2016 7th IEEE International Conference on Cognitive Infocommunications (CogInfoCom).

[19]  James R. Glass,et al.  Making Sense of Sound: Unsupervised Topic Segmentation over Acoustic Input , 2007, ACL.

[20]  Hideki Kozima,et al.  Text Segmentation Based on Similarity between Words , 1993, ACL.

[21]  Tamás Váradi,et al.  Language technology tools and resources for the analysis of multimodal communication , 2016, LT4DH@COLING.

[22]  Candace L. Sidner,et al.  Attention, Intentions, and the Structure of Discourse , 1986, CL.

[23]  Fathi M. Salem,et al.  Gate-variants of Gated Recurrent Unit (GRU) neural networks , 2017, 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS).

[24]  Julia Hirschberg,et al.  A Prosodic Analysis of Discourse Segments in Direction-Giving Monologues , 1996, ACL.

[25]  Liang Lu,et al.  On training the recurrent neural network encoder-decoder for large vocabulary end-to-end speech recognition , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[26]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[27]  Rebecca J. Passonneau,et al.  Discourse Segmentation by Human and Automated Means , 1997, CL.

[28]  Matthew Purver,et al.  Meeting Structure Annotation , 2008 .

[29]  Istvan Szekrenyes ProsoTool, a method for automatic annotation of fundamental frequency , 2015, 2015 6th IEEE International Conference on Cognitive Infocommunications (CogInfoCom).

[30]  Gökhan Tür,et al.  Prosody-based automatic segmentation of speech into sentences and topics , 2000, Speech Commun..

[31]  S. Luz Locating case discussion segments in recorded medical team meetings , 2009, SSCS '09.

[32]  Patrice Bellot,et al.  Topic segmentation using weighted lexical links (WLL) , 2007, SIGIR.

[33]  Saturnino Luz,et al.  Assessing the effectiveness of conversational features for dialogue segmentation in medical team meetings and in the AMI corpus , 2010, SIGDIAL Conference.

[34]  Wojciech Zaremba,et al.  An Empirical Exploration of Recurrent Network Architectures , 2015, ICML.

[35]  Matthew Purver,et al.  Meeting Structure Annotation: Data and Tools , 2005, SIGDIAL.

[36]  John D. Lafferty,et al.  Statistical Models for Text Segmentation , 1999, Machine Learning.

[37]  József Dombi,et al.  On a certain class of aggregative operators , 2013, Inf. Sci..

[38]  P. Galu Application of Topic Segmentation in Audiovisual Information Retrieval , 2012 .

[39]  Hervé Bourlard,et al.  Detecting speaker roles and topic changes in multiparty conversations using latent topic models , 2014, INTERSPEECH.

[40]  Benjamin Lecouteux,et al.  COMPARING GRU AND LSTM FOR AUTOMATIC SPEECH RECOGNITION , 2016 .

[41]  László Tóth Phone recognition with deep sparse rectifier neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[42]  Imran A. Sheikh,et al.  Topic segmentation in ASR transcripts using bidirectional RNNS for change detection , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[43]  Freddy Y. Y. Choi Advances in domain independent linear text segmentation , 2000, ANLP.

[44]  Marti A. Hearst Multi-Paragraph Segmentation Expository Text , 1994, ACL.

[45]  Julia Hirschberg,et al.  Acoustic indicators of topic segmentation , 1998, ICSLP.

[46]  Pedro M. Domingos A few useful things to know about machine learning , 2012, Commun. ACM.