Exploring Methods for Predicting Important Utterances Contributing to Meeting Summarization

Meeting minutes are useful, but creating meeting summaries are a time consuming task. Aiming at supporting such task, this paper proposes prediction models for important utterances that should be included in the meeting summary by using multimodal and multiparty features. We will tackle this issue from two approaches: Handcrafted feature models and deep neural network models. The best handcrafted feature model achieved 0.707 in F-measure, and the best deep-learning based verbal and nonverbal model (V-NV model) achieved 0.827 in F-measure. Based on the V-NV model, we implemented a meeting browser, and conducted a user study. The results showed that the proposed meeting browser better contributes to the understanding of the content of the discussion and the participant roles in the discussion than the conventional text-based browser.

[1]  Yuanliu Liu,et al.  Video-based emotion recognition using CNN-RNN and C3D hybrid networks , 2016, ICMI.

[2]  Konstantinos Koumpis,et al.  Automatic summarization of voicemail messages using lexical and prosodic features , 2005, TSLP.

[3]  Ichiro Sakata,et al.  Extractive Summarization Using Multi-Task Learning with Document Classification , 2017, EMNLP.

[4]  R. Bales,et al.  Personality and Interpersonal Behavior. , 1971 .

[5]  Susanne Burger,et al.  The ISL meeting corpus: the impact of meeting type on speech style , 2002, INTERSPEECH.

[6]  Alex Pentland,et al.  Towards Measuring Human Interactions in Conversational Settings , 2001 .

[7]  Min Yang,et al.  Abstractive Meeting Summarization via Hierarchical Adaptive Segmental Network Learning , 2019, WWW.

[8]  Ming Zhou,et al.  Ranking with Recursive Neural Networks and Its Application to Multi-Document Summarization , 2015, AAAI.

[9]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..

[10]  Jordi Vitrià,et al.  Automatic Detection of Dominance and Expected Interest , 2010, EURASIP J. Adv. Signal Process..

[11]  Dilek Z. Hakkani-Tür,et al.  Integrating prosodic features in extractive meeting summarization , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[12]  Louis-Philippe Morency,et al.  Deep multimodal fusion for persuasiveness prediction , 2016, ICMI.

[13]  Daniel Gatica-Perez,et al.  One of a kind: inferring personality impressions in meetings , 2013, ICMI '13.

[14]  Jean Carletta,et al.  Extractive summarization of meeting recordings , 2005, INTERSPEECH.

[15]  Daniel Gatica-Perez,et al.  Hire me: Computational Inference of Hirability in Employment Interviews Based on Nonverbal Behavior , 2014, IEEE Transactions on Multimedia.

[16]  Houfeng Wang,et al.  Learning Summary Prior Representation for Extractive Summarization , 2015, ACL.

[17]  Daniel Gatica-Perez,et al.  Emergent leaders through looking and speaking: from audio-visual data to multimodal recognition , 2012, Journal on Multimodal User Interfaces.

[18]  Erik Cambria,et al.  Convolutional MKL Based Multimodal Emotion Recognition and Sentiment Analysis , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[19]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[20]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[21]  Mirella Lapata,et al.  Neural Summarization by Extracting Sentences and Words , 2016, ACL.

[22]  Giuseppe Carenini,et al.  Methods for Mining and Summarizing Text Conversations , 2011, Synthesis Lectures on Data Management.

[23]  Mitsutoshi Okazaki,et al.  Revised NEO Personality Inventory(NEO-PI-R)を用いたてんかん患者におけるパーソナリティ傾向に関する検討 , 2018 .

[24]  Daniel Gatica-Perez,et al.  Detection and application of influence rankings in small group meetings , 2006, ICMI '06.

[25]  Tara N. Sainath,et al.  Learning the speech front-end with raw waveform CLDNNs , 2015, INTERSPEECH.

[26]  Giuseppe Carenini,et al.  Summarizing Spoken and Written Conversations , 2008, EMNLP.

[27]  Alex Waibel,et al.  MEETING BROWSER: TRACKING AND SUMMARIZING MEETINGS , 2007 .

[28]  Majid Sarrafzadeh,et al.  Toward Unsupervised Activity Discovery Using Multi-Dimensional Motif Detection in Time Series , 2009, IJCAI.

[29]  Giuseppe Carenini,et al.  Generating and Validating Abstracts of Meeting Conversations: a User Study , 2010, INLG.

[30]  Stephen Wan,et al.  Generating Overview Summaries of Ongoing Email Thread Discussions , 2004, COLING.

[31]  Gabriel Murray Abstractive Meeting Summarization as a Markov Decision Process , 2015, Canadian Conference on AI.

[32]  Samy Bengio,et al.  Detecting group interest-level in meetings , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[33]  Steve Whittaker,et al.  Have a say over what you see: evaluating interactive compression techniques , 2009, IUI.

[34]  Xin Wang,et al.  Predicting Polarities of Tweets by Composing Word Embeddings with Long Short-Term Memory , 2015, ACL.

[35]  Thomas Hofmann,et al.  Greedy Layer-Wise Training of Deep Networks , 2007 .

[36]  Haoran Li,et al.  Multi-modal Summarization for Asynchronous Collection of Text, Image, Audio and Video , 2017, EMNLP.

[37]  Hermann Ney,et al.  Convolutional neural networks for acoustic modeling of raw time signal in LVCSR , 2015, INTERSPEECH.

[38]  Claire Cardie,et al.  Focused Meeting Summarization via Unsupervised Relation Extraction , 2012, SIGDIAL Conference.

[39]  Carl Vogel,et al.  Modeling Collaborative Multimodal Behavior in Group Dialogues: The MULTISIMO Corpus , 2018, LREC.

[40]  Jean Carletta,et al.  The AMI Meeting Corpus: A Pre-announcement , 2005, MLMI.

[41]  Gabriel Murray,et al.  Using Speech-Specific Characteristics for Automatic Speech Summarization , 2008 .

[42]  Erik Cambria,et al.  Deep Convolutional Neural Network Textual Features and Multiple Kernel Learning for Utterance-level Multimodal Sentiment Analysis , 2015, EMNLP.

[43]  Giuseppe Carenini,et al.  Automatic Community Creation for Abstractive Spoken Conversations Summarization , 2017, NFiS@EMNLP.

[44]  Andreas Stolcke,et al.  The ICSI Meeting Corpus , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[45]  Jade Goldstein-Stewart,et al.  The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries , 1998, SIGIR Forum.

[46]  Petra Wagner,et al.  D64: a corpus of richly recorded conversational interaction , 2013, Journal on Multimodal User Interfaces.

[47]  Noel E. O'Connor,et al.  Shallow and Deep Convolutional Networks for Saliency Prediction , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[49]  Alex Pentland,et al.  Using the influence model to recognize functional roles in meetings , 2007, ICMI '07.

[50]  Johanna D. Moore,et al.  Evaluating Automatic Summaries of Meeting Recordings , 2005, IEEvaluation@ACL.

[51]  Julia Hirschberg,et al.  Comparing lexical, acoustic/prosodic, structural and discourse features for speech summarization , 2005, INTERSPEECH.

[52]  Hatice Gunes,et al.  Continuous Prediction of Spontaneous Affect from Multiple Cues and Modalities in Valence-Arousal Space , 2011, IEEE Transactions on Affective Computing.

[53]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[54]  Marc'Aurelio Ranzato,et al.  Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[55]  Claire Cardie,et al.  Domain-Independent Abstract Generation for Focused Meeting Summarization , 2013, ACL.

[56]  Hung-Hsuan Huang,et al.  Predicting Influential Statements in Group Discussions using Speech and Head Motion Information , 2014, ICMI.

[57]  Shiliang Zhang,et al.  Multimodal Deep Convolutional Neural Network for Audio-Visual Emotion Recognition , 2016, ICMR.

[58]  Xin Liu,et al.  Generic text summarization using relevance measure and latent semantic analysis , 2001, SIGIR '01.

[59]  Elizabeth Shriberg,et al.  Spotting "hot spots" in meetings: human judgments and prosodic cues , 2003, INTERSPEECH.

[60]  Subramanian Ramanathan,et al.  Connecting Meeting Behavior with Extraversion—A Systematic Study , 2012, IEEE Transactions on Affective Computing.

[61]  Jean Carletta,et al.  Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization , 2005, ACL 2005.

[62]  Hiroshi Murase,et al.  Quantifying interpersonal influence in face-to-face conversations based on visual attention patterns , 2006, CHI Extended Abstracts.

[63]  Junji Yamato,et al.  Linking speaking and looking behavior patterns with group composition, perception, and performance , 2012, ICMI '12.

[64]  Johanna D. Moore,et al.  Improving meeting summarization by focusing on user needs: a task-oriented evaluation , 2009, IUI.

[65]  Jean-Marc Odobez,et al.  Investigating automatic dominance estimation in groups from visual attention and speaking activity , 2008, ICMI '08.

[66]  Berna Erol,et al.  Multimodal summarization of meeting recordings , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[67]  Susannah B. F. Paletz,et al.  The Teams Corpus and Entrainment in Multi-Party Spoken Dialogues , 2016, EMNLP.

[68]  Chuohao Yeo,et al.  Modeling Dominance in Group Conversations Using Nonverbal Activity Cues , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[69]  Michel Galley,et al.  A Skip-Chain Conditional Random Field for Ranking Meeting Utterances by Importance , 2006, EMNLP.

[70]  Erik Cambria,et al.  Context-Dependent Sentiment Analysis in User-Generated Videos , 2017, ACL.

[71]  Frank Rudzicz,et al.  Summarizing multiple spoken documents: finding evidence from untranscribed audio , 2009, ACL/IJCNLP.

[72]  Xuanjing Huang,et al.  Attention-Based Convolutional Neural Network for Semantic Relation Extraction , 2016, COLING.

[73]  Subramanian Ramanathan,et al.  Automatic modeling of personality states in small group interactions , 2011, MM '11.

[74]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.