ProMETheus: An Intelligent Mobile Voice Meeting Minutes System

In this paper, we focus on designing and developing ProMETheus, an intelligent system for meeting minutes generated from audio data. The first task in ProMETheus is to recognize the speakers from noisy audio data. Speaker recognition algorithm is used to automatically identify who is speaking according to the speech in an audio data. Naturally, speech recognition will transcribe speakers' audio to text so that ProMETheus can generate the complete meeting text with speakers' name chronologically. In order to show the subject of the meeting and the agreed action, we use text summarization algorithm that can extract meaningful key phrases and summary sentences from the complete meeting text. In addition, sentiment analysis for meeting text of different speakers can make the agreed action more humane due to calculating the relevance score of each course by the sentiment and attitude in text tone. The ProMETheus is capable of accurately summarizing the meeting and analyzing the agreed action. Our robust system is evaluated on a real-world audio meeting dataset that involves multiple speakers in each meeting session.

[1]  Wei-Ying Ma,et al.  Web-page classification through summarization , 2004, SIGIR '04.

[2]  Lorenza Mondada,et al.  The interactional production of multiple spatialities within a participatory democracy meeting , 2011 .

[3]  Brigitte Bigi,et al.  Using Kullback-Leibler Distance for Text Categorization , 2003, ECIR.

[4]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[5]  M. K. Soni,et al.  Speaker Recognition using Support Vector Machine , 2014 .

[6]  Quoc V. Le,et al.  Multi-task Sequence to Sequence Learning , 2015, ICLR.

[7]  Chong Wang,et al.  Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[8]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[9]  Gurpreet Singh Lehal,et al.  A Survey of Text Summarization Extractive Techniques , 2010 .

[10]  Deepti Singh,et al.  Voice activity detection , 2007, CROS.

[11]  Krys J. Kochut,et al.  Text Summarization Techniques: A Brief Survey , 2017, International Journal of Advanced Computer Science and Applications.

[12]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[13]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[14]  Ani Nenkova,et al.  Beyond SumBasic: Task-focused summarization with sentence simplification and lexical expansion , 2007, Information Processing & Management.

[15]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[16]  Taher H. Haveliwala Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search , 2003, IEEE Trans. Knowl. Data Eng..

[17]  S. Rogelberg,et al.  Lateness to meetings: Examination of an unexplored temporal phenomenon , 2014 .

[18]  Lukás Burget,et al.  The AMI System for the Transcription of Speech in Meetings , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[19]  Moustafa Youssef,et al.  BLEDoorGuard: A Device-Free Person Identification Framework Using Bluetooth Signals for Door Access , 2018, IEEE Internet of Things Journal.

[20]  Claire Cardie,et al.  39. Opinion mining and sentiment analysis , 2014 .

[21]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[22]  Lukás Burget,et al.  The 2005 AMI System for the Transcription of Speech in Meetings , 2005, MLMI.

[23]  Peter Wiemer-Hastings,et al.  Latent semantic analysis , 2004, Annu. Rev. Inf. Sci. Technol..

[24]  Lukás Burget,et al.  Discriminatively trained Probabilistic Linear Discriminant Analysis for speaker verification , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  Bernhard Schölkopf,et al.  Comparing support vector machines with Gaussian kernels to radial basis function classifiers , 1997, IEEE Trans. Signal Process..

[26]  Tejashri Inadarchand Jain,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2010 .

[27]  John H. L. Hansen,et al.  A Study on Universal Background Model Training in Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[28]  Joseph A. Allen,et al.  Participate or else!: The effect of participation in decision-making in meetings on employee engagement. , 2015 .

[29]  Anne E. James,et al.  Agent Based Ontology Driven Virtual Meeting Assistant , 2010, FGIT.

[30]  Phil Blunsom,et al.  A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[31]  Gökhan Tür,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. 1 The CALO Meeting Assistant System , 2022 .

[32]  Douglas A. Reynolds,et al.  Deep Neural Network Approaches to Speaker and Language Recognition , 2015, IEEE Signal Processing Letters.

[33]  Moustafa Youssef,et al.  Who Opened the Room? Device-Free Person Identification Using Bluetooth Signals in Door Access , 2017, 2017 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData).

[34]  Jean Carletta,et al.  The AMI Meeting Corpus: A Pre-announcement , 2005, MLMI.

[35]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[36]  Thomas Hain,et al.  Recognition and understanding of meetings the AMI and AMIDA projects , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[37]  David Kirk,et al.  NVIDIA cuda software and gpu parallel computing architecture , 2007, ISMM '07.

[38]  P. Wayne Power,et al.  Understanding Background Mixture Models for Foreground Segmentation , 2002 .