Smart meeting systems: A survey of state-of-the-art and open issues

Smart meeting systems, which record meetings and analyze the generated audio--visual content for future viewing, have been a topic of great interest in recent years. A successful smart meeting system relies on various technologies, ranging from devices and algorithms to architecture. This article presents a condensed survey of existing research and technologies, including smart meeting system architecture, meeting capture, meeting recognition, semantic processing, and evaluation methods. It aims at providing an overview of underlying technologies to help understand the key design issues of such systems. This article also describes various open issues as possible ways to extend the capabilities of current smart meeting systems.

[1]  Don Kimber,et al.  FlyCam: practical panoramic video and automatic camera control , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[2]  Kazutaka Hirata,et al.  Memory cues for meeting video retrieval , 2004, CARPE'04.

[3]  Ralph Gross,et al.  Multimodal Meeting Tracker , 2000, RIAO.

[4]  Zicheng Liu,et al.  Energy-Based Sound Source Localization and Gain Normalization for Ad Hoc Microphone Arrays , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[5]  Samy Bengio,et al.  Automatic analysis of multimodal group actions in meetings , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Katashi Nagao,et al.  Discussion Ontology: Knowledge Discovery from Human Activities in Meetings , 2006, JSAI.

[7]  Stefano Spaccapietra,et al.  A multimodal database framework for multimedia meeting annotations , 2004, 10th International Multimedia Modelling Conference, 2004. Proceedings..

[9]  Xingshe Zhou,et al.  Supporting Context-Aware Media Recommendations for Smart Phones , 2006, IEEE Pervasive Computing.

[10]  Martial Michel,et al.  The NIST Smart Space and Meeting Room projects: signals, acquisition annotation, and metrics , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[11]  Andreas Stolcke,et al.  Automatic punctuation and disfluency detection in multi-party meetings using prosodic and lexical cues , 2002, INTERSPEECH.

[12]  Gerhard Rigoll,et al.  Meeting Event Recognition using a Parallel Recurrent Neural Net Approach , 2005 .

[13]  Anoop Gupta,et al.  Distributed meetings: a meeting capture and broadcasting system , 2002, MULTIMEDIA '02.

[14]  Daniel P. W. Ellis,et al.  Audio information access from meeting rooms , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[15]  Alexander H. Waibel,et al.  Modeling focus of attention for meeting indexing , 1999, MULTIMEDIA '99.

[16]  V. K. Singh,et al.  A design methodology for selection and placement of sensors in multimedia surveillance systems , 2006, VSSN '06.

[17]  Klara Nahrstedt,et al.  Analysis of Topology Aggregation techniques for QoS routing , 2007, CSUR.

[18]  Alexander H. Waibel,et al.  Modeling focus of attention for meeting indexing based on multiple cues , 2002, IEEE Trans. Neural Networks.

[20]  Jie Zhu,et al.  Head orientation and gaze direction in meetings , 2002, CHI Extended Abstracts.

[21]  Antinus Nijholt,et al.  The Distributed Virtual Meeting Room Exercise , 2005 .

[22]  Hagen Soltau,et al.  The ISL Meeting Room System , 2001 .

[23]  Ramesh Jain,et al.  Experiential meeting system , 2003, ETP '03.

[24]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.

[25]  Lynn Wilcox,et al.  Room with a Rear View: Meeting Capture in a Multimedia Conference Room , 2000, IEEE Multim..

[26]  Elizabeth Shriberg,et al.  Spotting "hot spots" in meetings: human judgments and prosodic cues , 2003, INTERSPEECH.

[27]  Juan Carlos Ojeda Abstract , 2020 .

[28]  Anoop Gupta,et al.  Automating lecture capture and broadcast: technology and videography , 2004, Multimedia Systems.

[29]  Mohan M. Trivedi,et al.  Active Camera Networks and Semantic Event Databases for Intelligent Environments , 2002 .

[30]  Xilin Chen,et al.  Towards monitoring human activities using an omnidirectional camera , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[31]  Daniel P. W. Ellis,et al.  Laughter Detection in Meetings , 2004 .

[32]  Gerhard Rigoll,et al.  Action Recognition in Meeting Scenarios using Global Motion Features , 2003 .

[33]  Andreas Stolcke,et al.  The ICSI Meeting Project: Resources and Research , 2004 .

[34]  Kiyoharu Aizawa,et al.  Summarizing wearable video , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[35]  Alexander H. Waibel,et al.  Multimodal people ID for a multimedia meeting browser , 1999, MULTIMEDIA '99.

[36]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[37]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[38]  Svetha Venkatesh,et al.  Privacy and the access of information in a smart house environment , 2007, ACM Multimedia.

[39]  Maria da Graça Campos Pimentel,et al.  Prototyping Applications to Document Human Experiences , 2007, IEEE Pervasive Computing.

[40]  Gregory D. Abowd,et al.  Integrating Meeting Capture within a Collaborative Team Environment , 2001, UbiComp.

[41]  Rainer Lienhart,et al.  Position calibration of microphones and loudspeakers in distributed computing platforms , 2005, IEEE Transactions on Speech and Audio Processing.

[42]  Dennis Reidsma,et al.  Meeting Modelling in the Context of Multimodal Research , 2004, MLMI.

[43]  Paul Lukowicz,et al.  Wearable Sensing to Annotate Meeting Recordings , 2002, SEMWEB.

[44]  Steve Renals,et al.  Dynamic Bayesian networks for meeting structuring , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[45]  Stephen J. McKenna,et al.  Head Tracking and Action Recognition in a Smart Meeting Room , 2003 .

[46]  Yuichi Nakamura,et al.  Virtual assistant: an artificial agent for enhancing content acquisition: how ambient media elicit information from humans , 2008, SAME '08.

[47]  Berna Erol,et al.  Portable meeting recorder , 2002, MULTIMEDIA '02.

[48]  Gregory D. Abowd,et al.  Making multimedia meeting records more meaningful , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[49]  Gregory D. Abowd,et al.  The Conference Assistant: combining context-awareness with wearable computing , 1999, Digest of Papers. Third International Symposium on Wearable Computers.

[50]  Gregory D. Abowd,et al.  Towards a Smarter Meeting Record—Capture and Access of Meetings Revisited , 2005, Multimedia Tools and Applications.

[51]  Mohan M. Trivedi,et al.  Activity monitoring and summarization for an intelligent meeting room , 2000, Proceedings Workshop on Human Motion.

[52]  共立出版株式会社 コンピュータ・サイエンス : ACM computing surveys , 1978 .

[53]  JongWon Kim,et al.  Design of Multi-party Meeting System for Interactive Collaboration , 2007, 2007 2nd International Conference on Communication Systems Software and Middleware.

[54]  Daniel P. W. Ellis,et al.  Pitch-based emphasis detection for characterization of meeting recordings , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[55]  Alexander H. Waibel,et al.  Face recognition in a meeting room , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[56]  Samy Bengio,et al.  Modeling Individual and Group Actions in Meetings: A Two-Layer HMM Framework , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[57]  Yuichi Nakamura,et al.  Towards smart meeting: enabling technologies and a real-world application , 2007, ICMI '07.

[58]  Rainer Stiefelhagen,et al.  Tracking focus of attention in meetings , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[59]  Carlos Busso,et al.  Smart room: participant and speaker localization and identification , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[60]  Anoop Gupta,et al.  Viewing meeting captured by an omni-directional camera , 2001, CHI.

[61]  Mike Flynn,et al.  Browsing Recorded Meetings with Ferret , 2004, MLMI.

[62]  Rainer Lienhart,et al.  Distributed array of synchronized sensors and actuators , 2004, 2004 12th European Signal Processing Conference.

[63]  Alex Waibel,et al.  MEETING BROWSER: TRACKING AND SUMMARIZING MEETINGS , 2007 .

[64]  John Makhoul,et al.  Rough'n'Ready: a meeting recorder and browser , 1999, CSUR.

[65]  Alex Waibel,et al.  Progress in automatic meeting transcription , 1999, EUROSPEECH.

[66]  Rainer Lienhart,et al.  Approximating Optimal Visual Sensor Placement , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[67]  Frank Stajano,et al.  Security for Ubiquitous Computing , 2002, ICISC.

[68]  Jiang Li,et al.  DigiMetro - an application-level multicast system for multi-party video conferencing , 2004, IEEE Global Telecommunications Conference, 2004. GLOBECOM '04..

[69]  Jean Rouat,et al.  Robust sound source localization using a microphone array on a mobile robot , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[70]  Iain McCowan,et al.  Microphone array speech recognition: experiments on overlapping speech in meetings , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[71]  Elizabeth Shriberg,et al.  Relationship between dialogue acts and hot spots in meetings , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[72]  Gregory D. Abowd,et al.  INCA: A Software Infrastructure to Facilitate the Construction and Evolution of Ubiquitous Capture & Access Applications , 2004, Pervasive.

[73]  Jean Carletta,et al.  Extractive summarization of meeting recordings , 2005, INTERSPEECH.

[74]  Jon Crowcroft,et al.  Quality-of-Service Routing for Supporting Multimedia Applications , 1996, IEEE J. Sel. Areas Commun..

[75]  Jian Zhao,et al.  Multi-Camera Surveillance with Visual Tagging and Generic Camera Placement , 2007, 2007 First ACM/IEEE International Conference on Distributed Smart Cameras.

[76]  Hagen Soltau,et al.  Advances in automatic meeting record creation and access , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[77]  Walter Bender,et al.  Using visualizations to review a group's interaction dynamics , 2006, CHI EA '06.

[78]  G. Bell,et al.  A digital life , 2007 .

[79]  Samy Bengio,et al.  Writer Identification for Smart Meeting Room Systems , 2006, Document Analysis Systems.

[80]  Jun Miyazaki,et al.  BUILDING A SMART MEETING ROOM: FROM INFRASTRUCTURE TO THE VIDEO GAP (RESEARCH AND OPEN ISSUES) , 2005, 21st International Conference on Data Engineering Workshops (ICDEW'05).

[81]  Jacques M. B. Terken,et al.  Real-Time Feedback on Nonverbal Behaviour to Enhance Social Dynamics in Small Group Meetings , 2005, MLMI.

[82]  Steve Whittaker,et al.  Accessing Multimodal Meeting Data: Systems, Problems and Possibilities , 2004, MLMI.

[83]  Samy Bengio,et al.  On automatic annotation of meeting databases , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[84]  Ramesh C. Jain Experiential computing , 2003, CACM.

[85]  Mari Ostendorf,et al.  Detection Of Agreement vs. Disagreement In Meetings: Training With Unlabeled Data , 2003, NAACL.

[86]  Mohan S. Kankanhalli,et al.  Coopetitive multi-camera surveillance using model predictive control , 2008, Machine Vision and Applications.

[87]  Alexander H. Waibel,et al.  Estimating focus of attention based on gaze and sound , 2001, PUI '01.

[88]  Kamin Whitehouse,et al.  Protecting your daily in-home activity information from a wireless snooping attack , 2008, UbiComp.

[89]  Fabio Pianesi,et al.  Multimodal support to group dynamics , 2007, Personal and Ubiquitous Computing.

[90]  Samy Bengio,et al.  Detecting group interest-level in meetings , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[91]  YuZhiwen,et al.  Smart meeting systems , 2010 .