Modeling individual and group actions in meetings with layered HMMs
暂无分享,去创建一个
Samy Bengio | Daniel Gatica-Perez | Iain McCowan | Dong Zhang | Samy Bengio | I. McCowan | D. Gática-Pérez | Dong Zhang
[1] Elizabeth Shriberg,et al. Relationship between dialogue acts and hot spots in meetings , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).
[2] Dariu Gavrila,et al. The Visual Analysis of Human Movement: A Survey , 1999, Comput. Vis. Image Underst..
[3] S. Garrod,et al. Group Discussion as Interactive Dialogue or as Serial Monologue: The Influence of Group Size , 2000, Psychological science.
[4] Jean Carletta,et al. Nonverbal behaviours improving a simulation of small group discussion , 2003 .
[5] Alex Pentland,et al. Towards Measuring Human Interactions in Conversational Settings , 2001 .
[6] Nikki Mirghafori,et al. Transmissions and transitions: a study of two common assumptions in multi-band ASR , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[7] Stuart J. Russell,et al. Dynamic bayesian networks: representation, inference and learning , 2002 .
[8] John Makhoul,et al. Rough'n'Ready: a meeting recorder and browser , 1999, CSUR.
[9] Biing-Hwang Juang,et al. Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.
[10] Alexander H. Waibel,et al. Skin-Color Modeling and Adaptation , 1998, ACCV.
[11] S. Duncan,et al. Some Signals and Rules for Taking Speaking Turns in Conversations , 1972 .
[12] Mari Ostendorf,et al. Detection Of Agreement vs. Disagreement In Meetings: Training With Unlabeled Data , 2003, NAACL.
[13] David C. Hogg,et al. Learning Behaviour Models of Human Activities , 1999, BMVC.
[14] Jan P. H. van Santen,et al. Review of Handbook of standards and resources for spoken language systems by Dafydd Gibbon, Roger Moore, and Richard Winski. Mouton de Gruyter 1997. , 1998 .
[15] Darren Moore,et al. The IDIAP Smart Meeting Room , 2002 .
[16] Alex Pentland,et al. A Bayesian Computer Vision System for Modeling Human Interactions , 1999, IEEE Trans. Pattern Anal. Mach. Intell..
[17] Alex Pentland,et al. Pfinder: Real-Time Tracking of the Human Body , 1997, IEEE Trans. Pattern Anal. Mach. Intell..
[18] Iain McCowan,et al. Location based speaker segmentation , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..
[19] Ramakant Nevatia,et al. Multi-agent event recognition , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.
[20] Andreas Stolcke,et al. The Meeting Project at ICSI , 2001, HLT.
[21] Michael S. Brandstein,et al. Robust Localization in Reverberant Rooms , 2001, Microphone Arrays.
[22] Samy Bengio,et al. Modeling human interaction in meetings , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..
[23] Juergen Luettin,et al. Audio-Visual Speech Modeling for Continuous Speech Recognition , 2000, IEEE Trans. Multim..
[24] David G. Novick,et al. Coordinating turn-taking with gaze , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.
[25] Peter D. Bricker,et al. The role of audible and visible back-channel responses in interpersonal communication. , 1977 .
[26] E.,et al. GROUPS : INTERACTION AND PERFORMANCE , 2001 .
[27] Samy Bengio,et al. Modeling Individual and Group Actions in Meetings: A Two-Layer HMM Framework , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.
[28] J. Markel,et al. The SIFT algorithm for fundamental frequency estimation , 1972 .
[29] Hagen Soltau,et al. Advances in automatic meeting record creation and access , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).
[30] Samy Bengio,et al. Automatic analysis of multimodal group actions in meetings , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[31] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .
[32] Steve Renals,et al. Dynamic Bayesian networks for meeting structuring , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[33] Shih-Fu Chang,et al. Unsupervised discovery of multilevel statistical video structures using hierarchical hidden Markov models , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).
[34] A. Nakamura,et al. Nature (London , 1975 .
[35] Andreas Girgensohn,et al. LiteMinutes: an Internet-based system for multimedia meeting minutes , 2001, WWW '01.
[36] K. Parker,et al. Speaking turns in small group interaction: A context-sensitive event sequence model. , 1988 .
[37] Bernie Mulgrew,et al. IEEE Workshop on Neural Networks for Signal Processing , 1995 .
[38] Rainer Stiefelhagen,et al. Tracking focus of attention in meetings , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.
[39] Eric Fosler-Lussier,et al. Combining multiple estimators of speaking rate , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[40] Van Nostrand,et al. Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm , 1967 .
[41] Eric Horvitz,et al. Layered representations for learning and inferring office activity from multiple sensory channels , 2004, Comput. Vis. Image Underst..
[42] Anoop Gupta,et al. Distributed meetings: a meeting capture and broadcasting system , 2002, MULTIMEDIA '02.
[43] David G. Stork,et al. Speech recognition and sensory integration , 1998 .