Meeting decision detection : multimodal information fusion for multi-party dialogue understanding

Modern advances in multimedia and storage technologies have led to huge archives of human conversations in widely ranging areas. These archives offer a wealth of information in the organization contexts. However, retrieving and managing information in these archives is a time-consuming and labor-intensive task. Previous research applied keyword and computer vision-based methods to do this. However, spontaneous conversations, complex in the use of multimodal cues and intricate in the interactions between multiple speakers, have posed new challenges to these methods. We need new techniques that can leverage the information hidden in multiple communication modalities – including not just “what” the speakers say but also “how” they express themselves and interact with others. In responding to this need, the thesis inquires into the multimodal nature of meeting dialogues and computational means to retrieve and manage the recorded meeting information. In particular, this thesis develops the Meeting Decision Detector (MDD) to detect and track decisions, one of the most important outcomes of the meetings. The MDD involves not only the generation of extractive summaries pertaining to the decisions (“decision detection”), but also the organization of a continuous stream of meeting speech into locally coherent segments (“discourse segmentation”). This inquiry starts with a corpus analysis which constitutes a comprehensive empirical study of the decision-indicative and segment-signalling cues in the meeting corpora. These cues are uncovered from a variety of communication modalities, including the words spoken, gesture and head movements, pitch and energy level, rate of speech, pauses, and use of subjective terms. While some of the cues match the previous findings of speech segmentation, some others have not been studied before. The analysis also provides empirical grounding for computing features and integrating them into a computational model. To handle the high-dimensional multimodal feature space in the meeting domain, this thesis compares empirically feature discriminability and feature pattern finding criteria. As the different knowledge sources are expected to capture different types of features, the thesis also experiments with methods that can harness synergy between the multiple knowledge sources. The problem formalization and the modeling algorithm so far correspond to an optimal setting: an off-line, post-meeting analysis scenario. However, ultimately the MDD is expected to be operated online – right after a meeting, or when a meeting is still in progress. Thus this thesis also explores techniques that help relax the optimal setting, especially those using only features that can be generated with a higher

[1]  J. Langford,et al.  FeatureBoost: A Meta-Learning Algorithm that Improves Model Robustness , 2000, ICML.

[2]  Mark G. Core,et al.  Coding Dialogs with the DAMSL Annotation Scheme , 1997 .

[3]  S. Duncan,et al.  Some Signals and Rules for Taking Speaking Turns in Conversations , 1972 .

[4]  G. Klein,et al.  Decision Making in Action: Models and Methods , 1993 .

[5]  Gökhan Tür,et al.  Integrating Prosodic and Lexical Cues for Automatic Topic Segmentation , 2001, CL.

[6]  H. Grice Logic and conversation , 1975 .

[7]  Julia Hirschberg,et al.  Now Let’s Talk About Now; Identifying Cue Phrases Intonationally , 1987, ACL.

[8]  Jeff A. Bilmes,et al.  Dialog act tagging using graphical models , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[9]  Karl G. D. Bailey,et al.  Disfluencies and human language comprehension , 2004, Trends in Cognitive Sciences.

[10]  Heidi Christensen,et al.  Maximum entropy segmentation of broadcast news , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[11]  Johanna D. Moore,et al.  What Decisions Have You Made?: Automatic Decision Detection in Meeting Conversations , 2007, HLT-NAACL.

[12]  Pavel Matejka,et al.  Phonotactic language identification using high quality phoneme recognition , 2005, INTERSPEECH.

[13]  Ronald A. Howard,et al.  Readings on the Principles and Applications of Decision Analysis , 1989 .

[14]  Diane J. Litman,et al.  Annotating Student Emotional States in Spoken Tutoring Dialogues , 2004, SIGDIAL Workshop.

[15]  Jean Carletta,et al.  The AMI Meeting Corpus: A Pre-announcement , 2005, MLMI.

[16]  Matthew Stone,et al.  Living Hand to Mouth: Psychological Theories about Speech and Gesture in Interactive Dialogue Systems , 1999 .

[17]  Steve Whittaker,et al.  A meeting browser evaluation test , 2005, CHI Extended Abstracts.

[18]  Marti A. Hearst,et al.  A Critique and Improvement of an Evaluation Metric for Text Segmentation , 2002, CL.

[19]  Anton Nijholt,et al.  Addressee Identification in Face-to-Face Meetings , 2006, EACL.

[20]  Mitchell P. Marcus,et al.  Topic segmentation: algorithms and applications , 1998 .

[21]  Barbara Di Eugenio,et al.  Squibs and Discussions: The Kappa Statistic: A Second Look , 2004, CL.

[22]  Ellen Riloff,et al.  Creating Subjective and Objective Sentence Classifiers from Unannotated Texts , 2005, CICLing.

[23]  Yiming Yang,et al.  Topic Detection and Tracking Pilot Study Final Report , 1998 .

[24]  Michael Strube,et al.  Improving extractive dialogue summarization by utilizing human feedback , 2007, Artificial Intelligence and Applications.

[25]  C. Fowler,et al.  Talkers' signaling of new and old. words in speech and listeners' perception and use of the distinction , 1987 .

[26]  Yukiko I. Nakano,et al.  Non-Verbal Cues for Discourse Structure , 2022 .

[27]  M. Tanenhaus,et al.  Approaches to studying world-situated language use : bridging the language-as-product and language-as-action traditions , 2005 .

[28]  Khalid Choukri,et al.  The CHIL audiovisual corpus for lecture and meeting analysis inside smart rooms , 2007, Lang. Resour. Evaluation.

[29]  Gerald Penn,et al.  Evaluation of Sentence Selection for Speech Summarization , 2005 .

[30]  Stanley Peters,et al.  Ontology-Based Discourse Understanding for a Persistent Meeting Assistant , 2005, AAAI Spring Symposium: Persistent Assistants: Living and Working with AI.

[31]  Marti A. Hearst Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.

[32]  John D. Lafferty,et al.  Statistical Models for Text Segmentation , 1999, Machine Learning.

[33]  Andrei Popescu-Belis,et al.  Task-Based Evaluation of Meeting Browsers: from Task Elicitation to User Behavior Analysis , 2008, LREC.

[34]  Hiroyuki Tsuboi,et al.  A new discourse structure model for spontaneous spoken dialogue , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[35]  John D. Lafferty,et al.  Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[36]  Stan Matwin,et al.  Machine Learning for the Detection of Oil Spills in Satellite Radar Images , 1998, Machine Learning.

[37]  Klaus Zechner,et al.  Automatic Summarization of Open-Domain Multiparty Dialogues in Diverse Genres , 2002, CL.

[38]  Finn V. Jensen,et al.  Bayesian Networks and Decision Graphs , 2001, Statistics for Engineering and Information Science.

[39]  Elizabeth Shriberg,et al.  Automatic dialog act segmentation and classification in multiparty meetings , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[40]  Wilfried Post,et al.  A research environment for meeting behavior , 2004 .

[41]  Ellen Riloff,et al.  Learning Extraction Patterns for Subjective Expressions , 2003, EMNLP.

[42]  Andrew Kehler,et al.  Coherence, reference, and the theory of grammar , 2002, CSLI lecture notes series.

[43]  Samy Bengio,et al.  Detecting group interest-level in meetings , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[44]  Edward Gibson,et al.  Representing discourse coherence: A corpus-based analysis , 2004, COLING.

[45]  Steve Renals,et al.  Multimodal Integration for Meeting Group Action Segmentation and Recognition , 2005, MLMI.

[46]  Alexander G. Hauptmann,et al.  Informedia: news-on-demand multimedia information acquisition and retrieval , 1997 .

[47]  A. Tversky,et al.  The framing of decisions and the psychology of choice. , 1981, Science.

[48]  Francisco Herrera,et al.  A Sequential Selection Process in Group Decision Making with a Linguistic Assessment Approach , 1995, Inf. Sci..

[49]  W. Chafe The Pear Stories: Cognitive, Cultural and Linguistic Aspects of Narrative Production , 1980 .

[50]  V. Pallotta Collaborative and Argumentative Models of Meeting Discussions , 2005 .

[51]  Marc L. Resnick,et al.  Effects of Organizational Scheme and Labeling on Task Performance in Product-Centered and User-Centered Retail Web Sites , 2004, Hum. Factors.

[52]  Elmar Nöth,et al.  How to find trouble in communication , 2003, Speech Commun..

[53]  Jan Svartvik,et al.  A __ comprehensive grammar of the English language , 1988 .

[54]  Karen Sparck Jones,et al.  Book Reviews: Evaluating Natural Language Processing Systems: An Analysis and Review , 1996, CL.

[55]  Hong Yu,et al.  Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences , 2003, EMNLP.

[56]  Jody Kreiman,et al.  Perception of Sentence and Paragraph Bound-aries in Natural Conversation , 1982 .

[57]  R. Desimone,et al.  Neural mechanisms of selective visual attention. , 1995, Annual review of neuroscience.

[58]  Inderjeet Mani,et al.  Improving Summaries by Revising Them , 1999, ACL.

[59]  Philip L. Smith,et al.  Psychology and neurobiology of simple decisions , 2004, Trends in Neurosciences.

[60]  Kamal Nigam,et al.  Retrieving topical sentiments from online document collections , 2003, IS&T/SPIE Electronic Imaging.

[61]  Andreas Stolcke,et al.  The ICSI Meeting Corpus , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[62]  Stanley Peters,et al.  Modelling and Detecting Decisions in Multi-party Dialogue , 2008, SIGDIAL Workshop.

[63]  Andreas Stolcke,et al.  Direct Modeling of Prosody: An Overview of Applications in Automatic Speech Processing , 2004 .

[64]  James R. Glass,et al.  Unsupervised Word Acquisition from Speech using Pattern Discovery , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[65]  Heather Shovelton,et al.  Mapping the Range of Information Contained in the Iconic Hand Gestures that Accompany Spontaneous Speech , 1999 .

[66]  Dirk Heylen,et al.  Argument Diagramming of Meeting Conversations , 2005 .

[67]  Jerry R. Hobbs Coherence and Coreference , 1979, Cogn. Sci..

[68]  M. Argyle,et al.  The Different Functions of Gaze , 1973 .

[69]  Ilse Lehiste,et al.  Phonetic characteristics of discourse , 1980 .

[70]  David M. Allen,et al.  The Relationship Between Variable Selection and Data Agumentation and a Method for Prediction , 1974 .

[71]  Gillian Brown,et al.  Questions of intonation , 1980 .

[72]  Patrick Suppes,et al.  Decision Making: An Experimental Approach , 1959 .

[73]  Alois Knoll,et al.  Integrating Language, Vision and Action for Human Robot Dialog Systems , 2007, HCI.

[74]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.

[75]  Dolf Trieschnigg,et al.  TNO Hierarchical topic detection report at TDT 2004 , 2004 .

[76]  A. Stolcke,et al.  Automatic detection of discourse structure for speech recognition and understanding , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[77]  Rick Kazman,et al.  Accessing multimedia through concept clustering , 1997, CHI.

[78]  Kuntz Werner,et al.  Issues as Elements of Information Systems , 1970 .

[79]  Steve Whittaker,et al.  Accessing Multimodal Meeting Data: Systems, Problems and Possibilities , 2004, MLMI.

[80]  Rick Kazman,et al.  Four Paradigms for Indexing Video Conferences , 1996, IEEE Multim..

[81]  Kazutaka Hirata,et al.  Memory cues for meeting video retrieval , 2004, CARPE'04.

[82]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[83]  Susan T. Dumais,et al.  Optimizing search by showing results in context , 2001, CHI.

[84]  Barry Arons,et al.  Pitch-based emphasis detection for segmenting speech recordings , 1994, ICSLP.

[85]  Alexander G. Hauptmann TRECVID: the utility of a content-based video retrieval evaluation , 2006, Electronic Imaging.

[86]  G. W. Furnas,et al.  Generalized fisheye views , 1986, CHI '86.

[87]  S. Duncan,et al.  On signalling that it's your turn to speak☆ , 1974 .

[88]  Candace L. Sidner,et al.  Attention, Intentions, and the Structure of Discourse , 1986, CL.

[89]  Matthew Purver,et al.  Shallow Discourse Structure for Action Item Detection , 2006, HLT-NAACL 2006.

[90]  Steve Renals,et al.  Automatic Meeting Segmentation Using Dynamic Bayesian Networks , 2007, IEEE Transactions on Multimedia.

[91]  Ralph L. Keeney,et al.  Decisions with multiple objectives: preferences and value tradeoffs , 1976 .

[92]  Dolf Trieschnigg,et al.  Hierarchical topic detection in large digital news archives: Exploring a sample based approach , 2005, J. Digit. Inf. Manag..

[93]  Inderjeet Mani,et al.  Machine Learning of Generic and User-Focused Summarization , 1998, AAAI/IAAI.

[94]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[95]  Daniel P. W. Ellis,et al.  Audio information access from meeting rooms , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[96]  Louis B. Rosenfeld,et al.  Information architecture for the world wide web - designing large-scale web sites , 1998 .

[97]  Kalina Bontcheva,et al.  Robust Generic and Query-based Summarization , 2003, EACL.

[98]  Gina-Anne Levow,et al.  Topic Segmentation with Hybrid Document Indexing , 2007, EMNLP.

[99]  Hagen Soltau,et al.  Advances in automatic meeting record creation and access , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[100]  Rebecca J. Passonneau,et al.  Combining Multiple Knowledge Sources for Discourse Segmentation , 1995, ACL.

[101]  John R. Searle,et al.  Speech Acts: An Essay in the Philosophy of Language , 1970 .

[102]  Andrei Popescu-Belis,et al.  User Query Analysis for the Specification and Evaluation of a Dialogue Processing and Retrieval System , 2004, LREC.

[103]  Heidi Christensen,et al.  Are extractive text summarisation techniques portable to broadcast news? , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[104]  P. Langley Selection of Relevant Features in Machine Learning , 1994 .

[105]  Ben Hutchinson,et al.  Acquiring the Meaning of Discourse Markers , 2004, ACL.

[106]  Shuki J. Cohen A computerized scale for monitoring levels of agreement during a conversation , 2003 .

[107]  R. L. Keeney,et al.  Decisions with Multiple Objectives: Preferences and Value Trade-Offs , 1977, IEEE Transactions on Systems, Man, and Cybernetics.

[108]  Alexander I. Rudnicky,et al.  You Are What You Say: Using Meeting Participants’ Speech to Detect their Roles and Expertise , 2006, HLT-NAACL 2006.

[109]  Hideki Kozima,et al.  Text Segmentation Based on Similarity between Words , 1993, ACL.

[110]  Jean Carletta,et al.  Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[111]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[112]  Julia Hirschberg,et al.  A Prosodic Analysis of Discourse Segments in Direction-Giving Monologues , 1996, ACL.

[113]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[114]  Ellen M. Voorhees,et al.  The TREC Spoken Document Retrieval Track: A Success Story , 2000, TREC.

[115]  Elizabeth Shriberg,et al.  Relationship between dialogue acts and hot spots in meetings , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[116]  Susanne Burger,et al.  The ISL meeting corpus: the impact of meeting type on speech style , 2002, INTERSPEECH.

[117]  Hitoshi Isahara,et al.  A Statistical Model for Domain-Independent Text Segmentation , 2001, ACL.

[118]  Alexander I. Rudnicky,et al.  Segmenting meetings into agenda items by extracting implicit supervision from human note-taking , 2007, IUI '07.

[119]  Kathleen McKeown,et al.  Text generation: using discourse strategies and focus constraints to generate natural language text , 1985 .

[120]  Timothy W. Butler,et al.  An Interactive Framework for Multi‐Person, Multiobjective Decisions , 1993 .

[121]  Kathleen R. McKeown,et al.  Predicting the semantic orientation of adjectives , 1997 .

[122]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[123]  Adele Diederich,et al.  Survey of decision field theory , 2002, Math. Soc. Sci..

[124]  Susan R. Fussell,et al.  Social and Cognitive Approaches to Interpersonal Communication: Introduction and Overview , 1998 .

[125]  L. Beach,et al.  Man as an Intuitive Statistician , 2022 .

[126]  Jean Carletta,et al.  Extractive summarization of meeting recordings , 2005, INTERSPEECH.

[127]  Steve Whittaker,et al.  Design and evaluation of systems to support interaction capture and retrieval , 2008, Personal and Ubiquitous Computing.

[128]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[129]  Antoine Raux,et al.  A unit selection approach to F0 modeling and its application to emphasis , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[130]  Johanna D. Moore,et al.  AUTOMATIC TOPIC SEGMENTATION AND LABELING IN MULTIPARTY DIALOGUE , 2006, 2006 IEEE Spoken Language Technology Workshop.

[131]  L. Beach,et al.  A Contingency Model for the Selection of Decision Strategies , 1978 .

[132]  Norbert Reithinger,et al.  Dialogue act classification using language models , 1997, EUROSPEECH.

[133]  Johanna D. Moore,et al.  Latent Semantic Analysis for Text Segmentation , 2001, EMNLP.

[134]  Maite Taboada,et al.  Prosodic Correlates of Rhetorical Relations , 2006, HLT-NAACL 2006.

[135]  E. Schegloff Sequencing in Conversational Openings , 1968 .

[136]  Gabriel Murray,et al.  Using Speech-Specific Characteristics for Automatic Speech Summarization , 2008 .

[137]  Zeshui Xu,et al.  Group decision making based on multiple types of linguistic preference relations , 2008, Inf. Sci..

[138]  Yaakov Yaari,et al.  Segmentation of Expository Texts by Hierarchical Agglomerative Clustering , 1997, ArXiv.

[139]  Sadaoki Furui,et al.  Sentence-extractive automatic speech summarization and evaluation techniques , 2006, Speech Commun..

[140]  Helen Wright,et al.  Automatic utterance type detection using suprasegmental features , 1998, ICSLP.

[141]  Julia Hirschberg,et al.  “I just played that a minute ago!:” Designing User Interfaces for Audio Navigation , 1998 .

[142]  W. Bruce Croft,et al.  Text Segmentation by Topic , 1997, ECDL.

[143]  Steve Young,et al.  The video mail retrieval project: experiences in retrieving spoken documents , 1997 .

[144]  Daniel P. W. Ellis,et al.  Pitch-based emphasis detection for characterization of meeting recordings , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[145]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[146]  Ann Cutler,et al.  Prosody in the Comprehension of Spoken Language: A Literature Review , 1997, Language and speech.

[147]  Ani Nenkova,et al.  The Pyramid Method: Incorporating human content selection variation in summarization evaluation , 2007, TSLP.

[148]  B. D. Finetti,et al.  Foresight: Its Logical Laws, Its Subjective Sources , 1992 .

[149]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[150]  Diane J. Litman,et al.  Modelling User Satisfaction and Student Learning in a Spoken Dialogue Tutoring System with Generic, Tutoring, and User Affect Parameters , 2006, NAACL.

[151]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[152]  Mike Flynn,et al.  Browsing Recorded Meetings with Ferret , 2004, MLMI.

[153]  C. Koch,et al.  Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[154]  Dilek Z. Hakkani-Tür,et al.  Speech segmentation and spoken document processing , 2008, IEEE Signal Processing Magazine.

[155]  Ronald Maier,et al.  Organizational memory systems to support organizational information processing: development of a framework and results of an empirical study , 1999, SIGCPR '99.

[156]  Berna Erol,et al.  Multimodal summarization of meeting recordings , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[157]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[158]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[159]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[160]  Mari Ostendorf,et al.  Detection Of Agreement vs. Disagreement In Meetings: Training With Unlabeled Data , 2003, NAACL.

[161]  Christopher Cieri,et al.  Research methodologies, observations and outcomes in (conversational) speech data collection , 2002 .

[162]  Steve Renals,et al.  DBN Based Joint Dialogue Act Recognition of Multiparty Meetings , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[163]  Francisco Herrera,et al.  A note on the internal consistency of various preference representations , 2002, Fuzzy Sets Syst..

[164]  David M. Blei,et al.  Topic segmentation with an aspect hidden Markov model , 2001, SIGIR '01.

[165]  Gerhard Rigoll,et al.  Multimodal meeting analysis by segmentation and classification of meeting events based on a higher level semantic approach , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[166]  Richard A. Bolt,et al.  “Put-that-there”: Voice and gesture at the graphics interface , 1980, SIGGRAPH '80.

[167]  S. Bodily Note—A Delegation Process for Combining Individual Utility Functions , 1979 .

[168]  David W. Opitz,et al.  Feature Selection for Ensembles , 1999, AAAI/IAAI.

[169]  E. Schegloff,et al.  A simplest systematics for the organization of turn-taking for conversation , 1974 .

[170]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[171]  Steve Whittaker,et al.  Analysing Meeting Records: An Ethnographic Study and Technological Implications , 2005, MLMI.

[172]  Frederick Mosteller,et al.  An Experimental Measurement of Utility , 1951, Journal of Political Economy.

[173]  Eric Fosler-Lussier,et al.  Discourse Segmentation of Multi-Party Conversation , 2003, ACL.

[174]  Johanna D. Moore,et al.  Combining Multiple Knowledge Sources for Dialogue Segmentation in Multimedia Archives , 2007, ACL.

[175]  Louis B. Rosenfeld,et al.  Information architecture for the world wide web - designing large-scale web sites (2. ed.) , 1998 .

[176]  Robert T. Clemen,et al.  Making Hard Decisions: An Introduction to Decision Analysis , 1997 .

[177]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[178]  Kathleen R. McKeown,et al.  Information fusion for multidocument summarization: paraphrasing and generation , 2003 .

[179]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[180]  Carolyn Penstein Rosé,et al.  The Necessity of a Meeting Recording and Playback System, and the Benefit of Topic-Level Annotations to Meeting Browsing , 2005, INTERACT.

[181]  Michael Halliday,et al.  Cohesion in English , 1976 .

[182]  Jay W. Lorsch,et al.  Decision making at the top : the shaping of strategic direction , 1983 .

[183]  Julia Hirschberg,et al.  SCAN: designing and evaluating user interfaces to support retrieval from speech archives , 1999, SIGIR '99.

[184]  Robert B. Kaplan The anatomy of rhetoric : prolegomena to a functional theory of rhetoric : essays for teachers , 1972 .

[185]  Karen Spärck Jones,et al.  Automatic content-based retrieval of broadcast news , 1995, MULTIMEDIA '95.

[186]  Sanmay Das,et al.  Filters, Wrappers and a Boosting-Based Hybrid for Feature Selection , 2001, ICML.

[187]  Jay F. Nunamaker,et al.  Meeting analysis: findings from research and practice , 2001, Proceedings of the 34th Annual Hawaii International Conference on System Sciences.

[188]  Samy Bengio,et al.  Semi-supervised adapted HMMs for unusual event detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[189]  Michael Johnston,et al.  Finite-state Multimodal Parsing and Understanding , 2000, COLING.

[190]  Samy Bengio,et al.  Automatic analysis of multimodal group actions in meetings , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[191]  Diane J. Litman,et al.  Recognizing student emotions and attitudes on the basis of utterances in spoken tutoring dialogues with both human and computer tutors , 2006, Speech Commun..

[192]  Marilyn A. Walker,et al.  PARADISE: A Framework for Evaluating Spoken Dialogue Agents , 1997, ACL.

[193]  Julia Hirschberg,et al.  Identifying Agreement and Disagreement in Conversational Speech: Use of Bayesian Networks to Model Pragmatic Dependencies , 2004, ACL.

[194]  D. McNeill Language and Gesture: Gesture in action , 2000 .

[195]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[196]  Elizabeth Shriberg,et al.  Spotting "hot spots" in meetings: human judgments and prosodic cues , 2003, INTERSPEECH.

[197]  Thomas G. Dietterich,et al.  Learning with Many Irrelevant Features , 1991, AAAI.

[198]  Mari Ostendorf,et al.  A Quantitative Analysis of Lexical Differences Between Genders in Telephone Conversations , 2005, ACL.

[199]  Alan F. Smeaton,et al.  SeLeCT: a lexical cohesion based news story segmentation system , 2004, AI Commun..

[200]  Andrew McCallum,et al.  Feature Bagging: Preventing Weight Undertraining in Structured Discriminative Learning , 2005 .

[201]  Sarit Kraus,et al.  Collaborative Plans for Group Activities , 1993, IJCAI.

[202]  Matthew Purver,et al.  Meeting Structure Annotation: Data and Tools , 2005, SIGDIAL.

[203]  Helen F. Hastie,et al.  Automatically predicting dialogue structure using prosodic features , 2002, Speech Commun..

[204]  S. Toulmin The uses of argument , 1960 .

[205]  Masaaki Nagata,et al.  First steps towards statistical modeling of dialogue to predict the speech act type of the next utterance , 1994, Speech Communication.

[206]  Alexander H. Waibel,et al.  DIASUMM: Flexible Summarization of Spontaneous Dialogues in Unrestricted Domains , 2000, COLING.

[207]  Martin Rajman,et al.  Towards an argumentative coding scheme for annotating meeting dialogue data , 2007 .

[208]  Graeme Hirst,et al.  Lexical Cohesion Computed by Thesaural relations as an indicator of the structure of text , 1991, CL.

[209]  Jean-Luc Minel,et al.  How to Appreciate the Quality of Automatic Text Summarization? Examples of FAN and MLUCE Protocols and their Results on SERAPHIN , 1997, ACL 1997.

[210]  Konstantinos Koumpis,et al.  Extractive summarization of voicemail using lexical and prosodic feature subset selection , 2001, INTERSPEECH.

[211]  D. Simons,et al.  Failure to detect changes to attended objects in motion pictures , 1997 .

[212]  Rob Malouf,et al.  A Comparison of Algorithms for Maximum Entropy Parameter Estimation , 2002, CoNLL.

[213]  Sharon L. Oviatt,et al.  Individual differences in multimodal integration patterns: what are they and why do they exist? , 2005, CHI.

[214]  Zenzi M. Griffin,et al.  Why Look? Reasons for Eye Movements Related to Language Production. , 2004 .

[215]  J. Darroch,et al.  Generalized Iterative Scaling for Log-Linear Models , 1972 .

[216]  Maxine Eskénazi,et al.  Data collection and processing in the carnegie mellon communicator , 1999, EUROSPEECH.

[217]  Roeland Ordelman,et al.  Transcription of conference room meetings: an investigation , 2005, INTERSPEECH.

[218]  Jens Rasmussen,et al.  Information Processing and Human-Machine Interaction: An Approach to Cognitive Engineering , 1986 .

[219]  Elizabeth Shriberg,et al.  The ICSI Meeting Recorder Dialog Act (MRDA) Corpus , 2004, SIGDIAL Workshop.

[220]  Karen E. Lochbaum,et al.  An Algorithm for Plan Recognition in Collaborative Discourse , 1991, ACL.

[221]  Violeta Seretan,et al.  User Requirements Analysis for Meeting Information Retrieval Based on Query Elicitation , 2007, ACL.

[222]  J. Hobbs On the coherence and structure of discourse , 1985 .

[223]  Martial Michel,et al.  The NIST Meeting Room Pilot Corpus , 2004, LREC.

[224]  Andrei Popescu-Belis,et al.  Abstracting a Dialog Act Tagset for Meeting Processing , 2004, LREC.

[225]  Andreas Stolcke,et al.  Combining Prosodic Lexical and Cepstral Systems for Deceptive Speech Detection , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[226]  Julia Hirschberg,et al.  Comparing lexical, acoustic/prosodic, structural and discourse features for speech summarization , 2005, INTERSPEECH.

[227]  Julia Hirschberg,et al.  Some intonational characteristics of discourse structure , 1992, ICSLP.

[228]  James R. Glass,et al.  Making Sense of Sound: Unsupervised Topic Segmentation over Acoustic Input , 2007, ACL.

[229]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[230]  Leysia Palen,et al.  “I'll get that off the audio”: a case study of salvaging multimedia meeting records , 1997, CHI.

[231]  William E Cooper,et al.  Hierarchical coding in speech timing , 1978, Cognitive Psychology.

[232]  Rakesh K. Sarin,et al.  Group Decisions with Multiple Criteria , 2002, Manag. Sci..

[233]  Gökhan Tür,et al.  Prosody-based automatic segmentation of speech into sentences and topics , 2000, Speech Commun..

[234]  Janyce Wiebe,et al.  RECOGNIZING STRONG AND WEAK OPINION CLAUSES , 2006, Comput. Intell..

[235]  Andreas Stolcke,et al.  Meetings about meetings: research at ICSI on speech in multiparty conversations , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[236]  Alistair Knott,et al.  A data-driven methodology for motivating a set of coherence relations , 1996 .

[237]  H. Grice Utterer's meaning and intentions , 1969 .

[238]  Klaus Zechner,et al.  High Performance Segmentation of Spontaneous Speech Using Part of Speech and Trigger Word Information , 1997, ANLP.

[239]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[240]  James Allan,et al.  Introduction to topic detection and tracking , 2002 .

[241]  Alexander Clark,et al.  An Analysis of Quantitative Aspects in the Evaluation of Thematic Segmentation Algorithms , 2009, SIGDIAL Workshop.

[242]  G. Ayers Discourse functions of pitch range in spontaneous and read speech , 1994 .

[243]  Marc Moens,et al.  Articles Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status , 2002, CL.

[244]  Horacio Saggion,et al.  Concept Identification and Presentation in the Context of Technical Text Summarization , 2000 .

[245]  M. Pickering,et al.  Toward a mechanistic psychology of dialogue , 2004, Behavioral and Brain Sciences.

[246]  Pavel Matejka,et al.  Towards Lower Error Rates in Phoneme Recognition , 2004, TSD.

[247]  Andreas Stolcke,et al.  Prosody-based automatic detection of annoyance and frustration in human-computer dialog , 2002, INTERSPEECH.

[248]  J. Kacprzyk Group decision making with a fuzzy linguistic majority , 1986 .

[249]  Gina-Anne Levow,et al.  Prosody-based Topic Segmentation for Mandarin Broadcast News , 2004, NAACL.

[250]  Kathleen R. McKeown,et al.  Automatic text summarization as applied to information retrieval: using indicative and informative summaries , 2003 .

[251]  L. Menn,et al.  Fundamental Frequency and Discourse Structure , 1982 .

[252]  Andreas Stolcke,et al.  Using machine learning to cope with imbalanced classes in natural speech: evidence from sentence boundary and disfluency detection , 2004, INTERSPEECH.

[253]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.

[254]  D. O’Keefe Persuasion , 1990, The Handbook of Communication Skills.

[255]  Colin W. Wightman,et al.  Computational aids for the study of prosody , 1994 .

[256]  H. Levy Stochastic dominance and expected utility: survey and analysis , 1992 .

[257]  Karen Spärck Jones,et al.  Open-vocabulary speech indexing for voice and video mail retrieval , 1997, MULTIMEDIA '96.

[258]  Rebecca J. Passonneau,et al.  Intention-Based Segmentation: Human Reliability and Correlation with Linguistic Cues , 1993, ACL.