Research in multimedia and multimodal parsing and generation

This overview introduces the emerging set of techniques for parsing and generating multiple media (e.g., text, graphics, maps, gestures) using multiple sensory modalities (e.g., auditory, visual, tactile). We first briefly introduce and motivate the value of such techniques. Next we describe various computational methods for parsing input from heterogeneous media and modalities (e.g., natural language, gesture, gaze). We subsequently overview complementary techniques for generating coordinated multimedia and multimodal output. Finally, we discuss systems that have integrated both parsing and generation to enable multimedia dialogue in the context of intelligent interfaces. The article concludes by outlining fundamental problems which require further research.

[1]  Stephan M. Schwanauer,et al.  Machine Models of Music , 1993 .

[2]  Ronald W. Schafer,et al.  Digital Processing of Speech Signals , 1978 .

[3]  Chris Mellish,et al.  Current research in natural language generation , 1990 .

[4]  Steven K. Feiner,et al.  Apex: An Experiment in the Automated Creation of Pictorial Explanations , 1985, IEEE Computer Graphics and Applications.

[5]  William Colgrove Intelligent user interfaces and the Internet , 1995 .

[6]  William Buxton,et al.  Communicating with sound , 1987 .

[7]  Gerard Kempen Natural Language Generation , 1987 .

[8]  Mark T. Maybury,et al.  Planning Multimedia Explanations Using Communicative Acts , 1991, AAAI Workshop on Intelligent Multimedia Interfaces.

[9]  Mark T. Maybury Knowledge-based multimedia: The future of expert systems and multimedia , 1994 .

[10]  Mark T. Maybury Natural language generation , 1988 .

[11]  Jürgen Krause A Multilayered Empirical Approach to Multimodality: Towards Mixed Solutions of Natural Language and Graphical Interfaces , 1991, AAAI Workshop on Intelligent Multimedia Interfaces.

[12]  Eduard Hovy,et al.  Aspects of Automated Natural Language Generation , 1992, Lecture Notes in Computer Science.

[13]  Alex Waibel,et al.  Readings in speech recognition , 1990 .

[14]  Robert J. K. Jacob,et al.  What you look at is what you get: eye movement-based interaction techniques , 1990, CHI '90.

[15]  Wolfgang Wahlster,et al.  Combining Deictic Gestures and Natural Language for Referent Identification , 1986, COLING.

[16]  Mark T. Maybury,et al.  21. Communicative Acts for Multimedia and Multimodal Dialogue , 2000 .

[17]  Eduard H. Hovy,et al.  Automatic Generation of Formatted Text , 1991, AAAI.

[18]  Steven K. Feiner,et al.  Towards Coordinated Temporal Multimedia Presentations , 1993, AAAI Workshop on Intelligent Multimedia Interfaces.

[19]  Daniel D. Suthers,et al.  Using "Live Information" in a Multimedia Framework , 1991, AAAI Workshop on Intelligent Multimedia Interfaces.

[20]  Stuart C. Shapiro,et al.  Intelligent multi-media interface technology , 1991 .

[21]  Oliviero Stock,et al.  ALFRESCO: Enjoying the Combination of Natural Language Processing and Hypermedia for Information Exploration , 1991, AAAI Workshop on Intelligent Multimedia Interfaces.

[22]  Richard A. Bolt,et al.  Multi-modal natural dialogue , 1992, CHI '92.

[23]  Frank Fallside,et al.  Computer speech processing , 1985 .

[24]  Steven F. Roth,et al.  Automating the presentation of information , 1991, [1991] Proceedings. The Seventh IEEE Conference on Artificial Intelligence Application.

[25]  Mark T. Maybury,et al.  Intelligent multimedia interfaces , 1994, CHI Conference Companion.

[26]  A. Leroi‐Gourhan,et al.  Gesture and Speech , 1993 .

[27]  John Levine,et al.  Automatic Generation of On-Line Documentation in the IDAS Project , 1992, ANLP.

[28]  Karen Spärck Jones,et al.  Readings in natural language processing , 1986 .

[29]  Thomas Rist,et al.  The Design of Illustrated Documents as a Planning Task , 1993, AAAI Workshop on Intelligent Multimedia Interfaces.

[30]  Kristinn R. Thórisson,et al.  Integrating Simultaneous Input from Speech, Gaze, and Hand Gestures , 1991, AAAI Workshop on Intelligent Multimedia Interfaces.

[31]  Joe Marks,et al.  A formal specification scheme for network diagrams that facilitates automated design , 1991, J. Vis. Lang. Comput..

[32]  Alfred Kobsa,et al.  User Models in Dialog Systems , 1989, Symbolic Computation.

[33]  Bradley A. Goodman Multimedia Explanations for Intelligent Training Systems , 1991, AAAI Workshop on Intelligent Multimedia Interfaces.

[34]  Jock D. Mackinlay,et al.  Automating the design of graphical presentations of relational information , 1986, TOGS.

[35]  D. Bouwhuis,et al.  The Structure of Multimodal Dialogue , 1989 .

[36]  Wolfgang Wahlster,et al.  Planning Multimodal Discourse (Invited Talks Abstract) , 1993, ACL.

[37]  Steven K. Feiner,et al.  A grid-based approach to automating display layout , 1998 .

[38]  Yigal Arens,et al.  Presentation design using an integrated knowledge base , 1998 .

[39]  Roger B. Dannenberg,et al.  Multimedia interface design , 1992 .

[40]  John R. Searle,et al.  Speech Acts: An Essay in the Philosophy of Language , 1970 .

[41]  Joseph William Marks Automating the design of network diagrams , 1991 .

[42]  Mark T. Maybury,et al.  Planning multisentential English text using communicative acts , 1991 .

[43]  John D. Burger,et al.  The Application of Natural Language Models to Intelligent Multimedia , 1991, AAAI Workshop on Intelligent Multimedia Interfaces.

[44]  Andrea Bonarini,et al.  Modeling Issues in Multimedia Car-Driver Interaction , 1991, AAAI Workshop on Intelligent Multimedia Interfaces.

[45]  Wolfgang Wahlster,et al.  User and discourse models for multimodal communication , 1991 .

[46]  Steven K. Feiner,et al.  Automating the generation of coordinated multimedia explanations , 1991, Computer.

[47]  William C. Mann,et al.  Natural Language Generation in Artificial Intelligence and Computational Linguistics , 1990 .

[48]  Wolfgang Wahlster,et al.  WIP: The Automatic Synthesis of Multimodal Presentations , 1991, AAAI Workshop on Intelligent Multimedia Interfaces.

[49]  Ulrich Thiel,et al.  A Conversational Model of Multimodal Interaction in Information Systems , 1993, AAAI.

[50]  Ulrich Thiel,et al.  Knowledge Based Control of Visual Dialogues in Information Systems , 1992, Advanced Visual Interfaces.

[51]  Ronald J. Brachman,et al.  An overview of the KL-ONE Knowledge Representation System , 1985 .

[52]  Steven F. Roth,et al.  Data characterization for intelligent graphics presentation , 1990, CHI '90.

[53]  Giovanni Toffoli,et al.  Graphics, Hyperqueries, and Natural Language: An Integrated Approach to User-Computer Interfaces , 1992, International Working Conference on Advanced Visual Interfaces.

[54]  W. Buxton,et al.  A study in two-handed input , 1986, CHI '86.

[55]  Catherine Pelachaud Functional Decomposition of Facial Expressions for an Animation System , 1992, Advanced Visual Interfaces.

[56]  Eduard H. Hovy,et al.  On the Knowledge Underlying Multimedia Presentations , 1991, AAAI Workshop on Intelligent Multimedia Interfaces.

[57]  Steven F. Roth,et al.  Graphics and natural language as components of automatic explanation , 1991 .

[58]  James F. Allen Natural language understanding , 1987, Bejnamin/Cummings series in computer science.

[59]  Winfried Graf,et al.  Constraint-Based Graphical Layout of Multimodal Presentations , 1998, Advanced Visual Interfaces.