Automatic cinematography and multilingual NLG for generating video documentaries

Automatically constructing a complete documentary or educational film from scattered pieces of images and knowledge is a significant challenge. Even when this information is provided in an annotated format, the problems of ordering, structuring and animating sequences of images, and producing natural language descriptions that correspond to those images within multiple constraints, are each individually difficult tasks. This paper describes an approach for tackling these problems through a combination of rhetorical structures with narrative and film theory to produce movie-like visual animations from still images along with natural language generation techniques needed to produce text descriptions of what is being seen in the animations. The use of rhetorical structures from NLG is used to integrate separate components for video creation and script generation. We further describe an implementation, named Glamour, that produces actual, short video documentaries, focusing on a cultural heritage domain, and that have been evaluated by professional filmmakers.

[1]  William H. Bares,et al.  A Model for Constraint-Based Camera Planning , 2000 .

[2]  Fiammetta Namer Subject Erasing And Pronominalization In Italian Text Generation , 1989, EACL.

[3]  S. Eisenstein,et al.  The Film Sense , 1942 .

[4]  Maria Roussou,et al.  Multilingual Personalized Information Objects , 2005 .

[5]  Ehud Reiter,et al.  Book Reviews: Building Natural Language Generation Systems , 2000, CL.

[6]  Charles B. Callaway,et al.  Porting to an Italian Surface Realizer: A Case Study , 2003, ENLG@EACL.

[7]  Barbara Di Eugenio,et al.  Centering theory and the Italian pronominal system , 1990, COLING.

[8]  Andreas Butz,et al.  Anymation with CATHI , 1997, AAAI 1997.

[9]  David Salesin,et al.  Declarative Camera Control for Automatic Cinematography , 1996, AAAI/IAAI, Vol. 1.

[10]  Alan W. Black,et al.  Flite: a small fast run-time synthesis engine , 2001, SSW.

[11]  Wolfgang Wahlster,et al.  Plan-Based Integration of Natural Language and Graphics Generation , 1993, Artif. Intell..

[12]  Elena Not,et al.  Building Adaptive Information Presentations from Existing Information Repositories , 2001 .

[13]  Ivana Kruijff-Korbayová,et al.  Resources for Multilingual Text Generation in Three Slavic Languages , 2000, LREC.

[14]  David Evans,et al.  Tracking and summarizing news on a daily basis with Columbia's Newsblaster , 2002 .

[15]  Yishai A. Feldman,et al.  Knowledge-Based Cinematography and Its Applications , 2004, ECAI.

[16]  William C. Mann,et al.  RHETORICAL STRUCTURE THEORY: A THEORY OF TEXT ORGANIZATION , 1987 .

[17]  Charles B. Callaway,et al.  Multilingual Natural Language Generation for 3D Learning Environments , 1999 .

[18]  Charles B. Callaway,et al.  Multilingual Revision , 2003, ENLG@EACL.

[19]  D. Arijon,et al.  Grammar of Film Language , 1976 .

[20]  M. Halliday,et al.  Language, Context, and Text: Aspects of Language in a Social-Semiotic Perspective , 1989 .

[21]  Kathleen McKeown,et al.  Text generation: using discourse strategies and focus constraints to generate natural language text , 1985 .

[22]  John A. Bateman,et al.  Multilingual Grammars and Multilingual Lexicons for Multilingual Text Generation , 1998 .

[23]  James C. Lester,et al.  Evaluating the Effects of Natural Language Generation Techniques on Reader Satisfaction , 2001 .

[24]  Michael ODonnell,et al.  RSTTool 2.4 - A markup Tool for Rhetorical Structure Theory , 2000, INLG.

[25]  Antonio Krüger,et al.  The museum visit: generating seamless personalized presentations on multiple devices , 2004, IUI '04.

[26]  Tsvi Kuflik,et al.  Personal reporting of a museum visit as an entrypoint to future cultural experience , 2005, IUI '05.

[27]  Michael Elhadad,et al.  FUF: the Universal Unifier User Manual Version 5.2 , 1991 .

[28]  Peter A. Heeman,et al.  An Annotation Scheme for C , 2006 .

[29]  Cinzia Avesani,et al.  A modified "PaIntE" model for Italian TTS , 2002, Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002..

[30]  Mark T. Maybury,et al.  Intelligent multimedia interfaces , 1994, CHI Conference Companion.

[31]  Cinzia Avesani,et al.  Festival speaks Italian! , 2001, INTERSPEECH.

[32]  Cécile Paris,et al.  A Support Tool for Writing Multilingual Instructions , 1995, IJCAI.

[33]  Patrick Olivier,et al.  CamPlan: A Camera Planning Agent , 2000 .

[34]  C. Habel,et al.  Language , 1931, NeuroImage.

[35]  James C. Lester,et al.  Student-Sensitive Multimodal Explanation Generation for 3D Learning Environments , 1999, AAAI/IAAI.

[36]  Steven K. Feiner,et al.  Automated presentation planning of animation using task decomposition with heuristic reasoning , 1993 .

[37]  C. Metz Film Language: A Semiotics of the Cinema , 1974 .

[38]  MellishChris,et al.  A Reference Architecture for Natural Language Generation Systems , 2006 .

[39]  Sabine Geldof,et al.  Generating more natural route descriptions , 2002 .

[40]  James D. Hollan,et al.  STEAMER: An Interactive Inspectable Simulation-Based Training System , 1984, AI Mag..

[41]  Oliviero Stock Language-Based Interfaces and Their Application for Cultural Tourism , 2001, AI Mag..

[42]  Donia Scott The Multilingual Generation Game: Authoring Fluent Texts in Unfamiliar Languages , 1999, IJCAI.

[43]  Greg P. Kearsley,et al.  Artificial intelligence and instruction: Applications and methods , 1987 .

[44]  Ehud Reiter,et al.  NLG vs. Templates , 1995, ArXiv.

[45]  James C. Lester,et al.  Pronominalization in Generated Discourse and Dialogue , 2002, ACL.

[46]  Chris Mellish,et al.  An annotation scheme for concept-to-speech synthesis. , 1999 .

[47]  Ehud Reiter,et al.  Has a Consensus NL Generation Architecture Appeared, and is it Psycholinguistically Plausible? , 1994, INLG.

[48]  James C. Lester,et al.  Narrative prose generation , 2001, Artif. Intell..

[49]  James C. Lester,et al.  Realtime Constraint-Based Cinematography for Complex Interactive 3D Worlds , 1998, AAAI/IAAI.