Scene Boundary Detection from Movie Dialogue: A Genetic Algorithm Approach

Movie scripts are a rich textual resource that can be tapped for movie content analysis. This article describes a mechanism for fragmenting a sequence of movie script dialogue into scene-wise groups. In other words, it attempts to locate scene transitions using information acquired from a sequence of dialogue units. We collect movie scripts from a web archive. Thereafter, we preprocess them to develop a resource of dialogues. We feed the dialogue sequence from a script to a Genetic Algorithm (GA) framework. The system fragments the sequence into adjacent groups of dialogue units or output 'scenes'. We use SentiWordnet scores and Wordnet distance for dialogue units to optimize this grouping so that adjacent scenes are semantically most dissimilar. Then we compare the resulting fragmented dialogue sequence with the original scene-wise alignment of dialogue in the script.

[1]  Arnav Jhala,et al.  Exploiting Structure and Conventions of Movie Scripts for Information Retrieval and Text Mining , 2008, ICIDS.

[2]  Jonathan Foote,et al.  Scene boundary detection via video self-similarity analysis , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[3]  Lawrence. Davis,et al.  Handbook Of Genetic Algorithms , 1990 .

[4]  GeunSik Jo,et al.  Exploiting Script-Subtitles Alignment to Scene Boundary Dectection in Movie , 2010, 2010 IEEE International Symposium on Multimedia.

[5]  Ben Taskar,et al.  Movie/Script: Alignment and Parsing of Video and Text Transcription , 2008, ECCV.

[6]  Christopher D. Manning,et al.  Stanford typed dependencies manual , 2010 .

[7]  Goldberg,et al.  Genetic algorithms , 1993, Robust Control Systems with Genetic Algorithms.

[8]  Nevenka Dimitrova,et al.  Screenplay alignment for closed-system speaker identification and analysis of feature films , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[9]  Marilyn A. Walker,et al.  An Annotated Corpus of Film Dialogue for Learning and Characterizing Character Style , 2012, LREC.

[10]  Rafael E. Banchs Movie-DiC: a Movie Dialogue Corpus for Research and Development , 2012, ACL.

[11]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[12]  Tat-Seng Chua,et al.  A cinematic-based framework for scene boundary detection in video , 2003, The Visual Computer.

[13]  Marilyn A. Walker,et al.  All the World's a Stage: Learning Character Models from Film , 2011, AIIDE.

[14]  Andrea Esuli,et al.  SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining , 2010, LREC.

[15]  Jason J. Jung,et al.  Emotion-based character clustering for managing story-based contents: a cinemetric analysis , 2012, Multimedia Tools and Applications.