Narrative theme navigation for sitcoms supported by fan-generated scripts

The following article provides the definitive description of the complete Joke-O-Mat system to navigate sitcoms as presented briefly in Friedland et al. (2009) and extended in Janin et al. (2010), which was augmented with fan-generated scripts as described in Friedland et al. (2010). The system with the extension allows a user to browse a sitcom by scene, punchline, and dialog segment, and to filter these themes by actor and by keyword. For example, the user can choose to watch only punchlines by the character “Kramer” that contain the word “armoire”. The system infers the narrative themes and provides word-level search by automatically aligning the output of a speaker identification system and a speech recognizer to both closed captions and scripts generated by fans on the Internet. The segmentations produced by this system have proven to be indistinguishable from expert-generated segmentations, and require significantly less time to produce. The article describes the original and the extended Joke-O-Mat (http://www.icsi.berkeley.edu/jokeomat/) system, discusses problems with the use of fan-generated content, and presents results on episodes from the sitcom Seinfeld with regards to segmentation accuracy and overall user satisfaction as determined by a human-subject study.

[1]  Gerald Friedland,et al.  Live speaker identification in conversations , 2008, ACM Multimedia.

[2]  Tobun Dorbin Ng,et al.  Collages as dynamic summaries for news video , 2002, MULTIMEDIA '02.

[3]  Edward Y. Chang,et al.  Multimodal concept-dependent active learning for image retrieval , 2004, MULTIMEDIA '04.

[4]  Amarnath Gupta,et al.  Visual information retrieval , 1997, CACM.

[5]  Changsheng Xu,et al.  Live sports event detection based on broadcast video and web-casting text , 2006, MM '06.

[6]  Anthony Hoogs,et al.  Video content annotation using visual analysis and a large semantic knowledgebase , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[7]  Shih-Fu Chang,et al.  A fully automated content-based video search engine supporting spatiotemporal queries , 1998, IEEE Trans. Circuits Syst. Video Technol..

[8]  Gerald Friedland,et al.  Joke-o-Mat HD: browsing sitcoms with human derived transcripts , 2010, ACM Multimedia.

[9]  Marcel Worring,et al.  Query on demand video browsing , 2007, ACM Multimedia.

[10]  Gerald Friedland,et al.  Narrative theme navigation for sitcoms supported by fan-generated scripts , 2010, AIEMPro 2010.

[11]  John R. Kender,et al.  VAST MM: multimedia browser for presentation video , 2007, CIVR '07.

[12]  Mohan S. Kankanhalli,et al.  Proceedings of the 2008 international conference on Content-based image and video retrieval , 2008 .

[13]  R. Brunelli,et al.  A Survey on the Automatic Indexing of Video Data, , 1999, J. Vis. Commun. Image Represent..

[14]  Martha Larson,et al.  Overview of VideoCLEF 2008: Automatic Generation of Topic-based Feeds for Dual Language Audio-Visual Content , 2008, CLEF.

[15]  Eric Bruno,et al.  Design of Multimodal Dissimilarity Spaces for Retrieval of Video Documents , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Mohan S. Kankanhalli,et al.  Proceedings of the 13th annual ACM international conference on Multimedia , 2005, MM 2005.

[17]  Marcel Worring,et al.  Concept-Based Video Retrieval , 2009, Found. Trends Inf. Retr..

[18]  Jean-Luc Gauvain,et al.  The LIMSI Broadcast News transcription system , 2002, Speech Commun..

[19]  Mohamed Abdel-Mottaleb,et al.  Audio scene segmentation for video with generic content , 2008, Electronic Imaging.

[20]  Tat-Seng Chua Towards the next plateau: innovative multimedia research beyond trecvid , 2007, ACM Multimedia.

[21]  Franciska de Jong,et al.  Annotation of Heterogeneous Multimedia Content Using Automatic Speech Recognition , 2007, SAMT.

[22]  Douglas A. Reynolds,et al.  Approaches and applications of audio diarization , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[23]  Marijn Huijbregts,et al.  The ICSI RT07s Speaker Diarization System , 2007, CLEAR.

[24]  Marcel Worring,et al.  Building a visual ontology for video retrieval , 2005, MULTIMEDIA '05.

[25]  Gerald Friedland,et al.  Using Artistic Markers and Speaker Identification for Narrative-Theme Navigation of Seinfeld Episodes , 2009, 2009 11th IEEE International Symposium on Multimedia.

[26]  Takeo Kanade,et al.  Intelligent Access to Digital Video: Informedia Project , 1996, Computer.

[27]  Douglas A. Reynolds,et al.  Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..

[28]  John Adcock,et al.  Experiments in interactive video search by addition and subtraction , 2008, CIVR '08.

[29]  Qibin Sun,et al.  Video Browsing on Handheld Devices—Interface Designs for the Next Generation of Mobile Video Players , 2008, IEEE MultiMedia.

[30]  Shih-Fu Chang,et al.  MediaNet: a multimedia information network for knowledge representation , 2000, SPIE Optics East.

[31]  Rong Yan,et al.  IBM multimedia search and retrieval system , 2007, CIVR '07.

[32]  Gerald Friedland,et al.  3rd international workshop on automated information extraction in media production , 2010, ACM Multimedia.

[33]  de Franciska Jong,et al.  OLIVE: Speech-Based Video Retrieval , 1998 .

[34]  Takeo Kanade,et al.  Name-It: Naming and Detecting Faces in News Videos , 1999, IEEE Multim..

[35]  Karen Spärck Jones,et al.  Automatic content-based retrieval of broadcast news , 1995, MULTIMEDIA '95.

[36]  Sid-Ahmed Berrani,et al.  A non-supervised approach for repeated sequence detection in TV broadcast streams , 2008, Signal Process. Image Commun..

[37]  Chuohao Yeo,et al.  Visual speaker localization aided by acoustic models , 2009, MM '09.

[38]  Gerald Friedland,et al.  Joke-o-mat: browsing sitcoms punchline by punchline , 2009, ACM Multimedia.

[39]  Henning Schulzrinne,et al.  Proceedings of the 12th annual ACM international conference on Multimedia , 2004, MM 2004.

[40]  Alberto Del Bimbo,et al.  Automatic video annotation using ontologies extended with visual information , 2005, MULTIMEDIA '05.

[41]  Gerald Friedland,et al.  Towards Semantic Analysis of Conversations: A System for the Live Identification of Speakers in Meetings , 2008, 2008 IEEE International Conference on Semantic Computing.

[42]  Stéphane Ayache,et al.  Evaluation of Active Learning Strategies for Video Indexing , 2007, 2007 International Workshop on Content-Based Multimedia Indexing.