Narrative theme navigation for sitcoms supported by fan-generated scripts

The following article presents a novel method to generate indexing information for the navigation of TV content and presents an implementation that extends the Joke-O-Mat sitcom navigation system, presented in [1]. The extended system enhances Joke-o-mat's capability to browse a sitcom by scene, punchline, dialog segment, and actor with word-level keyword search. The indexing is performed based on the alignment of the multimedia content with closed captions and "found" fan-generated scripts processed with speech and speaker recognition systems. This significantly reduces the amount of manual intervention required for training new episodes, and the final narrative-theme segmentation has proven indistinguishable from expert annotation. This article describes the new Joke-o-mat system, discusses problems with using fan-generated data, and presents results on episodes from the sitcom Seinfeld, showing segmentation accuracy and user satisfaction as determined by a human-subject study.

[1]  Martha Larson,et al.  Overview of VideoCLEF 2008: Automatic Generation of Topic-based Feeds for Dual Language Audio-Visual Content , 2008, CLEF.

[2]  Takeo Kanade,et al.  Name-It: Naming and Detecting Faces in News Videos , 1999, IEEE Multim..

[3]  John Adcock,et al.  Experiments in interactive video search by addition and subtraction , 2008, CIVR '08.

[4]  Karen Spärck Jones,et al.  Automatic content-based retrieval of broadcast news , 1995, MULTIMEDIA '95.

[5]  KanadeTakeo,et al.  Intelligent Access to Digital Video , 1996 .

[6]  Stéphane Ayache,et al.  Evaluation of active learning strategies for video indexing , 2007, Signal Process. Image Commun..

[7]  R. Brunelli,et al.  A Survey on the Automatic Indexing of Video Data, , 1999, J. Vis. Commun. Image Represent..

[8]  Chuohao Yeo,et al.  Visual speaker localization aided by acoustic models , 2009, MM '09.

[9]  Marcel Worring,et al.  Concept-Based Video Retrieval , 2009, Found. Trends Inf. Retr..

[10]  Gerald Friedland,et al.  Joke-o-mat: browsing sitcoms punchline by punchline , 2009, ACM Multimedia.

[11]  Tat-Seng Chua Towards the next plateau: innovative multimedia research beyond trecvid , 2007, ACM Multimedia.

[12]  Marijn Huijbregts,et al.  The ICSI RT07s Speaker Diarization System , 2007, CLEAR.

[13]  Marcel Worring,et al.  Building a visual ontology for video retrieval , 2005, MULTIMEDIA '05.

[14]  Gerald Friedland,et al.  Using Artistic Markers and Speaker Identification for Narrative-Theme Navigation of Seinfeld Episodes , 2009, 2009 11th IEEE International Symposium on Multimedia.

[15]  de Franciska Jong,et al.  OLIVE: Speech-Based Video Retrieval , 1998 .

[16]  Alberto Del Bimbo,et al.  Automatic video annotation using ontologies extended with visual information , 2005, MULTIMEDIA '05.

[17]  Gerald Friedland,et al.  Live speaker identification in conversations , 2008, ACM Multimedia.

[18]  Gerald Friedland,et al.  Towards Semantic Analysis of Conversations: A System for the Live Identification of Speakers in Meetings , 2008, 2008 IEEE International Conference on Semantic Computing.

[19]  Tobun Dorbin Ng,et al.  Collages as dynamic summaries for news video , 2002, MULTIMEDIA '02.

[20]  Eric Bruno,et al.  Design of Multimodal Dissimilarity Spaces for Retrieval of Video Documents , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Anthony Hoogs,et al.  Video content annotation using visual analysis and a large semantic knowledgebase , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[22]  Douglas A. Reynolds,et al.  Approaches and applications of audio diarization , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[23]  Qibin Sun,et al.  Video Browsing on Handheld Devices—Interface Designs for the Next Generation of Mobile Video Players , 2008, IEEE MultiMedia.

[24]  Amarnath Gupta,et al.  Visual information retrieval , 1997, CACM.

[25]  Douglas A. Reynolds,et al.  Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..

[26]  Edward Y. Chang,et al.  Multimodal concept-dependent active learning for image retrieval , 2004, MULTIMEDIA '04.

[27]  Shih-Fu Chang,et al.  A fully automated content-based video search engine supporting spatiotemporal queries , 1998, IEEE Trans. Circuits Syst. Video Technol..

[28]  Gerald Friedland,et al.  Joke-o-Mat HD: browsing sitcoms with human derived transcripts , 2010, ACM Multimedia.

[29]  Marcel Worring,et al.  Query on demand video browsing , 2007, ACM Multimedia.

[30]  John R. Kender,et al.  VAST MM: multimedia browser for presentation video , 2007, CIVR '07.

[31]  Sid-Ahmed Berrani,et al.  A non-supervised approach for repeated sequence detection in TV broadcast streams , 2008, Signal Process. Image Commun..

[32]  Shih-Fu Chang,et al.  MediaNet: a multimedia information network for knowledge representation , 2000, SPIE Optics East.

[33]  Jean-Luc Gauvain,et al.  The LIMSI Broadcast News transcription system , 2002, Speech Commun..

[34]  Mohamed Abdel-Mottaleb,et al.  Audio scene segmentation for video with generic content , 2008, Electronic Imaging.

[35]  Rong Yan,et al.  IBM multimedia search and retrieval system , 2007, CIVR '07.

[36]  Takeo Kanade,et al.  Intelligent Access to Digital Video: Informedia Project , 1996, Computer.

[37]  Changsheng Xu,et al.  Live sports event detection based on broadcast video and web-casting text , 2006, MM '06.