Main character detection in news and movie content

Advances in multimedia compression standards, data storage, digital hardware technology and network performance have led to a considerable increase in the amount of digital content being archived and made available online. As a result, data organization, representation and efficient search and retrieval from digital video repositories has seen increased interest from the research community in recent years. In order to facilitate access to desired media segments, many indexing techniques have been employed. Automatic content structuring is one enabling technology used to aid browse/ retrieval. Scene-level analysis and sports summarization are two examples of active research in this area. Content structuring can be considered as the task of building an ’’index” and/or ’’table of contents” for events or objects that occur throughout a programme. Our approach to content structuring is to build an index based on the reappearance of the main characters within the content. For news programmes, this can be used for temporal segmentation into individual news stories based on the fact that the anchorperson, the main ’’character” in this scenario signals the beginning of a news item. For movie content, this could provide enhanced random access browsing functionality to the end user. In this thesis we propose an approach to news story segmentation that uses low-level features and three different algorithms for temporal segmentation. We then extend this system to perform anchor-person detection using automatic face detection and clustering algorithms. An extensive manually marked up test set has been used to validate each component of our overall approach. Finally, we discuss how our approach could be extended to identify the main characters in movie content using similar classification techniques and directorial conventions.

[1]  Wang Yan Human face detection and location in complex background , 2000 .

[2]  David Doermann,et al.  Archiving, indexing, and retrieval of video in the compressed domain , 1996, Other Conferences.

[3]  S. L. Phung,et al.  A novel skin color model in YCbCr color space and its application to human face detection , 2002, Proceedings. International Conference on Image Processing.

[4]  Akio Nagasaka,et al.  Automatic Video Indexing and Full-Video Search for Object Appearances , 1991, VDB.

[5]  Wilson S. Geisler,et al.  Multichannel Texture Analysis Using Localized Spatial Filters , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  John S. Boreczky,et al.  Comparison of video shot boundary detection techniques , 1996, J. Electronic Imaging.

[7]  Yannis Avrithis,et al.  Broadcast news parsing using visual cues: a robust face detection approach , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[8]  Liang-Tien Chia,et al.  Motion Histogram: A New Motion Feature to Index Motion Content in Video Segment , 2004, IKE.

[9]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[10]  Ramin Zabih,et al.  A feature-based algorithm for detecting and classifying production effects , 1999, Multimedia Systems.

[11]  Xinbo Gao,et al.  Unsupervised video-shot segmentation and model-free anchorperson detection for news video story parsing , 2002, IEEE Trans. Circuits Syst. Video Technol..

[12]  Ioannis Pitas,et al.  Rule-based face detection in frontal views , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Narendra Ahuja,et al.  Detecting Faces in Images: A Survey , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Miroslaw Bober,et al.  Curvature Scale Space Representation: Theory, Applications, and MPEG-7 Standardization , 2011, Computational Imaging and Vision.

[15]  Lawrence Wai-Choong Wong,et al.  ANSES: Summarisation of News Video , 2003, CIVR.

[16]  Alex Pentland,et al.  Probabilistic Visual Learning for Object Representation , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Ioannis Pitas,et al.  A novel method for automatic face segmentation, facial feature extraction and tracking , 1998, Signal Process. Image Commun..

[18]  Ian Craw,et al.  Automatic extraction of face-features , 1987, Pattern Recognit. Lett..

[19]  M. Pawlewski,et al.  Face detection in colour images , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[20]  Noel E. O'Connor,et al.  Face detection and clustering for video indexing applications , 2003 .

[21]  John Smith,et al.  MPEG-7 Multimedia Content Description Standard , 2003 .

[22]  Alan Hanjalic,et al.  Semiautomatic news analysis, indexing, and classification system based on topic preselection , 1998, Electronic Imaging.

[23]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[24]  Stephen W. Smoliar,et al.  An integrated system for content-based video retrieval and browsing , 1997, Pattern Recognit..

[25]  Stéphane Mallat,et al.  A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Takeo Kanade,et al.  Neural Network-Based Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Alan F. Smeaton,et al.  News story segmentation in the Fischlar video indexing system , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[28]  Liu Huayong,et al.  The segmentation of news video into story units , 2005 .

[29]  Kiyoharu Aizawa,et al.  Detection and tracking of facial features , 1995, Other Conferences.

[30]  Shih-Fu Chang,et al.  Image Retrieval: Current Techniques, Promising Directions, and Open Issues , 1999, J. Vis. Commun. Image Represent..

[31]  Atreyi Kankanhalli,et al.  Automatic partitioning of full-motion video , 1993, Multimedia Systems.

[32]  Barry Smyth,et al.  A personalised TV listings service for the digital TV age , 2000, Knowl. Based Syst..

[33]  Wayne H. Wolf,et al.  Hidden Markov model parsing of video programs , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[34]  Jean-Marc Odobez,et al.  Text detection, recognition in images and video frames , 2004, Pattern Recognit..

[35]  Ioannis Pitas,et al.  Content-based video parsing and indexing based on audio-visual interaction , 2001, IEEE Trans. Circuits Syst. Video Technol..

[36]  B. K. Low,et al.  A fast and accurate algorithm for facial feature segmentation , 1997, Proceedings of International Conference on Image Processing.

[37]  Soo-Chang Pei,et al.  Efficient MPEG Compressed Video Analysis Using Macroblock Type Information , 1999, IEEE Trans. Multim..

[38]  Zhu Liu,et al.  Major cast detection in video using both audio and visual information , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[39]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[40]  Ajay Divakaran,et al.  Descriptor for spatial distribution of motion activity for compressed video , 1999, Electronic Imaging.

[41]  Howard D. Wactlar,et al.  Complementary video and audio analysis for broadcast news archives , 2000, CACM.

[42]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[43]  Lei Chen,et al.  Rule-based scene extraction from video , 2002, Proceedings. International Conference on Image Processing.

[44]  M. Burl,et al.  Face Localization via Shape Statistics , 1995 .

[45]  Shih-Fu Chang,et al.  A statistical framework for fusing mid-level perceptual features in news story segmentation , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[46]  David C. Gibbon,et al.  Multi-modal system for locating heads and faces , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[47]  Fabrizio Smeraldi,et al.  Saccadic search with Gabor features applied to eye detection and real-time head tracking , 2000, Image Vis. Comput..

[48]  Michael G. Christel Visual digests for news video libraries , 1999, MULTIMEDIA '99.

[49]  Alan F. Smeaton,et al.  Evaluation of automatic shot boundary detection on a large video test suite , 1999 .

[50]  Tomaso A. Poggio,et al.  Example-Based Learning for View-Based Human Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[51]  Hideo Hashimoto,et al.  Video indexing using motion vectors , 1992, Other Conferences.

[52]  Thomas S. Huang,et al.  Constructing table-of-content for videos , 1999, Multimedia Systems.

[53]  Shin'ichi Satoh,et al.  Comparative evaluation of face sequence matching for content-based video access , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[54]  Alan Hanjalic,et al.  Template-based detection of anchorperson shots in news programs , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[55]  Wolfgang Effelsberg,et al.  On the detection and recognition of television commercials , 1997, Proceedings of IEEE International Conference on Multimedia Computing and Systems.

[56]  Roberto Cipolla,et al.  A probabilistic framework for perceptual grouping of features for human face detection , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[57]  Matti Pietikäinen,et al.  An Experimental Comparison of Autoregressive and Fourier-Based Descriptors in 2D Shape Classification , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[58]  Bo Shen,et al.  Direct feature extraction from compressed images , 1996, Electronic Imaging.

[59]  C.-C. Jay Kuo,et al.  Video Content Analysis Using Multimodal Information , 2003, Springer US.

[60]  Boon-Lock Yeo,et al.  Rapid scene analysis on compressed video , 1995, IEEE Trans. Circuits Syst. Video Technol..

[61]  Arding Hsu,et al.  Feature management for large video databases , 1993, Electronic Imaging.

[62]  Noel Murphy,et al.  Automatic TV advertisement detection from MPEG bitstream , 2002, Pattern Recognit..

[63]  Michael I. Jordan,et al.  Generic constraints on underspecified target trajectories , 1989, International 1989 Joint Conference on Neural Networks.

[64]  Alberto Del Bimbo,et al.  Content-based indexing and retrieval of TV news , 2001, Pattern Recognit. Lett..

[65]  Andrew W. Fitzgibbon,et al.  On Affine Invariant Clustering and Automatic Cast Listing in Movies , 2002, ECCV.

[66]  Steve J. Young,et al.  HMM-based architecture for face identification , 1994, Image Vis. Comput..

[67]  Noel E. O'Connor,et al.  Facial Feature Extraction and Principal Component Analysis for Face Detection in Color Images , 2004, ICIAR.

[68]  Wei-Hao Lin,et al.  News video classification using SVM-based multimodal classifiers and combination strategies , 2002, MULTIMEDIA '02.

[69]  L. Farkas,et al.  Anthropometric Facial Proportions in Medicine , 1986 .

[70]  E. Callenbach Grammar of the Film Language . Daniel Arjon. , 1993 .

[71]  Shih-Fu Chang,et al.  Scene change detection in an MPEG-compressed video sequence , 1995, Electronic Imaging.

[72]  A. Murat Tekalp,et al.  Video indexing through integration of syntactic and semantic features , 1996, Proceedings Third IEEE Workshop on Applications of Computer Vision. WACV'96.

[73]  Hiroshi Murase,et al.  Unsupervised face recognition by associative chaining , 2003, Pattern Recognit..

[74]  A. Aydin Alatan Automatic multi-modal dialogue scene indexing , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[75]  Mark T. Maybury,et al.  Broadcast news navigation using story segmentation , 1997, MULTIMEDIA '97.

[76]  Gerhard Rigoll,et al.  Automatic topic identification in multimedia broadcast data , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[77]  Shih-Fu Chang,et al.  Survey of compressed-domain features used in audio-visual indexing and analysis , 2003, J. Vis. Commun. Image Represent..

[78]  Takeo Kanade,et al.  Intelligent Access to Digital Video: Informedia Project , 1996, Computer.

[79]  Richard A. Foulds,et al.  Toward robust skin identification in video images , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[80]  Yihong Gong,et al.  Automatic parsing of news video , 1994, 1994 Proceedings of IEEE International Conference on Multimedia Computing and Systems.

[81]  Regina Barzilay,et al.  Using Lexical Chains for Text Summarization , 1997 .

[82]  Wayne H. Wolf,et al.  Key frame selection by motion analysis , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[83]  Yueting Zhuang,et al.  Adaptive key frame extraction using unsupervised clustering , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[84]  Ichiro Ide,et al.  Scene identification in news video by character region segmentation , 2000, MULTIMEDIA '00.

[85]  Tobun Dorbin Ng,et al.  Informedia at TRECVID 2003 : Analyzing and Searching Broadcast News Video , 2003, TRECVID.

[86]  Konstantinos N. Plataniotis,et al.  Automatic location and tracking of the facial region in color video sequences , 1999, Signal Process. Image Commun..

[87]  Noel E. O'Connor,et al.  Temporal video segmentation for real-time key frame extraction , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[88]  Chin-Chuan Han,et al.  Facial feature detection using geometrical face model: An efficient approach , 1998, Pattern Recognit..

[89]  K. R. Rao,et al.  Techniques and Standards for Image, Video, and Audio Coding , 1996 .

[90]  Ichiro Ide,et al.  Automatic Video Indexing Based on Shot Classification , 1998, AMCP.

[91]  Clement T. Yu,et al.  Detecting human faces in color images , 1998, Proceedings International Workshop on Multi-Media Database Management Systems (Cat. No.98TB100249).

[92]  Takeo Kanade,et al.  Probabilistic modeling of local appearance and spatial relationships for object recognition , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[93]  Shin'ichi Satoh,et al.  Towards actor/actress identification in drama videos , 1999, MULTIMEDIA '99.

[94]  Yasuo Ariki,et al.  Extraction of TV news articles based on scene cut detection using DCT clustering , 1996, Proceedings of 3rd IEEE International Conference on Image Processing.

[95]  Stefan Eickeler,et al.  Content-based video indexing of TV broadcast news using hidden Markov models , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[96]  Loong Fah Cheong,et al.  Parsing video programs into individual segments using FSA modeling , 2002, Proceedings. International Conference on Image Processing.

[97]  Alan F. Smeaton,et al.  A generic news story segmentation system and its evaluation , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[98]  David S. Doermann,et al.  Video summarization by curve simplification , 1998, MULTIMEDIA '98.

[99]  B. S. Manjunath,et al.  Representation of motion activity in hierarchical levels for video indexing and filtering , 2002, Proceedings. International Conference on Image Processing.

[100]  Tobun Dorbin Ng,et al.  Collages as dynamic summaries for news video , 2002, MULTIMEDIA '02.

[101]  Cyrus Shahabi,et al.  Shape Analysis and Retrieval of Multimedia Objects , 2002, Multimedia Systems and Applications.

[102]  Kikukawa Takeshi,et al.  Development of an Automatic Summary Editing System for the Audio Visual Resources. , 1992 .

[103]  Wolfgang Effelsberg,et al.  Automatic recognition of film genres , 1995, MULTIMEDIA '95.

[104]  Kyu Ho Park,et al.  Automatic human face location in a complex background using motion and color information , 1996, Pattern Recognit..

[105]  John R. Kender,et al.  Finding skin in color images , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[106]  Takeo Kanade,et al.  Name-It: Naming and Detecting Faces in News Videos , 1999, IEEE Multim..

[107]  Djemel Ziou,et al.  Edge Detection Techniques-An Overview , 1998 .

[108]  Yihong Gong,et al.  Lessons Learned from Building a Terabyte Digital Video Library , 1999, Computer.

[109]  John M. Gauch,et al.  The VISION Digital Video Library Project , 1998 .

[110]  Peng Wang,et al.  A hybrid approach to news video classification multimodal features , 2003, Fourth International Conference on Information, Communications and Signal Processing, 2003 and the Fourth Pacific Rim Conference on Multimedia. Proceedings of the 2003 Joint.

[111]  James L. Crowley,et al.  Multi-modal tracking of faces for video communications , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.