Video content summarization and augmentation based on structural semantic processing and social network analysis

Abstract Video summarization techniques have been proposed for years to offer people comprehensive understanding of a whole story on video. However, although these traditional methods give brief summaries for users, they still do not provide concept‐organized or structural views. Besides, the knowledge they offer to users is often limited to existing videos. In this study, we present a structural video content summarization that utilizes the four kinds of entities, “who,” “what,” “where,” and “when,” to establish the framework of the video contents. Relevant media associated with each entity in the online resource are also analyzed to enrich existing contents. With the above‐mentioned information, the structure of the story and its complementary knowledge can be built up according to the entities. Therefore, users can not only browse the video efficiently but also focus on what they are interested in. In order to construct the fundamental system, we employ the maximum entropy criterion to integrate visual and text features extracted from video frames and speech transcripts, generating high‐level concept entities. Shots are linked together based on their contents. After constructing the relational graph, we exploit the graph entropy model to detect meaningful shots and relations. The social network analysis based on the Markov clustering algorithm is performed to explore relevant information online. The results demonstrate that our system can achieve excellent performance and information coverage.

[1]  Lexing Xie,et al.  Modeling personal and social network context for event annotation in images , 2007, JCDL '07.

[2]  NgoChong-Wah,et al.  Video summarization and scene detection by graph modeling , 2005 .

[3]  I. King,et al.  A Novel Video Summarization Framework for Document Preparation and Archival Applications , 2005, 2005 IEEE Aerospace Conference.

[4]  Chia-Hung Yeh,et al.  Techniques for movie content analysis and skimming: tutorial and overview on video abstraction techniques , 2006, IEEE Signal Processing Magazine.

[5]  Ioannis Pitas,et al.  Information theory-based shot cut/fade detection and video summarization , 2006, IEEE Transactions on Circuits and Systems for Video Technology.

[6]  Philip H. S. Torr,et al.  The Development and Comparison of Robust Methods for Estimating the Fundamental Matrix , 1997, International Journal of Computer Vision.

[7]  Chong-Wah Ngo,et al.  Video summarization and scene detection by graph modeling , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[8]  Huiru Zheng,et al.  Integration of Genomic Data for Inferring Protein Complexes from Global Protein–Protein Interaction Networks , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[9]  Thomas S. Huang,et al.  Exploring video structure beyond the shots , 1998, Proceedings. IEEE International Conference on Multimedia Computing and Systems (Cat. No.98TB100241).

[10]  Anthony Hoogs,et al.  Video content annotation using visual analysis and a large semantic knowledgebase , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[11]  Dennis McLeod,et al.  Retrieval effectiveness of an ontology-based model for information selection , 2004, The VLDB Journal.

[12]  Shih-Fu Chang,et al.  Automatic Multimedia Knowledge Discovery, Summarization and Evaluation , 2003 .

[13]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[14]  U. Brandes A faster algorithm for betweenness centrality , 2001 .

[15]  Tina Eliassi-Rad,et al.  Visual Analysis of Large Heterogeneous Social Networks by Semantic and Structural Abstraction , 2006 .

[16]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[17]  Ulrik Brandes,et al.  Visual Statistics for Collections of Clustered Graphs , 2008, 2008 IEEE Pacific Visualization Symposium.

[18]  Andrew W. Moore,et al.  X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[19]  George A. Miller,et al.  Nouns in WordNet: A Lexical Inheritance System , 1990 .

[20]  Stijn van Dongen,et al.  Graph Clustering Via a Discrete Uncoupling Process , 2008, SIAM J. Matrix Anal. Appl..

[21]  Anette Hulth,et al.  A Study on Automatically Extracted Keywords in Text Categorization , 2006, ACL.

[22]  Zygmunt Pizlo,et al.  Automated video program summarization using speech transcripts , 2006, IEEE Transactions on Multimedia.

[23]  Jianping Fan,et al.  Hierarchical video content description and summarization using unified semantic and visual similarity , 2003, Multimedia Systems.

[24]  Guizhong Liu,et al.  A Multiple Visual Models Based Perceptive Analysis Framework for Multilevel Video Summarization , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[25]  Shih-Fu Chang,et al.  Generative, discriminative, and ensemble learning on multi-modal perceptual fusion toward news video story segmentation , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[26]  Hugo Liu,et al.  Unpacking Meaning from Words: A Context-Centered Approach to Computational Lexicon Design , 2003, CONTEXT.

[27]  Jafar Adibi,et al.  Discovering important nodes through graph entropy the case of Enron email database , 2005, LinkKDD '05.

[28]  Thomas S. Huang,et al.  Automatic Video Annotation by Mining Speech Transcripts , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[29]  Lie Lu,et al.  A generic framework of user attention model and its application in video summarization , 2005, IEEE Trans. Multim..

[30]  Wan Kyun Chung,et al.  Data Association Using Visual Object Recognition for EKF-SLAM in Home Environment , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[31]  Anton J. Enright,et al.  Detection of functional modules from protein interaction networks , 2003, Proteins.

[32]  Yuxin Peng,et al.  Clip-based similarity measure for query-dependent clip retrieval and video summarization , 2006, IEEE Trans. Circuits Syst. Video Technol..

[33]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.