Semantic Analysis for Automatic Event Recognition and Segmentation of Wedding Ceremony Videos

Wedding is one of the most important ceremonies in our lives. It symbolizes the birth and creation of a new family. In this paper, we present a system for automatically segmenting a wedding ceremony video into a sequence of recognizable wedding events, e.g., the couple's wedding kiss. Our goal is to develop an automatic tool that helps users to efficiently organize, search, and retrieve his/her treasured wedding memories. Furthermore, the obtained event descriptions could benefit and complement the current research in semantic video understanding. Based on the knowledge of wedding customs, a set of audiovisual features, relating to the wedding contexts of speech/music types, applause activities, picture-taking activities, and leading roles, are exploited to build statistical models for each wedding event. Thirteen wedding events are then recognized by a hidden Markov model, which takes into account both the fitness of observed features and the temporal rationality of event ordering to improve the segmentation accuracy. We conducted experiments on a collection of wedding videos and the promising results demonstrate the effectiveness of our approach. Comparisons with conditional random fields show that the proposed approach is more effective in this application domain.

[1]  Wen-Huang Cheng,et al.  Semantic-event based analysis and segmentation of wedding ceremony videos , 2007, MIR '07.

[2]  M. Ibrahim Sezan,et al.  A semantic event-detection approach and its application to detecting hunts in wildlife vide , 2000, IEEE Trans. Circuits Syst. Video Technol..

[3]  Cumhur Erkut,et al.  Synthesis of Hand Clapping Sounds , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Andreas Girgensohn,et al.  Temporal event clustering for digital photo collections , 2003, ACM Multimedia.

[5]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Chih-Jen Lin,et al.  Working Set Selection Using Second Order Information for Training Support Vector Machines , 2005, J. Mach. Learn. Res..

[7]  Mubarak Shah,et al.  Detection and representation of scenes in videos , 2005, IEEE Transactions on Multimedia.

[8]  Georgios Tziritas,et al.  A speech/music discriminator based on RMS and zero-crossings , 2005, IEEE Transactions on Multimedia.

[9]  Ying Li,et al.  Instructional Video Content Analysis Using Audio Information , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Mubarak Shah,et al.  Automatic Segmentation of Home Videos , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[11]  Min Chen,et al.  Semantic event detection via multimodal data mining , 2006, IEEE Signal Processing Magazine.

[12]  Svetha Venkatesh,et al.  Computational Media Aesthetics: Finding Meaning Beautiful , 2001, IEEE Multim..

[13]  Jianping Fan,et al.  Mining Multilevel Image Semantics via Hierarchical Classification , 2008, IEEE Transactions on Multimedia.

[14]  Paul Wintz,et al.  Digital image processing (2nd ed.) , 1987 .

[15]  Abdesselam Bouzerdoum,et al.  Skin segmentation using color pixel classification: analysis and comparison , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  B. Repp The sound of two hands clapping: an exploratory study. , 1987, The Journal of the Acoustical Society of America.

[17]  Mohan S. Kankanhalli,et al.  Modeling intent for home video repurposing , 2006, IEEE Multimedia.

[18]  Ba Tu Truong,et al.  Determining dramatic intensification via flashing lights in movies , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[19]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[20]  Aya Aner-Wolf,et al.  Video de-abstraction or how to save money on your wedding video , 2002, Sixth IEEE Workshop on Applications of Computer Vision, 2002. (WACV 2002). Proceedings..

[21]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[22]  Wen-Huang Cheng,et al.  Semantic context detection based on hierarchical audio models , 2003, MIR '03.

[23]  Ronald L. Graham,et al.  Concrete mathematics - a foundation for computer science (2. ed.) , 1994 .

[24]  Thomas H. Cormen,et al.  Introduction to algorithms [2nd ed.] , 2001 .

[25]  Tong Zhang,et al.  Content-Based Audio Classification and Retrieval for Audiovisual Data Parsing , 2001 .

[26]  Masanori Sugimoto,et al.  Video summarization using personal photo libraries , 2006, MIR '06.

[27]  Lisl M. Spangenberg Timeless Traditions : A couple's guide to wedding customs around the world , 2001 .

[28]  Alvin W. Drake,et al.  Fundamentals of Applied Probability Theory , 1967 .

[29]  Alexander C. Loui,et al.  Finding structure in home videos by probabilistic hierarchical clustering , 2003, IEEE Trans. Circuits Syst. Video Technol..

[30]  Ronald L. Graham,et al.  Concrete mathematics - a foundation for computer science , 1991 .

[31]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[32]  Chong-Wah Ngo,et al.  Structuring home video by snippet detection and pattern parsing , 2004, MIR '04.

[33]  Joo-Hwee Lim,et al.  Home Photo Content Modeling for Personalized Event-Based Retrieval , 2003, IEEE Multim..

[34]  J.-L. Wu,et al.  Video Adaptation for Small Display Based on Content Recomposition , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[35]  Xian-Sheng Hua,et al.  Automatic time stamp extraction system for home videos , 2002, 2002 IEEE International Symposium on Circuits and Systems. Proceedings (Cat. No.02CH37353).

[36]  Ben Taskar,et al.  Introduction to statistical relational learning , 2007 .

[37]  Rafael C. González,et al.  Local Determination of a Moving Contrast Edge , 1985, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Tao Mei,et al.  Modeling and Mining of Users' Capture Intention for Home Videos , 2007, IEEE Transactions on Multimedia.

[39]  Thomas M. Cover,et al.  Elements of information theory (2. ed.) , 2006 .

[40]  Gregory D. Abowd,et al.  The Family Video Archive: an annotation and browsing environment for home movies , 2003, MIR '03.

[41]  Lie Lu,et al.  Optimization-based automated home video editing system , 2004, IEEE Transactions on Circuits and Systems for Video Technology.