Semantics reinforcement and fusion learning for multimedia streams

Fusion of multimedia streams for enhanced performance is a critical problem for retrieval. However, fusion performance tends to easily overfit the hillclimb set used to learn fusion rules. In this paper, we perform fusion learning for multimedia streams using a greedy performance driven algorithm. In our fusion learning paradigm, fused output is a linear combination of multiple classifiers or ranked streams. The algorithm is inspired from Ensemble Learning [2] but takes that idea further for improving generalization capability. A key application of our fusion learning algorithm, described in this work, is semantics reinforcement using an ensemble of classifiers built using the same training dataset but groundtruth corresponding to different concepts. We expect that classifiers built for semantically close concepts should reinforce each other's performance and fusion learning is an excellent post-classification way to reinforce semantics and performance. Fusion learning experiments have been performed on TRECVID 2005 test set. Experiments using the well established retrieval effectiveness measure of mean average precision reveal that our proposed algorithm improves over the best classifier (oracle) by 3.8%. We also present and discuss some interesting and intuitive semantic reinforcement trends observed during fusion learning.

[1]  Guodong Guo,et al.  Boosting for fast face recognition , 2001, Proceedings IEEE ICCV Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems.

[2]  John R. Smith,et al.  On the detection of semantic concepts at TRECVID , 2004, MULTIMEDIA '04.

[3]  Stefan Fischer,et al.  Fusion of audio and video information for multi modal person authentication , 1997, Pattern Recognit. Lett..

[4]  Thomas S. Huang,et al.  Factor graph framework for semantic video indexing , 2002, IEEE Trans. Circuits Syst. Video Technol..

[5]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Nicu Sebe,et al.  Boosting contextual information in content-based image retrieval , 2004, MIR '04.

[7]  Kalyanmoy Deb,et al.  A Computationally Efficient Evolutionary Algorithm for Real-Parameter Optimization , 2002, Evolutionary Computation.

[8]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[9]  Alexander G. Hauptmann,et al.  Successful approaches in the TREC video retrieval evaluations , 2004, MULTIMEDIA '04.

[10]  Azriel Rosenfeld,et al.  Detection and location of people in video images using adaptive fusion of color and edge information , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[11]  Carl Eklund,et al.  National Institute for Standards and Technology , 2009, Encyclopedia of Biometrics.

[12]  Paul A. Viola,et al.  Boosting Image Retrieval , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[13]  Chong-Wah Ngo,et al.  Detection of Documentary Scene Changes by Audio-Visual Fusion , 2003, CIVR.

[14]  Rich Caruana,et al.  Ensemble selection from libraries of models , 2004, ICML.

[15]  Nicholas R. Howe,et al.  A Closer Look at Boosted Image Retrieval , 2003, CIVR.

[16]  Kevin W. Bowyer,et al.  Combination of multiple classifiers using local accuracy estimates , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  Mubarak Shah,et al.  Story Segmentation in News Videos Using Visual and Text Cues , 2005, CIVR.

[18]  Ludmila I. Kuncheva,et al.  A Theoretical Study on Six Classifier Fusion Strategies , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Marcus Jerome Pickering,et al.  Video Retrieval by Feature Learning in Key Frames , 2002, CIVR.

[20]  Fabio Roli,et al.  Dynamic classifier selection based on multiple classifier behaviour , 2001, Pattern Recognit..

[21]  Milind R. Naphade,et al.  A Greedy Performance Driven Algorithm for Decision Fusion Learning , 2007, 2007 IEEE International Conference on Image Processing.

[22]  Mingjing Li,et al.  Boosting image orientation detection with indoor vs. outdoor classification , 2002, Sixth IEEE Workshop on Applications of Computer Vision, 2002. (WACV 2002). Proceedings..

[23]  Rong Yan,et al.  Mining Relationship Between Video Concepts using Probabilistic Graphical Models , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[24]  Edward Y. Chang,et al.  Optimal multimodal fusion for multimedia data analysis , 2004, MULTIMEDIA '04.

[25]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Shih-Fu Chang,et al.  Image Retrieval: Current Techniques, Promising Directions, and Open Issues , 1999, J. Vis. Commun. Image Represent..

[27]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[28]  Nicu Sebe,et al.  The State of the Art in Image and Video Retrieval , 2003, CIVR.

[29]  John R. Smith,et al.  IBM Research TRECVID-2009 Video Retrieval System , 2009, TRECVID.