Multimedia Evidence Fusion for Video Concept Detection via OWA Operator

We present a novel multi-modal evidence fusion method for highlevel feature (HLF) detection in videos. The uni-modal features, such as color histogram, transcript texts, etc, tend to capture different aspects of HLFs and hence share complementariness and redundancy in modeling the contents of such HLFs. We argue that such inter-relation are key to effective multi-modal fusion. Here, we formulate the fusion as a multi-criteria group decision making task, in which the uni-modal detectors are coordinated for a consensus final detection decision, based on their inter-relations. Specifically, we mine the complementariness and redundancy inter-relation of uni-modal detectors using the Ordered Weighted Average (OWA) operator. The `or-ness' measure in OWA models the inter-relation of uni-modal detectors as combination of pure complementariness and pure redundancy. The resulting weights of OWA can then yield a consensus fusion, by optimally leveraging the decisions of uni-modal detectors. The experiments on TRECVID 07 dataset show that the proposed OWA aggregation operator can significantly outperform other fusion methods, by achieving a state-of-art MAP of 0.132.

[1]  Dong Xu,et al.  Columbia University TRECVID-2006 Video Search and High-Level Feature Extraction , 2006, TRECVID.

[2]  Chong-Wah Ngo,et al.  Bag-of-visual-words expansion using visual relatedness for video indexing , 2008, SIGIR '08.

[3]  Ronald R. Yager,et al.  On ordered weighted averaging aggregation operators in multicriteria decision-making , 1988 .

[4]  Thierry Marchant Maximal orness Weights with a Fixed Variability for OWA Operators , 2006, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[5]  Duy-Dinh Le,et al.  NII-ISM, Japan at TRECVID 2007: High Level Feature Extraction , 2007, TRECVID.

[6]  Chitra Dorai,et al.  Bridging the semantic gap with computational media aesthetics , 2003, IEEE MultiMedia.

[7]  Hung-Khoon Tan,et al.  Experimenting VIREO-374: Bag-of-Visual-Words and Visual-Based Ontology for Semantic Video Indexing and search , 2007, TRECVID.

[8]  J. Kacprzyk,et al.  The Ordered Weighted Averaging Operators: Theory and Applications , 1997 .

[9]  J. Kacprzyk,et al.  OWA operators in group decision making and consensus reaching under fuzzy preferences and fuzzy majority , 1997 .

[10]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[11]  Meng Wang,et al.  MSRA-USTC-SJTU at TRECVID 2007: High-Level Feature Extraction and Search , 2007, TRECVID.

[12]  Marcel Worring,et al.  The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.

[13]  Ronald R. Yager,et al.  On ordered weighted averaging aggregation operators in multicriteria decisionmaking , 1988, IEEE Trans. Syst. Man Cybern..

[14]  Rong Yan,et al.  Multi-Lingual Broadcast News Retrieval , 2006, TRECVID.

[15]  Robert Fullér,et al.  An Analytic Approach for Obtaining Maximal Entropy Owa Operator Weights , 2001, Fuzzy Sets Syst..

[16]  Stefan M. Rüger,et al.  Information-theoretic semantic multimedia indexing , 2007, CIVR '07.