Building Detectors to Support Searches on Combined Semantic Concepts

Bridging the semantic gap is one of the big challenges in multimedia information retrieval. It exists between the extraction of low-level features of a video and its conceptual contents. In order to understand the conceptual content of a video a common approach is building concept detectors. A problem of this approach is that the number of detectors is impossible to determine. This paper presents a set of 8 methods on how to combine two existing concepts into a new one, which occurs when both concepts appear at the same time. The scores for each shot of a video for the combined concept are computed from the output of the underlying detectors. The findings are evaluated on basis of the output of the 101 detectors including a comparison to the theoretical possibility to train a classifier on each combined concept. The precision gains are significant, specially for methods which also consider the chronological surrounding of a shot promising.

[1]  Rick Kazman,et al.  Supporting the retrieval process in multimedia information systems , 1997, Proceedings of the Thirtieth Hawaii International Conference on System Sciences.

[2]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  James Ze Wang,et al.  Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Dennis Koelma,et al.  The MediaMill TRECVID 2008 Semantic Video Search Engine , 2008, TRECVID.

[5]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[6]  John R. Smith,et al.  On the detection of semantic concepts at TRECVID , 2004, MULTIMEDIA '04.

[7]  Thomas S. Huang,et al.  Relevance feedback in content-based image retrieval: some recent advances , 2002, Inf. Sci..

[8]  Anil K. Jain,et al.  On image classification: city images vs. landscapes , 1998, Pattern Recognit..

[9]  Christian Böhm,et al.  Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases , 2001, CSUR.

[10]  Nello Cristianini,et al.  Large Margin DAGs for Multiclass Classification , 1999, NIPS.

[11]  G. Nemhauser,et al.  Integer Programming , 2020 .

[12]  Anoop Gupta,et al.  Distributed meetings: a meeting capture and broadcasting system , 2002, MULTIMEDIA '02.

[13]  Paul Over,et al.  TRECVID 2005 - An Overview , 2005, TRECVID.

[14]  Hideyuki Tamura,et al.  Textural Features Corresponding to Visual Perception , 1978, IEEE Transactions on Systems, Man, and Cybernetics.

[15]  Jane Hunter,et al.  Dynamic Generation of Intelligent Multimedia Presentations through Semantic Inferencing , 2002, ECDL.

[16]  Ricardo A. Baeza-Yates,et al.  Searching in metric spaces , 2001, CSUR.

[17]  John R. Smith,et al.  Large-scale concept ontology for multimedia , 2006, IEEE MultiMedia.

[18]  Daniel A. Keim,et al.  Using entropy impurity for improved 3D object similarity search , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[19]  Brian Christopher Smith,et al.  Passive capture and structuring of lectures , 1999, MULTIMEDIA '99.

[20]  Marcel Worring,et al.  Multimedia event-based video indexing using time intervals , 2005, IEEE Transactions on Multimedia.

[21]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.

[22]  Rong Yan,et al.  Probabilistic models for combining diverse knowledge sources in multimedia retrieval , 2006 .

[23]  Pavel Zezula,et al.  Similarity Search - The Metric Space Approach , 2005, Advances in Database Systems.

[24]  Gregory D. Abowd,et al.  Automated capture, integration, and visualization of multiple media streams , 1998, Proceedings. IEEE International Conference on Multimedia Computing and Systems (Cat. No.98TB100241).

[25]  Hagen Soltau,et al.  The ISL Meeting Room System , 2001 .

[26]  Daniel A. Keim,et al.  A pivot-based index structure for combination of feature vectors , 2005, SAC '05.

[27]  编程语言 Query by Example , 2010, Encyclopedia of Database Systems.

[28]  Chao Li,et al.  VeXQuery: An XQuery Extension for MPEG-7 Vector-Based Feature Query , 2009, SITIS.

[29]  Denis Lalanne,et al.  Thematic segmentation of meetings through document/speech alignment , 2004, MULTIMEDIA '04.

[30]  BENJAMIN BUSTOS,et al.  Feature-based similarity search in 3D object databases , 2005, CSUR.

[31]  Nicu Sebe,et al.  The State of the Art in Image and Video Retrieval , 2003, CIVR.

[32]  Marcel Worring,et al.  Multimodal Video Indexing : A Review of the State-ofthe-art , 2001 .

[33]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[34]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[35]  Denis Lalanne,et al.  From searching to browsing through multimodal documents linking , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[36]  Hans Burkhardt,et al.  Feature Selection for Automatic Image Annotation , 2006, DAGM-Symposium.

[37]  Andrei Popescu-Belis,et al.  Reference Resolution over a Restricted Domain: References to Documents , 2004 .

[38]  Denis Lalanne,et al.  Thematic alignment of recorded speech with documents , 2003, DocEng '03.

[39]  Marcel Worring,et al.  The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.

[40]  Sharad Mehrotra,et al.  The hybrid tree: an index structure for high dimensional feature spaces , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[41]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[42]  Sakti Pramanik,et al.  The ND-Tree: A Dynamic Indexing Technique for Multidimensional Non-ordered Discrete Data Spaces , 2003, VLDB.

[43]  Benjamin Bustos,et al.  Dynamic similarity search in multi-metric spaces , 2006, MIR '06.