Knowledge based query expansion in complex multimedia event detection

A common approach in content based video information retrieval is to perform automatic shot annotation with semantic labels using pre-trained classifiers. The visual vocabulary of state-of-the-art automatic annotation systems is limited to a few thousand concepts, which creates a semantic gap between the semantic labels and the natural language query. One of the methods to bridge this semantic gap is to expand the original user query using knowledge bases. Both common knowledge bases such as Wikipedia and expert knowledge bases such as a manually created ontology can be used to bridge the semantic gap. Expert knowledge bases have highest performance, but are only available in closed domains. Only in closed domains all necessary information, including structure and disambiguation, can be made available in a knowledge base. Common knowledge bases are often used in open domain, because it covers a lot of general information. In this research, query expansion using common knowledge bases ConceptNet and Wikipedia is compared to an expert description of the topic applied to content-based information retrieval of complex events. We run experiments on the Test Set of TRECVID MED 2014. Results show that 1) Query Expansion can improve performance compared to using no query expansion in the case that the main noun of the query could not be matched to a concept detector; 2) Query expansion using expert knowledge is not necessarily better than query expansion using common knowledge; 3) ConceptNet performs slightly better than Wikipedia; 4) Late fusion can slightly improve performance. To conclude, query expansion has potential in complex event detection.

[1]  Joshua Green,et al.  YouTube: Online Video and Participatory Culture , 2009 .

[2]  Paul Over,et al.  Multimedia retrieval benchmarks , 2004, IEEE MultiMedia.

[3]  Mubarak Shah,et al.  High-level event recognition in unconstrained videos , 2013, International Journal of Multimedia Information Retrieval.

[4]  Catherine Havasi,et al.  Representing General Relational Knowledge in ConceptNet 5 , 2012, LREC.

[5]  Minglun Gong,et al.  Combining conceptual query expansion and visual search results exploration for web image retrieval , 2011, J. Ambient Intell. Humaniz. Comput..

[6]  Nicu Sebe,et al.  Knowledge adaptation for ad hoc multimedia event detection with few exemplars , 2012, ACM Multimedia.

[7]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[8]  Manesh Kokare,et al.  Relevance Feedback in Content Based Image Retrieval: A Review , 2011 .

[9]  Ian H. Witten,et al.  An open-source toolkit for mining Wikipedia , 2013, Artif. Intell..

[10]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[11]  Ramakant Nevatia,et al.  VERL: An Ontology Framework for Representing and Annotating Video Events , 2005, IEEE Multim..

[12]  Masoud Mazloom,et al.  Querying for video events by semantic signatures from few examples , 2013, MM '13.

[13]  Miguel Cazorla,et al.  ImageCLEF 2014: Overview and Analysis of the Results , 2014, CLEF.

[14]  Xinyu Hugo Liu,et al.  Semantic understanding and commonsense reasoning in an adaptive photo agent , 2002 .

[15]  Ricardo Baeza-Yates,et al.  Towards Semantic Search , 2008, NLDB.

[16]  Amit P. Sheth,et al.  Semantics for the Semantic Web: The Implicit, the Formal and the Powerful , 2005, Int. J. Semantic Web Inf. Syst..

[17]  Koen E. A. van de Sande,et al.  Recommendations for video event recognition using concept vocabularies , 2013, ICMR.

[18]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Barry Smith,et al.  Biodynamic ontology: applying BFO in the biomedical domain. , 2004, Studies in health technology and informatics.

[20]  Kewei Tu,et al.  Joint Video and Text Parsing for Understanding Events and Answering Queries , 2013, IEEE MultiMedia.

[21]  R.W.L. Van der Zon A knowledge base approach for semantic interpretation and decomposition in concept based video retrieval , 2014 .

[22]  Monique Thonnat,et al.  A video interpretation platform applied to bank agency monitoring , 2004 .

[23]  Arnold W. M. Smeulders,et al.  Visual-Concept Search Solved? , 2010, Computer.

[24]  Claudio Carpineto,et al.  A Survey of Automatic Query Expansion in Information Retrieval , 2012, CSUR.

[25]  Rada Mihalcea,et al.  Semantic Relatedness Using Salient Semantic Analysis , 2011, AAAI.

[26]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[27]  Dong Liu,et al.  Event-Driven Semantic Concept Discovery by Exploiting Weakly Tagged Internet Images , 2014, ICMR.

[28]  Alberto Del Bimbo,et al.  Semantic annotation and retrieval of video events using multimedia ontologies , 2007, International Conference on Semantic Computing (ICSC 2007).

[29]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[30]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[31]  J. Voß Measuring Wikipedia , 2005 .

[32]  Alberto Del Bimbo,et al.  Event detection and recognition for semantic annotation of video , 2010, Multimedia Tools and Applications.

[33]  John R. Smith,et al.  Large-scale concept ontology for multimedia , 2006, IEEE MultiMedia.

[34]  Manuel Blum,et al.  Verbosity: a game for collecting common-sense facts , 2006, CHI.

[35]  Klamer Schutte,et al.  Event Classification using Concepts , 2013 .

[36]  Dong Liu,et al.  BBN VISER TRECVID 2011 Multimedia Event Detection System , 2011, TRECVID.

[37]  Chong-Wah Ngo,et al.  VIREO-TNO @ TRECVID 2014: Multimedia Event Detection and Recounting (MED and MER) , 2014, TRECVID.

[38]  Ted Pedersen,et al.  WordNet::Similarity - Measuring the Relatedness of Concepts , 2004, NAACL.

[39]  Klamer Schutte,et al.  TNO at TRECVID 2013: Multimedia Event Detection and Instance Search , 2013, TRECVID.

[40]  Gerald Friedland Current Multimedia Data Formats and Semantic Computing: A Practical Example and the Challenges for the Future , 2007 .

[41]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[42]  Dennis N. Ocholla,et al.  Proceedings of ISSI 2007 - 11th International Conference of the International Society for Scientometrics and Informetrics , 2005 .

[43]  ChengXiang Zhai,et al.  Tapping into knowledge base for concept feedback: leveraging conceptnet to improve search results for difficult queries , 2012, WSDM '12.

[44]  Rong Yan,et al.  Semantic concept-based query expansion and re-ranking for multimedia retrieval , 2007, ACM Multimedia.

[45]  Viviana Mascardi,et al.  A Comparison of Upper Ontologies , 2007, WOA.

[46]  Alexander G. Hauptmann,et al.  Successful approaches in the TREC video retrieval evaluations , 2004, MULTIMEDIA '04.

[47]  Rada Mihalcea,et al.  Improving Query Expansion for Image Retrieval via Saliency and Picturability , 2011, CLEF.

[48]  Georges Quénot,et al.  TRECVID 2015 - An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics , 2011, TRECVID.

[49]  Fei Song,et al.  Knowledge-Based Approaches to Query Expansion in Information Retrieval , 1996, Canadian Conference on AI.

[50]  Gerhard Weikum,et al.  YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia: Extended Abstract , 2013, IJCAI.

[51]  Jonathon S. Hare,et al.  Mind the gap: another look at the problem of the semantic gap in image retrieval , 2006, Electronic Imaging.