In search of video event semantics

In this thesis we aim to represent an event in a video using semantic features. We start from a bank of concept detectors for representing events in video. At first we considered the relevance of concepts to the event inside the video representation. We address the problem of video event classification using a bank of concept detectors. Different from existing work, which simply relies on a bank containing all available detectors, we propose an algorithm that learns from examples what concepts in bank are most informative per event. Secondly , we concentrated on the accuracy of concept detectors. Different from existing works, which obtain a semantic representation by training concepts over entire video clips, we propose an algorithm that learns a set of relevant frames as the concept prototypes from web video examples, without the need for frame-level annotations, and use them for representing an event video. Thirdly, we consider the problem of searching video events with concepts. We aim at querying web videos for events using only a handful of video query examples, where the standard approach learns a ranker from hundreds of examples. We consider a semantic representation, consisting of off-the-shelf concept detectors, to capture the variance in semantic appearance of events. Finally, we consider the problem of video event search without semantic concepts. The prevailing solutions in literature rely on a semantic video representation obtained from thousands of pre-trained concept detectors. Different from them, we propose a new semantic video representation that is based on freely available social tagged videos only, without the need for training any intermediate concept detectors.

[1]  Cordelia Schmid,et al.  TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[2]  David W. Aha,et al.  A Comparative Evaluation of Sequential Feature Selection Algorithms , 1995, AISTATS.

[3]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[4]  Alberto Del Bimbo,et al.  Event detection and recognition for semantic annotation of video , 2010, Multimedia Tools and Applications.

[5]  Alberto Del Bimbo,et al.  Enriching and localizing semantic tags in internet videos , 2011, ACM Multimedia.

[6]  Colas Schretter,et al.  Information-Theoretic Feature Selection in Microarray Data Using Variable Complementarity , 2008, IEEE Journal of Selected Topics in Signal Processing.

[7]  Alberto Del Bimbo,et al.  Socializing the Semantic Gap , 2015, ACM Comput. Surv..

[8]  Dong Wang,et al.  Video search in concept subspace: a text-like paradigm , 2007, CIVR '07.

[9]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[10]  Dong Liu,et al.  EventNet: A Large Scale Structured Concept Library for Complex Event Detection in Video , 2015, ACM Multimedia.

[11]  Stevan Rudinac,et al.  Leveraging visual concepts and query performance prediction for semantic-theme-based video retrieval , 2012, International Journal of Multimedia Information Retrieval.

[12]  M. Ibrahim Sezan,et al.  A semantic event-detection approach and its application to detecting hunts in wildlife vide , 2000, IEEE Trans. Circuits Syst. Video Technol..

[13]  Cordelia Schmid,et al.  Action and Event Recognition with Fisher Vectors on a Compact Feature Set , 2013, 2013 IEEE International Conference on Computer Vision.

[14]  Marcel Worring,et al.  Concept-Based Video Retrieval , 2009, Found. Trends Inf. Retr..

[15]  Dong Xu,et al.  Video Event Recognition Using Kernel Methods with Multilevel Temporal Alignment , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Andrew W. Fitzgibbon,et al.  Efficient Object Category Recognition Using Classemes , 2010, ECCV.

[17]  Gang Hua,et al.  Semantic Model Vectors for Complex Video Event Recognition , 2012, IEEE Transactions on Multimedia.

[18]  Dong Liu,et al.  Recognizing Complex Events in Videos by Learning Key Static-Dynamic Evidences , 2014, ECCV.

[19]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Rong Yan,et al.  Semantic concept-based query expansion and re-ranking for multimedia retrieval , 2007, ACM Multimedia.

[21]  Dong Liu,et al.  Event-Driven Semantic Concept Discovery by Exploiting Weakly Tagged Internet Images , 2014, ICMR.

[22]  Hao Su,et al.  Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification , 2010, NIPS.

[23]  Teruko Mitamura,et al.  Zero-Example Event Search using MultiModal Pseudo Relevance Feedback , 2014, ICMR.

[24]  Nuno Vasconcelos,et al.  Bridging the Gap: Query by Semantic Example , 2007, IEEE Transactions on Multimedia.

[25]  A. Smeaton,et al.  TRECVID 2013 -- An Overview of the Goals, Tasks, Data, Evaluation Mechanisms, and Metrics | NIST , 2011 .

[26]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Zhigang Ma,et al.  From Concepts to Events: a Progressive Process for Multimedia content Analysis , 2013 .

[28]  Wei Liu,et al.  Double Fusion for Multimedia Event Detection , 2012, MMM.

[29]  Riccardo Leonardi,et al.  Event recognition in sport programs using low-level motion indices , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[30]  Shahram Ebadollahi,et al.  Visual Event Detection using Multi-Dimensional Concept Dynamics , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[31]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[32]  Jack Sklansky,et al.  On Automatic Feature Selection , 1988, Int. J. Pattern Recognit. Artif. Intell..

[33]  Ehud Rivlin,et al.  Understanding Video Events: A Survey of Methods for Automatic Interpretation of Semantic Occurrences in Video , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[34]  Chong-Wah Ngo,et al.  Selection of Concept Detectors for Video Search by Ontology-Enriched Semantic Spaces , 2008, IEEE Transactions on Multimedia.

[35]  John R. Smith,et al.  Large-scale concept ontology for multimedia , 2006, IEEE MultiMedia.

[36]  Jin Zhao,et al.  Video Retrieval Using High Level Features: Exploiting Query Matching and Confidence-Based Weighting , 2006, CIVR.

[37]  Shiguang Shan,et al.  Informedia@TrecVID 2014: MED and MER , 2014 .

[38]  Yi Yang,et al.  Searching Persuasively: Joint Event Detection and Evidence Recounting with Limited Supervision , 2015, ACM Multimedia.

[39]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[40]  James A. Bucklew,et al.  Introduction to Rare Event Simulation , 2010 .

[41]  Xirong Li,et al.  TagBook: A Semantic Video Representation Without Supervision for Event Detection , 2015, IEEE Transactions on Multimedia.

[42]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[43]  B. Li,et al.  Event detection and summarization in sports video , 2001, Proceedings IEEE Workshop on Content-Based Access of Image and Video Libraries (CBAIVL 2001).

[44]  Cees Snoek,et al.  VideoStory: A New Multimedia Embedding for Few-Example Recognition and Translation of Events , 2014, ACM Multimedia.

[45]  Nicu Sebe,et al.  We are not equally negative: fine-grained labeling for multimedia event detection , 2013, ACM Multimedia.

[46]  Dong Liu,et al.  Building A Large Concept Bank for Representing Events in Video , 2014, ArXiv.

[47]  Masoud Mazloom,et al.  Conceptlets: Selective Semantics for Classifying Video Events , 2014, IEEE Transactions on Multimedia.

[48]  Mubarak Shah,et al.  Columbia-UCF TRECVID2010 Multimedia Event Detection: Combining Multiple Modalities, Contextual Concepts, and Temporal Matching , 2010, TRECVID.

[49]  Ming-Syan Chen,et al.  Video Event Detection by Inferring Temporal Instance Labels , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[51]  Alexei A. Efros,et al.  Ensemble of exemplar-SVMs for object detection and beyond , 2011, 2011 International Conference on Computer Vision.

[52]  Dennis Koelma,et al.  Qualcomm Research and University of Amsterdam at TRECVID 2015: Recognizing Concepts, Objects, and Events in Video , 2015, TRECVID.

[53]  Dahua Lin,et al.  Conditional Infomax Learning: An Integrated Framework for Feature Extraction and Fusion , 2006, ECCV.

[54]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[55]  A. G. Amitha Perera,et al.  Multimedia event detection with multimodal feature fusion and temporal concept localization , 2013, Machine Vision and Applications.

[56]  Masoud Mazloom,et al.  On-the-Fly Video Event Search by Semantic Signatures , 2014, ICMR.

[57]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[58]  Hiroshi Motoda,et al.  Computational Methods of Feature Selection , 2022 .

[59]  Florian Metze,et al.  Beyond audio and video retrieval: towards multimedia summarization , 2012, ICMR.

[60]  Yi Yang,et al.  Fast and Accurate Content-based Semantic Search in 100M Internet Videos , 2015, ACM Multimedia.

[61]  Lexing Xie,et al.  Event Mining in Multimedia Streams , 2008, Proceedings of the IEEE.

[62]  Anil K. Jain,et al.  Image retrieval using color and shape , 1996, Pattern Recognit..

[63]  Shuang Wu,et al.  Zero-Shot Event Detection Using Multi-modal Fusion of Weakly Supervised Concepts , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[64]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[65]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[66]  C. Schmid,et al.  Category-Specific Video Summarization , 2014, ECCV.

[67]  Thomas Mensink,et al.  Image Classification with the Fisher Vector: Theory and Practice , 2013, International Journal of Computer Vision.

[68]  Nuno Vasconcelos,et al.  Dynamic Pooling for Complex Event Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[69]  Marcel Worring,et al.  Personalizing automated image annotation using cross-entropy , 2011, ACM Multimedia.

[70]  Andrew Zisserman,et al.  Efficient Additive Kernels via Explicit Feature Maps , 2012, IEEE Trans. Pattern Anal. Mach. Intell..

[71]  Shih-Fu Chang,et al.  Consumer video understanding: a benchmark database and an evaluation of human and machine performance , 2011, ICMR.

[72]  Aaron F. Bobick,et al.  Recognition of Visual Activities and Interactions by Stochastic Parsing , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[73]  Yi Yang,et al.  How Related Exemplars Help Complex Event Detection in Web Videos? , 2013, 2013 IEEE International Conference on Computer Vision.

[74]  Koen E. A. van de Sande,et al.  Recommendations for video event recognition using concept vocabularies , 2013, ICMR.

[75]  Shuang Wu,et al.  Multimodal feature fusion for robust event detection in web videos , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[76]  Pedro Larrañaga,et al.  Feature Subset Selection by Bayesian network-based optimization , 2000, Artif. Intell..

[77]  Lloyd A. Smith,et al.  Feature Selection for Machine Learning: Comparing a Correlation-Based Filter Approach to the Wrapper , 1999, FLAIRS.

[78]  Yu-Gang Jiang,et al.  SUPER: towards real-time event recognition in internet videos , 2012, ICMR.

[79]  Tomás Lozano-Pérez,et al.  A Framework for Multiple-Instance Learning , 1997, NIPS.

[80]  Mubarak Shah,et al.  Recognition of Complex Events: Exploiting Temporal Dynamics between Underlying Concepts , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[81]  W. B. Cameron The Mind of a Mnemonist: A Little Book about a Vast Memory , 1970 .

[82]  Katja Hofmann,et al.  Assessing concept selection for video retrieval , 2008, MIR '08.

[83]  Marcel Worring,et al.  Learning Social Tag Relevance by Neighbor Voting , 2009, IEEE Transactions on Multimedia.

[84]  Cees Snoek,et al.  Composite Concept Discovery for Zero-Shot Video Event Detection , 2014, ICMR.

[85]  Koichi Shinoda,et al.  TokyoTech+Canon at TRECVID 2011 , 2011, TRECVID.

[86]  Dirk P. Kroese,et al.  The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation and Machine Learning , 2004 .

[87]  Andrew Zisserman,et al.  Multiple queries for large scale specific object retrieval , 2012, BMVC.

[88]  Dennis Koelma,et al.  The MediaMill TRECVID 2008 Semantic Video Search Engine , 2008, TRECVID.

[89]  Xirong Li,et al.  Few-Example Video Event Retrieval using Tag Propagation , 2014, ICMR.

[90]  Shie Mannor,et al.  The cross entropy method for classification , 2005, ICML.

[91]  Cees Snoek,et al.  Recommendations for recognizing video events by concept vocabularies , 2014, Comput. Vis. Image Underst..

[92]  Chong-Wah Ngo,et al.  Representations of Keypoint-Based Semantic Concept Detection: A Comprehensive Study , 2010, IEEE Transactions on Multimedia.

[93]  Masoud Mazloom,et al.  Querying for video events by semantic signatures from few examples , 2013, MM '13.

[94]  Ramakant Nevatia,et al.  Evaluating multimedia features and fusion for example-based event detection , 2013, Machine Vision and Applications.

[95]  Hui Cheng,et al.  Video event recognition using concept attributes , 2013, 2013 IEEE Workshop on Applications of Computer Vision (WACV).

[96]  Yi Yang,et al.  Semantic Concept Discovery for Large-Scale Zero-Shot Event Detection , 2015, IJCAI.

[97]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[98]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[99]  Alex Pentland,et al.  Photobook: Content-based manipulation of image databases , 1996, International Journal of Computer Vision.

[100]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[101]  Alberto Del Bimbo,et al.  Video event classification using string kernels , 2010, Multimedia Tools and Applications.

[102]  R. Manmatha,et al.  Syntactic characterization of appearance and its application to image retrieval , 1997, Electronic Imaging.

[103]  Christos Faloutsos,et al.  QBIC project: querying images by content, using color, texture, and shape , 1993, Electronic Imaging.

[104]  Nicu Sebe,et al.  Complex Event Detection via Multi-source Video Attributes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[105]  Ivor W. Tsang,et al.  Visual Event Recognition in Videos by Learning from Web Data , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[106]  Dong Liu,et al.  Encoding Concept Prototypes for Video Event Detection and Summarization , 2015, ICMR.

[107]  Oded Maron,et al.  Learning from Ambiguity , 1998 .

[108]  Mark Sanderson,et al.  Automatic video tagging using content redundancy , 2009, SIGIR.

[109]  Yiannis Kompatsiaris,et al.  High-level event detection in video exploiting discriminant concepts , 2011, 2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI).

[110]  Mubarak Shah,et al.  Recognizing Complex Events Using Large Margin Joint Low-Level Event Model , 2012, ECCV.

[111]  Masoud Mazloom,et al.  Searching informative concept banks for video event detection , 2013, ICMR.

[112]  Rong Yan,et al.  Can High-Level Concepts Fill the Semantic Gap in Video Retrieval? A Case Study With Broadcast News , 2007, IEEE Transactions on Multimedia.

[113]  Cordelia Schmid,et al.  Dense Trajectories and Motion Boundary Descriptors for Action Recognition , 2013, International Journal of Computer Vision.

[114]  R. Manmatha,et al.  Modeling Concept Dependencies for Event Detection , 2014, ICMR.

[115]  Marcel Worring,et al.  Adding Semantics to Detectors for Video Retrieval , 2007, IEEE Transactions on Multimedia.

[116]  David C. Rubin,et al.  Autobiographical Memory , 2019, Encyclopedia of Autism Spectrum Disorders.

[117]  Hui Cheng,et al.  Evaluation of low-level features and their combinations for complex event detection in open source videos , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[118]  Yi Yang,et al.  A discriminative CNN video representation for event detection , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[119]  Ramakant Nevatia,et al.  DISCOVER: Discovering Important Segments for Classification of Video Events and Recounting , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[120]  Cees G. M. Snoek,et al.  Best practices for learning video concept detectors from social media examples , 2014, Multimedia Tools and Applications.

[121]  Milind R. Naphade,et al.  Learning the semantics of multimedia queries and concepts from a small number of examples , 2005, MULTIMEDIA '05.

[122]  Teruko Mitamura,et al.  Multimodal knowledge-based analysis in multimedia event detection , 2012, ICMR '12.

[123]  Noboru Babaguchi,et al.  Event based indexing of broadcasted sports video by intermodal collaboration , 2002, IEEE Trans. Multim..

[124]  Mubarak Shah,et al.  High-level event recognition in unconstrained videos , 2013, International Journal of Multimedia Information Retrieval.

[125]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[126]  Nicu Sebe,et al.  Knowledge adaptation for ad hoc multimedia event detection with few exemplars , 2012, ACM Multimedia.

[127]  Jason J. Corso,et al.  Action bank: A high-level representation of activity in video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.