论文信息 - ITI-CERTH participation to TRECVID 2015

ITI-CERTH participation to TRECVID 2015

Abstract:This paper provides an overview of the tasks submitted to TRECVID 2011 by ITI-CERTH. ITICERTH participated in the Known-item search (KIS) as well as in the Semantic Indexing (SIN) and the Event Detection in Internet Multimedia (MED) tasks. In the SIN task, techniques are developed, which combine motion information with existing well-performing descriptors such as SURF, Random Forests and Bag-of-Words for shot representation. In the MED task, the trained concept detectors of the SIN task are used to represent video sources with model vector sequences, then a dimensionality reduction method is used to derive a discriminant subspace for recognizing events, and, nally, SVMbased event classiers are used to detect the underlying video events. The KIS search task is performed by employing VERGE, which is an interactive retrieval application combining retrieval functionalities in various modalities and exploiting implicit user feedback.

参考文献

[1] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2] Yiannis Kompatsiaris,et al. Video event recounting using mixture subclass discriminant analysis , 2013, 2013 IEEE International Conference on Image Processing.

[3] Marcel Worring,et al. The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.

[4] Yiannis Kompatsiaris,et al. Activity detection using Sequential Statistical Boundary Detection (SSBD) , 2016, Comput. Vis. Image Underst..

[5] Christopher G. Harris,et al. A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[6] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Yiannis Kompatsiaris,et al. Mixture Subclass Discriminant Analysis , 2011, IEEE Signal Processing Letters.

[8] Vasileios Mezaris,et al. Video event detection using generalized subclass discriminant analysis and linear support vector machines , 2014, ICMR.

[9] Ebroul Izquierdo,et al. Knowledge Space of Semantic Inference for Automatic Annotation and retrieval of Multimedia Content - K-Space , 2006, SAMT.

[10] Koen E. A. van de Sande,et al. Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11] Franciska de Jong,et al. Annotation of Heterogeneous Multimedia Content Using Automatic Speech Recognition , 2007, SAMT.

[12] Yiannis Kompatsiaris,et al. Mixture Subclass Discriminant Analysis Link to Restricted Gaussian Model and Other Generalizations , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[13] Zhi-Hua Zhou,et al. ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[14] David Nistér,et al. Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[15] Cordelia Schmid,et al. Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16] Hari Kalva,et al. Accuracy and stability improvement of tomography video signatures , 2010, 2010 IEEE International Conference on Multimedia and Expo.

[17] Christof Monz,et al. The QMUL system description for IWSLT 2010 , 2010, IWSLT.

[18] Antonin Guttman,et al. R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[19] Nicholas W. D. Evans,et al. ALIZE/spkdet: a state-of-the-art open source software for speaker recognition , 2008, Odyssey.

[20] Arnold W. M. Smeulders,et al. Real-Time Visual Concept Classification , 2010, IEEE Transactions on Multimedia.

[21] Ioannis Patras,et al. A Study on the Use of a Binary Local Descriptor and Color Extensions of Local Descriptors for Video Concept Detection , 2015, MMM.

[22] Cordelia Schmid,et al. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[23] Mannes Poel,et al. Multimedia Semantic Syndication for Enhanced News Services (MESH) , 2006 .

[24] George Karypis,et al. A Comparison of Document Clustering Techniques , 2000 .

[25] Yiannis Kompatsiaris,et al. The COST292 experimental framework for TRECVID 2007 , 2007, TRECVID.

[26] Ioannis Patras,et al. Cascade of classifiers based on binary, non-binary and deep convolutional network descriptors for video concept detection , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[27] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .

[28] Yiannis Kompatsiaris,et al. Improving event detection using related videos and relevance degree support vector machines , 2013, MM '13.

[29] Gerald Friedland,et al. Acoustic super models for large scale video event detection , 2011, J-MRE '11.

[30] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[31] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[32] David J. Kriegman,et al. Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[33] Yiannis Kompatsiaris,et al. Video Tomographs and a Base Detector Selection Strategy for Improving Large-Scale Video Concept Detection , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[34] Thomas Hofmann,et al. Probabilistic Latent Semantic Analysis , 1999, UAI.

[35] Yiannis Kompatsiaris,et al. Linear Subclass Support Vector Machines , 2012, IEEE Signal Processing Letters.

[36] Cordelia Schmid,et al. Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37] Yiannis Kompatsiaris,et al. COST292 experimental framework for TRECVID2008 , 2008, TRECVID.

[38] Jean-Luc Gauvain,et al. Modeling northern and southern varieties of dutch for STT , 2009, INTERSPEECH.

[39] Yiannis Kompatsiaris,et al. High-level event detection system based on discriminant visual concepts , 2011, ICMR '11.

[40] Christiane Fellbaum,et al. Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[41] Tao Mei,et al. Online video recommendation based on multimodal fusion and relevance feedback , 2007, CIVR '07.

[42] Marcel Worring,et al. MediaTable: Interactive Categorization of Multimedia Collections , 2010, IEEE Computer Graphics and Applications.

[43] Siddharth Patwardhan,et al. Incorporating Dictionary and Corpus Information into a Context Vector Measure of Semantic Relatednes , 2003 .

[44] GeversTheo,et al. Evaluating Color Descriptors for Object and Scene Recognition , 2010 .

[45] David J. Kriegman,et al. Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[46] Luc Van Gool,et al. Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[47] Thomas G. Dietterich. Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[48] Yiannis Kompatsiaris,et al. MESH participation to TRECVID2008 HLFE , 2008, TRECVID.

[49] Yiannis Kompatsiaris,et al. ITI-CERTH participation in TRECVID 2018 , 2017, TRECVID.

[50] Paul Over,et al. Evaluation campaigns and TRECVid , 2006, MIR '06.

[51] Michael G. Strintzis,et al. A framework for the efficient segmentation of large-format color images , 2002, Proceedings. International Conference on Image Processing.

[52] Yiannis Kompatsiaris,et al. On the use of feature tracks for dynamic concept detection in video , 2010, 2010 IEEE International Conference on Image Processing.

[53] Paul Over,et al. High-level feature detection from video in TRECVid: a 5-year retrospective of achievements , 2009 .

[54] Stéphane Ayache,et al. Video Corpus Annotation Using Active Learning , 2008, ECIR.

[55] Heikki Mannila,et al. Random projection in dimensionality reduction: applications to image and text data , 2001, KDD '01.

[56] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[57] Sergio Escalera,et al. On the Decoding Process in Ternary Error-Correcting Output Codes , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[58] Yiannis Kompatsiaris,et al. High-level event detection in video exploiting discriminant concepts , 2011, 2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI).

[59] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[60] Sergio Escalera,et al. Subclass Problem-Dependent Design for Error-Correcting Output Codes , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[61] Georges Quénot,et al. Re-ranking by local re-scoring for video indexing and retrieval , 2011, CIKM '11.

[62] Yiannis Kompatsiaris,et al. On the Use of Visual Soft Semantics for Video Temporal Decomposition to Scenes , 2010, 2010 IEEE Fourth International Conference on Semantic Computing.

[63] Cordelia Schmid,et al. Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[64] Yiannis Kompatsiaris,et al. Automatic event-based indexing of multimedia content using a joint content-event model , 2010, EiMM '10.

[65] Ioannis Patras,et al. Learning to detect video events from zero or very few video examples , 2015, Image Vis. Comput..

[66] Yiannis Kompatsiaris,et al. Gradual transition detection using color coherence and other criteria in a video shot meta-segmentation framework , 2008, 2008 15th IEEE International Conference on Image Processing.

[67] Boon-Lock Yeo,et al. Segmentation of Video by Clustering and Graph Analysis , 1998, Comput. Vis. Image Underst..

[68] Michael G. Strintzis,et al. Still Image Segmentation Tools For Object-Based Multimedia Applications , 2004, Int. J. Pattern Recognit. Artif. Intell..

[69] Xudong Jiang,et al. Eigenfeature Regularization and Extraction in Face Recognition , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[70] Emily Gallup Fayen,et al. Guidelines for the construction, format, and management of monolingual controlled vocabularies : A revision of ANSI/NISO Z39.19 for the 21st century , 2007 .

[71] Cor J. Veenman,et al. Visual Word Ambiguity , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[72] G LoweDavid,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[73] Ioannis Patras,et al. Local Features and a Two-Layer Stacking Architecture for Semantic Concept Detection in Video , 2015, IEEE Transactions on Emerging Topics in Computing.

[74] Andrew Zisserman,et al. Image Classification using Random Forests and Ferns , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[75] G. McLachlan,et al. The EM algorithm and extensions , 1996 .

[76] Andrew Zisserman,et al. The devil is in the details: an evaluation of recent feature encoding methods , 2011, BMVC.

[77] I. Jolliffe. Principal Component Analysis , 2002 .

[78] Yiannis Kompatsiaris,et al. ITI-CERTH participation to TRECVID 2009 HLFE and Search , 2009, TRECVID.

[79] Aleix M. Martínez,et al. Subclass discriminant analysis , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[80] Gary R. Bradski,et al. ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[81] Evgeniy Gabrilovich,et al. Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[82] Cordelia Schmid,et al. Action recognition by dense trajectories , 2011, CVPR 2011.

[83] Ioannis Patras,et al. Ordering of Visual Descriptors in a Classifier Cascade Towards Improved Video Concept Detection , 2016, MMM.

[84] Cordelia Schmid,et al. Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[85] Jiri Matas,et al. On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[86] John R. Smith,et al. Multimedia semantic indexing using model vectors , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[87] Vasileios Mezaris,et al. GPU Accelerated Generalised Subclass Discriminant Analysis for Event and Concept Detection in Video , 2015, ACM Multimedia.

[88] Andrew Zisserman,et al. Multiple queries for large scale specific object retrieval , 2012, BMVC.

[89] Dennis Koelma,et al. The MediaMill TRECVID 2008 Semantic Video Search Engine , 2008, TRECVID.

[90] Yiannis Kompatsiaris,et al. K-Space at TRECvid 2006 , 2006, TRECVID.

[91] Georges Quénot,et al. TRECVID 2015 - An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics , 2011, TRECVID.

[92] Jean-Luc Gauvain,et al. The LIMSI Broadcast News transcription system , 2002, Speech Commun..

[93] Yiannis Kompatsiaris,et al. A Comparative Study on the Use of Multi-label Classification Techniques for Concept-Based Video Indexing and Annotation , 2014, MMM.

[94] Paul W. Munro,et al. Improving Committee Diagnosis with Resampling Techniques , 1995, NIPS.

[95] Li Xu,et al. Hierarchical Saliency Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[96] Yiannis Kompatsiaris,et al. Temporal Video Segmentation to Scenes Using High-Level Audiovisual Features , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[97] Fengxi Song,et al. Feature Selection Based on Linear Discriminant Analysis , 2010, 2010 International Conference on Intelligent System Design and Engineering Application.

[98] Gang Hua,et al. Semantic Model Vectors for Complex Video Event Recognition , 2012, IEEE Transactions on Multimedia.

引用

Concise Preservation by combining Managed Forgetting and Contextualized Remembering Fact Sheet Project information Project description Digital

2013

Multimodal Fusion: Combining Visual and Textual Cues for Concept Detection in Video

Multimedia Data Mining and Analytics

2015

Large-scale video event classification using dynamic temporal pyramid matching of visual semantics

2013 IEEE International Conference on Image Processing

2013

ITI-CERTH participation in TRECVID 2018

TRECVID

2017

MULTISENSOR: Development of multimedia content integration technologies for journalism, media monitoring and international exporting decision support

2015 IEEE International Conference on Multimedia & Expo Workshops (ICMEW)

2015

A multimedia interactive search engine based on graph-based and non-linear multimodal fusion

2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI)

2016

MULTISENSOR Mining and Understanding of multilinguaL contenT for Intelligent Sentiment Enriched coNtext and Social Oriented inteRpretation FP 7-610411 D 2 . 2 Basic techniques for speech recognition , text analysis and concept detection

2014

Tree-Augmented Cross-Modal Encoding for Complex-Query Video Retrieval

SIGIR

2020

The MULTISENSOR Project - Development of Multimedia Content Integration Technologies for Journalism, Media Monitoring and International Exporting Decision Support

MMDA@ECAI

2016

Event modelling and recognition in video

2013

SEA: Sentence Encoder Assembly for Video Retrieval by Textual Queries

IEEE Transactions on Multimedia

2020

Event-based media processing and analysis: A survey of the literature

Image Vis. Comput.

2016

Migration-Related Semantic Concepts for the Retrieval of Relevant Video Content

2020

ITI-CERTH participation to TRECVID 2015

Concise Preservation by combining Managed Forgetting and Contextualized Remembering Fact Sheet Project information Project description Digital

Multimodal Fusion: Combining Visual and Textual Cues for Concept Detection in Video

Large-scale video event classification using dynamic temporal pyramid matching of visual semantics

ITI-CERTH participation in TRECVID 2018

MULTISENSOR: Development of multimedia content integration technologies for journalism, media monitoring and international exporting decision support

A multimedia interactive search engine based on graph-based and non-linear multimodal fusion

Query and Keyframe Representations for Ad-hoc Video Search

VERGE: A Multimodal Interactive Video Search Engine

VERGE: An Interactive Search Engine for Browsing Video Collections

VERGE in VBS 2017

Local Invariant Feature Tracks for high-level video feature extraction

Dual Encoding for Zero-Example Video Retrieval

Hybrid Space Learning for Language-based Video Retrieval

MULTISENSOR Mining and Understanding of multilinguaL contenT for Intelligent Sentiment Enriched coNtext and Social Oriented inteRpretation FP 7-610411 D 2 . 2 Basic techniques for speech recognition , text analysis and concept detection

Tree-Augmented Cross-Modal Encoding for Complex-Query Video Retrieval

The MULTISENSOR Project - Development of Multimedia Content Integration Technologies for Journalism, Media Monitoring and International Exporting Decision Support

Event modelling and recognition in video

SEA: Sentence Encoder Assembly for Video Retrieval by Textual Queries

Event-based media processing and analysis: A survey of the literature

Migration-Related Semantic Concepts for the Retrieval of Relevant Video Content