Utilisation du contexte pour l'indexation sémantique des images et vidéos. (Using context for semantic indexing of images and videos)

L'indexation automatisee des documents image fixe et video est un probleme difficile en raison de la ``distance'' existant entre les tableaux de nombres codant ces documents et les concepts avec lesquels on souhaite les annoter (personnes, lieux, evenements ou objets, par exemple). Des methodes existent pour cela mais leurs resultats sont loin d'etre satisfaisants en termes de generalite et de precision. Elles utilisent en general un ensemble unique de tels exemples et le considere d'une maniere uniforme. Ceci n'est pas optimal car un meme concept peut apparaitre dans des contextes tres divers et son apparence peut etre tres differente en fonction de ces contextes. Dans le cadre de cette these, nous avons considere l'utilisation du contexte pour l'indexation des documents multimedia. Le contexte a largement ete utilise dans l'etat de l'art pour traiter diverses problematiques. Dans notre travail, nous retenons les relations entre les concepts comme source de contexte semantique. Pour le cas des videos, nous exploitons le contexte temporel qui modelise les relations entre les plans d'une meme video. Nous proposons plusieurs approches utilisant les deux types de contexte ainsi que leur combinaison, dans differents niveaux d'un systeme d'indexation. Nous presentons egalement le probleme de detection simultanee de groupes de concepts que nous jugeons lie a la problematique de l'utilisation du contexte. Nous considerons que la detection d'un groupe de concepts revient a detecter un ou plusieurs concepts formant le groupe dans un contexte ou les autres sont presents. Nous avons etudie et compare pour cela deux categories d'approches. Toutes nos propositions sont generiques et peuvent etre appliquees a n'importe quel systeme pour la detection de n'importe quel concept. Nous avons evalue nos contributions sur les collections de donnees TRECVid et VOC, qui sont des standards internationaux et reconnues par la communaute. Nous avons obtenu de bons resultats, comparables a ceux des meilleurs systemes d'indexation evalues ces dernieres annees dans les compagnes d'evaluation precedemment citees.

[1]  Geoffrey E. Hinton Reducing the Dimensionality of Data with Neural , 2008 .

[2]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[3]  Frédéric Jurie,et al.  Latent mixture vocabularies for object categorization and segmentation , 2006, Image Vis. Comput..

[4]  Anil K. Jain,et al.  Unsupervised texture segmentation using Gabor filters , 1990, 1990 IEEE International Conference on Systems, Man, and Cybernetics Conference Proceedings.

[5]  Georges Quénot,et al.  TRECVID 2015 - An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics , 2011, TRECVID.

[6]  Jon Atli Benediktsson,et al.  Classification of Hyperspectral Images by Using Extended Morphological Attribute Profiles and Independent Component Analysis , 2011, IEEE Geoscience and Remote Sensing Letters.

[7]  S. Sclaroff,et al.  Combining textual and visual cues for content-based image retrieval on the World Wide Web , 1998, Proceedings. IEEE Workshop on Content-Based Access of Image and Video Libraries (Cat. No.98EX173).

[8]  John C. Russ,et al.  Image Processing Handbook, Fourth Edition , 2002 .

[9]  Bogdan Trawinski,et al.  Comparison of Bagging, Boosting and Stacking Ensembles Applied to Real Estate Appraisal , 2010, ACIIDS.

[10]  John S. Boreczky,et al.  Comparison of video shot boundary detection techniques , 1996, Electronic Imaging.

[11]  Karin Ackermann,et al.  Categories and Concepts , 2003, Job 28. Cognition in Context.

[12]  Alexei A. Efros,et al.  Putting Objects in Perspective , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[13]  Marcel Worring,et al.  Adding Semantics to Detectors for Video Retrieval , 2007, IEEE Transactions on Multimedia.

[14]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[15]  M. Ibrahim Sezan,et al.  A semantic event-detection approach and its application to detecting hunts in wildlife vide , 2000, IEEE Trans. Circuits Syst. Video Technol..

[16]  Nando de Freitas,et al.  A Statistical Model for General Contextual Object Recognition , 2004, ECCV.

[17]  B. S. Manjunath,et al.  Color and texture descriptors , 2001, IEEE Trans. Circuits Syst. Video Technol..

[18]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[19]  Dharmendra Singh,et al.  An assessment of independent component analysis for detection of military targets from hyperspectral images , 2011, Int. J. Appl. Earth Obs. Geoinformation.

[20]  Yueting Zhuang,et al.  Adaptive key frame extraction using unsupervised clustering , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[21]  Cordelia Schmid,et al.  Dataset Issues in Object Recognition , 2006, Toward Category-Level Object Recognition.

[22]  J. Aldrich R.A. Fisher and the making of maximum likelihood 1912-1922 , 1997 .

[23]  Tsuhan Chen,et al.  Estimating age, gender, and identity using first name priors , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Dong Xu,et al.  Columbia University TRECVID-2006 Video Search and High-Level Feature Extraction , 2006, TRECVID.

[25]  David Haussler,et al.  A Discriminative Framework for Detecting Remote Protein Homologies , 2000, J. Comput. Biol..

[26]  Zhu Liu,et al.  Integration of multimodal features for video scene classification based on HMM , 1999, 1999 IEEE Third Workshop on Multimedia Signal Processing (Cat. No.99TH8451).

[27]  Trevor Hastie,et al.  Multi-class AdaBoost ∗ , 2009 .

[28]  Rachid Benmokhtar Fusion multi-niveaux pour l'indexation et la recherche multimédia par le contenu sémantique , 2009 .

[29]  Roger Mohr,et al.  A probabilistic framework of selecting effective key frames for video browsing and indexing , 2000 .

[30]  S. Yoo,et al.  Support Vector Machine Based Arrhythmia Classification Using Reduced Features , 2005 .

[31]  Luca Maria Gambardella,et al.  Flexible, High Performance Convolutional Neural Networks for Image Classification , 2011, IJCAI.

[32]  Georges Quénot,et al.  Conceptual feedback for semantic multimedia indexing , 2013, 2013 11th International Workshop on Content-Based Multimedia Indexing (CBMI).

[33]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[34]  Yi Wu,et al.  Ontology-based multi-classification learning for video concept detection , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[35]  Yong Man Ro Matching pursuit: contents featuring for image indexing , 1998, Other Conferences.

[36]  Michael J. Witbrock,et al.  Searching for Common Sense: Populating Cyc™ from the Web , 2005, AAAI.

[37]  Antonio Torralba,et al.  Contextual Priming for Object Detection , 2003, International Journal of Computer Vision.

[38]  Edward Y. Chang,et al.  Optimal multimodal fusion for multimedia data analysis , 2004, MULTIMEDIA '04.

[39]  Wilson S. Geisler,et al.  Multichannel Texture Analysis Using Localized Spatial Filters , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[40]  Koen E. A. van de Sande,et al.  Evaluation of color descriptors for object and scene recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Jianping Fan,et al.  Hierarchical classification for automatic image annotation , 2007, SIGIR.

[42]  Thomas S. Huang,et al.  Factor graph framework for semantic video indexing , 2002, IEEE Trans. Circuits Syst. Video Technol..

[43]  Georges Quénot,et al.  Extended conceptual feedback for semantic multimedia indexing , 2014, Multimedia Tools and Applications.

[44]  Takenobu Tokunaga,et al.  Hierarchical Bayesian Clustering for Automatic Text Classification , 1995, IJCAI.

[45]  Lilly Suriani Affendey,et al.  Developing context model supporting spatial relations for semantic video retrieval , 2010, 2010 International Conference on Information Retrieval & Knowledge Management (CAMP).

[46]  E. Yalow On Educational psychology: A cognitive view. , 1979 .

[47]  M. S. Drew,et al.  Color constancy - Generalized diagonal transforms suffice , 1994 .

[48]  Bernard. Merialdo,et al.  Eurecom at TRECVID 2009 High-Level Feature Extraction , 2009, TRECVID.

[49]  Tao Mei,et al.  Correlative multi-label video annotation , 2007, ACM Multimedia.

[50]  Alexei A. Efros,et al.  Recovering Surface Layout from an Image , 2007, International Journal of Computer Vision.

[51]  Marcel Worring,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Harvesting Social Images for Bi-Concept Search , 2022 .

[52]  Yung-Yu Chuang,et al.  Cross-Domain Multicue Fusion for Concept-Based Video Indexing , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  Kenneth I. Laws,et al.  Rapid Texture Identification , 1980, Optics & Photonics.

[54]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[55]  Rich Caruana,et al.  Predicting good probabilities with supervised learning , 2005, ICML.

[56]  Luc Van Gool,et al.  Affine/ Photometric Invariants for Planar Intensity Patterns , 1996, ECCV.

[57]  W. Bruce Croft,et al.  Deriving concept hierarchies from text , 1999, SIGIR '99.

[58]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[59]  Alice Caplier,et al.  Using Human Visual System modeling for bio-inspired low level image processing , 2010, Comput. Vis. Image Underst..

[60]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[61]  Alexei A. Efros,et al.  Recovering Occlusion Boundaries from an Image , 2011, International Journal of Computer Vision.

[62]  Daniel Gatica-Perez,et al.  On image auto-annotation with latent space models , 2003, ACM Multimedia.

[63]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[64]  John R. Smith,et al.  Multimedia semantic indexing using model vectors , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[65]  John R. Smith,et al.  Comparing texture feature sets for retrieving core images in petroleum applications , 1998, Electronic Imaging.

[66]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[67]  Rong Yan,et al.  The combination limit in multimedia retrieval , 2003, MULTIMEDIA '03.

[68]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[69]  Xiangyang Xue,et al.  Semantic video indexing by fusing explicit and implicit context spaces , 2010, ACM Multimedia.

[70]  Anind K. Dey,et al.  Understanding and Using Context , 2001, Personal and Ubiquitous Computing.

[71]  Antonio Torralba,et al.  Object Recognition by Scene Alignment , 2007, NIPS.

[72]  Georges Quénot,et al.  Evaluations of multi-learner approaches for concept indexing in video documents , 2010, RIAO.

[73]  B. S. Manjunath,et al.  NeTra-V: toward an object-based video representation , 1997, Electronic Imaging.

[74]  Lior Wolf,et al.  A Critical View of Context , 2006, International Journal of Computer Vision.

[75]  Jun Yang,et al.  (Un)Reliability of video concept detection , 2008, CIVR '08.

[76]  S. Nayar,et al.  Vision and the Atmosphere , 2002, International Journal of Computer Vision.

[77]  K. Wakimoto,et al.  Efficient and Effective Querying by Image Content , 1994 .

[78]  Djoerd Hiemstra,et al.  A probabilistic ranking framework using unobservable binary events for video search , 2008, CIVR '08.

[79]  T.S. Huang,et al.  Recognizing high-level audio-visual concepts using context , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[80]  Alex Pentland,et al.  Photobook: Content-based manipulation of image databases , 1996, International Journal of Computer Vision.

[81]  Mbarek Charhad Modèles de documents vidéos basés sur le formalisme des graphes conceptuels pour l'indexation et la recherche par le contenu sémantique , 2005 .

[82]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[83]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[84]  Christos Faloutsos,et al.  QBIC project: querying images by content, using color, texture, and shape , 1993, Electronic Imaging.

[85]  Hairong Qi,et al.  Hybrid Dimensionality Reduction Method Based on Support Vector Machine and Independent Component Analysis , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[86]  Georges Quénot,et al.  Quaero at TRECVID 2011: Semantic Indexing and Multimedia Event Detection , 2011, TRECVID.

[87]  Josef Kittler,et al.  Efficient and Robust Retrieval by Shape Content through Curvature Scale Space , 1998, Image Databases and Multi-Media Search.

[88]  Dennis Koelma,et al.  The MediaMill TRECVID 2008 Semantic Video Search Engine , 2008, TRECVID.

[89]  Alexander G. Hauptmann,et al.  LSCOM Lexicon Definitions and Annotations (Version 1.0) , 2006 .

[90]  William J. Welch,et al.  Efficient, adaptive cross-validation for tuning and comparing models, with application to drug discovery , 2011 .

[91]  Yung-Yu Chuang,et al.  Multi-cue fusion for semantic video indexing , 2008, ACM Multimedia.

[92]  Gaël Richard,et al.  Robust visual features for the multimodal identification of unregistered speakers in TV talk-shows , 2010, 2010 IEEE International Conference on Image Processing.

[93]  Nikos Fakotakis,et al.  Comparative Evaluation of Various MFCC Implementations on the Speaker Verification Task , 2007 .

[94]  Michael J. Witbrock,et al.  Story segmentation and detection of commercials in broadcast news video , 1998, Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries -ADL'98-.

[95]  Ronan Collobert,et al.  Recurrent Convolutional Neural Networks for Scene Labeling , 2014, ICML.

[96]  José San Pedro,et al.  Network-aware identification of video clip fragments , 2007, CIVR '07.

[97]  Jefferson Provost,et al.  Na ive-Bayes vs. Rule-Learning in Classification of Email , 1999 .

[98]  Brian V. Funt,et al.  Color Angular Indexing , 1996, ECCV.

[99]  Antonio Criminisi,et al.  TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context , 2007, International Journal of Computer Vision.

[100]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[101]  Mahesan Niranjan,et al.  Data-dependent kernels in svm classification of speech patterns , 2000, INTERSPEECH.

[102]  Georges Quénot,et al.  Annotation of still images by multiple visual concepts , 2014, 2014 12th International Workshop on Content-Based Multimedia Indexing (CBMI).

[103]  Alexei A. Efros,et al.  What Do the Sun and the Sky Tell Us About the Camera? , 2010, International Journal of Computer Vision.

[104]  Xian-Sheng Hua,et al.  Two-Dimensional Active Learning for image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[105]  Milind R. Naphade On supervision and statistical learning for semantic multimedia analysis , 2004, J. Vis. Commun. Image Represent..

[106]  Georges Quénot,et al.  Infrequent concept pairs detection in multimedia documents , 2014, ICMR.

[107]  C.-C. Jay Kuo,et al.  Content-based classification and retrieval of audio , 1998, Optics & Photonics.

[108]  Shai Fine,et al.  A hybrid GMM/SVM approach to speaker identification , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[109]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[110]  Guojun Lu,et al.  Evaluation of MPEG-7 shape descriptors against other shape descriptors , 2003, Multimedia Systems.

[111]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[112]  Gang Wang,et al.  Joint learning of visual attributes, object classes and visual saliency , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[113]  Patrick Brézillon,et al.  Context in problem solving: a survey , 1999, The Knowledge Engineering Review.

[114]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[115]  Bo Zhang,et al.  Exploiting spatial context constraints for automatic image region annotation , 2007, ACM Multimedia.

[116]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[117]  Heng-Da Cheng,et al.  Effective image retrieval using dominant color descriptor and fuzzy support vector machine , 2009, Pattern Recognit..

[118]  Ishwar K. Sethi,et al.  Mining association rules between low-level image features and high-level concepts , 2001, SPIE Defense + Commercial Sensing.

[119]  Lucien Wald,et al.  Data fusion : a conceptual approach for an efficient exploitation of remote sensing images , 1998 .

[120]  Mark Girolami,et al.  Document Classification Employing the Fisher Kernel Derived from Probabilistic Hierarchic Corpus Rep , 2001 .

[121]  Bernt Schiele,et al.  Using Local Context To Improve Face Detection , 2003, BMVC.

[122]  Dirk Van,et al.  Ensemble Methods: Foundations and Algorithms , 2012 .

[123]  Bianca Zadrozny,et al.  Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers , 2001, ICML.

[124]  Stéphane Ayache,et al.  Classifier Fusion for SVM-Based Multimedia Semantic Indexing , 2007, ECIR.

[125]  Georges Quénot,et al.  Quaero at TRECVID 2012: Semantic Indexing , 2012, TRECVID.

[126]  Terry Winograd,et al.  Architectures for Context , 2001, Hum. Comput. Interact..

[127]  Thomas Gärtner,et al.  A survey of kernels for structured data , 2003, SKDD.

[128]  Shih-Fu Chang,et al.  A reranking approach for context-based concept fusion in video indexing and retrieval , 2007, CIVR '07.

[129]  Hideyuki Tamura,et al.  Textural Features Corresponding to Visual Perception , 1978, IEEE Transactions on Systems, Man, and Cybernetics.

[130]  Patrick Haffner,et al.  Support vector machines for histogram-based image classification , 1999, IEEE Trans. Neural Networks.

[131]  Emine Yilmaz,et al.  Estimating average precision with incomplete and imperfect judgments , 2006, CIKM '06.

[132]  Yann LeCun,et al.  Convolutional networks and applications in vision , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[133]  Milind R. Naphade,et al.  Detecting semantic concepts using context and audiovisual features , 2001, Proceedings IEEE Workshop on Detection and Recognition of Events in Video.

[134]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[135]  Georges Quénot,et al.  Quaero at TRECVID 2013: Semantic Indexing and Instance Search , 2013 .

[136]  Arun Ross,et al.  Score normalization in multimodal biometric systems , 2005, Pattern Recognit..

[137]  Thomas S. Huang,et al.  CBIR: from low-level features to high-level semantics , 2000, Electronic Imaging.

[138]  Mubarak Shah,et al.  Improving Semantic Concept Detection and Retrieval using Contextual Estimates , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[139]  Christian Jutten,et al.  Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture , 1991, Signal Process..

[140]  Sunitha Abburu Context Ontology Construction For Cricket Video , 2010 .

[141]  Serge J. Belongie,et al.  Context based object categorization: A critical survey , 2010, Comput. Vis. Image Underst..

[142]  Wei-Hao Lin,et al.  Confounded Expectations: Informedia at TRECVID 2004 , 2004, TRECVID.

[143]  Luca Maria Gambardella,et al.  Max-pooling convolutional neural networks for vision-based hand gesture recognition , 2011, 2011 IEEE International Conference on Signal and Image Processing Applications (ICSIPA).

[144]  Alexei A. Efros,et al.  IM2GPS: estimating geographic information from a single image , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[145]  Ramin Zabih,et al.  Comparing images using color coherence vectors , 1997, MULTIMEDIA '96.

[146]  Abdelkader Hamadi,et al.  Reclassement sémantique pour l'indexation de documents multimédia , 2013, CORIA.

[147]  Emanuele Trucco,et al.  Geometric Invariance in Computer Vision , 1995 .

[148]  Dan Schonfeld,et al.  Segmented trajectory based indexing and retrieval of video data , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[149]  Carman Neustaedter,et al.  Image annotation using personal calendars as context , 2008, ACM Multimedia.

[150]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[151]  J. Leeuw,et al.  Isotone Optimization in R: Pool-Adjacent-Violators Algorithm (PAVA) and Active Set Methods , 2009 .

[152]  Pierre Tirilly,et al.  Language modeling for bag-of-visual words image categorization , 2008, CIVR '08.

[153]  Fang Liu,et al.  Periodicity, Directionality, and Randomness: Wold Features for Image Modeling and Retrieval , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[154]  Daniel Gatica-Perez,et al.  PLSA-based image auto-annotation: constraining the latent space , 2004, MULTIMEDIA '04.

[155]  Yi-Hsuan Yang,et al.  Video search reranking via online ordinal reranking , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[156]  Thomas R. Gruber,et al.  Toward principles for the design of ontologies used for knowledge sharing? , 1995, Int. J. Hum. Comput. Stud..

[157]  Trevor Darrell,et al.  Nearest-Neighbor Methods in Learning and Vision , 2008, IEEE Trans. Neural Networks.

[158]  Georges Quénot,et al.  Descriptor optimization for multimedia indexing and retrieval , 2013, Multimedia Tools and Applications.

[159]  B. S. Manjunath,et al.  NeTra: A toolbox for navigating large image databases , 1997, Multimedia Systems.

[160]  Xian-Sheng Hua,et al.  Image Classification With Kernelized Spatial-Context , 2010, IEEE Transactions on Multimedia.

[161]  Wen Gao,et al.  Video indexing by motion activity maps , 2002, Proceedings. International Conference on Image Processing.

[162]  Wen Gao,et al.  Sequence Multi-Labeling: A Unified Video Annotation Scheme With Spatial and Temporal Context , 2010, IEEE Transactions on Multimedia.

[163]  David Dagan Feng,et al.  Improving News Video Annotation with Semantic Context , 2010, 2010 International Conference on Digital Image Computing: Techniques and Applications.

[164]  David P. Ausubel,et al.  The Acquisition and Retention of Knowledge: A Cognitive View , 2000 .

[165]  Takeo Kanade,et al.  Intelligent Access to Digital Video: Informedia Project , 1996, Computer.

[166]  Matthieu Cord,et al.  SALSAS: Sub-linear active learning strategy with approximate k-NN search , 2011, Pattern Recognit..

[167]  B. S. Manjunath,et al.  Texture features and learning similarity , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[168]  KanadeTakeo,et al.  Intelligent Access to Digital Video , 1996 .

[169]  Lie Lu,et al.  A robust audio classification and segmentation method , 2001, MULTIMEDIA '01.

[170]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[171]  Gunnar Rätsch,et al.  A General and Efficient Multiple Kernel Learning Algorithm , 2005, NIPS.

[172]  HongJiang Zhang,et al.  Scheme for visual feature-based image indexing , 1995, Electronic Imaging.

[173]  Georges Quénot,et al.  Hierarchical Late Fusion for Concept Detection in Videos , 2012, ECCV Workshops.

[174]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[175]  Wei Xia,et al.  Independent Component Analysis for Blind Unmixing of Hyperspectral Imagery With Additional Constraints , 2011, IEEE Transactions on Geoscience and Remote Sensing.

[176]  James Dowe,et al.  Content-based retrieval in multimedia imaging , 1993, Electronic Imaging.

[177]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[178]  Glenn Healey,et al.  The Illumination-Invariant Recognition of 3D Objects Using Local Color Invariants , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[179]  Charles Elkan,et al.  Expectation Maximization Algorithm , 2010, Encyclopedia of Machine Learning.

[180]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[181]  Georges Quénot,et al.  Annotation de vidéos par paires rares de concepts , 2014, CORIA-CIFED.

[182]  Koen E. A. van de Sande,et al.  A comparison of color features for visual concept classification , 2008, CIVR '08.

[183]  Patrick Pérez,et al.  Rapid Summarisation and Browsing of Video Sequences , 2002, BMVC.

[184]  Paul Over,et al.  High-level feature detection from video in TRECVid: a 5-year retrospective of achievements , 2009 .

[185]  R. Brunelli,et al.  A Survey on the Automatic Indexing of Video Data, , 1999, J. Vis. Commun. Image Represent..

[186]  Bernard Mérialdo,et al.  Saliency moments for image categorization , 2011, ICMR.

[187]  Georges Quénot,et al.  Re-ranking for Multimedia Indexing and Retrieval , 2011, ECIR.

[188]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[189]  Georges Quénot,et al.  Re-ranking by local re-scoring for video indexing and retrieval , 2011, CIKM '11.

[190]  Alan Mink,et al.  Multimodal biometrics: issues in design and testing , 2003, ICMI '03.

[191]  B. S. Manjunath,et al.  Texture Features for Browsing and Retrieval of Image Data , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[192]  Arthur V. Forman,et al.  Contextual Analysis Of Tactical Scenes , 1984, Other Conferences.

[193]  Anil K. Jain,et al.  Texture Analysis , 2018, Handbook of Image Processing and Computer Vision.

[194]  Michael J. Swain,et al.  The capacity of color histogram indexing , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[195]  Amarnath Gupta,et al.  Virage image search engine: an open framework for image management , 1996, Electronic Imaging.

[196]  Dong Wang,et al.  Video search in concept subspace: a text-like paradigm , 2007, CIVR '07.

[197]  B. N. Chatterji,et al.  An FFT-based technique for translation, rotation, and scale-invariant image registration , 1996, IEEE Trans. Image Process..

[198]  Dan Schonfeld,et al.  Real-Time Motion Trajectory-Based Indexing and Retrieval of Video Sequences , 2007, IEEE Transactions on Multimedia.

[199]  Raymond J. Devettere,et al.  Human Understanding. Volume I: The Collective Use and Evolution of Concepts , 1973 .

[200]  Andrea Vedaldi,et al.  Objects in Context , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[201]  Erwin M. Bakker,et al.  Semantic Video Retrieval Using Audio Analysis , 2002, CIVR.

[202]  Martin A. Fischler,et al.  The Representation and Matching of Pictorial Structures , 1973, IEEE Transactions on Computers.

[203]  Shih-Fu Chang,et al.  A fully automated content-based video search engine supporting spatiotemporal queries , 1998, IEEE Trans. Circuits Syst. Video Technol..

[204]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[205]  Aleksandra Mojsilovic,et al.  Capturing image semantics with low-level descriptors , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[206]  Tony Veale,et al.  An Intrinsic Information Content Metric for Semantic Similarity in WordNet , 2004, ECAI.

[207]  Georges Quénot,et al.  Two-layers re-ranking approach based on contextual information for visual concepts detection in videos , 2012, 2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI).

[208]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[209]  M. Langlois,et al.  Society of Photo-Optical Instrumentation Engineers , 2005 .

[210]  Avideh Zakhor,et al.  A Trajectory Based Video Indexing System For Street Surveillance , 1999 .

[211]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[212]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[213]  C A Nelson,et al.  Learning to Learn , 2017, Encyclopedia of Machine Learning and Data Mining.

[214]  Patrick Gallinari,et al.  Improving Image Annotation in Imbalanced Classification Problems with Ranking SVM , 2009, CLEF.

[215]  Antonio Torralba,et al.  SIFT Flow: Dense Correspondence across Different Scenes , 2008, ECCV.

[216]  Yue Gao,et al.  Attribute-augmented semantic hierarchy: towards bridging semantic gap and intention gap in image retrieval , 2013, ACM Multimedia.

[217]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[218]  Chuohao Yeo,et al.  Multi-modal speaker diarization of real-world meetings using compressed-domain video features , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[219]  Cordelia Schmid,et al.  Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[220]  Ming-Kuei Hu,et al.  Visual pattern recognition by moment invariants , 1962, IRE Trans. Inf. Theory.

[221]  Antonio Torralba,et al.  Using the Forest to See the Trees: A Graphical Model Relating Features, Objects, and Scenes , 2003, NIPS.

[222]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[223]  Edward A. Fox,et al.  Research Contributions , 2014 .

[224]  Robert E. Schapire,et al.  The strength of weak learnability , 1990, Mach. Learn..

[225]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[226]  Haojie Li,et al.  TRECVid 2013 Semantic Video Concept Detection by NTT-MD-DUT , 2013, TRECVID.

[227]  Claudio S. Pinhanez,et al.  Using approximate models as source of contextual information for vision processing , 1995 .

[228]  M. Desvignes,et al.  A tool for studying context in Image sequences , 1989 .

[229]  Thomas M. Strat,et al.  Context-Based Vision: Recognizing Objects Using Information from Both 2D and 3D Imagery , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[230]  Ramin Zabih,et al.  Comparing images using joint histograms , 1999, Multimedia Systems.

[231]  Jing Huang,et al.  Image indexing using color correlograms , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[232]  Shu-Yuan Chen,et al.  Image classification using color, texture and regions , 2003, Image Vis. Comput..

[233]  S. Pizer,et al.  The Image Processing Handbook , 1994 .

[234]  F. Dirfaux Key frame selection to represent a video , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[235]  Alexei A. Efros,et al.  An empirical study of context in object detection , 2009, CVPR.

[236]  Shih-Fu Chang,et al.  Motion trajectory matching of video objects , 1999, Electronic Imaging.

[237]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[238]  Susan T. Dumais,et al.  A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[239]  B. S. Manjunath,et al.  Rotation-invariant texture classification using a complete space-frequency model , 1999, IEEE Trans. Image Process..

[240]  Stéphane Ayache,et al.  Video Corpus Annotation Using Active Learning , 2008, ECIR.

[241]  Godfried T. Toussaint,et al.  The use of context in pattern recognition , 1978, Pattern Recognit..

[242]  Mark Sanderson,et al.  Automatic video tagging using content redundancy , 2009, SIGIR.

[243]  Hisashi Aoki,et al.  A Shot Classification Method to Select Effective Key-frames for Video Browsing , 1996 .

[244]  Ho Joon Kim,et al.  Human Action Recognition Using a Modified Convolutional Neural Network , 2007, ISNN.

[245]  Hervé Glotin,et al.  Pyramidal Multi-level Features for the Robot Vision@ICPR 2010 Challenge , 2010, 2010 20th International Conference on Pattern Recognition.

[246]  Hiroshi Sako,et al.  Handwritten digit recognition using state-of-the-art techniques , 2002, Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition.

[247]  Romain Murenzi,et al.  Fast texture database retrieval using extended fractal features , 1997, Electronic Imaging.

[248]  Bill N. Schilit,et al.  Context-aware computing applications , 1994, Workshop on Mobile Computing Systems and Applications.

[249]  Markus A. Stricker,et al.  Similarity of color images , 1995, Electronic Imaging.

[250]  Driss Aboutajdine,et al.  Bridging the Semantic Gap for Texture-based Image Retrieval and Navigation , 2009, J. Multim..

[251]  Thomas M. Strat,et al.  Employing Contextual Information in Computer Vision , 1993 .

[252]  M. Bar Visual objects in context , 2004, Nature Reviews Neuroscience.