Automatic tag correction in videos : an approach based on frequent pattern mining. (Correction automatique d'annotations de vidéos : une approche à base de fouille de motifs fréquents)

This thesis presents a new system for video auto tagging which aims at correcting the tags provided by users for videos uploaded on the Internet. Most existing auto-tagging systems rely mainly on the textual information and learn a great number of classifiers (on per possible tag) to tag new videos. However, the existing user-provided video annotations are often incorrect and incomplete. Indeed, users uploading videos might often want to rapidly increase their video’s number-of-view by tagging them with popular tags which are irrelevant to the video. They can also forget an obvious tag which might greatly help an indexing process. In this thesis, we limit the use this questionable textual information and do not build a supervised model to perform the tag propagation. We propose to compare directly the visual content of the videos described by different sets of features such as SIFT-based Bag-Of-visual-Words or frequent patterns built from them. We then propose an original tag correction strategy based on the frequency of the tags in the visual neighborhood of the videos. We have also introduced a number of strategies and datasets to evaluate our system. The experiments show that our method can effectively improve the existing tags and that frequent patterns build from Bag-Of-visual-Words are useful to construct accurate visual features

[1]  Xiangyang Xue,et al.  Shot boundary detection using unsupervised clustering and hypothesis testing , 2004, 2004 International Conference on Communications, Circuits and Systems (IEEE Cat. No.04EX914).

[2]  Yueting Zhuang,et al.  Adaptive key frame extraction using unsupervised clustering , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[3]  Gang Hua,et al.  Semantic Model Vectors for Complex Video Event Recognition , 2012, IEEE Transactions on Multimedia.

[4]  Steven C. H. Hoi,et al.  Chinese University of Hong Kong at TRECVID 2006: Shot Boundary Detection and Video Search , 2006, TRECVID.

[5]  Minh-Son Dao,et al.  Video retrieval using video object-trajectory and edge potential function , 2004, Proceedings of 2004 International Symposium on Intelligent Multimedia, Video and Speech Processing, 2004..

[6]  Jiawei Han,et al.  Discriminative Frequent Pattern Analysis for Effective Classification , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[7]  Jilles Vreeken,et al.  Slim: Directly Mining Descriptive Patterns , 2012, SDM.

[8]  Hiroki Arimura,et al.  LCM ver.3: collaboration of array, bitmap and prefix tree for frequent itemset mining , 2005 .

[9]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[10]  Xinbo Gao,et al.  A Video Shot Boundary Detection Algorithm Based on Feature Tracking , 2006, RSKT.

[11]  Tobun Dorbin Ng,et al.  Informedia at TRECVID 2003 : Analyzing and Searching Broadcast News Video , 2003, TRECVID.

[12]  Chong-Wah Ngo,et al.  Evaluating bag-of-visual-words representations in scene classification , 2007, MIR '07.

[13]  Kebin Jia,et al.  Video Key Frame Extraction Based on Spatial-Temporal Color Distribution , 2008, 2008 International Conference on Intelligent Information Hiding and Multimedia Signal Processing.

[14]  Alan F. Smeaton,et al.  TRECVid 2006 Experiments at Dublin City University , 2012, TRECVID.

[15]  Ivor W. Tsang,et al.  Visual Event Recognition in Videos by Learning from Web Data , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Xifeng Yan,et al.  CloSpan: Mining Closed Sequential Patterns in Large Datasets , 2003, SDM.

[17]  A. Murat Tekalp,et al.  Two-stage hierarchical video summary extraction to match low-level user browsing preferences , 2003, IEEE Trans. Multim..

[18]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[19]  Andrew Zisserman,et al.  Video data mining using configurations of viewpoint invariant regions , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[20]  Ivor W. Tsang,et al.  Tag-based web photo retrieval improved by batch mode re-tagging , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[21]  Katsumi Tanaka,et al.  Querying Video Data by Spatio-Temporal Relationships of Moving Object Traces , 2002, VDB.

[22]  Ji Hong-bing A Shot Boundary Detection Method for News Video Based on Rough-Fuzzy Sets , 2006 .

[23]  Meng Wang,et al.  Multimedia tagging: past, present and future , 2011, ACM Multimedia.

[24]  Andrew Gilbert,et al.  Fast realistic multi-action recognition using mined dense spatio-temporal features , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[25]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[26]  G. Camara-Chavez,et al.  Shot Boundary Detection by a Hierarchical Supervised Approach , 2007, 2007 14th International Workshop on Systems, Signals and Image Processing and 6th EURASIP Conference focused on Speech and Image Processing, Multimedia Communications and Services.

[27]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[28]  Tie-Yan Liu,et al.  Dynamic selection and effective compression of key frames for video abstraction , 2003, Pattern Recognit. Lett..

[29]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[30]  Li Li,et al.  A Survey on Visual Content-Based Video Indexing and Retrieval , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[31]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[32]  Frédéric Jurie,et al.  Histograms of Pattern Sets for Image Classification and Object Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Hideyuki Tamura,et al.  Textural Features Corresponding to Visual Perception , 1978, IEEE Transactions on Systems, Man, and Cybernetics.

[34]  Luc Van Gool,et al.  Video mining with frequent itemset configurations , 2006 .

[35]  Fei-Fei Li,et al.  Grouplet: A structured image representation for recognizing human and object interactions , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[36]  Jilles Vreeken,et al.  Krimp: mining itemsets that compress , 2011, Data Mining and Knowledge Discovery.

[37]  Ting Liu,et al.  Video Segmentation via Temporal Pattern Classification , 2007, IEEE Transactions on Multimedia.

[38]  Vladimir Pavlovic,et al.  A New Baseline for Image Annotation , 2008, ECCV.

[39]  Christopher Joseph Pal,et al.  YouTube Scale, Large Vocabulary Video Annotation , 2010, Video Search and Mining.

[40]  Jilles Vreeken,et al.  The long and the short of it: summarising event sequences with serial episodes , 2012, KDD.

[41]  Dong Liu,et al.  Image retagging , 2010, ACM Multimedia.

[42]  Atreyi Kankanhalli,et al.  Automatic partitioning of full-motion video , 1993, Multimedia Systems.

[43]  John R. Smith,et al.  IBM Research TRECVID-2009 Video Retrieval System , 2009, TRECVID.

[44]  Luciano Sbaiz,et al.  Finding meaning on YouTube: Tag recommendation and category discovery , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[45]  Susan T. Dumais,et al.  Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[46]  John F. Roddick,et al.  Sequential pattern mining -- approaches and algorithms , 2013, CSUR.

[47]  Andrew Zisserman,et al.  Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[48]  Alberto Del Bimbo,et al.  An evaluation of nearest-neighbor methods for tag refinement , 2013, 2013 IEEE International Conference on Multimedia and Expo (ICME).

[49]  Frédéric Jurie,et al.  Sampling Strategies for Bag-of-Features Image Classification , 2006, ECCV.

[50]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[51]  Koen E. A. van de Sande,et al.  Recommendations for video event recognition using concept vocabularies , 2013, ICMR.

[52]  M. R. Turner,et al.  Texture discrimination by Gabor functions , 1986, Biological Cybernetics.

[53]  Zhi-Cheng Zhao,et al.  Shot Boundary Detection Algorithm in Compressed Domain Based on Adaboost and Fuzzy Theory , 2006, ICNC.

[54]  Tao Liu,et al.  BUPT at TRECVID 2007: Shot Boundary Detection , 2007, TRECVID.

[55]  Ba Tu Truong,et al.  Video abstraction: A systematic review and classification , 2007, TOMCCAP.

[56]  Tinne Tuytelaars,et al.  Effective Use of Frequent Itemset Mining for Image Classification , 2012, ECCV.

[57]  B. S. Manjunath,et al.  Video Annotation Through Search and Graph Reinforcement Mining , 2010, IEEE Transactions on Multimedia.

[58]  Bo Zhang,et al.  A Formal Study of Shot Boundary Detection , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[59]  Tinne Tuytelaars,et al.  Mining Multiple Queries for Image Retrieval: On-the-Fly Learning of an Object-Specific Mid-level Representation , 2013, 2013 IEEE International Conference on Computer Vision.

[60]  Jorma Rissanen,et al.  Minimum Description Length Principle , 2010, Encyclopedia of Machine Learning.

[61]  Ramakant Nevatia,et al.  Large-scale web video event classification by use of Fisher Vectors , 2013, 2013 IEEE Workshop on Applications of Computer Vision (WACV).

[62]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[63]  Yi Yang,et al.  Effective transfer tagging from image to video , 2013, TOMCCAP.

[64]  Marcel Worring,et al.  Learning Social Tag Relevance by Neighbor Voting , 2009, IEEE Transactions on Multimedia.

[65]  Ming Yang,et al.  Mining discriminative co-occurrence patterns for visual recognition , 2011, CVPR 2011.

[66]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[67]  Georges Quénot,et al.  TRECVID 2015 - An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics , 2011, TRECVID.

[68]  Johannes Gehrke,et al.  Sequential PAttern mining using a bitmap representation , 2002, KDD.

[69]  Ming Yang,et al.  Discovery of Collocation Patterns: from Visual Words to Visual Phrases , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[70]  Frédéric Jurie,et al.  Finding Groups of Duplicate Images In Very Large Dataset , 2012, BMVC.

[71]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[72]  Paul M. B. Vitányi,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 1993, Graduate Texts in Computer Science.

[73]  Cordelia Schmid,et al.  Applying Color Names to Image Description , 2007, 2007 IEEE International Conference on Image Processing.

[74]  Kuo-Chin Fan,et al.  Motion Flow-Based Video Retrieval , 2007, IEEE Transactions on Multimedia.

[75]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[76]  Rong Yan,et al.  A review of text and image retrieval approaches for broadcast news video , 2007, Information Retrieval.

[77]  Gregory Piatetsky-Shapiro,et al.  High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality , 2000 .

[78]  Georges Quénot,et al.  CLIPS at TRECVID : Shot Boundary Detection and Feature Detection , 2003, TRECVID.

[79]  Markus Koch,et al.  Learning automatic concept detectors from online video , 2010, Comput. Vis. Image Underst..

[80]  Yuchou Chang,et al.  Unsupervised Video Shot Detection Using Clustering Ensemble with a Color Global Scale-Invariant Feature Transform Descriptor , 2008, EURASIP J. Image Video Process..

[81]  Luc Van Gool,et al.  Efficient Mining of Frequent and Distinctive Feature Configurations , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[82]  Qingning Zeng,et al.  Shot Boundary Detection Based on Difference Sequences of Mutual Information , 2007, Fourth International Conference on Image and Graphics (ICIG 2007).

[83]  Arnold W. M. Smeulders,et al.  Real-Time Visual Concept Classification , 2010, IEEE Transactions on Multimedia.

[84]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[85]  HongJiang Zhang,et al.  Motion texture: a new motion based video representation , 2002, Object recognition supported by user interaction for service robots.

[86]  Sangkyum Kim,et al.  DisIClass: discriminative frequent pattern-based image classification , 2010, MDMKDD '10.

[87]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[88]  Robert M. Hawlick Statistical and Structural Approaches to Texture , 1979 .

[89]  Jiebo Luo,et al.  Mining compositional features for boosting , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[90]  Hans P. Moravec Obstacle avoidance and navigation in the real world by a seeing robot rover , 1980 .

[91]  Bingbing Ni,et al.  Assistive tagging: A survey of multimedia tagging with human-computer joint exploration , 2012, CSUR.

[92]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[93]  Lihi Zelnik-Manor,et al.  On SIFTs and their scales , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[94]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[95]  Kpalma Kidiyo,et al.  A Survey of Shape Feature Extraction Techniques , 2008 .

[96]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[97]  C. V. Jawahar,et al.  Characteristic pattern discovery in videos , 2010, ICVGIP '10.

[98]  Frédéric Dufaux,et al.  Key Frame Selection to Represent a Video , 2000, ICIP.

[99]  Cordelia Schmid,et al.  TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[100]  Sarah V. Porter,et al.  Video Segmentation and Indexing using Motion Estimation , 2004 .

[101]  George Toderici,et al.  Discriminative tag learning on YouTube videos with latent sub-tags , 2011, CVPR 2011.

[102]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[103]  Meng Wang,et al.  Active learning in multimedia annotation and retrieval: A survey , 2011, TIST.

[104]  Yang Yang,et al.  Robust Semantic Video Indexing by Harvesting Web Images , 2013, MMM.

[105]  Marcel Worring,et al.  Unsupervised multi-feature tag relevance learning for social image retrieval , 2010, CIVR '10.

[106]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[107]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[108]  Xian-Sheng Hua,et al.  To learn representativeness of video frames , 2005, MULTIMEDIA '05.

[109]  Ankur Agarwal,et al.  Hyperfeatures - Multilevel Local Coding for Visual Recognition , 2006, ECCV.

[110]  Jianyong Wang,et al.  Mining sequential patterns by pattern-growth: the PrefixSpan approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[111]  Peter Auer,et al.  Weak Hypotheses and Boosting for Generic Object Detection and Recognition , 2004, ECCV.

[112]  M.,et al.  Statistical and Structural Approaches to Texture , 2022 .