Finding meaning on YouTube: Tag recommendation and category discovery

We present a system that automatically recommends tags for YouTube videos solely based on their audiovisual content. We also propose a novel framework for unsupervised discovery of video categories that exploits knowledge mined from the World-Wide Web text documents/searches. First, video content to tag association is learned by training classifiers that map audiovisual content-based features from millions of videos on YouTube.com to existing uploader-supplied tags for these videos. When a new video is uploaded, the labels provided by these classifiers are used to automatically suggest tags deemed relevant to the video. Our system has learned a vocabulary of over 20,000 tags. Secondly, we mined large volumes of Web pages and search queries to discover a set of possible text entity categories and a set of associated is-A relationships that map individual text entities to categories. Finally, we apply these is-A relationships mined from web text on the tags learned from audiovisual content of videos to automatically synthesize a reliable set of categories most relevant to videos – along with a mechanism to predict these categories for new uploads. We then present rigorous rating studies that establish that: (a) the average relevance of tags automatically recommended by our system matches the average relevance of the uploader-supplied tags at the same or better coverage and (b) the average precision@K of video categories discovered by our system is 70% with K=5.

[1]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[2]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[3]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[4]  Bernardo A. Huberman,et al.  The Structure of Collaborative Tagging Systems , 2005, ArXiv.

[5]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[6]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[7]  Nicu Sebe,et al.  Content-based multimedia information retrieval: State of the art and challenges , 2006, TOMCCAP.

[8]  Shumeet Baluja,et al.  Large scale image-based adult-content filtering , 2006, VISAPP.

[9]  Mor Naaman,et al.  HT06, tagging paper, taxonomy, Flickr, academic article, to read , 2006, HYPERTEXT '06.

[10]  Lars Schmidt-Thieme,et al.  Collaborative Tag Recommendations , 2007, GfKl.

[11]  Juan Chen,et al.  Determination of Shot Boundary in MPEG Videos for TRECVID 2007 , 2007, TRECVID.

[12]  Hui Wan,et al.  Personalized Tag Recommendations via Tagging and Content-based Similarity Metrics , 2007, ICWSM.

[13]  Marius Pasca,et al.  Turning Web Text and Search Queries into Factual Knowledge: Hierarchical Class Attribute Extraction , 2008, AAAI.

[14]  W. Macready,et al.  Image recognition with an adiabatic quantum computer I. Mapping to quadratic unconstrained binary optimization , 2008, 0804.4457.

[15]  Roman Kern,et al.  Extending Folksonomies for Image Tagging , 2008, 2008 Ninth International Workshop on Image Analysis for Multimedia Interactive Services.

[16]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[17]  Samy Bengio,et al.  A Discriminative Kernel-Based Approach to Rank Images from Text Queries , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Vincent Lepetit,et al.  A fast local descriptor for dense matching , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Benjamin Van Durme,et al.  Weakly-Supervised Acquisition of Open-Domain Classes and Class Attributes from Web Documents and Query Logs , 2008, ACL.

[20]  Adrian Ulges,et al.  A System That Learns to Tag Videos by Watching Youtube , 2008, ICVS.

[21]  Roelof van Zwol,et al.  Flickr tag recommendation based on collective knowledge , 2008, WWW.

[22]  Ullas Gargi,et al.  Solving the label resolution problem in supervised video content classification , 2008, MIR '08.

[23]  Partha Pratim Talukdar,et al.  Weakly-Supervised Acquisition of Labeled Class Instances using Graph Random Walks , 2008, EMNLP.

[24]  Hrishikesh B. Aradhye,et al.  Video2Text: Learning to Annotate Video Content , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[25]  Patrick Pantel,et al.  Entity Extraction via Ensemble Semantics , 2009, EMNLP.

[26]  José Ranilla,et al.  Collaborative Tag Recommendation System based on Logistic Regression , 2009, DC@PKDD/ECML.

[27]  Li Fei-Fei,et al.  Towards total scene understanding: Classification, annotation and segmentation in an automatic framework , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.