论文信息 - Finding meaning on YouTube: Tag recommendation and category discovery

Finding meaning on YouTube: Tag recommendation and category discovery

We present a system that automatically recommends tags for YouTube videos solely based on their audiovisual content. We also propose a novel framework for unsupervised discovery of video categories that exploits knowledge mined from the World-Wide Web text documents/searches. First, video content to tag association is learned by training classifiers that map audiovisual content-based features from millions of videos on YouTube.com to existing uploader-supplied tags for these videos. When a new video is uploaded, the labels provided by these classifiers are used to automatically suggest tags deemed relevant to the video. Our system has learned a vocabulary of over 20,000 tags. Secondly, we mined large volumes of Web pages and search queries to discover a set of possible text entity categories and a set of associated is-A relationships that map individual text entities to categories. Finally, we apply these is-A relationships mined from web text on the tags learned from audiovisual content of videos to automatically synthesize a reliable set of categories most relevant to videos – along with a mechanism to predict these categories for new uploads. We then present rigorous rating studies that establish that: (a) the average relevance of tags automatically recommended by our system matches the average relevance of the uploader-supplied tags at the same or better coverage and (b) the average precision@K of video categories discovered by our system is 70% with K=5.

[1] Marti A. Hearst. Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[2] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[3] R. Manmatha,et al. Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[4] Bernardo A. Huberman,et al. The Structure of Collaborative Tagging Systems , 2005, ArXiv.

[5] Paul Over,et al. Evaluation campaigns and TRECVid , 2006, MIR '06.

[6] David Nistér,et al. Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[7] Nicu Sebe,et al. Content-based multimedia information retrieval: State of the art and challenges , 2006, TOMCCAP.

[8] Shumeet Baluja,et al. Large scale image-based adult-content filtering , 2006, VISAPP.

[9] Mor Naaman,et al. HT06, tagging paper, taxonomy, Flickr, academic article, to read , 2006, HYPERTEXT '06.

[10] Lars Schmidt-Thieme,et al. Collaborative Tag Recommendations , 2007, GfKl.

[11] Juan Chen,et al. Determination of Shot Boundary in MPEG Videos for TRECVID 2007 , 2007, TRECVID.

[12] Hui Wan,et al. Personalized Tag Recommendations via Tagging and Content-based Similarity Metrics , 2007, ICWSM.

[13] Marius Pasca,et al. Turning Web Text and Search Queries into Factual Knowledge: Hierarchical Class Attribute Extraction , 2008, AAAI.

[14] W. Macready,et al. Image recognition with an adiabatic quantum computer I. Mapping to quadratic unconstrained binary optimization , 2008, 0804.4457.

[15] Roman Kern,et al. Extending Folksonomies for Image Tagging , 2008, 2008 Ninth International Workshop on Image Analysis for Multimedia Interactive Services.

[16] Oren Etzioni,et al. Open Information Extraction from the Web , 2007, CACM.

[17] Samy Bengio,et al. A Discriminative Kernel-Based Approach to Rank Images from Text Queries , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18] Vincent Lepetit,et al. A fast local descriptor for dense matching , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[19] Benjamin Van Durme,et al. Weakly-Supervised Acquisition of Open-Domain Classes and Class Attributes from Web Documents and Query Logs , 2008, ACL.

[20] Adrian Ulges,et al. A System That Learns to Tag Videos by Watching Youtube , 2008, ICVS.

[21] Roelof van Zwol,et al. Flickr tag recommendation based on collective knowledge , 2008, WWW.

[22] Ullas Gargi,et al. Solving the label resolution problem in supervised video content classification , 2008, MIR '08.

[23] Partha Pratim Talukdar,et al. Weakly-Supervised Acquisition of Labeled Class Instances using Graph Random Walks , 2008, EMNLP.

[24] Hrishikesh B. Aradhye,et al. Video2Text: Learning to Annotate Video Content , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[25] Patrick Pantel,et al. Entity Extraction via Ensemble Semantics , 2009, EMNLP.

[26] José Ranilla,et al. Collaborative Tag Recommendation System based on Logistic Regression , 2009, DC@PKDD/ECML.

[27] Li Fei-Fei,et al. Towards total scene understanding: Classification, annotation and segmentation in an automatic framework , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.