Exploring social tagging graph for web object classification

This paper studies web object classification problem with the novel exploration of social tags. Automatically classifying web objects into manageable semantic categories has long been a fundamental preprocess for indexing, browsing, searching, and mining these objects. The explosive growth of heterogeneous web objects, especially non-textual objects such as products, pictures, and videos, has made the problem of web classification increasingly challenging. Such objects often suffer from a lack of easy-extractable features with semantic information, interconnections between each other, as well as training examples with category labels. In this paper, we explore the social tagging data to bridge this gap. We cast web object classification problem as an optimization problem on a graph of objects and tags. We then propose an efficient algorithm which not only utilizes social tags as enriched semantic features for the objects, but also infers the categories of unlabeled objects from both homogeneous and heterogeneous labeled objects, through the implicit connection of social tags. Experiment results show that the exploration of social tags effectively boosts web object classification. Our algorithm significantly outperforms the state-of-the-art of general classification methods.

[1]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[2]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[3]  Qiang Yang,et al.  A comparison of implicit and explicit links for web page classification , 2006, WWW '06.

[4]  Wei-Ying Ma,et al.  IRC: an iterative reinforcement categorization algorithm for interrelated Web objects , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[5]  Fabrício Enembreck,et al.  WEB Image Classification Based on the Fusion of Image and Text Classifiers , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[6]  Yong Yu,et al.  Optimizing web search using social annotations , 2007, WWW '07.

[7]  Georgia Koutrika,et al.  Can social bookmarking improve web search? , 2008, WSDM '08.

[8]  Wei-Hao Lin,et al.  News video classification using SVM-based multimodal classifiers and combination strategies , 2002, MULTIMEDIA '02.

[9]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[10]  Andrew Y. Ng,et al.  Transfer learning for text classification , 2005, NIPS.

[11]  Piotr Indyk,et al.  Enhanced hypertext categorization using hyperlinks , 1998, SIGMOD '98.

[12]  Christopher H. Brooks,et al.  Improved annotation of the blogosphere via autotagging and hierarchical clustering , 2006, WWW '06.

[13]  Qiang Yang,et al.  Topic-bridged PLSA for cross-domain text classification , 2008, SIGIR '08.

[14]  Gerhard Weikum,et al.  Efficient top-k querying over social-tagging networks , 2008, SIGIR '08.

[15]  Brian D. Davison,et al.  Web page classification: Features and algorithms , 2009, CSUR.

[16]  Xin Li,et al.  Tag-based social interest discovery , 2008, WWW.

[17]  Yong Yu,et al.  Exploring social annotations for the semantic web , 2006, WWW '06.

[18]  Hongyuan Zha,et al.  Exploring social annotations for information retrieval , 2008, WWW.

[19]  Qiang Yang,et al.  Transferring Naive Bayes Classifiers for Text Classification , 2007, AAAI.

[20]  Yiming Yang,et al.  Hypertext Categorization using Hyperlink Patterns and Meta Data , 2001, ICML.