Exploiting the Social Capital of Folksonomies for Web Page Classification

Collaborative tagging systems (CTSs), also known as folksonomies, have grown in popularity on the Web and social tagging has become an important feature of many Web 2.0 services. It has been argued that the power of tagging lies in the ability for people to freely determine the appropriate tags for resources without having to rely on a predefined lexicon or hierarchy. The free-form nature of tagging causes a number of problems in this social classification scheme, such as synonymy and morphological variety. However, social tagging can be a valuable source of information to help in the organization of Web resources. In this paper we present an empirical analysis carried out to determine the importance of social tagging in Web page classification. Experimental results showed that tag-based classification outperformed classifiers based on full-text of documents.

[1]  Alberto Córdoba,et al.  Pattern Matching Techniques to Identify Syntactic Variations of Tags in Folksonomies , 2008, WSKS.

[2]  Christoph Meinel,et al.  The Metadata Triumvirate: Social Annotations, Anchor Texts and Search Queries , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[3]  Arkaitz Zubiaga,et al.  Getting the most out of social annotations for web page classification , 2009, DocEng '09.

[4]  Marieke Guy,et al.  Folksonomies: Tidying Up Tags? , 2006, D Lib Mag..

[5]  Sadegh Aliakbary,et al.  Web Page Classification Using Social Tags , 2009, 2009 International Conference on Computational Science and Engineering.

[6]  Andreas Hotho,et al.  Information Retrieval in Folksonomies: Search and Ranking , 2006, ESWC.

[7]  J. Walther Computer-Mediated Communication , 1996 .

[8]  W. B. Cavnar,et al.  N-gram-based text categorization , 1994 .

[9]  Tony Hammond,et al.  Social Bookmarking Tools (I): A General Overview , 2005, D Lib Mag..

[11]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[12]  Lora Aroyo,et al.  The Semantic Web: Research and Applications , 2009, Lecture Notes in Computer Science.

[13]  Christoph Meinel,et al.  Exploring social annotations for web document classification , 2008, SAC '08.

[14]  Miltiadis D. Lytras,et al.  Emerging Technologies and Information Systems for the Knowledge Society, First World Summit on the Knowledge Society, WSKS 2008, Athens, Greece, September 24-26, 2008. Proceedings , 2008, WSKS.

[15]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .