Automatic Document Tagging in Social Semantic Digital Library

The emergence of Web 2.0 has created a lot of annotation and personalization information about web resources. Extracting and utilizing these information to enhance the quality of services is a key target of modern digital libraries. In this paper, we present a novel Automatic Document Tagging (ADT) approach for digital libraries. In our approach, the ADT problem is formulated as a variant of multi-class classification problem. But differently, the training data for ADT is collected from the user's historic tags and only partially labeled. The incompleteness of the training data makes the training a more challenging problem. To overcome this problem, an efficient randomized online training algorithm (RPL) is proposed. RPL algorithm has two phases: (i) random exploitation and (ii) classifier update. The experimental results from both synthetic and real-word data demonstrate the effectiveness.

[1]  Enrico Motta,et al.  The Semantic Web - ISWC 2005, 4th International Semantic Web Conference, ISWC 2005, Galway, Ireland, November 6-10, 2005, Proceedings , 2005, SEMWEB.

[2]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[3]  Stefan Decker,et al.  JeromeDL - Adding Semantic Web Technologies to Digital Libraries , 2005, DEXA.

[4]  Marc Toussaint,et al.  Probabilistic inference for solving discrete and continuous state Markov Decision Processes , 2006, ICML.

[5]  Wenfei Fan,et al.  Keys with Upward Wildcards for XML , 2001, DEXA.

[6]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[7]  Vladimir Geroimenko A Semantic Web Primer , 2005 .

[8]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[9]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[10]  J. Langford,et al.  The Epoch-Greedy algorithm for contextual multi-armed bandits , 2007, NIPS 2007.

[11]  Amy Nicole Langville,et al.  Google's PageRank and beyond - the science of search engine rankings , 2006 .

[12]  Lora Aroyo,et al.  The Semantic Web: Research and Applications , 2009, Lecture Notes in Computer Science.

[13]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT.

[14]  Y. Singer,et al.  Ultraconservative online algorithms for multiclass problems , 2003 .

[15]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[16]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[17]  Peter Mika Ontologies Are Us: A Unified Model of Social Networks and Semantics , 2005, International Semantic Web Conference.

[18]  Jason Weston,et al.  Support vector machines for multi-class pattern recognition , 1999, ESANN.

[19]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[20]  Yoram Singer,et al.  Online multiclass learning by interclass hypothesis sharing , 2006, ICML.