Exploiting Social Annotation for Automatic Resource Discovery

Information integration applications, such as mediators or mashups, that require access to information resources currently rely on users manually discovering and integrating them in the application. Manual resource discovery is a slow process, requiring the user to sift through results obtained via keyword-based search. Although search methods have advanced to include evidence from document contents, its metadata and the contents and link structure of the referring pages, they still do not adequately cover information sources -- often called ``the hidden Web''-- that dynamically generate documents in response to a query. The recently popular social bookmarking sites, which allow users to annotate and share metadata about various information sources, provide rich evidence for resource discovery. In this paper, we describe a probabilistic model of the user annotation process in a social bookmarking system del.icio.us. We then use the model to automatically find resources relevant to a particular information domain. Our experimental results on data obtained from \emph{del.icio.us} show this approach as a promising method for helping automate the resource discovery task.

[1]  Kristina Lerman,et al.  Automatically Labeling the Inputs and Outputs of Web Services , 2006, AAAI.

[2]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[3]  Kristina Lerman,et al.  Semantic Labeling of Online Information Sources , 2007, Int. J. Semantic Web Inf. Syst..

[4]  Jun Zhang,et al.  Simlarity Search for Web Services , 2004, VLDB.

[5]  Jon M. Kleinberg,et al.  Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text , 1998, Comput. Networks.

[6]  Craig A. Knoblock,et al.  Composing, optimizing, and executing plans for bioinformatics web services , 2005, The VLDB Journal.

[7]  Yong Yu,et al.  Exploring social annotations for the semantic web , 2006, WWW '06.

[8]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[9]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[10]  Craig A. Knoblock,et al.  Learning Semantic Descriptions of Web Information Sources , 2007, IJCAI.

[11]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[12]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[13]  Adam Mathes,et al.  Folksonomies-Cooperative Classification and Communication Through Shared Metadata , 2004 .

[14]  Nicholas Kushmerick,et al.  Learning to Attach Semantic Metadata to Web Services , 2003, International Semantic Web Conference.

[15]  Steven Minton,et al.  AutoFeed: an unsupervised learning system for generating webfeeds , 2005, K-CAP '05.

[16]  David M. Pennock,et al.  Probabilistic Models for Unified Collaborative and Content-Based Recommendation in Sparse-Data Environments , 2001, UAI.