论文信息 - Exploring social annotations for information retrieval - 字舞流文

Exploring social annotations for information retrieval

Social annotation has gained increasing popularity in many Web-based applications, leading to an emerging research area in text analysis and information retrieval. This paper is concerned with developing probabilistic models and computational algorithms for social annotations. We propose a unified framework to combine the modeling of social annotations with the language modeling-based methods for information retrieval. The proposed approach consists of two steps: (1) discovering topics in the contents and annotations of documents while categorizing the users by domains; and (2) enhancing document and query language models by incorporating user domain interests as well as topical background models. In particular, we propose a new general generative model for social annotations, which is then simplified to a computationally tractable hierarchical Bayesian network. Then we apply smoothing techniques in a risk minimization framework to incorporate the topical information to language models. Experiments are carried out on a real-world annotation data set sampled from del.icio.us. Our results demonstrate significant improvements over the traditional approaches.

Hongyuan Zha | Jiang Bian | C. Lee Giles | Ding Zhou | Shuyi Zheng | H. Zha | Jiang Bian | Ding Zhou | Shuyi Zheng

[1] Frederick Jelinek,et al. Interpolated estimation of Markov source parameters from sparse data , 1980 .

[2] Vice President,et al. An Introduction to Expert Systems , 1989 .

[3] Hoon Kim,et al. Monte Carlo Statistical Methods , 2000, Technometrics.

[4] Jaana Kekäläinen,et al. IR evaluation methods for retrieving highly relevant documents , 2000, SIGIR '00.

[5] James A. Hendler,et al. The Semantic Web" in Scientific American , 2001 .

[6] Ravi Kumar,et al. On the Bursty Evolution of Blogspace , 2003, WWW '03.

[7] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[8] Ramanathan V. Guha,et al. SemTag and seeker: bootstrapping the semantic web via automated semantic annotation , 2003, WWW '03.

[9] Thomas L. Griffiths,et al. The Author-Topic Model for Authors and Documents , 2004, UAI.

[10] CHENGXIANG ZHAI,et al. A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[11] Thomas L. Griffiths,et al. Probabilistic author-topic models for information discovery , 2004, KDD.

[12] Mark Steyvers,et al. Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[13] Carmel Domshlak,et al. Better than the real thing?: iterative pseudo-query processing using cluster-based language models , 2005, SIGIR '05.

[14] Christian P. Robert,et al. Monte Carlo Statistical Methods (Springer Texts in Statistics) , 2005 .

[15] ChengXiang Zhai,et al. Discovering evolutionary theme patterns from text: an exploration of temporal text mining , 2005, KDD '05.

[16] Yong Yu,et al. Exploring social annotations for the semantic web , 2006, WWW '06.

[17] Bernardo A. Huberman,et al. Usage patterns of collaborative tagging systems , 2006, J. Inf. Sci..

[18] Hongyuan Zha,et al. Probabilistic models for discovering e-communities , 2006, WWW '06.

[19] Andreas Hotho,et al. Information Retrieval in Folksonomies: Search and Ranking , 2006, ESWC.

[20] Tao Tao,et al. Language Model Information Retrieval with Document Expansion , 2006, NAACL.