Credibility Improves Topical Blog Post Retrieval

Topical blog post retrieval is the task of ranking blog posts with respect to their relevance for a given topic. To improve topical blog post retrieval we incorporate textual credibility indicators in the retrieval process. We consider two groups of indicators: post level (determined using information about individual blog posts only) and blog level (determined using information from the underlying blogs). We describe how to estimate these indicators and how to integrate them into a retrieval approach based on language models. Experiments on the TREC Blog track test set show that both groups of credibility indicators significantly improve retrieval effectiveness; the best performance is achieved when combining them.

[1]  Gilad Mishne,et al.  Finding high-quality content in social media , 2008, WSDM '08.

[2]  W. Bruce Croft,et al.  Quantifying query ambiguity , 2002 .

[3]  Craig MacDonald,et al.  Overview of the TREC 2007 Blog Track , 2007, TREC.

[4]  Timothy W. Finin,et al.  The BlogVox Opinion Retrieval System , 2006, TREC.

[5]  Gilad Mishne,et al.  A Study of Blog Search , 2006, ECIR.

[6]  Max Mühlhäuser,et al.  Automatically Assessing the Post Quality in Online Discussions on Software , 2007, ACL.

[7]  Miriam J. Metzger Making sense of credibility on the Web: Models for evaluating online information and recommendations for future research , 2007 .

[8]  David Hawking,et al.  Overview of the TREC-2001 Web track , 2002 .

[9]  Ellen R. Tauber,et al.  Experts vs. Online Consumers: A Comparative Credibility Study of Health and Finance Web Sites , 2002 .

[10]  Bing Liu,et al.  Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data , 2006, Data-Centric Systems and Applications.

[11]  Fernando Diaz,et al.  Improving the estimation of relevance models using large external corpora , 2006, SIGIR.

[12]  W. Bruce Croft,et al.  Time-based language models , 2003, CIKM '03.

[13]  Craig MacDonald,et al.  Overview of the TREC 2006 Blog Track , 2006, TREC.

[14]  Gilad Mishne,et al.  Leave a Reply: An Analysis of Weblog Comments , 2006 .

[15]  Gilad Mishne Using Blog Properties to Improve Retrieval , 2007, ICWSM.

[16]  Elizabeth D. Liddy,et al.  Assessing Credibility of Weblogs , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[17]  Gilad Mishne,et al.  Applied text analytics for blogs , 2007 .

[18]  Iadh Ounis,et al.  The TREC Blogs06 Collection: Creating and Analysing a Blog Test Collection , 2006 .

[19]  Miriam J. Metzger Making sense of credibility on the Web: Models for evaluating online information and recommendations for future research , 2007, J. Assoc. Inf. Sci. Technol..

[20]  W. Bruce Croft,et al.  Document quality models for web ad hoc retrieval , 2005, CIKM '05.

[21]  Johanna Nichols,et al.  Evidentiality: The Linguistic Coding of Epistemology , 1986 .