University of Glasgow at TREC 2008: Experiments in Blog, Enterprise, and Relevance Feedback Tracks with Terrier

Abstract : In TREC 2008, we participate in the Blog, Enterprise, and Relevance Feedback tracks. In all tracks, we continue the research and development of the Terrier platform centred around extending state-of-the-art weighting models based on the Divergence From Randomness (DFR) framework. In particular, we investigate two main themes, namely, proximity-based models, and collection and profile enrichment techniques based on several resources. In the Blog track, we aim to improve our opinion detection techniques and to integrate various new blog-specific features into our Voting Model. For the baseline ad-hoc task, we aim to build strongly performing baselines by applying two different techniques. The first one boosts documents in which query terms co-occur in a given window size, and the second one applies query expansion using collection enrichment. Non-English documents are also removed from the retrieved results. In the opinion-finding task, we experiment with two main opinion detection approaches. The first one improves our TREC 2007 dictionary-based approach by automatically building an internal opinion dictionary from the collection itself. We measure the opinionated discriminability of each term using an information-theoretic divergence measure based on the relevance assessments of previous years. The second approach is based on the OpinionFinder tool, which identifies subjective sentences in text. In particular, we introduce a novel method to measure the informativeness of query terms occurring in close proximity to subjective sentences. In the blog distillation task, we have two research themes.

[1]  Ellen M. Voorhees,et al.  Overview of TREC 2007 , 2007, TREC.

[2]  Ellen M. Voorhees,et al.  Overview of TREC 2004 , 2004, TREC.

[3]  L MercerRobert,et al.  Class-based n-gram models of natural language , 1992 .

[4]  David Hawking,et al.  Toward better weighting of anchors , 2004, SIGIR '04.

[5]  Craig MacDonald,et al.  University of Glasgow at WebCLEF 2005: Experiments in per-field Normalisation and Language Specific Stemming , 2005, CLEF.

[6]  Craig MacDonald,et al.  Ranking opinionated blog posts using OpinionFinder , 2008, SIGIR '08.

[7]  Iadh Ounis,et al.  A syntactically-based query reformulation technique for information retrieval , 2008, Inf. Process. Manag..

[8]  Stephen E. Robertson,et al.  Microsoft Cambridge at TREC 13: Web and Hard Tracks , 2004, TREC.

[9]  Gianni Amati,et al.  Probability models for information retrieval based on divergence from randomness , 2003 .

[10]  Iadh Ounis,et al.  The TREC Blogs06 Collection: Creating and Analysing a Blog Test Collection , 2006 .

[11]  Iadh Ounis,et al.  University of Glasgow at TREC 2006: Experiments in Terabyte and Enterprise Tracks with Terrier , 2006, TREC.

[12]  Craig MacDonald,et al.  Key blog distillation: ranking aggregates , 2008, CIKM '08.

[13]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[14]  Iadh Ounis,et al.  Examining the Content Load of Part of Speech Blocks for Information Retrieval , 2006, ACL 2006.

[15]  Craig MacDonald,et al.  Voting for candidates: adapting data fusion techniques for an expert search task , 2006, CIKM '06.

[16]  Claudio Carpineto,et al.  Italian Monolingual Information Retrieval with PROSIT , 2002, CLEF.

[17]  Kui-Lam Kwok,et al.  TREC-7 Ad-Hoc, High Precision and Filtering Experiments using PIRCS , 1998, TREC.

[18]  Craig MacDonald,et al.  Overview of the TREC 2006 Blog Track , 2006, TREC.

[19]  Iadh Ounis,et al.  A case study of distributed information retrieval architectures to index one terabyte of text , 2005, Inf. Process. Manag..

[20]  Craig MacDonald,et al.  An effective statistical approach to blog post opinion retrieval , 2008, CIKM '08.

[21]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[22]  Stephen E. Robertson,et al.  Simple BM25 extension to multiple weighted fields , 2004, CIKM '04.

[23]  Craig MacDonald,et al.  High Quality Expertise Evidence for Expert Search , 2008, ECIR.

[24]  Craig MacDonald,et al.  Integrating Proximity to Subjective Sentences for Blog Opinion Retrieval , 2009, ECIR.

[25]  David Hawking,et al.  Overview of the TREC 2004 Web Track , 2004, TREC.

[26]  Claire Cardie,et al.  OpinionFinder: A System for Subjectivity Analysis , 2005, HLT.

[27]  Djoerd Hiemstra,et al.  Being Omnipresent To Be Almighty: The Importance of The Global Web Evidence for Organizational Expert Finding , 2008 .

[28]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[29]  Craig MacDonald,et al.  Searching for Expertise: Experiments with the Voting Model , 2009, Comput. J..

[30]  Stephen E. Robertson,et al.  Okapi at TREC-4 , 1995, TREC.

[31]  Iadh Ounis,et al.  Combination of Document Priors in Web Information Retrieval , 2007, RIAO.

[32]  Craig MacDonald,et al.  Overview of the TREC 2007 Blog Track , 2007, TREC.

[33]  Iadh Ounis,et al.  Query performance prediction , 2006, Inf. Syst..

[34]  Craig MacDonald,et al.  The voting model for people search , 2009, SIGF.

[35]  Ben He,et al.  Terrier : A High Performance and Scalable Information Retrieval Platform , 2022 .

[36]  Craig MacDonald,et al.  University of Glasgow at TREC 2007: Experiments in Blog and Enterprise Tracks with Terrier , 2007, TREC.

[37]  Stephen E. Robertson,et al.  Relevance weighting for query independent evidence , 2005, SIGIR '05.

[38]  Giorgio Gambosi,et al.  FUB, IASI-CNR and University of Tor Vergata at TREC 2008 Blog Track , 2008, TREC.

[39]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[40]  OunisIadh,et al.  A case study of distributed information retrieval architectures to index one terabyte of text , 2005 .

[41]  Shenghua Bao,et al.  Research on Expert Search at Enterprise Track of TREC 2006 , 2005, TREC.

[42]  W. Bruce Croft,et al.  Hierarchical Language Models for Expert Finding in Enterprise Corpora , 2008, Int. J. Artif. Intell. Tools.