Identifying Facets in Query-Biased Sets of Blog Posts

We investigate the identification of facets of query-biased sets of blog posts. Given a set of blog posts relevant to a topic, we compare several methods for identifying facets of the topic in this set. Building on a clustering of a set of blog posts, we compare several cluster labeling methods, and find that a method that makes use of blog and blog search specific features outperforms other methods. We also present eciencyimproving feature sets for clustering; our proposed method is fast enough to be deployed online.

[1]  Roger Bakeman,et al.  Observing Interaction: An Introduction to Sequential Analysis , 1986 .

[2]  James P. Callan,et al.  Automatically labeling hierarchical clusters , 2006, DG.O.

[3]  Robert Villa,et al.  The effectiveness of query-specific hierarchic clustering in information retrieval , 2002, Inf. Process. Manag..

[4]  David R. Karger,et al.  Scatter/Gather: a cluster-based approach to browsing large document collections , 1992, SIGIR '92.

[5]  Craig MacDonald,et al.  Overview of the TREC 2006 Blog Track , 2006, TREC.

[6]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[7]  Gilad Mishne,et al.  A Study of Blog Search , 2006, ECIR.

[8]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[9]  K. Fujimura,et al.  BLOGRANGER – A Multi-faceted Blog Search Engine , 2006 .

[10]  Gilad Mishne,et al.  Why Are They Excited? Identifying and Explaining Spikes in Blog Mood Levels , 2006, EACL.

[11]  Ryoji Kataoka,et al.  A clustering method for news articles retrieval system , 2005, WWW '05.

[12]  G. A. Mishne,et al.  Expiriments with mood classification in blog posts , 2005, SIGIR 2005.

[13]  Wei-Ying Ma,et al.  Learning to cluster web search results , 2004, SIGIR '04.

[14]  G. A. Mishne,et al.  Information Access Challenges in the Blogspace , 2006 .

[15]  Lyle H. Ungar,et al.  Automatic Labeling of Document Clusters , 2000, KDD 2000.

[16]  George Karypis,et al.  Hierarchical Clustering Algorithms for Document Datasets , 2005, Data Mining and Knowledge Discovery.

[17]  G. Karypis,et al.  Criterion Functions for Document Clustering ∗ Experiments and Analysis , 2001 .