A search result clustering method using informatively named entities

Clustering the results of a search helps the user to overview the information returned. In this paper, we regard the clustering task as indexing the search results. Here, an index means a structured label list that can makes it easier for the user to comprehend the labels and search results. To realize this goal, we make three proposals. First is to use Named Entity Extraction for term extraction. Second is a new label selecting criterion based on importance in the search result and the relation between terms and search queries. The third is label categorization using category information of labels, which is generated by NE extraction. We implement a prototype system based on these proposals and find that it offers much higher performance than existing methods; we focus on news articles in this paper.

[1]  Nigel Collier,et al.  Introduction to the Bio-entity Recognition Task at JNLPBA , 2004, NLPBA/BioNLP.

[2]  Wei-Ying Ma,et al.  Learning to cluster web search results , 2004, SIGIR '04.

[3]  Marti A. Hearst,et al.  Reexamining the cluster hypothesis: scatter/gather on retrieval results , 1996, SIGIR '96.

[4]  Oren Etzioni,et al.  Grouper: A Dynamic Clustering Interface to Web Search Results , 1999, Comput. Networks.

[5]  Shourya Roy,et al.  A hierarchical monothetic document clustering algorithm for summarization and browsing search results , 2004, WWW '04.

[6]  Kentaro Torisawa,et al.  Extracting Hyponyms of Prespecified Hypernyms from Itemizations and Headings in Web Documents , 2004, COLING.

[7]  Anton Leuski,et al.  Evaluating document clustering for interactive information retrieval , 2001, CIKM '01.

[8]  Hideki Isozaki,et al.  Efficient Support Vector Classifiers for Named Entity Recognition , 2002, COLING.

[9]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[10]  Ralph Grishman,et al.  Message Understanding Conference- 6: A Brief History , 1996, COLING.

[11]  Marius Pasca,et al.  Acquisition of categorized named entities for web search , 2004, CIKM '04.

[12]  Gerard Salton,et al.  On the Specification of Term Values in Automatic Indexing , 1973 .

[13]  Shigeyoshi Ohno,et al.  Overlapping Clustering Method Using Local and Global Importance of Feature Terms at NTCIR-4 WEB Task , 2004, NTCIR.

[14]  Hiroyuki Seki,et al.  Flexible Category Structure for Supporting WWW Retrieval , 2000, ER.

[15]  S. Sekine Named Entity : History and Future , 2004 .