Discovering and Comparing Topic Hierarchies

Hierarchies have been used for organization, summarization, and access to information, yet a lingering issue is how best to construct them. In this paper, our goal is to automatically create domain specific hierarchies that can be used for browsing a document set and locating relevant documents. We examine methods of automatically generating hierarchies and evaluating them. To this end, we compare and contrast two methods of generating topic hierarchies from the text of documents: one, subsumption hierarchies, uses subsumption relations found within document sets, and the other, lexical hierarchies, utilizes frequently used words within phrases. Our evaluation shows that subsumption hierarchies divide documents into smaller groups, allowing one to find all relevant documents without looking at as many non-relevant documents. However, such hierarchies are more likely to contain no path to a relevant document.

[1]  Carolyn J. Crouch,et al.  The automatic generation of extended queries , 1989, SIGIR '90.

[2]  Peter Willett,et al.  Recent trends in hierarchic document clustering: A critical review , 1988, Inf. Process. Manag..

[3]  Daphne Koller,et al.  Using machine learning to improve information access , 1998 .

[4]  W. Bruce Croft,et al.  The INQUERY Retrieval System , 1992, DEXA.

[5]  David R. Karger,et al.  Scatter/Gather: a cluster-based approach to browsing large document collections , 1992, SIGIR '92.

[6]  Ian H. Witten,et al.  Lexically-generated subject hierarchies for browsing large collections , 1999, International Journal on Digital Libraries.

[7]  Ellen M. Voorhees,et al.  The Sixth Text REtrieval Conference (TREC-6) , 2000, Inf. Process. Manag..

[8]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.

[9]  Carolyn J. Crouch,et al.  A cluster-based approach to thesaurus construction , 1988, SIGIR '88.

[10]  Daphne Koller,et al.  Hierarchically Classifying Documents Using Very Few Words , 1997, ICML.

[11]  W. Bruce Croft,et al.  Deriving concept hierarchies from text , 1999, SIGIR '99.

[12]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[13]  Peter G. Anick,et al.  The paraphrase search assistant: terminological feedback for iterative information seeking , 1999, SIGIR '99.

[14]  James Pustejovsky,et al.  Automatic construction of faceted terminological feedback for context-based information retrieval , 1999 .