Building, Testing, and Applying Concept Hierarchies

A means of automatically deriving a hierarchical organization of concepts from a set of documents without use of training data or standard clustering techniques is presented. Using a process that extracts salient words and phrases from the documents, these terms are organized hierarchically using a type of co-occurrence known as subsumption. The resulting structure is displayed as a series of hierarchical menus. When generated from a set ofretrieved documents, a user browsing the menus gains an overview of their content in a manner distinct from existing techniques. The methods used to build the structure are simple and appear to be effective. The formation and presentation of the hierarchy is described along with a study of some of its properties, including a preliminary experiment, which indicates that users may find the hierarchy a more efficient means of locating relevant documents than the classic method of scanning a ranked document list.

[1]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[2]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[3]  W. Bruce Croft,et al.  Resolving ambiguity for cross-language retrieval , 1998, SIGIR '98.

[4]  Alon Itai,et al.  Two Languages Are More Informative Than One , 1991, ACL.

[5]  Eugene Charniak,et al.  Determining the specificity of nouns from text , 1999, EMNLP.

[6]  Hsinchun Chen,et al.  Internet Browsing and Searching: User Evaluations of Category Map and Concept Space Techniques , 1998, J. Am. Soc. Inf. Sci..

[7]  William A. Woods,et al.  Conceptual Indexing: A Better Way to Organize Knowledge , 1997 .

[8]  Geoffrey P. Ellis,et al.  A common query interface for multilingual document retrieval from databases of the European Community Institutions (abstract) , 1993, SIGIR.

[9]  Hans-Peter Frei,et al.  Concept based query expansion , 1993, SIGIR.

[10]  Karen Sparck Jones A statistical interpretation of term specificity and its application in retrieval , 1972 .

[11]  Lauren B. Doyle,et al.  Semantic Road Maps for Literature Searchers , 1961, JACM.

[12]  Gregory Grefenstette Short Query Linguistic Expansion Techniques: Palliating One-Word Queries by Providing Intermediate Structure to Text , 1997, SCIE.

[13]  Roy Rada,et al.  Machine learning - applications in expert systems and information retrieval , 1986, Ellis Horwood series in artificial intelligence.

[14]  George Lakoff,et al.  Women, Fire, and Dangerous Things , 1987 .

[15]  Donna K. Harman,et al.  Relevance feedback revisited , 1992, SIGIR '92.

[16]  Mark Magennis,et al.  The potential and actual effectiveness of interactive query expansion , 1997, SIGIR '97.

[17]  Marti A. Hearst,et al.  Reexamining the cluster hypothesis: scatter/gather on retrieval results , 1996, SIGIR '96.

[18]  Daniel E. Rose,et al.  V-Twin: A Lightweight Engine for Interactive Use , 1996, TREC.

[19]  Mark Sanderson,et al.  Advantages of query biased summaries in information retrieval , 1998, SIGIR '98.

[20]  Robert Krovetz,et al.  Viewing morphology as an inference process , 1993, Artif. Intell..

[21]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[22]  Tom M. Mitchell,et al.  Improving Text Classification by Shrinkage in a Hierarchy of Classes , 1998, ICML.

[23]  Yorick Wilks,et al.  Evaluation of an Algorithm for the Recognition and Classification of Proper Names , 1996, COLING.

[24]  Peter G. Anick,et al.  The paraphrase search assistant: terminological feedback for iterative information seeking , 1999, SIGIR '99.

[25]  Amanda Spink,et al.  Real life information retrieval: a study of user queries on the Web , 1998, SIGF.

[26]  Gregory Grefenstette,et al.  Explorations in automatic thesaurus discovery , 1994 .

[27]  Leah S. Larkey,et al.  A patent search and classification system , 1999, DL '99.

[28]  W. Bruce Croft,et al.  Deriving concept hierarchies from text , 1999, SIGIR '99.

[29]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[30]  W. Bruce Croft,et al.  Support for Browsing in an Intelligent Text Retrieval System , 1989, Int. J. Man Mach. Stud..

[31]  Hwee Tou Ng,et al.  Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-Based Approach , 1996, ACL.

[32]  Marti A. Hearst,et al.  Scatter/gather browsing communicates the topic structure of a very large text collection , 1996, CHI.

[33]  Peter Bruza,et al.  Query Reformulation on the Internet: Empirical Data and the Hyperindex Search Engine , 1997, RIAO.