Automatic identification and organization of index terms for interactive browsing

The potential of automatically generated indexes for information acces s has been recognized for several decades (e.g., Bush 1945 [2], Edmundson and Wyllys 1961 [4]), but the quantity of text and the ambiguity of natural language processing have made progress at this task more difficult than was originally foreseen. Recently, a body of work on development of interactive systems to support phrase browsing has begun to emerge (e.g., Anick and Vaithyanathan 1997 [1], Gutwin et al. [10], Nevill-Manning et al. 1997 [17], Godby and Reighart 1998 [9]). In this paper, we consider two issues related to the use of automatically identified phrases as index terms in a dynamic text browser (DTB), a user-centered system for navigating and browsing index terms: 1) What criteria are useful for assessing the usefulness of automatically identified index terms? and 2) Is the quality of the terms identified by automatic indexing such that they provide useful access to document content? The terms that we focus on have been identified by LinkIT, a software tool for identifying significant topics in text [7]. Over 90% of the terms identified by LinkIT are coherent and therefore merit inclusion in the dynamic text browser. Terms identified by LinkIT are input to Intell-Index, a prototype DTB that supports interactive navigation of index terms. The distinction between phrasal heads (the most important words in a coherent term) and modifiers serves as the basis for a hierarchical organization of terms. This linguistically motivated structure helps users to efficiently browsing and disambiguate terms. We conclude that the approach to information access discussed in this paper is very promising, and also that there is much room for further research. In the meantime, this research is a contribution to the establishment of a solid foundation for assessing the usability of terms in phrase browsing applications.

[1]  Ian H. Witten,et al.  Browsing in digital libraries: a phrase-based approach , 1997, DL '97.

[2]  Julia E. Hodges,et al.  An automated system that assists in the generation of document indexes , 1996, Nat. Lang. Eng..

[3]  Susan T. Dumais,et al.  The vocabulary problem in human-system communication , 1987, CACM.

[4]  Nina Wacholder,et al.  Document Processing with LinkIT , 2000, RIAO.

[5]  Ray R. Reighart,et al.  Using machine-readable text as a source of novel vocabulary to update the Dewy Decimal classification , 1998 .

[6]  H. P. Edmundson,et al.  Automatic abstracting and indexing—survey and recommendations , 1961, CACM.

[7]  Slava M. Katz,et al.  Technical terminology: some linguistic properties and an algorithm for identification in text , 1995, Natural Language Engineering.

[8]  Nancy C. Mulvany,et al.  Indexing Books , 1994 .

[9]  Shivakumar Vaithyanathan,et al.  Exploiting clustering and phrases for context-based information retrieval , 1997, SIGIR '97.

[10]  Hsinchun Chen,et al.  Comparing noun phrasing techniques for use with medical digital library tools , 2000 .

[11]  Vannevar Bush,et al.  As we may think , 1945, INTR.

[12]  Hsinchun Chen,et al.  Comparing noun phrasing techniques for use with medical digital library tools , 2000, J. Am. Soc. Inf. Sci..

[13]  Wendy G. Lehnert,et al.  Information extraction , 1996, CACM.

[14]  Carol A. Hert,et al.  A usability assessment of online indexing structures in the networked environment , 2000, J. Am. Soc. Inf. Sci..

[15]  Jessica L. Milstead Needs for research in indexing , 1994 .

[16]  David Yarowsky,et al.  One Sense per Collocation , 1993, HLT.

[17]  Joe Zhou,et al.  Phrasal Terms in Real-World IR Applications , 1999 .

[18]  Nina Wacholder,et al.  Evaluation of Automatically Identified Index Terms for Browsing Electronic Documents , 2000, ANLP.

[19]  Vasileios Hatzivassiloglou,et al.  Translating Collocations for Bilingual Lexicons: A Statistical Approach , 1996, CL.

[20]  ChengXiang Zhai,et al.  Noun-Phrase Analysis in Unrestricted Text for Information Retrieval , 1996, ACL.

[21]  Carl Gutwin,et al.  Improving browsing in digital libraries with keyphrase indexes , 1999, Decis. Support Syst..

[22]  Jin Wang,et al.  Building Effective Queries In Natural Language Information Retrieval , 1997, ANLP.