Document analysis for visualization

An experimental term selection strategy for document visualization is described. Strong discriminators with few co-occurrences increase the clustering tendency of low-dimensional document browsing spaces. Clustering tendency is tested with diagnostic measures adapted from the field of cluster analysis, and con6rrned using the VIBE visualization tool. This method supports browsing in high recall, low precision document retrieval and classification tasks.

[1]  A. Tversky,et al.  Spatial versus tree representations of proximity data , 1982 .

[2]  Donald B. Crouch The visual display of information in an information retrieval environment , 1986, SIGIR '86.

[3]  Robert R. Korfhage,et al.  Visualization of a Document Collection: The VIBE System , 1993, Inf. Process. Manag..

[4]  Anselm Spoerri Visual tools for information retrieval , 1993, Proceedings 1993 IEEE Symposium on Visual Languages.

[5]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[6]  Robert R. Korfhage,et al.  GUIDO, a visual tool for retrieving documents , 1994, Proceedings of 1994 IEEE Symposium on Visual Languages.

[7]  Robert R. Korfhage,et al.  BIRD: browsing interface for the retrieval of documents , 1994, Proceedings of 1994 IEEE Symposium on Visual Languages.

[8]  Robert R. Korfhage,et al.  The Use of Visual Representations in Information Retrieval Applications , 1990 .

[9]  Matthias Hemmje,et al.  LyberWorld—a visualization user interface supporting fulltext retrieval , 1994, SIGIR '94.

[10]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.

[11]  Carolyn J. Crouch,et al.  An analysis of approximate versus exact discrimination values , 1988, Inf. Process. Manag..

[12]  Donna K. Harman,et al.  Overview of the first TREC conference , 1993, SIGIR.

[13]  Peter Willett,et al.  An algorithm for the calculation of exact term discrimination values , 1985, Inf. Process. Manag..

[14]  Robert R. Korfhage,et al.  To see, or not to see— is That the query? , 1991, SIGIR '91.

[15]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[16]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Katherine W. McCain,et al.  Longitudinal author cocitation mapping: The changing structure of macroeconomics , 1984, J. Am. Soc. Inf. Sci..

[18]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[19]  Matthew Chalmers,et al.  Using a Landscape Methaphor to Represent a Corpus of Documents , 1993, COSIT.

[20]  Peter Willett,et al.  An improved algorithm for the calculation of exact term discrimination values , 1988, Inf. Process. Manag..

[21]  Clement T. Yu,et al.  A theory of term importance in automatic text analysis , 1974, J. Am. Soc. Inf. Sci..