TOPIC ISLANDS/sup TM/-a wavelet-based text visualization system

We present a novel approach to visualize and explore unstructured text. The underlying technology, called TOPIC-O-GRAPHY/sup TM/, applies wavelet transforms to a custom digital signal constructed from words within a document. The resultant multiresolution wavelet energy is used to analyze the characteristics of the narrative flow in the frequency domain, such as theme changes, which is then related to the overall thematic content of the text document using statistical methods. The thematic characteristics of a document can be analyzed at varying degrees of detail, ranging from section-sized text partitions to partitions consisting of a few words. Using this technology, we are developing a visualization system prototype known as TOPIC ISLANDS to browse a document, generate fuzzy document outlines, summarize text by levels of detail and according to user interests, define meaningful subdocuments, query text content, and provide summaries of topic evolution.

[1]  Shmuel T. Klein,et al.  Clumping properties of content-bearing words , 1998 .

[2]  G. Nielson,et al.  Haar wavelets over triangular domains with applications to multiresolution models for flow over a sphere , 1997, Proceedings. Visualization '97 (Cat. No. 97CB36155).

[3]  Raghu Machiraju,et al.  Wavelet-based multiresolutional representation of computational field simulation datasets , 1997, Proceedings. Visualization '97 (Cat. No. 97CB36155).

[4]  Pak Chung Wong,et al.  Brushing techniques for exploring volume datasets , 1997, Proceedings. Visualization '97 (Cat. No. 97CB36155).

[5]  Marti A. Hearst,et al.  Cat-a-Cone: an interactive interface for specifying searches and viewing retrieval results using a large category hierarchy , 1997, SIGIR '97.

[6]  Marc R. Ilgen,et al.  DEPICT: Documents Evaluated as Pictures. Visualizing information using context vectors and self-organizing maps , 1996, Proceedings IEEE Symposium on Information Visualization '96.

[7]  Pak Chung Wong,et al.  Dual multiresolution HyperSlice for multivariate data visualization , 1996, Proceedings IEEE Symposium on Information Visualization '96.

[8]  Pak Chung Wong,et al.  Multiresolution multidimensional wavelet brushing , 1996, Proceedings of Seventh Annual IEEE Visualization '96.

[9]  George G. Robertson,et al.  The WebBook and the Web Forager: an information workspace for the World-Wide Web , 1996, CHI.

[10]  James J. Thomas,et al.  Visualizing the non-visual: spatial analysis and interaction with information from text documents , 1995, Proceedings of Visualization 1995 Conference.

[11]  Pak Chung Wong,et al.  Authenticity analysis of wavelet approximations in visualization , 1995, Proceedings Visualization '95.

[12]  Marti A. Hearst TileBars: visualization of term distribution information in full text information access , 1995, CHI '95.

[13]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[14]  Robert J. Moorhead,et al.  Lossless progressive transmission of scientific data using biorthogonal wavelet transform , 1994, Proceedings of 1st International Conference on Image Processing.

[15]  Arie E. Kaufman,et al.  Wavelet-based volume morphing , 1994, Proceedings Visualization '94.

[16]  Gerard Salton,et al.  Automatic Text Theme Generation and the Analysis of Text Structure , 1994 .

[17]  Marti A. Hearst Multi-Paragraph Segmentation Expository Text , 1994, ACL.

[18]  Jock D. Mackinlay,et al.  The document lens , 1993, UIST '93.

[19]  Stephen G. Eick,et al.  Visualizing code profiling line oriented statistics , 1992, Proceedings Visualization '92.

[20]  Jock D. Mackinlay,et al.  Cone Trees: animated 3D visualizations of hierarchical information , 1991, CHI.

[21]  Jock D. Mackinlay,et al.  The perspective wall: detail and context smoothly integrated , 1991, CHI.

[22]  Mark T. Keane,et al.  Cognitive Psychology: A Student's Handbook , 1990 .

[23]  Stéphane Mallat,et al.  A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..