TOPIC ISLANDS/sup TM/-a wavelet-based text visualization system

We present a novel approach to visualize and explore unstructured text. The underlying technology, called TOPIC-O-GRAPHY/sup TM/, applies wavelet transforms to a custom digital signal constructed from words within a document. The resultant multiresolution wavelet energy is used to analyze the characteristics of the narrative flow in the frequency domain, such as theme changes, which is then related to the overall thematic content of the text document using statistical methods. The thematic characteristics of a document can be analyzed at varying degrees of detail, ranging from section-sized text partitions to partitions consisting of a few words. Using this technology, we are developing a visualization system prototype known as TOPIC ISLANDS to browse a document, generate fuzzy document outlines, summarize text by levels of detail and according to user interests, define meaningful subdocuments, query text content, and provide summaries of topic evolution.

[1]  Gregory M. Nielson,et al.  Haar wavelets over triangular domains with applications to multiresolution models for flow over a sphere , 1997 .

[2]  Marti A. Hearst,et al.  Cat-a-Cone: an interactive interface for specifying searches and viewing retrieval results using a large category hierarchy , 1997, SIGIR '97.

[3]  Pak Chung Wong,et al.  Authenticity analysis of wavelet approximations in visualization , 1995, Proceedings Visualization '95.

[4]  Robert J. Moorhead,et al.  Progressive transmission of scientific data using biorthogonal wavelet transform , 1994, Proceedings Visualization '94.

[5]  Stéphane Mallat,et al.  A Theory for Multiresolution Signal Decomposition: The Wavelet Representation , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Marc R. Ilgen,et al.  DEPICT: Documents Evaluated as Pictures. Visualizing information using context vectors and self-organizing maps , 1996, Proceedings IEEE Symposium on Information Visualization '96.

[7]  Raghu Machiraju,et al.  Wavelet-based multiresolutional representation of computational field simulation datasets , 1997 .

[8]  James J. Thomas,et al.  Visualizing the non-visual: spatial analysis and interaction with information from text documents , 1995, Proceedings of Visualization 1995 Conference.

[9]  Mark T. Keane,et al.  Cognitive Psychology: A Student's Handbook , 1990 .

[10]  Shmuel T. Klein,et al.  Clumping properties of content-bearing words , 1998 .

[11]  Gerard Salton,et al.  Automatic Text Theme Generation and the Analysis of Text Structure , 1994 .

[12]  Pak Chung Wong,et al.  Brushing techniques for exploring volume datasets , 1997 .

[13]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[14]  Stephen G. Eick,et al.  Visualizing code profiling line oriented statistics , 1992, Proceedings Visualization '92.

[15]  Marti A. Hearst Multi-Paragraph Segmentation Expository Text , 1994, ACL.

[16]  Marti A. Hearst TileBars: visualization of term distribution information in full text information access , 1995, CHI '95.

[17]  Jock D. Mackinlay,et al.  Cone Trees: animated 3D visualizations of hierarchical information , 1991, CHI.

[18]  Pak Chung Wong,et al.  Multiresolution multidimensional wavelet brushing , 1996, Proceedings of Seventh Annual IEEE Visualization '96.

[19]  Jock D. Mackinlay,et al.  The document lens , 1993, UIST '93.

[20]  Pak Chung Wong,et al.  Dual multiresolution HyperSlice for multivariate data visualization , 1996, Proceedings IEEE Symposium on Information Visualization '96.

[21]  Arie E. Kaufman,et al.  Wavelet-based volume morphing , 1994, Proceedings Visualization '94.

[22]  Jock D. Mackinlay,et al.  The perspective wall: detail and context smoothly integrated , 1991, CHI.

[23]  George G. Robertson,et al.  The WebBook and the Web Forager: an information workspace for the World-Wide Web , 1996, CHI.