Dynamic Topography Information Landscapes - An Incremental Approach to Visual Knowledge Discovery

Incrementally computed information landscapes are an effective means to visualize longitudinal changes in large document repositories. Resembling tectonic processes in the natural world, dynamic rendering reflects both long-term trends and short-term fluctuations in such repositories. To visualize the rise and decay of topics, the mapping algorithm elevates and lowers related sets of concentric contour lines. Addressing the growing number of documents to be processed by state-of-the-art knowledge discovery applications, we introduce an incremental, scalable approach for generating such landscapes. The processing pipeline includes a number of sequential tasks, from crawling, filtering and pre-processing Web content to projecting, labeling and rendering the aggregated information. Incremental processing steps are localized in the projection stage consisting of document clustering, cluster force-directed placement and fast document positioning. We evaluate the proposed framework by contrasting layout qualities of incremental versus non-incremental versions. Documents for the experiments stem from the blog sample of the Media Watch on Climate Change (www.ecoresearch.net/climate). Experimental results indicate that our incremental computation approach is capable of accurately generating dynamic information landscapes.

[1]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[2]  Joe Marks,et al.  An empirical study of algorithms for point-feature label placement , 1995, TOGS.

[3]  D. M. P. Hagyard,et al.  Efficient convolution based algorithms for erosion and dilation , 1999, NSIP.

[4]  Wolfgang Kienreich,et al.  Visual Knowledge Discovery in Dynamic Enterprise Text Repositories , 2009, 2009 13th International Conference Information Visualisation.

[5]  James R. Slagle,et al.  A Clustering and Data-Reorganizing Algorithm , 1975, IEEE Transactions on Systems, Man, and Cybernetics.

[6]  G. Karypis,et al.  Incremental Singular Value Decomposition Algorithms for Highly Scalable Recommender Systems , 2002 .

[7]  Qiang Yang,et al.  An Incremental Subspace Learning Algorithm to Categorize Large Scale Text Data , 2005, APWeb.

[8]  Yiming Yang,et al.  Topic Detection and Tracking Pilot Study Final Report , 1998 .

[9]  Mads Nielsen,et al.  Computer Vision — ECCV 2002 , 2002, Lecture Notes in Computer Science.

[10]  Jennifer Widom,et al.  Proceedings of the 1996 ACM SIGMOD international conference on Management of data , 1996, PODS 1996.

[11]  H. S. Heaps,et al.  Information retrieval, computational and theoretical aspects , 1978 .

[12]  Marcel van Herk A fast algorithm for local minimum and maximum filters on rectangular and octagonal kernels , 1992, Pattern Recognit. Lett..

[13]  Yanchun Zhang,et al.  Web Technologies Research and Development - APWeb 2005, 7th Asia-Pacific Web Conference, Shanghai, China, March 29 - April 1, 2005, Proceedings , 2005, APWeb.

[14]  A. Ennaji,et al.  An Incremental Hierarchical Clustering , 1999 .

[15]  Anders Holst,et al.  Random indexing of text samples for latent semantic analysis , 2000 .

[16]  Guy Melançon,et al.  Multiscale hybrid MDS , 2004, Proceedings. Eighth International Conference on Information Visualisation, 2004. IV 2004..

[17]  Ales Leonardis,et al.  Incremental PCA for on-line visual learning and recognition , 2002, Object recognition supported by user interaction for service robots.

[18]  Hans-Peter Kriegel,et al.  Incremental Clustering for Mining in a Data Warehousing Environment , 1998, VLDB.

[19]  Arno Scharl,et al.  Tracking and modelling information diffusion across interactive online media , 2007, Int. J. Metadata Semant. Ontologies.

[20]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[21]  Arno Scharl,et al.  Incremental computation of information landscapes for dynamic web interfaces , 2010, IHC.

[22]  Pat Langley,et al.  Models of Incremental Concept Formation , 1990, Artif. Intell..

[23]  James Allan,et al.  Topic Detection and Tracking , 2002, The Information Retrieval Series.

[24]  Douglas H. Fisher,et al.  Knowledge Acquisition Via Incremental Conceptual Clustering , 1987, Machine Learning.

[25]  Matti Pietikäinen,et al.  Incremental locally linear embedding , 2005, Pattern Recognit..

[26]  Arno Scharl,et al.  Multiple coordinated views for searching and navigating Web content repositories , 2009, Inf. Sci..

[27]  Inderjit S. Dhillon,et al.  Concept Decompositions for Large Sparse Text Data Using Clustering , 2004, Machine Learning.

[28]  Wojciech Basalaj,et al.  Incremental multidimensional scaling method for database visualization , 1999, Electronic Imaging.

[29]  J. Kruskal Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[30]  Edward M. Reingold,et al.  Graph drawing by force‐directed placement , 1991, Softw. Pract. Exp..

[31]  Andrew W. Moore,et al.  X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[32]  Matthew Brand,et al.  Incremental Singular Value Decomposition of Uncertain Data with Missing Values , 2002, ECCV.

[33]  Lucy T. Nowell,et al.  ThemeRiver: Visualizing Thematic Changes in Large Document Collections , 2002, IEEE Trans. Vis. Comput. Graph..

[34]  Shaoning Pang,et al.  Incremental linear discriminant analysis for classification of data streams , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[35]  Michael Granitzer,et al.  Automatic Cluster Number Selection Using a Split and Merge K-Means Approach , 2009, 2009 20th International Workshop on Database and Expert Systems Application.

[36]  Jarek Nieplocha,et al.  Scalable Visual Analytics of Massive Textual Datasets , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[37]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .