Summarization techniques for visualization of large, multidimensional datasets

One of the main issues confronting visualization, is how to effectively display large, high dimensional datasets within a limited display area, without overwhelming the user. In this report, we discuss a data summarization approach to tackle this problem. Summarization is the process by which data is reduced in a meaningful and intelligent fashion, to its important and relevant features. We survey several different techniques from within computer science, which can be used to extract various characteristics from raw data. Using summarization techniques intelligently within visualization systems, could potentially reduce the size and dimensionality of large, high dimensional data, highlight relevant and important features, and enhance comprehension.

[1]  Ulrich Güntzer,et al.  Algorithms for association rule mining — a general survey and comparison , 2000, SKDD.

[2]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[3]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[4]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[5]  Hong Shen,et al.  Construct robust rule sets for classification , 2002, KDD.

[6]  José Fernando Rodrigues,et al.  Enhancing Data Visualization Techniques , 2003 .

[7]  Chris North,et al.  Temporal, geographical and categorical aggregations viewed through coordinated displays: a case study with highway incident data , 1999, NPIVM '99.

[8]  Mei C. Chuah,et al.  Dynamic aggregation with circular visual designs , 1998, Proceedings IEEE Symposium on Information Visualization (Cat. No.98TB100258).

[9]  Heidrun Schumann,et al.  A Flexible Approach for Visual Data Mining , 2002, IEEE Trans. Vis. Comput. Graph..

[10]  Thomas A. DeFanti,et al.  Visualization in Scientific Computing-A Synopsis , 1987, IEEE Computer Graphics and Applications.

[11]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[12]  Hans-Peter Kriegel,et al.  Spatial Data Mining: A Database Approach , 1997, SSD.

[13]  Jiuyong Li,et al.  Optimal and Robust Rule Set Generation , 2002 .

[14]  Jiong Yang,et al.  TAR: temporal association rules on evolving numerical attributes , 2001, Proceedings 17th International Conference on Data Engineering.

[15]  Jade Goldstein-Stewart,et al.  Using aggregation and dynamic queries for exploring large data sets , 1994, CHI.

[16]  Douglas M. Hawkins Identification of Outliers , 1980, Monographs on Applied Probability and Statistics.

[17]  James T. Enns,et al.  Building perceptual textures to visualize multidimensional datasets , 1998 .

[18]  T. Kohonen,et al.  Visual Explorations in Finance with Self-Organizing Maps , 1998 .

[19]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[20]  Weili Wu,et al.  Modeling Spatial Dependencies for Mining Geospatial Data , 2001, SDM.

[21]  John F. Roddick,et al.  Paradigms for Spatial and Spatio-Temporal Data Mining , 2001 .

[22]  Christopher G. Healey,et al.  Assisted Visualization of E-Commerce Auction Agents , 2001, Graphics Interface.

[23]  Jiawei Han,et al.  Geographic Data Mining and Knowledge Discovery , 2001 .

[24]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[25]  Christopher G. Healey,et al.  Attribute preserving dataset simplification , 2001, Proceedings Visualization, 2001. VIS '01..

[26]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[27]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Philip S. Yu,et al.  Outlier detection for high dimensional data , 2001, SIGMOD '01.

[29]  Hans-Peter Kriegel,et al.  3D Shape Histograms for Similarity Search and Classification in Spatial Databases , 1999, SSD.

[30]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[31]  Shashi Shekhar,et al.  Detecting graph-based spatial outliers: algorithms and applications (a summary of results) , 2001, KDD '01.

[32]  Petra Perner,et al.  Empirical Evaluation of Feature Subset Selection Based on a Real-World Data Set , 2000, PKDD.

[33]  Hans-Peter Kriegel,et al.  Knowledge Discovery in Spatial Databases , 1999, DAGM-Symposium.

[34]  Arthur Flexer,et al.  On the use of self-organizing maps for clustering and visualization , 1999, Intell. Data Anal..