A Visual and Interactive Data Exploration Method for Large Data Sets and Clustering

We present in this paper a new method for the visual exploration of large data sets with up to one million of objects. We highlight some limitations of the existing visual methods in this context. Our approach is based on previous systems like Vibe, Sqwid or Radviz which have been used in information retrieval: several data called points of interest (POIs) are placed on a circle. The remaining large amount of data is displayed within the circle at locations which depend on the similarity between the data and the POIs. Several interactions with the user are possible and ease the exploration of the data. We highlight the visual and computational properties of this representation: it displays the similarities between data in a linear time, it allows the user to explore the data set and to obtain useful information. We show how it can be applied to standard 'small' databases, either benchmarks or real world data. Then we provide results on several large, real or artificial, data sets with up to one million data. We describe then both the successes and limits of our method.

[1]  Alfred Inselberg,et al.  The plane with parallel coordinates , 1985, The Visual Computer.

[2]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[3]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[4]  Jennifer Widom,et al.  Proceedings of the 1996 ACM SIGMOD international conference on Management of data , 1996, PODS 1996.

[5]  Robert R. Korfhage,et al.  To see, or not to see— is That the query? , 1991, SIGIR '91.

[6]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[7]  Hans-Peter Kriegel,et al.  VisDB: database exploration using multidimensional visualization , 1994, IEEE Computer Graphics and Applications.

[8]  Hans Hagen,et al.  Scientific Visualization: Overviews, Methodologies, and Techniques , 1997 .

[9]  Matthew O. Ward,et al.  Hierarchical parallel coordinates for exploration of large datasets , 1999, Proceedings Visualization '99 (Cat. No.99CB37067).

[10]  Yike Guo,et al.  New paradigms in information visualization. , 2000, SIGIR 2000.

[11]  Allan R. Wilks,et al.  Dynamic Graphics for Data Analysis , 1987 .

[12]  Gilles Venturini,et al.  An Interactive Visualization Environment for Data Exploration Using Points of Interest , 2006, ADMA.

[13]  Herman Chernoff,et al.  The Use of Faces to Represent Points in k- Dimensional Space Graphically , 1973 .

[14]  Yike Guo,et al.  New paradigms in information visualization (poster session) , 2000, SIGIR '00.

[15]  Jean-Daniel Fekete,et al.  Interactive information visualization of a million items , 2002, IEEE Symposium on Information Visualization, 2002. INFOVIS 2002..

[16]  Matthias Hemmje,et al.  LyberWorld—a visualization user interface supporting fulltext retrieval , 1994, SIGIR '94.

[17]  Richard A. Becker,et al.  Brushing scatterplots , 1987 .

[18]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[19]  Georges G. Grinstein,et al.  Dimensional anchors: a graphic primitive for multidimensional multivariate information visualizations , 1999, NPIVM '99.

[20]  Pak Chung Wong,et al.  30 Years of Multidimensional Multivariate Visualization , 1994, Scientific Visualization.