DataJewel: Integrating Visualization with Temporal Data Mining

In this chapter we describe DataJewel, a new temporal data mining architecture. DataJewel tightly integrates a visualization component, an algorithmic component and a database component. We introduce a new visualization technique called CalendarView as an implementation of the visualization component, and we introduce a data structure that supports temporal mining of large databases. In our architecture, algorithms can be tightly integrated with the visualization component and most existing temporal data mining algorithms can be leveraged by embedding them into DataJewel. This integration is achieved by an interface that is used by both the user and the algorithms to assign colors to events. The user interactively assigns colors to incorporate domain knowledge or to formulate hypotheses. The algorithm assigns colors based on discovered patterns. The same visualization technique is used for displaying both data and patterns to make it more intuitive for the user to identify useful patterns while exploring data interactively or while using algorithms to search for patterns. Our experiments in analyzing several large datasets from the airplane maintenance domain demonstrate the usefulness of our approach and we discuss its applicability to domains like homeland security, market basket analysis and web mining.

[1]  Sunita Sarawagi,et al.  Integrating association rule mining with relational database systems: alternatives and implications , 1998, SIGMOD '98.

[2]  Cláudia Antunes,et al.  Temporal Data Mining: an overview , 2001 .

[3]  Foster J. Provost,et al.  A Survey of Methods for Scaling Up Inductive Algorithms , 1999, Data Mining and Knowledge Discovery.

[4]  Lucy T. Nowell,et al.  ThemeRiver: Visualizing Thematic Changes in Large Document Collections , 2002, IEEE Trans. Vis. Comput. Graph..

[5]  Joseph M. Hellerstein,et al.  Informix under CONTROL: Online Query Processing , 2000, Data Mining and Knowledge Discovery.

[6]  Li Yang,et al.  Interactive exploration of very large relational datasets through 3D dynamic projections , 2000, KDD '00.

[7]  J. V. van Wijk,et al.  Cluster and calendar based visualization of time series data , 1999, Proceedings 1999 IEEE Symposium on Information Visualization (InfoVis'99).

[8]  Sunita Sarawagi,et al.  Integrating Mining with Relational Database Systems: Alternatives and Implications. , 1998, SIGMOD 1998.

[9]  Ramakrishnan Srikant,et al.  Kdd-2001: Proceedings of the Seventh Acm Sigkdd International Conference on Knowledge Discovery and Data Mining : August 26-29, 2001 San Francisco, Ca, USA , 2002 .

[10]  Daniel A. Keim,et al.  Hierarchical Pixel Bar Charts , 2002, IEEE Trans. Vis. Comput. Graph..

[11]  Robert P. Trueblood,et al.  Data Mining and Statistical Analysis Using SQL , 2001, Apress.

[12]  JOHANNES GEHRKE,et al.  RainForest—A Framework for Fast Decision Tree Construction of Large Datasets , 1998, Data Mining and Knowledge Discovery.

[13]  Hans-Peter Kriegel,et al.  Towards an Effective Cooperation of the Computer and the User for Classification , 2000, KDD 2000.

[14]  Hans-Peter Kriegel,et al.  Towards an effective cooperation of the user and the computer for classification , 2000, KDD '00.

[15]  Jock D. Mackinlay,et al.  Developing calendar visualizers for the information visualizer , 1994, UIST '94.

[16]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[17]  Marie-Christine Fauvet,et al.  Visual Exploration of Temporal Object Databases , 2000, BDA.

[18]  Daniel A. Keim,et al.  Visual data mining: background , 2002 .

[19]  Daniel A. Keim,et al.  HD-Eye: Visual Mining of High-Dimensional Data , 1999, IEEE Computer Graphics and Applications.