Visualization Support for User-Centered Model Selection in Knowledge Discovery and Data Mining

The problem of model selection in knowledge discovery and data mining—the selection of appropriate discovered patterns/models or algorithms to achieve such patterns/models—is generally a difficult task for the user as it requires meta-knowledge on algorithms/models and model performance metrics. Viewing knowledge discovery as a human-centered process that requires an effective collaboration between the user and the discovery system, our work aims to make model selection in knowledge discovery easier and more effective. For such a collaboration, our solution is to give the user the ability to try easily various alternatives and to compare competing models quantitatively and qualitatively. The basic idea of our solution is to integrate data and knowledge visualization with the knowledge discovery process in order to the support the participation of the user. We introduce the knowledge discovery system D2MS in which several visualization techniques of data and knowledge are developed and integrated into the steps of the knowledge discovery process. The visualizers in D2MS greatly help the user gain better insight in each step of the knowledge discovery process as well the relationship between data and discovered knowledge in the whole process.

[1]  George Furnas,et al.  The FISHEYE view: A new look at structured files , 1986, CHI 1986.

[2]  Alexander Schnabl,et al.  Development of Multi-Criteria Metrics for Evaluation of Data Mining Algorithms , 1997, KDD.

[3]  Tu Bao Ho,et al.  Knowledge Discovery from Unsupervised Data in Support of Decision Making , 2000 .

[4]  Download Book,et al.  Information Visualization in Data Mining and Knowledge Discovery , 2001 .

[5]  Ron Kohavi,et al.  MineSet: An Integrated System for Data Mining , 1997, KDD.

[6]  Tu Bao Ho,et al.  A Scalable Algorithm for Rule Post-pruning of Large Decision Trees , 2001, PAKDD.

[7]  Hing-Yan Lee,et al.  Exploiting Visualization in Knowledge Discovery , 1995, KDD.

[8]  Tu Bao Ho,et al.  A Mixed Similarity Measure in Near-Linear Computational Complexity for Distance-Based Methods , 2000, PKDD.

[9]  Hans-Peter Kriegel,et al.  Visualization Techniques for Mining Large Databases: A Comparison , 1996, IEEE Trans. Knowl. Data Eng..

[10]  Tu Bao Ho,et al.  An Interactive-Graphic System for Decision Tree Induction , 1999 .

[11]  David H. Wolpert,et al.  The Relationship Between PAC, the Statistical Physics Framework, the Bayesian Framework, and the VC Framework , 1995 .

[12]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[13]  Ben Shneiderman,et al.  Browsing hierarchical data with multi-level dynamic queries and pruning , 1997, Int. J. Hum. Comput. Stud..

[14]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[15]  Tu Bao Ho,et al.  Mining Prediction Rules from Minority Classes , 2001, INAP.

[16]  Hans-Peter Kriegel,et al.  Towards an Effective Cooperation of the Computer and the User for Classification , 2000, KDD 2000.

[17]  Alexandros Kalousis,et al.  NOEMON: Design, implementation and performance results of an intelligent assistant for classifier selection , 1999, Intell. Data Anal..

[18]  Edward M. Reingold,et al.  Tidier Drawings of Trees , 1981, IEEE Transactions on Software Engineering.

[19]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[20]  Nick Cercone,et al.  Visualizing the process of knowledge discovery , 2000, J. Electronic Imaging.

[21]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[22]  M. Hilario,et al.  Building algorithm profiles for prior model selection in knowledge discovery systems , 1999, IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028).

[23]  Tu Bao Ho,et al.  A visualization tool for interactive learning of large decision trees , 2000, Proceedings 12th IEEE Internationals Conference on Tools with Artificial Intelligence. ICTAI 2000.

[24]  Tu Bao Ho,et al.  Discovering and using knowledge from unsupervised data , 1997, Decis. Support Syst..