Visualization support for user-centered model selection in knowledge discovery in databases

The process of knowledge discovery in databases inherently consists of several steps that are necessarily iterative and interactive. In each application, to go through this process the user has to exploit different algorithms and their settings that usually yield different discovered models. The selection of appropriate discovered models or algorithms to achieve such models, referred to as model selection-requires meta-knowledge on algorithm/model and model performance metrics - is generally a difficult task for the user. Taking account of this difficulty, we consider that the ease of model selection is crucial in the success of real-life knowledge discovery activities. Different from most related work that aims to an automatic model selection, in our view model selection should be a semiautomatic work requiring an effective collaboration between the user and the discovery system. For such a collaboration, our solution is to give the user the ability to try easily various alternatives and to compare competing models quantitatively by performance metrics, and qualitatively by effective visualization. This paper presents our research on such model selection and visualization in the development of a knowledge discovery system called D2MS.

[1]  Tu Bao Ho,et al.  A Mixed Similarity Measure in Near-Linear Computational Complexity for Distance-Based Methods , 2000, PKDD.

[2]  Tu Bao Ho,et al.  A Scalable Algorithm for Rule Post-pruning of Large Decision Trees , 2001, PAKDD.

[3]  Tu Bao Ho,et al.  Discovering and using knowledge from unsupervised data , 1997, Decis. Support Syst..

[4]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[5]  T. B. Ho,et al.  Extracting Meningitis Knowledge by Integration of Rule Induction and Association Mining , 2001, JSAI Workshops.

[6]  Hans-Peter Kriegel,et al.  Towards an Effective Cooperation of the Computer and the User for Classification , 2000, KDD 2000.

[7]  George Furnas,et al.  The FISHEYE view: A new look at structured files , 1986, CHI 1986.

[8]  Nick Cercone,et al.  RuleViz: a model for visualizing knowledge discovery process , 2000, KDD '00.

[9]  Edward M. Reingold,et al.  Tidier Drawings of Trees , 1981, IEEE Transactions on Software Engineering.

[10]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[11]  Tu Bao Ho,et al.  An Interactive-Graphic System for Decision Tree Induction , 1999 .

[12]  Tu Bao Ho,et al.  A visualization tool for interactive learning of large decision trees , 2000, Proceedings 12th IEEE Internationals Conference on Tools with Artificial Intelligence. ICTAI 2000.

[13]  Alexander Schnabl,et al.  Development of Multi-Criteria Metrics for Evaluation of Data Mining Algorithms , 1997, KDD.

[14]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.