do Processo de KDD com ênf Ase à Análise exPlorA tóriA dos dAdos

Abstract Knowledge Discovery in Databases – KDD – is a process that consists of several steps, beginning with the collection of data for the problem under analysis and ending with the interpretation and evaluation of the final results. This paper discusses the influence of exploratory data analysis on the performance of Data Mining techniques with respect to the classification of new patterns, based on its application to a medical problem, and compares the performance of these techniques in order to identify the one with the highest percentage of successes. The results of this study lead to the conclusion that, providing this analysis is done properly, it can significantly improve the performance of these techni-ques and serve as an important tool to optimize the end results. For the problem under study, the techniques involving a Linear Programming model and Neural Networks were the ones showing the lowest percentages of errors for the test sets, presenting good generalization capacities.

[1]  Bart Baesens,et al.  Using Neural Network Rule Extraction and Decision Tables for Credit - Risk Evaluation , 2003, Manag. Sci..

[2]  Julian D. Olden,et al.  Illuminating the “black box”: a randomization approach for understanding variable contributions in artificial neural networks , 2002 .

[3]  David G. Stork,et al.  Pattern Classification , 1973 .

[4]  Gustavo E. A. P. A. Batista,et al.  Aplicando seleção unilateral em conjuntos de exemplos desbalanceados: resultados iniciais , 1999 .

[5]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[6]  Hongjun Lu,et al.  NeuroRule: A Connectionist Approach to Data Mining , 1995, VLDB.

[7]  Nghiep Nguyen,et al.  Predicting Housing Value: A Comparison of Multiple Regression Analysis and Artificial Neural Networks , 2001 .

[8]  Laurene V. Fausett,et al.  Fundamentals Of Neural Networks , 1994 .

[9]  P. McCullagh,et al.  Generalized Linear Models , 1984 .

[10]  Alex A. Freitas,et al.  Extracting comprehensible rules from neural networks via genetic algorithms , 2000, 2000 IEEE Symposium on Combinations of Evolutionary Computation and Neural Networks. Proceedings of the First IEEE Symposium on Combinations of Evolutionary Computation and Neural Networks (Cat. No.00.

[11]  O. Mangasarian,et al.  Robust linear programming discrimination of two linearly inseparable sets , 1992 .

[12]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[13]  Michael J. Seiler,et al.  Residential Real Estate Prices: A Room with a View , 2002 .

[14]  P. McCullagh,et al.  Generalized Linear Models , 1972, Predictive Analytics.

[15]  Hongjun Lu,et al.  Effective Data Mining Using Neural Networks , 1996, IEEE Trans. Knowl. Data Eng..