Towards a Methodology for Data Mining Project Development: The Importance of Abstraction

Standards such as CRISP-DM, SEMMA, PMML, are making data mining processes easier. Nevertheless, up to date, projects are being developed more as an art than as a science making it difficult to understand, evaluate and compare results as there is no standard methodology. In this chapter, we make a proposal for such a methodology based on RUP and CRISP-DM and concentrate on the project conception phase for determining a feasible project plan.

[1]  Jorma Rissanen,et al.  SLIQ: A Fast Scalable Classifier for Data Mining , 1996, EDBT.

[2]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[3]  Robert L. Grossman,et al.  Data mining standards initiatives , 2002, CACM.

[4]  Karen L. McGraw,et al.  Knowledge Acquisition: Principles and Guidelines , 1989 .

[5]  Taku Komura,et al.  NiceMeetVR: facing professional baseball pitchers in the virtual batting cage , 2002, SAC '02.

[6]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[7]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[8]  Ingolf Geist,et al.  A framework for data mining and KDD , 2002, SAC '02.

[9]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[10]  Michael J. A. Berry,et al.  Data mining techniques - for marketing, sales, and customer support , 1997, Wiley computer publishing.

[11]  Dorian Pyle,et al.  Successful customer relationship management in financial applications , 2000, KDD '00.

[12]  Zdzislaw Pawlak,et al.  Information systems theoretical foundations , 1981, Inf. Syst..

[13]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[14]  C. Apte,et al.  Data mining with decision trees and decision rules , 1997, Future Gener. Comput. Syst..

[15]  Wojciech Ziarko,et al.  Variable Precision Rough Set Model , 1993, J. Comput. Syst. Sci..

[16]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[17]  Larissa Terpeluk Moss,et al.  Business Intelligence Roadmap: The Complete Project Lifecycle for Decision-Support Applications , 2003 .

[18]  Tsau Young Lin,et al.  Data mining using granular computing: fast algorithms for finding association rules , 2002 .

[19]  Dominik Slezak,et al.  Constructing Extensions of Bayesian Classifiers with Use of Normalizing Neural Networks , 2003, ISMIS.

[20]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[21]  Ian Witten,et al.  Data Mining , 2000 .

[22]  Philippe Kruchten,et al.  The Rational Unified Process: An Introduction , 1998 .

[23]  Stephan Kudyba,et al.  Data Mining Defined , 2001 .

[24]  Jiawei Han,et al.  Generalization and decision tree induction: efficient classification in data mining , 1997, Proceedings Seventh International Workshop on Research Issues in Data Engineering. High Performance Database Management for Large-Scale Applications.

[25]  Roger S. Pressman,et al.  Software Engineering: A Practitioner's Approach , 1982 .