Cost Drivers of a Parametric Cost Estimation Model for Data Mining Projects (DMCOMO)

Data Mining is a research line that began in 1980 in order to find the knowledge that is hidden in the data that organizations are storing in a daily basis. This knowledge supports the decision-making processes in organizations. As a consequence companies of every kind have been developing data mining projects since the term appeared. However, there is no way to estimate this kind of projects. Although there are many references to Data Mining algorithms in the bibliography, not many authors have dealt the problem from Software Engineering point of view. CRISP-DM is a model process, from Software Engineering point of view, that appeared in 2000. CRISP-DM is the first standard of Data Mining projects development. In the standard of software development model process, e.g. ISO 12207 and IEEE 1074, processes and tasks are proposed similar to those in CRISP-DM model. Nevertheless, in software development a lot of methods are described to estimate the costs of project development (SLIM, SEER-SEM, PRICE-S and COCOMO). These methods are not appropriate in the case of Data Mining projects because in Data Mining software development is not the first goal. Some methods have been proposed to estimate some phases of a Data Mining project but there is no method to estimate the global cost of a generic Data Mining project. As a consequence, in this paper we propose the cost driver of a parametric estimation method for Data Mining projects.

[1]  William Frawley,et al.  Knowledge Discovery in Databases , 1991 .

[2]  Jörg Rech,et al.  Knowledge Discovery in Databases , 2001, Künstliche Intell..

[3]  Peter D. Turney Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm , 1994, J. Artif. Intell. Res..

[4]  Ellis Horowitz,et al.  Software Cost Estimation with COCOMO II , 2000 .

[5]  Rolf Stadler,et al.  Discovering Data Mining: From Concept to Implementation , 1997 .

[6]  Hiroyuki Kawano An Overview of Knowledge Discovery in Databases , 1997 .

[7]  Gregory Piatetsky-Shapiro,et al.  A Comparison of Approaches for Maximizing Business Payoff of Prediction Models , 1996, KDD.

[8]  Thomas Reinartz,et al.  CRISP-DM 1.0: Step-by-step data mining guide , 2000 .

[9]  Karim K. Hirji,et al.  Discovering data mining: from concept to implementation , 1999, SKDD.

[10]  George H. John,et al.  SIPping from the Data Firehose , 1997, KDD.

[11]  Gregory Piatetsky-Shapiro,et al.  An Overview of Knowledge Discovery in Database: Recent Progress and Challenges , 1993, RSKD.

[12]  Michael J. Pazzani,et al.  Reducing Misclassification Costs , 1994, ICML.

[13]  Pedro M. Domingos,et al.  How to Get a Free Lunch: A Simple Cost Model for Machine Learning Applications , 1998 .

[14]  Jon M. Kleinberg,et al.  A Microeconomic View of Data Mining , 1998, Data Mining and Knowledge Discovery.

[15]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[16]  Barry Boehm,et al.  From Multiple Regression to Bayesian Analysis for Calibrating COCOMO II , 1999 .