Toward an integrated knowledge discovery and data mining process model

Abstract The knowledge discovery and data mining (KDDM) process models describe the various phases (e.g. business understanding, data understanding, data preparation, modeling, evaluation and deployment) of the KDDM process. They act as a roadmap for implementation of the KDDM process by presenting a list of tasks for executing the various phases. The checklist approach of describing the tasks is not adequately supported by appropriate tools, which specify ‘how’ the particular task can be implemented. This may result in tasks not being implemented. Another disadvantage is that the long checklist does not capture or leverage the dependencies that exist among the various tasks of the same and different phases. This not only makes the process cumbersome to implement, but also hinders possibilities for semi-automation of certain tasks. Given that each task in the process model serves an important goal and even affects the execution of related tasks due to the dependencies, these limitations are likely to negatively affect the efficiency and effectiveness of KDDM projects. This paper proposes an improved KDDM process model that overcomes these shortcomings by prescribing tools for supporting each task as well as identifying and leveraging dependencies among tasks for semi-automation of tasks, wherever possible.

[1]  Kristopher A. Pruitt,et al.  Modeling Homeland Security: A Value Focused Thinking Approach , 2012 .

[2]  Язык программирования,et al.  Cross Industry Standard Process for Data Mining , 2010 .

[3]  Vikram Pudi,et al.  Advances in Knowledge Discovery and Data Mining, 14th Pacific-Asia Conference, PAKDD 2010, Hyderabad, India, June 21-24, 2010. Proceedings. Part I , 2010, PAKDD.

[4]  Kweku-Muata Osei-Bryson,et al.  Framework for formal implementation of the business understanding phase of data mining projects , 2009, Expert Syst. Appl..

[5]  Kweku-Muata Osei-Bryson,et al.  Role of Human Intelligence in Domain Driven Data Mining , 2009 .

[6]  Kweku-Muata Osei-Bryson,et al.  Organization-Ontology Based Framework for Implementing the Business Understanding Phase of Data Mining Projects , 2008, Proceedings of the 41st Annual Hawaii International Conference on System Sciences (HICSS 2008).

[7]  Mu-Chen Chen,et al.  Ranking discovered rules from data mining with multiple criteria by data envelopment analysis , 2007, Expert Syst. Appl..

[8]  Paul C. Nutt,et al.  Intelligence gathering for decision making , 2007 .

[9]  Geert Poels,et al.  Evaluating Quality of Conceptual Models Based on User Perceptions , 2006, ER.

[10]  Gregory L. Boylan,et al.  Using value-focused thinking to select a simulation tool for the acquisition of infantry soldier systems , 2006 .

[11]  Sylvain Delisle,et al.  Invited Paper: Intelligent Data Mining Assistance via CBR and Ontologies , 2006, 17th International Workshop on Database and Expert Systems Applications (DEXA'06).

[12]  MusílekPetr,et al.  A survey of Knowledge Discovery and Data Mining process models , 2006 .

[13]  Lukasz A. Kurgan,et al.  A survey of Knowledge Discovery and Data Mining process models , 2006, The Knowledge Engineering Review.

[14]  T. Davenport Competing on analytics. , 2006, Harvard business review.

[15]  Soung Hie Kim,et al.  Prioritization of association rules in data mining: Multiple criteria decision approach , 2005, Expert Syst. Appl..

[16]  Robert R. Keeter,et al.  Applying Value-Focused Thinking to Effects Based Operations , 2005 .

[17]  Lakhmi C. Jain,et al.  Advanced Techniques in Knowledge Discovery and Data Mining (Advanced Information and Knowledge Processing) , 2005 .

[18]  Yair Wand,et al.  Organizational memory information systems: a transactive memory approach , 2005, Decis. Support Syst..

[19]  Abraham Bernstein,et al.  Toward intelligent assistance for a data mining process: an ontology-based approach for cost-sensitive classification , 2005, IEEE Transactions on Knowledge and Data Engineering.

[20]  Karin Becker,et al.  A documentation infrastructure for the management of data mining projects , 2005, Inf. Softw. Technol..

[21]  Lukasz Kurgan,et al.  Trends in Data Mining and Knowledge Discovery , 2005 .

[22]  Mehryar Mohri,et al.  Confidence Intervals for the Area Under the ROC Curve , 2004, NIPS.

[23]  Kweku-Muata Osei-Bryson,et al.  Evaluation of decision trees: a multi-criteria approach , 2004, Comput. Oper. Res..

[24]  J. Kangas,et al.  The use of value focused thinking and the A’WOT hybrid method in tourism management , 2004 .

[25]  O A B Hassan,et al.  Application of value-focused thinking on the environmental selection of wall structures. , 2004, Journal of environmental management.

[26]  Frederick E. Petry,et al.  Combining the Performance Strengths of the Logistic Regression and Neural Network Models: A Medical Outcomes Approach , 2003, TheScientificWorldJournal.

[27]  Akhil Kumar,et al.  XML - Based Schema Definition for Support of Interorganizational Workflow , 2003, Inf. Syst. Res..

[28]  Dorian Pyle Business modeling and data mining , 2003 .

[29]  Bala Srinivasan,et al.  Criteria for a Comparative Study of Visualization Techniques in Data Mining , 2003 .

[30]  Larry Alexander,et al.  Decision support systems in the 21st century , 2002, SOEN.

[31]  Les Gasser,et al.  A Design Theory for Systems That Support Emergent Knowledge Processes , 2002, MIS Q..

[32]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[33]  Ramasamy Uthurusamy,et al.  EVOLVING DATA MINING INTO SOLUTIONS FOR INSIGHTS , 2002 .

[34]  Thomas Reinartz Stages of the discovery process , 2002 .

[35]  M. Fox,et al.  An Organization Ontology for Enterprise Modelling , 2002 .

[36]  Youngjin Yoo,et al.  Media and Group Cohesion: Relative Influences on Social Presence, Task Participation, and Group Consensus , 2001, MIS Q..

[37]  John Mingers,et al.  Combining IS Research Methods: Towards a Pluralist Methodology , 2001, Inf. Syst. Res..

[38]  H. Winklhofer,et al.  Index Construction with Formative Indicators: An Alternative to Scale Development , 2001 .

[39]  Francisco J. García-Peñalvo,et al.  A user requirements elicitation tool , 2001, SOEN.

[40]  Nick Cercone,et al.  RuleViz: a model for visualizing knowledge discovery process , 2000, KDD '00.

[41]  K. Cios,et al.  A knowledge discovery approach to diagnosing myocardial perfusion , 2000, IEEE Engineering in Medicine and Biology Magazine.

[42]  A. Field Discovering statistics using SPSS for Windows. , 2000 .

[43]  Pertti Järvinen,et al.  Research Questions Guiding Selection of an Appropriate Research Method , 2000, ECIS.

[44]  Farhi Marir,et al.  Document management systems from current capabilities towards intelligent information retrieval: an overview , 1999 .

[45]  Karim K. Hirji,et al.  Discovering data mining: from concept to implementation , 1999, SKDD.

[46]  John Leslie King,et al.  Rigor and relevance: careers on the line , 1999 .

[47]  Izak Benbasat,et al.  Empirical Research in Information Systems: The Practice of Relevance , 1999, MIS Q..

[48]  John Hulland,et al.  Use of partial least squares (PLS) in strategic management research: a review of four recent studies , 1999 .

[49]  Stephen R. Gardner Building the data warehouse , 1998, CACM.

[50]  Colette Rolland,et al.  A Comprehensive View of Process Engineering , 1998, CAiSE.

[51]  Michael J. Prietula,et al.  Simulating organizations: computational models of institutions and groups , 1998 .

[52]  Bruno Crémilleux,et al.  Treatment of Missing Values for Association Rules , 1998, PAKDD.

[53]  Wynne W. Chin The partial least squares approach for structural equation modeling. , 1998 .

[54]  Sarabjot Singh Anand,et al.  Decision support using data mining , 1998 .

[55]  Rüdiger Wirth,et al.  Towards Process-Oriented Tool Support for Knowledge Discovery in Databases , 1997, PKDD.

[56]  Michael J. A. Berry,et al.  Data mining techniques - for marketing, sales, and customer support , 1997, Wiley computer publishing.

[57]  Stan Schenkerman,et al.  Inducement of Nonexistent Order by the Analytic Hierarchy Process , 1997 .

[58]  Dennis Murray,et al.  Data warehousing in the real world - a practical guide for building decision support systems , 1997 .

[59]  R. Hämäläinen,et al.  An Experiment on the Numerical Modelling of Verbal Ratio Statements , 1997 .

[60]  S. Menard Applied Logistic Regression Analysis , 1996 .

[61]  Ralph L. Keeney,et al.  Value-Focused Thinking , 1996 .

[62]  Evangelos Simoudis,et al.  Integrating Inductive and Deductive Reasoning for Data Mining , 1996, Advances in Knowledge Discovery and Data Mining.

[63]  Ronald J. Brachman,et al.  The Process of Knowledge Discovery in Databases , 1996, Advances in Knowledge Discovery and Data Mining.

[64]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[65]  J. Moody,et al.  NNES: A Neural Network Explanation System for Transforming Trained Neural Networks into Decision Trees , 1995 .

[66]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993, Knowl. Acquis..

[67]  Mark Dowson,et al.  Software process themes and issues , 1993, [1993] Proceedings of the Second International Conference on the Software Process-Continuous Software Process Improvement.

[68]  Watts S. Humphrey,et al.  Software process development and enactment: concepts and definitions , 1993, [1993] Proceedings of the Second International Conference on the Software Process-Continuous Software Process Improvement.

[69]  Ralph L. Keeney,et al.  Value-Focused Thinking: A Path to Creative Decisionmaking , 1992 .

[70]  R. D. Holder,et al.  Response to Holder's Comments on the Analytic Hierarchy Process: Response to the Response , 1991 .

[71]  Thomas L. Saaty,et al.  Response to Holder's Comments on the Analytic Hierarchy Process , 1991 .

[72]  J. Dyer Remarks on the analytic hierarchy process , 1990 .

[73]  Victor R. Basili,et al.  The TAME Project: Towards Improvement-Oriented Software Environments , 1988, IEEE Trans. Software Eng..

[74]  Colin Potts,et al.  Design of Everyday Things , 1988 .

[75]  S. French,et al.  Decision Theory: An Introduction to the Mathematics of Rationality. , 1988 .

[76]  Victor R. Basili,et al.  A Methodology for Collecting Valid Software Engineering Data , 1984, IEEE Transactions on Software Engineering.

[77]  C. Fornell,et al.  Evaluating structural equation models with unobservable variables and measurement error. , 1981 .

[78]  Murray Turoff,et al.  The Delphi Method: Techniques and Applications , 1976 .

[79]  André L. Delbecq,et al.  A Group Process Model for Problem Identification and Program Planning , 1971 .

[80]  Herbert A. Simon,et al.  The Sciences of the Artificial , 1970 .