Model-driven data mining engineering: from solution-driven implementations to 'composable' conceptual data mining models

Data mining lacks a general modelling architecture allowing analysts to consider and interpret it as a truly software-engineering process, which would be beneficial for a wide spectrum of modern application scenarios. Bearing this in mind, in this paper, we propose an innovative model-driven engineering approach of data mining whose main goal consists in overcoming well-recognised limitations of actual approaches. The cornerstone of our proposal relies on the definition of a set of suitable model transformations which are able to automatically generate both the data under analysis, which are deployed via well-consolidated data warehousing technology and the analysis models for the target data mining tasks, which are tailored to a specific data-mining/analysis platform. These modelling tasks are now entrusted to the model-transformation scaffolds and rely on top of a well-defined reference architecture. The feasibility of our approach is finally demonstrated and validated by means of a comprehensive set of case studies.

[1]  George T. Heineman,et al.  Component-Based Software Engineering: Putting the Pieces Together , 2001 .

[2]  Juan Trujillo,et al.  A UML Based Approach for Modeling ETL Processes in Data Warehouses , 2003, ER.

[3]  Gregory Piatetsky-Shapiro,et al.  Knowledge Discovery in Databases: An Overview , 1992, AI Mag..

[4]  Alfredo Cuzzocrea,et al.  An OLAM-Based Framework for Complex Knowledge Pattern Discovery in Distributed-and-Heterogeneous-Data-Sources and Cooperative Information Systems , 2007, DaWaK.

[5]  Juan Trujillo,et al.  A UML 2.0 profile to design Association Rule mining models in the multidimensional conceptual modeling of data warehouses , 2007, Data Knowl. Eng..

[6]  Yannis Theodoridis,et al.  Pattern-Miner: integrated management and mining over data mining models , 2008, KDD.

[7]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[8]  Peter P. Chen Conceptual Modeling: Current Issues and Future Directions , 1999 .

[9]  Alberto Abelló,et al.  Research in data warehouse modeling and design: dead or alive? , 2006, DOLAP '06.

[10]  Matteo Golfarelli,et al.  The Dimensional Fact Model: A Conceptual Model for Data Warehouses , 1998, Int. J. Cooperative Inf. Syst..

[11]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[12]  Panos Vassiliadis,et al.  Data Mapping Diagrams for Data Warehouse Design with UML , 2004, ER.

[13]  Gregory Piatetsky-Shapiro,et al.  The KDD process for extracting useful knowledge from volumes of data , 1996, CACM.

[14]  Jose-Norberto Mazón,et al.  Applying Transformations to Model Driven Data Warehouses , 2006, DaWaK.

[15]  Jose-Norberto Mazón,et al.  A Model-Driven Goal-Oriented Requirement Engineering Approach for Data Warehouses , 2007, ER Workshops.

[16]  Nectaria Tryfona,et al.  starER: a conceptual model for data warehouse design , 1999, DOLAP '99.

[17]  Jiawei Han,et al.  OLAP Mining: Integration of OLAP with Data Mining , 1997, DS-7.

[18]  James D. Hamilton Time Series Analysis , 1994 .

[19]  Heikki Mannila,et al.  Principles of Data Mining , 2001, Undergraduate Topics in Computer Science.

[20]  Sergio Luján-Mora,et al.  Applying the UML and the Unified Process to the Design of Data Warehouses , 2006, J. Comput. Inf. Syst..

[21]  Dorian Pyle,et al.  Data Preparation for Data Mining , 1999 .

[22]  W. H. Inmon,et al.  Building the data warehouse , 1992 .

[23]  W. H. Inmon,et al.  The data warehouse and data mining , 1996, CACM.

[24]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[25]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[26]  Luca Cabibbo,et al.  A Logical Approach to Multidimensional Databases , 1998, EDBT.

[27]  Surajit Chaudhuri,et al.  An overview of data warehousing and OLAP technology , 1997, SGMD.

[28]  Jose-Norberto Mazón,et al.  Reconciling requirement-driven data warehouses with data sources via multidimensional normal forms , 2007, Data Knowl. Eng..

[29]  Jesús Pardillo,et al.  Integrating Clustering Data Mining into the Multidimensional Modeling of Data Warehouses with UML Profiles , 2007, DaWaK.

[30]  Il-Yeol Song,et al.  A UML profile for multidimensional modeling in data warehouses , 2006, Data Knowl. Eng..

[31]  Jose-Norberto Mazón,et al.  An MDA approach for the development of data warehouses , 2008, Decis. Support Syst..

[32]  H. V. Jagadish,et al.  Database Modeling and Design , 1998 .

[33]  Juan Trujillo,et al.  Conceptual Modeling for Classification Mining in Data Warehouses , 2006, DaWaK.