A UML profile for the conceptual modelling of data-mining with time-series in data warehouses

Time-series analysis is a powerful technique to discover patterns and trends in temporal data. However, the lack of a conceptual model for this data-mining technique forces analysts to deal with unstructured data. These data are represented at a low-level of abstraction and their management is expensive. Most analysts face up to two main problems: (i) the cleansing of the huge amount of potentially-analysable data and (ii) the correct definition of the data-mining algorithms to be employed. Owing to the fact that analysts' interests are also hidden in this scenario, it is not only difficult to prepare data, but also to discover which data is the most promising. Since their appearance, data warehouses have, therefore, proved to be a powerful repository of historical data for data-mining purposes. Moreover, their foundational modelling paradigm, such as, multidimensional modelling, is very similar to the problem domain. In this article, we propose a unified modelling language (UML) extension through UML profiles for data-mining. Specifically, the UML profile presented allows us to specify time-series analysis on top of the multidimensional models of data warehouses. Our extension provides analysts with an intuitive notation for time-series analysis which is independent of any specific data-mining tool or algorithm. In order to show its feasibility and ease of use, we apply it to the analysis of fish-captures in Alicante. We believe that a coherent conceptual modelling framework for data-mining assures a better and easier knowledge-discovery process on top of data warehouses.

[1]  Brendan Tierney,et al.  The involvement of human resources in large scale data mining projects , 2003, ISICT.

[2]  W. H. Inmon,et al.  Building the data warehouse (2nd ed.) , 1996 .

[3]  Elisa Bertino,et al.  Towards a Logical Model for Patterns , 2003, ER.

[4]  D. Heckerman,et al.  Autoregressive Tree Models for Time-Series Analysis , 2002, SDM.

[5]  William Frawley,et al.  Knowledge Discovery in Databases , 1991 .

[6]  Stefano Rizzi,et al.  UML-based Conceptual Modeling of Pattern-Bases , 2004, PaRMa.

[7]  Ernestina Menasalvas Ruiz,et al.  Towards a Methodology for Data Mining Project Development: The Importance of Abstraction , 2008, Data Mining: Foundations and Practice.

[8]  W. H. Inmon,et al.  The data warehouse and data mining , 1996, CACM.

[9]  Gregory Piatetsky-Shapiro,et al.  Knowledge Discovery in Databases: An Overview , 1992, AI Mag..

[10]  Weichang Du,et al.  Toward reuse of object-oriented software design models , 2004, Inf. Softw. Technol..

[11]  Kurt D. Fenstermacher If i had a model, i'd model in the mornin' , 2004, OOPSLA '04.

[12]  W. H. Inmon,et al.  Building the data warehouse , 1992 .

[13]  Juan Trujillo,et al.  A UML 2.0 profile to design Association Rule mining models in the multidimensional conceptual modeling of data warehouses , 2007, Data Knowl. Eng..

[14]  Jesús Pardillo,et al.  Integrating Clustering Data Mining into the Multidimensional Modeling of Data Warehouses with UML Profiles , 2007, DaWaK.

[15]  Juan Trujillo,et al.  Extending the UML for Designing Association Rule Mining Models for Data Warehouses , 2005, DaWaK.

[16]  Christopher R. Westphal,et al.  Data Mining Solutions: Methods and Tools for Solving Real-World Problems , 1998 .

[17]  Sergio Luján-Mora,et al.  Extending the UML for Multidimensional Modeling , 2002, UML.

[18]  Martin Glinz,et al.  A Classification of Stereotypes for Object-Oriented Modeling Languages , 1999, UML.

[19]  Dorian Pyle,et al.  Data Preparation for Data Mining , 1999 .

[20]  Kai Lung Hui,et al.  Who gets spammed? , 2006, CACM.

[21]  Il-Yeol Song,et al.  Multidimensional Modeling with UML Package Diagrams , 2002, ER.

[22]  Juan Trujillo,et al.  Conceptual Modeling for Classification Mining in Data Warehouses , 2006, DaWaK.

[23]  Abraham Bernstein,et al.  Toward intelligent assistance for a data mining process: an ontology-based approach for cost-sensitive classification , 2005, IEEE Transactions on Knowledge and Data Engineering.

[24]  Il-Yeol Song,et al.  A UML profile for multidimensional modeling in data warehouses , 2006, Data Knowl. Eng..

[25]  E. Ziegel Forecasting and Time Series: An Applied Approach , 2000 .

[26]  Mario Piattini,et al.  Metrics for data warehouse conceptual models understandability , 2007, Inf. Softw. Technol..

[27]  C. Chatfield,et al.  Prediction intervals for the Holt-Winters forecasting procedure , 1990 .

[28]  Christian S. Jensen,et al.  A foundation for capturing and querying complex multidimensional data , 2001, Inf. Syst..

[29]  Juan Trujillo,et al.  A Data Warehouse Engineering Process , 2004, ADVIS.

[30]  Juan Trujillo,et al.  A UML Based Approach for Modeling ETL Processes in Data Warehouses , 2003, ER.

[31]  Panos Vassiliadis,et al.  Data Mapping Diagrams for Data Warehouse Design with UML , 2004, ER.

[32]  S. Johansen Estimation and Hypothesis Testing of Cointegration Vectors in Gaussian Vector Autoregressive Models , 1991 .

[33]  Karin Becker,et al.  A documentation infrastructure for the management of data mining projects , 2005, Inf. Softw. Technol..