ETL Process Modeling Conceptual for Data Warehouses: A Systematic Mapping Study

BACKGROUND: A data warehouse (DW) is an integrated collection of subject-oriented data in the support of decision making. Importantly, the integration of data sources is achieved through the use of ETL (Extract, Transform, and Load) processes. It is therefore extensively recognized that the appropriate design of the ETL processes are key factors in the success of DW projects. OBJECTIVE: We assess existing research proposals about ETL process modeling for data warehouse in order to identify their main characteristics, notation, and activities. We also study if these modeling approaches are supported by some kind of prototype or tool. METHOD: We have undertaken a systematic mapping study of the research literature about modeling ETL processes. A mapping study provides a systematic and objective procedure for identifying the nature and extent of the available research by means of research questions. RESULTS: The study is based on a comprehensive set of papers obtained after using a multi-stage selection criteria and are published in international workshops, conferences and journals between 2000 and 2009. CONCLUSIONS: This systematic mapping study states that there is a clear classification of ETL process modeling approaches, but that they are not enough covered by researchers. Therefore, more effort is required to bridge the research gap in modeling ETL processes.

[1]  Martin D. Solomon Ensuring A Successful Data Warehouse Initiative , 2005, Inf. Syst. Manag..

[2]  Martin White,et al.  Enterprise information portals , 2000, Electron. Libr..

[3]  Pearl Brereton,et al.  Evidence relating to Object-Oriented software design: A survey , 2007, ESEM 2007.

[4]  Tore Dybå,et al.  Evidence-based software engineering , 2004, Proceedings. 26th International Conference on Software Engineering.

[5]  Timos K. Sellis,et al.  State-space optimization of ETL workflows , 2005, IEEE Transactions on Knowledge and Data Engineering.

[6]  Jigui Sun,et al.  CommonCube-based conceptual modeling of ETL processes , 2005, 2005 International Conference on Control and Automation.

[7]  Matthias Jarke,et al.  Fundamentals of Data Warehouses , 2000, Springer Berlin Heidelberg.

[8]  Juan Trujillo,et al.  A Data Warehouse Engineering Process , 2004, ADVIS.

[9]  Juan Trujillo,et al.  A UML Based Approach for Modeling ETL Processes in Data Warehouses , 2003, ER.

[10]  Panos Vassiliadis,et al.  Data Mapping Diagrams for Data Warehouse Design with UML , 2004, ER.

[11]  Panos Vassiliadis,et al.  A Methodology for the Conceptual Modeling of ETL Processes , 2003, CAiSE Workshops.

[12]  Alan R. Hevner,et al.  Integrated decision support systems: A data warehousing perspective , 2007, Decis. Support Syst..

[13]  Dimitrios Skoutas,et al.  Designing ETL processes using semantic web technologies , 2006, DOLAP '06.

[14]  W. H. Inmon,et al.  Building the data warehouse , 1992 .