Efficiency evaluation of open source ETL tools

Business intelligence (BI) is considered to have a high impact on businesses. Research activity has risen in the last years. An important part of BI systems is a well performing implementation of the Extract, Transform, and Load (ETL) process. In typical BI projects, implementing the ETL process can be the task with the greatest effort. However, little work is published on ETL applications and in particular on open source ETL tools. We have analyzed open source ETL tools especially with regard to their performance. In this paper we present the analysis' background and highlight related work. We then sketch the test setup, show the detailed results for Talend Open Studio and Pentaho Data Integration, and discuss our observations. Eventually, we draw a conclusion and point out future work.

[1]  Daniel Pol,et al.  Principles for an ETL Benchmark , 2009, TPCTC.

[2]  Ralph Kimball,et al.  The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming, and Delivering Data , 2004 .

[3]  Omar Boussaïd,et al.  DWEB: A Data Warehouse Engineering Benchmark , 2005, DaWaK.

[4]  Xiaodan Jiang,et al.  Business intelligence - a case study in life insurance industry , 2005, IEEE International Conference on e-Business Engineering (ICEBE'05).

[5]  Francesco Di Tria,et al.  Business intelligence systems: a comparative analysis , 2008 .

[6]  Ehtisham Zaidi,et al.  Magic Quadrant for Data Integration Tools , 2010 .

[7]  Simon Fong,et al.  Business Intelligence Modeling: A Case Study of Disaster Management Organization in Pakistan , 2009, 2009 Fourth International Conference on Computer Sciences and Convergence Information Technology.

[8]  Francesco Di Tria,et al.  Evaluating business intelligence platforms: a case study , 2008 .

[9]  Laura M. Haas,et al.  Information integration in the enterprise , 2008, CACM.

[10]  Hans Peter Luhn,et al.  A Business Intelligence System , 1958, IBM J. Res. Dev..

[11]  Markus Grünwald,et al.  Business Intelligence , 2009, Informatik-Spektrum.

[12]  Heiko Koziolek Introduction to Performance Metrics , 2005, Dependability Metrics.

[13]  Manuel Mejía-Lavalle,et al.  Survey of Business Intelligence for Energy Markets , 2009, HAIS.

[14]  Steffen Becker Performance-Related Metrics in the ISO 9126 Standard , 2005, Dependability Metrics.

[15]  Holger Günzel,et al.  Data-Warehouse-Systeme: Architektur, Entwicklung, Anwendung , 2005 .

[16]  Steve Williams,et al.  The Profit Impact of Business Intelligence , 2006 .

[17]  Timos K. Sellis,et al.  Optimizing ETL processes in data warehouses , 2005, 21st International Conference on Data Engineering (ICDE'05).

[18]  Stephen R. Gardner Building the data warehouse , 1998, CACM.

[19]  Barbara Wixom,et al.  The Current State of Business Intelligence , 2007, Computer.

[20]  Umeshwar Dayal,et al.  Benchmarking ETL Workflows , 2009, TPCTC.

[21]  W. H. Inmon,et al.  Building the Data Warehouse,3rd Edition , 2002 .

[22]  Ralph Kimball,et al.  The Data Warehouse Lifecycle Toolkit: Expert Methods for Designing, Developing and Deploying Data Warehouses with CD Rom , 1998 .

[23]  Peter Gluchowski,et al.  Synergiepotenziale und Herausforderungen von Knowledge Management und Business Intelligence , 2002 .

[24]  Efraim Turban,et al.  Decision Support and Business Intelligence Systems (8th Edition) , 2006 .

[25]  Panos Vassiliadis,et al.  Towards a Benchmark for ETL Workflows , 2007, QDB.

[26]  Matteo Golfarelli Open Source BI Platforms: A Functional and Architectural Comparison , 2009, DaWaK.

[27]  Zeljko Panian Business intelligence in support of business strategy , 2006 .

[28]  Zeljko Panian Expected progress in the field of business intelligence , 2009 .

[29]  Heiko Koziolek,et al.  Measuring Performance Metrics: Techniques and Tools , 2005, Dependability Metrics.

[30]  Naveen N. Kulkarni,et al.  Information as a Service in a Data Analytics Scenario - A Case Study , 2008, 2008 IEEE International Conference on Web Services.

[31]  Larissa Terpeluk Moss,et al.  Business Intelligence Roadmap: The Complete Project Lifecycle for Decision-Support Applications , 2003 .

[32]  Panos Vassiliadis,et al.  A taxonomy of ETL activities , 2009, DOLAP.

[33]  Torben Bach Pedersen,et al.  A Survey of Open Source Tools for Business Intelligence , 2005, Int. J. Data Warehous. Min..

[34]  Carlo Vercellis,et al.  Business Intelligence: Data Mining and Optimization for Decision Making , 2009 .

[35]  David J. Lilja,et al.  Measuring computer performance : A practitioner's guide , 2000 .

[36]  Andrew Stein,et al.  The adoption and use of business intelligence solutions in Australia , 2008, Int. J. Intell. Syst. Technol. Appl..