The challenges of Extract, Transform and Loading (ETL) system implementation for near real-time environment

Organization with considerable investment into data warehousing, the influx of various data types and forms requires certain ways of prepping data and staging platform that support fast, efficient and volatile data to reach its targeted audiences or users of different business needs. Extract, Transform and Load (ETL) system proved to be a choice standard for managing and sustaining the movement and transactional process of the valued big data assets. However, traditional ETL system can no longer accommodate and effectively handle streaming or near real-time data and stimulating environment which demands high availability, low latency and horizontal scalability features for functionality. This paper identifies the challenges of implementing ETL system for streaming or near real-time data which needs to evolve and streamline itself with the different requirements. Current efforts and solution approaches to address the challenges are presented. The classification of ETL system challenges are prepared based on near real-time environment features and ETL stages to encourage different perspectives for future research.

[1]  Lida Xu,et al.  An Integrated System for Regional Environmental Monitoring and Management Based on Internet of Things , 2014, IEEE Transactions on Industrial Informatics.

[2]  Mira Kim,et al.  Integration of Big Data Using Semantic Web Technologies , 2016, 2016 IEEE Tenth International Conference on Semantic Computing (ICSC).

[3]  Byron Ellis Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data , 2014 .

[4]  Xiaofang Li,et al.  Real-Time data ETL framework for big real-time data analysis , 2015, 2015 IEEE International Conference on Information and Automation.

[5]  Andrew Rau-Chaplin,et al.  A distributed tree data structure for real-time OLAP on cloud architectures , 2013, 2013 IEEE International Conference on Big Data.

[6]  Chen Lin,et al.  Maintaining Internal Consistency of Report for Real-Time OLAP with Layer-Based View , 2011, APWeb.

[7]  Shivani Saluja,et al.  Refreshing Datawarehouse in Near Real-Time , 2012 .

[8]  N. Karthikeyan,et al.  From Data Warehouses to Streaming Warehouses: A Survey on the Challenges for Real-Time Data Warehousing and Available Solutions , 2013 .

[9]  Carlos Roberto Valêncio,et al.  Real Time Delta Extraction Based on Triggers to Support Data Warehousing , 2013, 2013 International Conference on Parallel and Distributed Computing, Applications and Technologies.

[10]  Karthikeyan Ponnalagu,et al.  Goal-Driven Context-Aware Data Filtering in IoT-Based Systems , 2015, 2015 IEEE 18th International Conference on Intelligent Transportation Systems.

[11]  Torben Bach Pedersen,et al.  SETL: A programmable semantic extract-transform-load framework for semantic data warehouses , 2017, Inf. Syst..

[12]  Fiaz Majeed,et al.  Efficient data streams processing in the real time data warehouse , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[13]  Janis Zuters Near Real-Time Data Warehousing with Multi-stage Trickle and Flip , 2011, BIR.

[14]  Ardianto Wibowo,et al.  Problems and available solutions on the stage of Extract, Transform, and Loading in near real-time data warehousing (a literature study) , 2015, 2015 International Seminar on Intelligent Technology and Its Applications (ISITIA).

[15]  Ralph Kimball,et al.  The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling , 1996 .

[16]  Ralph Kimball,et al.  The Kimball Group Reader: Relentlessly Practical Tools for Data Warehousing and Business Intelligence , 2010 .

[17]  Alok Pareek Addressing BI Transactional Flows in the Real-Time Enterprise Using GoldenGate TDM - (Industrial Paper) , 2009, BIRTE.

[18]  Srividya Kona Bansal,et al.  Integrating Big Data: A Semantic Extract-Transform-Load Framework , 2015, Computer.