Data Lake Architecture: A New Repository for Data Engineer

Data is the biggest asset after people for businesses, and it is a new driver of the world economy. The volume of data that enterprises gather every day is growing rapidly. This kind of rapid growth of data in terms of volume, variety, and velocity is known as Big Data. Big Data is a challenge for enterprises, and the biggest challenge is how to store Big Data. In the past and some organizations currently, data warehouses are used to store Big Data. Enterprise data warehouses work on the concept of schema-on-write but Big Data analytics want data storage which works on the schema-on-read concept. To fulfill market demand, researchers are working on a new data repository system for Big Data storage known as a data lake. The data lake is defined as a data landing area for raw data from many sources. There is some confusion and questions which must be answered about data lakes. The objective of this article is to reduce the confusion and address some question about data lakes with the help of architecture.

[1]  Ken Kennedy,et al.  Automotive big data: Applications, workloads and infrastructures , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[2]  M. Hilbert,et al.  Big Data for Development: A Review of Promises and Challenges , 2016 .

[3]  Alan L. Porter,et al.  A technology delivery system for characterizing the supply side of technology emergence: Illustrated for Big Data & Analytics , 2017 .

[4]  Alan L. Porter,et al.  A systematic method to create search strategies for emerging technologies based on the Web of Science: illustrated for ‘Big Data’ , 2015, Scientometrics.

[5]  Ajit Singh Architecture of Data Lake , 2019 .

[6]  Alexandra Roatis,et al.  CLAMS: Bringing Quality to Data Lakes , 2016, SIGMOD Conference.

[7]  Victor Chang,et al.  A review and future direction of agile, business intelligence, analytics and data science , 2016, Int. J. Inf. Manag..

[8]  Ying Wah Teh,et al.  Big data reduction framework for value creation in sustainable enterprises , 2016, Int. J. Inf. Manag..

[9]  Rajkumar Buyya,et al.  Big Data computing and clouds: Trends and future directions , 2013, J. Parallel Distributed Comput..

[10]  Beth Plale,et al.  Crossing analytics systems: A case for integrated provenance in data lakes , 2016, 2016 IEEE 12th International Conference on e-Science (e-Science).

[11]  Swarup Roy,et al.  Big Data Analytics in Bioinformatics: A Machine Learning Perspective , 2015, ArXiv.

[12]  Gunasekaran Manogaran,et al.  Big Data Knowledge System in Healthcare , 2017 .

[13]  Sandra Geisler,et al.  Constance: An Intelligent Data Lake System , 2016, SIGMOD Conference.

[14]  Jacques Bughin,et al.  Big data, Big bang? , 2016, Journal of Big Data.

[15]  Cécile Favre,et al.  Metadata Systems for Data Lakes: Models and Features , 2019, ADBIS.

[16]  Pwint Phyu Khine,et al.  Data lake: a new ideology in big data era , 2018 .

[17]  Kayvan Najarian,et al.  Big Data Analytics in Healthcare , 2015, BioMed research international.

[18]  Patricia A. Berglund,et al.  Applied Survey Data Analysis , 2010 .

[19]  Murtaza Haider,et al.  Beyond the hype: Big data concepts, methods, and analytics , 2015, Int. J. Inf. Manag..

[20]  Zaia Alimazighi,et al.  A Fine‐Grained Distribution Approach for ETL Processes in Big Data Environments , 2017, Data Knowl. Eng..

[21]  Rick Kazman,et al.  Big Data System Development: An Embedded Case Study with a Global Outsourcing Firm , 2015, 2015 IEEE/ACM 1st International Workshop on Big Data Software Engineering.

[22]  Sachchidanand Singh,et al.  Big Data analytics , 2012 .

[23]  Sun Park,et al.  International Network Performance and Security Testing Based on Distributed Abyss Storage Cluster and Draft of Data Lake Framework , 2018, Secur. Commun. Networks.

[24]  Natalia Miloslavskaya,et al.  Big Data, Fast Data and Data Lake Concepts , 2016, BICA.

[25]  Dariusz Mrozek,et al.  Scalable Extraction of Big Macromolecular Data in Azure Data Lake Environment , 2019, Molecules.

[26]  Vijay Singh,et al.  Study of Internet of Things (IoT): A Vision, Architectural Elements, and Future Directions , 2016 .

[27]  Victor I. Chang,et al.  Critical success factors (CSFs) for information technology governance (ITG) , 2016, Int. J. Inf. Manag..

[28]  Natalia G. Miloslavskaya,et al.  Application of Big Data, Fast Data, and Data Lake Concepts to Information Security Issues , 2016, 2016 IEEE 4th International Conference on Future Internet of Things and Cloud Workshops (FiCloudW).

[29]  Rick Kazman,et al.  Agile Big Data Analytics for Web-Based Systems: An Architecture-Centric Approach , 2016, IEEE Transactions on Big Data.

[30]  Jérôme Darmont,et al.  Modeling Data Lake Metadata with a Data Vault , 2018, IDEAS.