Implementation of a Big Data Architecture For The Realization of Predictive Models With Great Volumes of Data

The research direction of the University of Sciences and Humanities has integrated a Big Data architecture to make predictive models with large volumes of data. Therefore it was implemented with the purpose that in future research, this architecture can be used efficiently. In this study, the theoretical concepts of Hadoop version 2.0 will be discussed, as well as the next scalability in a Beowulf cluster implemented in one of the University's laboratories and the configuration of Hadoop Spark and how they were able to work in conjunction. Finally, in the results section, tests will be carried out to validate that this architecture works perfectly.

[1]  Oliver Porth,et al.  Evolution of growing black holes in axisymmetric galaxy cores , 2011, 1108.3993.

[2]  Ankit Shah,et al.  Load Balancing through Block Rearrangement Policy for Hadoop Heterogeneous Cluster , 2018, 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[3]  Trevor J Barnes,et al.  Big data, little history , 2013 .

[4]  Hwaiyu Geng,et al.  Internet of Things and Data Analytics Handbook , 2017 .

[5]  Patricia Condori,et al.  Implementation of a Beowulf Cluster and Analysis of its Performance in Applications with Parallel Programming , 2019 .

[6]  Alva Mantari,et al.  Machine Learning Techniques to Visualize and Predict Terrorist Attacks Worldwide using the Global Terrorism Database , 2020, International Journal of Advanced Computer Science and Applications.

[7]  Gusseppe Bravo Rocca,et al.  Sparkmach: A Distributed Data Processing System Based on Automated Machine Learning for Big Data , 2018, SIMBig.

[8]  Ola Spjuth,et al.  HTSeq-Hadoop: Extending HTSeq for Massively Parallel Sequencing Data Analysis Using Hadoop , 2014, 2014 IEEE 10th International Conference on e-Science.

[9]  Shunying Zhu,et al.  Big data challenges in transportation: A case study of traffic volume count from massive Radio Frequency Identification(RFID) data , 2017, 2017 International Conference on the Frontiers and Advances in Data Science (FADS).

[10]  T. Subbulakshmi,et al.  A comparison study and performance evaluation of schedulers in Hadoop YARN , 2017, 2017 2nd International Conference on Communication and Electronics Systems (ICCES).

[11]  Chitresh Verma,et al.  Comparative Analysis of GFS and HDFS: Technology and Architectural landscape , 2018, 2018 10th International Conference on Computational Intelligence and Communication Networks (CICN).

[12]  Helmut Krcmar,et al.  Big Data , 2014, Wirtschaftsinf..

[13]  Garima Sharma,et al.  Performance evaluation of fair and capacity scheduling in Hadoop YARN , 2015, 2015 International Conference on Green Computing and Internet of Things (ICGCIoT).

[14]  Brian Meneses-Claudio,et al.  Render Farm for Highly Realistic Images in a Beowulf Cluster using Distributed Programming Techniques , 2019, International Journal of Advanced Computer Science and Applications.

[15]  Eeti Jain,et al.  Performance comparision of Hadoop and spark engine , 2017, 2017 International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC).

[16]  Srivatsa Maddodi,et al.  Netflix Bigdata Analytics - The Emergence of Data Driven Recommendation , 2019, SSRN Electronic Journal.