A framework for scheduling and managing big data applications in a distributed infrastructure

Nowadays, big data has received attention from researchers, business industries, education, and scientific communities. Big data analytics has to deal with large scale data that consist of both structured and unstructured data. These data are to be handled properly, that is extracting, processing, and analyzing those data to obtain meaningful information from them in a limited time. To yield insightful information, the processing of big data analytics requires high performance computing system, storage, and network resources. Hence, it is essential to design a high performance computing infrastructure with sufficient bandwidth which is capable to handle the big data processing in an efficient manner. However, the current network architectures in those infrastructures, with predefined network policies, do not allow for just-in-time reconfiguration of the networking infrastructure as demanded by big data analytics. In addressing these limitations, Software-Defined Networking (SDN) offers the means to dynamically configure the network parameters, dynamically provision the networks, and the network itself can be sliced in an on-demand manner. This research aims to characterize SDN with respect to the demands of big data analytics in Cluster, Grid, and Cloud Computing resources. The main motivation behind this research study is to design and develop an intelligent framework named as Big Data Analytics Management System (BDAMS) for collectively managing the compute, storage, and network resources in Cluster, Grid, and Cloud infrastructure for big data analytics.

[1]  Anees Shaikh,et al.  CloudNaaS: a cloud networking platform for enterprise applications , 2011, SoCC.

[2]  Paul Goransson,et al.  Software Defined Networks: A Comprehensive Approach , 2014 .

[3]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[4]  Kannan Govindarajan,et al.  CLOUDRB: A framework for scheduling and managing High-Performance Computing (HPC) applications in science cloud , 2014, Future Gener. Comput. Syst..

[5]  Ian T. Foster,et al.  Globus: a Metacomputing Infrastructure Toolkit , 1997, Int. J. High Perform. Comput. Appl..

[6]  Paul Zikopoulos,et al.  Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data , 2011 .

[7]  Vive Kumar,et al.  Particle Swarm Optimization (PSO)-Based Clustering for Improving the Quality of Learning using Cloud Computing , 2013, 2013 IEEE 13th International Conference on Advanced Learning Technologies.

[8]  I. Halcu,et al.  A big data implementation based on Grid computing , 2013, 2013 11th RoEduNet International Conference.

[9]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[10]  Rajkumar Buyya,et al.  Semantic-enabled CARE Resource Broker (SeCRB) for managing grid and cloud environment , 2013, The Journal of Supercomputing.

[11]  Zhi Zeng A provisioning mechanism for big data processing with high efficiency under GLOUD , 2014 .

[12]  Chin Guok,et al.  Software-Defined Networking for Big-Data Science - Architectural Models from Campus to the WAN , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[13]  Xindong Wu,et al.  K-Means Clustering with Bagging and MapReduce , 2011, 2011 44th Hawaii International Conference on System Sciences.

[14]  Stavros Valsamidis,et al.  A Clustering Methodology of Web Log Data for Learning Management Systems , 2012, J. Educ. Technol. Soc..

[15]  Kevin D. Seppi,et al.  Parallel PSO using MapReduce , 2007, 2007 IEEE Congress on Evolutionary Computation.

[16]  Vive Kumar,et al.  Continuous Clustering in Big Data Learning Analytics , 2013, 2013 IEEE Fifth International Conference on Technology for Education (t4e 2013).