New scheduling approach using reinforcement learning for heterogeneous distributed systems

Abstract Computer clusters, cloud computing and the exploitation of parallel architectures and algorithms have become the norm when dealing with scientific applications that work with large quantities of data and perform complex and time-consuming calculations. With the rise of social media applications and smart devices, the amount of digital data and the velocity at which it is produced have increased exponentially, determining the development of distributed system frameworks and platforms that increase productivity, consistency, fault-tolerance and security of parallel applications. The performance of such systems is mainly influenced by the architectural disposition and composition of the physical machines, the resource allocation and the scheduling of jobs and tasks. This paper proposes a reinforcement learning algorithm to solve the scheduling problem in distributed systems. The machine learning technique takes into consideration the heterogeneity of the nodes and their disposition within the grid, and the arrangement of tasks in a directed acyclic graph of dependencies, ultimately determining a scheduling policy for a better execution time. This paper also proposes a platform, in which the algorithm is implemented, that offers scheduling as a service to distributed systems.

[1]  George Mastorakis,et al.  On the performance response of delay-bounded energy-aware bandwidth allocation scheme in wireless networks , 2013, 2013 IEEE International Conference on Communications Workshops (ICC).

[2]  Weiwei Lin,et al.  Random task scheduling scheme based on reinforcement learning in cloud computing , 2015, Cluster Computing.

[3]  Kim-Kwang Raymond Choo,et al.  Spectral–spatial multi-feature-based deep learning for hyperspectral remote sensing image classification , 2016, Soft Computing.

[4]  Stefanie Tellex,et al.  Learning Propositional Functions for Planning and Reinforcement Learning , 2015, AAAI Fall Symposia.

[5]  Julie A. Shah,et al.  Apprenticeship Scheduling: Learning to Schedule from Human Experts , 2016, IJCAI.

[6]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[7]  Vinod Kumar Vavilapalli,et al.  Apache Hadoop YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop 2 , 2014 .

[8]  Kim-Kwang Raymond Choo,et al.  SVM or deep learning? A comparative study on remote sensing image classification , 2016, Soft Computing.

[9]  Mauro Iacono,et al.  Modeling apache hive based applications in big data architectures , 2013, VALUETOOLS.

[10]  K. Chandrasekaran,et al.  Determination of task scheduling mechanism using computational intelligence in Cloud Computing , 2015, 2015 International Conference on Computing and Network Communications (CoCoNet).

[11]  Michael L. Littman,et al.  Between Imitation and Intention Learning , 2015, IJCAI.

[12]  Stefano Marrone,et al.  Performability Modeling of Exceptions-Aware Systems in Multiformalism Tools , 2011, ASMTA.

[13]  Wei Zhang,et al.  A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.

[14]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[15]  Ciprian Dobre,et al.  A Scheduling Scheme for Throughput Optimization in Mobile Peer-to-Peer Networks , 2016 .

[16]  George Mastorakis,et al.  An evaluation of cloud-based mobile services with limited capacity: a linear approach , 2016, Soft Computing.

[17]  George Mastorakis,et al.  Evolutionary Multiobjective Optimization algorithm for multimedia delivery in critical applications through Content-Aware Networks , 2016, The Journal of Supercomputing.

[18]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[19]  Ciprian Dobre,et al.  Genetic algorithm for DAG scheduling in Grid environments , 2009, 2009 IEEE 5th International Conference on Intelligent Computer Communication and Processing.

[20]  Kurt Keutzer,et al.  Scheduling task dependence graphs with variable task execution times onto heterogeneous multiprocessors , 2008, EMSOFT '08.

[21]  Kim-Kwang Raymond Choo,et al.  PEDAL: a dynamic analysis tool for efficient concurrency bug reproduction in big data environment , 2016, Cluster Computing.

[22]  Stefano Marrone,et al.  PerfBPEL: A graph-based approach for the performance analysis of BPEL SOA applications , 2012, 6th International ICST Conference on Performance Evaluation Methodologies and Tools.

[23]  George Mastorakis,et al.  Energy efficient resource sharing using a trafficoriented routing scheme for cognitive radio networks , 2014, IET Networks.

[24]  K. Chandrasekaran,et al.  An objective study on improvement of task scheduling mechanism using computational intelligence in cloud computing , 2015 .

[25]  Mihaela-Andreea VASILE,et al.  MLBox: Machine learning box for asymptotic scheduling , 2017, Inf. Sci..

[26]  George Mastorakis,et al.  A resource intensive traffic-aware scheme using energy-aware routing in cognitive radio networks , 2014, Future Gener. Comput. Syst..

[27]  Jeffrey D. Ullman,et al.  NP-Complete Scheduling Problems , 1975, J. Comput. Syst. Sci..

[28]  Alan K. Mackworth,et al.  Artificial Intelligence: Artificial Intelligence and Agents , 2010 .

[29]  Vijayan Sugumaran,et al.  Participatory sensing-based semantic and spatial analysis of urban emergency events using mobile social media , 2016, EURASIP J. Wirel. Commun. Netw..

[30]  Ewa Deelman,et al.  WorkflowSim: A toolkit for simulating scientific workflows in distributed environments , 2012, 2012 IEEE 8th International Conference on E-Science.

[31]  George Mastorakis,et al.  On cohabitating networking technologies with common wireless access for home automation system purposes , 2016, IEEE Wireless Communications.

[32]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .