Deploying a Task-based Runtime System on Raspberry Pi Clusters

Arm® technology is becoming increasingly important in HPC. Recently, Fugaku, an Arm®-based system, was awarded the number one place in the Top500 list. Raspberry Pis provide an inexpensive platform to become familiar with this architecture. However, Pis can also be useful on their own. Here we describe our efforts to configure and benchmark the use of a Raspberry Pi cluster with the HPX/Phylanx platform (normally intended for use with HPC applications) and document the lessons we learned. First, we highlight the required changes in the configuration of the Pi to gain performance. Second, we explore how limited memory bandwidth limits the use of all cores in our shared memory benchmarks. Third, we evaluate whether low network bandwidth affects distributed performance. Fourth, we discuss the power consumption and the resulting trade-off in cost of operation and performance.

[1]  Christopher,et al.  STEllAR-GROUP/hpx: HPX V1.1.0: The C++ Standards Library for Parallelism and Concurrency , 2018 .

[2]  Kostas Katrinis,et al.  A taxonomy of task-based parallel programming technologies for high-performance computing , 2018, The Journal of Supercomputing.

[3]  Hartmut Kaiser,et al.  Methodology for Adaptive Active Message Coalescing in Task Based Runtime Systems , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[4]  Thomas L. Sterling,et al.  ParalleX An Advanced Parallel Execution Model for Scaling-Impaired Applications , 2009, 2009 International Conference on Parallel Processing Workshops.

[5]  P. Mahalakshmi,et al.  Smart bin: An intelligent waste alert and prediction system using machine learning approach , 2017, 2017 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET).

[6]  V. Sathish Kumar,et al.  EMBEDDED IMAGE CAPTURING SYSTEM USING RASPBERRY PI SYSTEM , 2014 .

[7]  Sung Wook Baik,et al.  Raspberry Pi assisted face recognition framework for enhanced law-enforcement services in smart cities , 2017, Future Gener. Comput. Syst..

[8]  Patrick Diehl,et al.  Asynchronous Execution of Python Code on Task-Based Runtime Systems , 2018, 2018 IEEE/ACM 4th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2).

[9]  Wooi-Haw Tan,et al.  Indoor location and motion tracking system for elderly assisted living home , 2017, 2017 International Conference on Robotics, Automation and Sciences (ICORAS).

[10]  V. Pawar,et al.  Machine learning regression technique for cotton leaf disease detection and controlling using IoT , 2017, 2017 International conference of Electronics, Communication and Aerospace Technology (ICECA).

[11]  Simon McIntosh-Smith,et al.  Comparative Benchmarking of the First Generation of HPC-Optimised Arm Processors on Isambard , 2018 .

[12]  Antti Ylä-Jääski,et al.  Energy- and Cost-Efficiency Analysis of ARM-Based Clusters , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[13]  Ján Piteľ,et al.  Machine learning algorithms implementation into embedded systems with web application user interface , 2017, 2017 IEEE 21st International Conference on Intelligent Engineering Systems (INES).

[14]  Charles E. Leiserson,et al.  The Cilk++ concurrency platform , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[15]  Mark Parsons,et al.  Evaluating the Arm Ecosystem for High Performance Computing , 2019, PASC.

[16]  Jing Fu,et al.  Raspberry Pi Based Intelligent Wireless Sensor Node for Localized Torrential Rain Monitoring , 2016, J. Sensors.

[17]  Carl Hewitt,et al.  The incremental garbage collection of processes , 1977, Artificial Intelligence and Programming Languages.

[18]  S. McIntosh-Smith,et al.  Scaling Results From the First Generation of Arm-based Supercomputers , 2019 .

[19]  Hartmut Kaiser,et al.  HPX: A Task Based Programming Model in a Global Address Space , 2014, PGAS.

[20]  Rajashree Tripathy,et al.  Real-time Face Detection and Tracking Using Haar Classifier on SoC , 2014 .

[21]  Doheon Lee,et al.  Drowsy Driving Warning System Based on GS1 Standards with Machine Learning , 2017, 2017 IEEE International Congress on Big Data (BigData Congress).

[22]  Yan Zhang,et al.  The Detection and Recognition of Bridges' Cracks Based on Deep Belief Network , 2017, 22017 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC).

[23]  Daniel Sunderland,et al.  Kokkos: Enabling manycore performance portability through polymorphic memory access patterns , 2014, J. Parallel Distributed Comput..

[24]  Neethu John,et al.  A Low Cost Implementation of Multi-label Classification Algorithm Using Mathematica on Raspberry Pi , 2015 .

[25]  Hemantkumar Wani,et al.  An appropriate model predicting pest/diseases of crops using machine learning algorithms , 2017, 2017 4th International Conference on Advanced Computing and Communication Systems (ICACCS).

[26]  Vincent M. Weaver,et al.  A Raspberry Pi Cluster Instrumented for Fine-Grained Power Measurement , 2016 .

[27]  Peter A. Boyle,et al.  Grid: A next generation data parallel C++ QCD library , 2015, ArXiv.

[28]  Patrick Diehl,et al.  An asynchronous and task-based implementation of peridynamics utilizing HPX—the C++ standard library for parallelism and concurrency , 2018, SN Applied Sciences.

[29]  Bradford L. Chamberlain,et al.  Parallel Programmability and the Chapel Language , 2007, Int. J. High Perform. Comput. Appl..

[30]  Munam Ali Shah,et al.  Energy Efficient Computing: A Comparison of Raspberry PI with Modern Devices , 2015 .

[31]  Thomas Heller,et al.  HPX – An open source C++ Standard Library for Parallelism and Concurrency , 2023, ArXiv.