In this report, I describe the design and implementation of an inexpensive, eight node, 32 core, cluster of raspberry pi single board computers, as well as the performance of this cluster on two computational tasks, one that requires significant data transfer relative to computational time requirements, and one that does not. We have two use-cases for the cluster: (a) as an educational tool for classroom usage, such as covering parallel algorithms in an algorithms course; and (b) as a test system for use during the development of parallel metaheuristics, essentially serving as a personal desktop parallel computing cluster. Our preliminary results show that the slow 100 Mbps networking of the raspberry pi significantly limits such clusters to parallel computational tasks that are either long running relative to data communications requirements, or that which requires very little internode communications. Additionally, although the raspberry pi 3 has a quad-core processor, parallel speedup degrades during attempts to utilize all four cores of all cluster nodes for a parallel computation, likely due to resource contention with operating system level processes. However, distributing a task across three cores of each cluster node does enable linear (or near linear) speedup.
[1]
Vincent A. Cicirello.
Variable Annealing Length and Parallelism in Simulated Annealing
,
2017,
SOCS.
[2]
Elaye Karstadt,et al.
Matrix Multiplication, a Little Faster
,
2017,
SPAA.
[3]
D. Neal.
Determining Sample Sizes for Monte Carlo Integration
,
1993
.
[4]
Suzanne J. Matthews.
Teaching with parallella: a first look in an undergraduate parallel computing course
,
2016
.
[5]
David Toth,et al.
The Micro-Cluster Showcase: 7 Inexpensive Beowulf Clusters for Teaching PDC
,
2016,
SIGCSE.
[6]
Xin-She Yang,et al.
Introduction to Algorithms
,
2021,
Nature-Inspired Optimization Algorithms.
[7]
V. Strassen.
Gaussian elimination is not optimal
,
1969
.
[8]
James Demmel,et al.
Communication-optimal parallel algorithm for strassen's matrix multiplication
,
2012,
SPAA '12.
[9]
David Toth,et al.
Budget Beowulfs: A Showcase of Inexpensive Clusters for Teaching PDC
,
2015,
SIGCSE.
[10]
Thomas L. Sterling,et al.
BEOWULF: A Parallel Workstation for Scientific Computation
,
1995,
ICPP.
[11]
Mithuna Thottethodi,et al.
Recursive array layouts and fast parallel matrix multiplication
,
1999,
SPAA '99.