Comparing parallel performance of Go and C++ TBB on a direct acyclic task graph using a dynamic programming problem

Concurrent programming languages Go and C++ Threading Building Blocks (TBB) offer high level parallel programming mechanisms built on top of threads. Go goroutines and TBB task classes are used as the computation units that are mapped to physical threads on multi-core processors. The synchronization mechanisms in Go and TBB are the channel and the task scheduler, respectively. We utilized these mechanisms to implement a parallel version of the optimal binary search tree dynamic programming algorithm in Go and TBB. Both implementations tile the iteration space and construct and evaluate a direct acyclic task graph for optimal parallelism without over constraints. We compared Go and TBB speedup and performance to create a benchmark of how efficient these two languages are at evaluating a direct acyclic task graph. Our experimental results show that the overhead of task scheduling and synchronization in TBB is much smaller than Go and that the overall performance of TBB is 1.6 to 3.6 times faster than Go. TBB provided super linear speedup under certain conditions, which we attribute to the majority of the test data being cached and the negative cost of task scheduling and synchronization. We conclude that TBB task scheduling and synchronization is faster than Go and that the top speedup of TBB is greater than that of Go.