Enabling the high level synthesis of data analytics accelerators

Conventional High Level Synthesis (HLS) tools mainly target compute intensive kernels typical of digital signal processing applications. We are developing techniques and architectural templates to enable HLS of data analytics applications. These applications are memory intensive, present fine-grained, unpredictable data accesses, and irregular, dynamic task parallelism. We discuss an architectural template based around a distributed controller to efficiently exploit thread level parallelism. We present a memory interface that supports parallel memory subsystems and enables implementing atomic memory operations. We introduce a dynamic task scheduling approach to efficiently execute heavily unbalanced workload. The templates are validated by synthesizing queries from the Lehigh University Benchmark (LUBM), a well know SPARQL benchmark.