Towards Fast Reverse Time Migration Kernels using Multi-threaded Wavefront Diamond Tiling

Today’s high-end multicore systems are characterized by a deep memory hierarchy, i.e., several levels of local and shared caches, with limited size and bandwidth per core. The ever-increasing gap between the processor and memory speed will further exacerbate the problem and has lead the scientific community to revisit numerical software implementations to better suit the underlying memory subsystem for performance (data reuse) as well as energy efficiency (data locality). The authors propose a novel multi-threaded wavefront diamond blocking (MWD) implementation in the context of stencil computations, which represents the core operation for seismic imaging in oil industry. The stencil diamond formulation introduces temporal blocking for high data reuse in the upper cache levels. The wavefront optimization technique ensures data locality by allowing multiple threads to share common adjacent point stencil. Therefore, MWD is able to take up the aforementioned challenges by alleviating the cache size limitation and releasing pressure from the memory bandwidth. Performance comparisons are shown against the optimized 25-point stencil standard seismic imaging scheme using spatial and temporal blocking and demonstrate the effectiveness of MWD.