MAPPING on UPMEM

This paper presents the implementation of a mapping algorithm on the UPMEM architecture. The mapping is a basic bioinformatics application that consists in finding the best location of millions of short DNA sequences on a full genome. The mapping can be constrained by a maximum number of differences between the DNA sequence and the region of the genome where a high similarity has been found. UPMEM’s Processing-In-Memory (PIM) solution consist of adding processing units into the DRAM, to minimize data access time and maximize bandwidth, in order to drastically accelerate data-consuming algorithms. A 16 GBytes UPMEM-DIMM module comes then with 256 UPMEM DRAM Processing Units (named DPU). The mapping algorithm implemented on the UPMEM architecture dispatches a huge index across the DPU memories. DNA sequences are assigned to a specific DPU according to k-mers features, allowing to massively map in parallel million of them. Experimentation on Human genome dataset shows that speed-up of 25 can be obtained with PIM, compared to fast mapping software such as BWA, Bowtie2 or NextGenMap running 16 Intel threads. Experimentation also highlight that data transfer from storage device limits the performances of the implementation. The use of SSD drives can boost the speed-up to 80.