论文信息 - MAPPING on UPMEM

MAPPING on UPMEM

This paper presents the implementation of a mapping algorithm on the UPMEM architecture. The mapping is a basic bioinformatics application that consists in finding the best location of millions of short DNA sequences on a full genome. The mapping can be constrained by a maximum number of differences between the DNA sequence and the region of the genome where a high similarity has been found. UPMEM’s Processing-In-Memory (PIM) solution consist of adding processing units into the DRAM, to minimize data access time and maximize bandwidth, in order to drastically accelerate data-consuming algorithms. A 16 GBytes UPMEM-DIMM module comes then with 256 UPMEM DRAM Processing Units (named DPU). The mapping algorithm implemented on the UPMEM architecture dispatches a huge index across the DPU memories. DNA sequences are assigned to a specific DPU according to k-mers features, allowing to massively map in parallel million of them. Experimentation on Human genome dataset shows that speed-up of 25 can be obtained with PIM, compared to fast mapping software such as BWA, Bowtie2 or NextGenMap running 16 Intel threads. Experimentation also highlight that data transfer from storage device limits the performances of the implementation. The use of SSD drives can boost the speed-up to 80.

[1] Bairong Shen,et al. Evaluation and Comparison of Multiple Aligners for Next-Generation Sequencing Data Analysis , 2014, BioMed research international.

[2] Véronique Martin,et al. Mapping Reads on a Genomic Sequence: An Algorithmic Overview and a Practical Comparative Analysis , 2012, J. Comput. Biol..

[3] Dominique Lavenier,et al. GASSST: global alignment short sequence search tool , 2010, Bioinform..

[4] Arndt von Haeseler,et al. NextGenMap: fast and accurate read mapping in highly polymorphic genomes , 2013, Bioinform..

[5] Richard Durbin,et al. Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[6] Ümit V. Çatalyürek,et al. Benchmarking short sequence mapping tools , 2013, BMC Bioinformatics.

[7] S. Nelson,et al. BFAST: An Alignment Tool for Large Scale Genome Resequencing , 2009, PloS one.

[8] Siu-Ming Yiu,et al. SOAP2: an improved ultrafast tool for short read alignment , 2009, Bioinform..

[9] Cole Trapnell,et al. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.