SWMapper: Scalable Read Mapper on SunWay TaihuLight

With the rapid development of next-generation sequencing (NGS) technologies, high throughput sequencing platforms continuously produce large amounts of short read DNA data at low cost. Read mapping is a performance-critical task, being one of the first stages required for many different types of NGS analysis pipelines. We present SWMapper — a scalable and efficient read mapper for the Sunway TaihuLight supercomputer. A number of optimization techniques are proposed to achieve high performance on its heterogeneous architecture which are centered around a memory-efficient succinct hash index data structure including seed filtration, duplicate removal, dynamic scheduling, asynchronous data transfer, and overlapping I/O and computation. Furthermore, a vectorized version of the banded Myers algorithm for pairwise alignment with 256-bit vector registers is presented to fully exploit the computational power of the SW26010 processor. Our performance evaluation shows that SWMapper using all 4 compute groups of a single Sunway TaihuLight node outperforms S-Aligner on the same hardware by a factor of 6.2. In addition, compared the state-of-the-art CPU-based mappers RazerS3, BitMapper2, and Hobbes3 running on a 4-core Xeon W-2123v3 CPU, SWMapper achieves speedups of 26.5, 7.8, and 2.6, respectively. Our optimizations achieve an aggregated speedup of 11 compared to the naïve implementation on one compute group of an SW26010 processor as well as a strong scaling efficiency of 74% on 128 compute groups.

[1]  Roderic Guigó,et al.  The GEM mapper: fast, accurate and versatile alignment by filtration , 2012, Nature Methods.

[2]  Xiaohui Xie,et al.  Hobbes3: Dynamic generation of variable-length signatures for efficient approximate subsequence mappings , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[3]  Kenli Li,et al.  Performance-Aware Model for Sparse Matrix-Matrix Multiplication on the Sunway TaihuLight Supercomputer , 2019, IEEE Transactions on Parallel and Distributed Systems.

[4]  Gianluigi Zanetti,et al.  SEAL: a distributed short read mapping and duplicate removal tool , 2011, Bioinform..

[5]  Yong Zhang,et al.  BitMapper2: A GPU-Accelerated All-Mapper Based on the Sparse q-Gram Index , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[6]  Chao Yang,et al.  Performance Optimization of the HPCG Benchmark on the Sunway TaihuLight Supercomputer , 2018, ACM Trans. Archit. Code Optim..

[7]  Knut Reinert,et al.  A novel and well-defined benchmarking method for second generation read mapping , 2011, BMC Bioinformatics.

[8]  M. Schatz,et al.  Big Data: Astronomical or Genomical? , 2015, PLoS biology.

[9]  Kai Xu,et al.  S-Aligner: Ultrascalable Read Mapping on Sunway Taihu Light , 2017, 2017 IEEE International Conference on Cluster Computing (CLUSTER).

[10]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[11]  Weiguo Liu,et al.  Fast and efficient short read mapping based on a succinct hash index , 2018, BMC Bioinformatics.

[12]  R.H. Dennard,et al.  Design Of Ion-implanted MOSFET's with Very Small Physical Dimensions , 1974, Proceedings of the IEEE.

[13]  Meng Zhang,et al.  Redesigning LAMMPS for Peta-Scale and Hundred-Billion-Atom Simulation on Sunway TaihuLight , 2018, SC18: International Conference for High Performance Computing, Networking, Storage and Analysis.

[14]  David A. Patterson,et al.  A new golden age for computer architecture , 2019, Commun. ACM.

[15]  Jorge Amigo,et al.  SparkBWA: Speeding Up the Alignment of High-Throughput DNA Sequencing Data , 2016, PloS one.

[16]  Kai Xu,et al.  Refactoring and Optimizing WRF Model on Sunway TaihuLight , 2019, ICPP.

[17]  Onur Mutlu,et al.  Accelerating read mapping with FastHASH , 2013, BMC Genomics.

[18]  Kenli Li,et al.  Implementing molecular dynamics simulation on the Sunway TaihuLight system with heterogeneous many‐core processors , 2018, Concurr. Comput. Pract. Exp..

[19]  Manuel Holtgrewe,et al.  Mason – A Read Simulator for Second Generation Sequencing Data , 2010 .

[20]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[21]  Juan Carlos Castilla-Rubio,et al.  Earth BioGenome Project: Sequencing life for the future of life , 2018, Proceedings of the National Academy of Sciences.

[22]  Faraz Hach,et al.  mrsFAST: a cache-oblivious algorithm for short-read mapping , 2010, Nature Methods.

[23]  Ke Qiu,et al.  Speeding Up Large-Scale Next Generation Sequencing Data Analysis with pBWA , 2017 .

[24]  Knut Reinert,et al.  RazerS 3: Faster, fully sensitive read mapping , 2012, Bioinform..

[25]  Guangwen Yang,et al.  swDNN: A Library for Accelerating Deep Learning Applications on Sunway TaihuLight , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[26]  Eugene W. Myers,et al.  A fast bit-vector algorithm for approximate string matching based on dynamic programming , 1998, JACM.

[27]  Changjun Hu,et al.  Massively Scaling the Metal Microscopic Damage Simulation on Sunway TaihuLight Supercomputer , 2018, ICPP.

[28]  Weifeng Liu,et al.  swSpTRSV: a fast sparse triangular solve with sparse level tile layout on sunway architectures , 2018, PPoPP.

[29]  Yongchao Liu,et al.  CUSHAW3: Sensitive and Accurate Base-Space and Color-Space Short-Read Alignment with Hybrid Seeding , 2014, PloS one.

[30]  Leonid Oliker,et al.  merAligner: A Fully Parallel Sequence Aligner , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[31]  Gordon E. Moore,et al.  Progress in digital integrated electronics , 1975 .

[32]  Heng Li Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM , 2013, 1303.3997.

[33]  Tomás F. Pena,et al.  BigBWA: approaching the Burrows-Wheeler aligner to Big Data technologies , 2015, Bioinform..

[34]  Guangwen Yang,et al.  swCaffe: A Parallel Framework for Accelerating Deep Learning Applications on Sunway TaihuLight , 2018, 2018 IEEE International Conference on Cluster Computing (CLUSTER).

[35]  Chao Yang,et al.  Enabling Highly Efficient k-Means Computations on the SW26010 Many-Core Processor of Sunway TaihuLight , 2019, Journal of Computer Science and Technology.

[36]  G.E. Moore,et al.  Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.

[37]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.