论文信息 - Direct approaches to exploit many-core architecture in bioinformatics

Direct approaches to exploit many-core architecture in bioinformatics

Current trends in computer programming look for solutions in the challenging task of porting and optimizing existing algorithms to many-core architectures with tens of Central Processing Units (CPUs). Yet, the lack of standardized general-purpose parallel programming and porting methodologies represents the main bottleneck on these developments. We have focused on bioinformatics applied to genomics in general and the so-called "Next-Generation" Sequencing (NGS) in particular, in order to study the viability and cost of porting and optimizing well known algorithms to a many-core architecture. Three different methods are tackled in order to implement existing algorithms in Tile64, corresponding to a microprocessor containing 64 CPUs, each of them being capable of executing an independent Linux operating system. Three different approaches have been explored: (i) implementation of the Needleman-Wunsch/Smith-Waterman pairwise aligner from scratch; (ii) direct translation of the Message Passing Interface (MPI) C++ ABySS assembly algorithm with changes on the communication layer; and (iii) migration of the ClustalW tool, parallelizing only the most time-consuming stage. The performance-gain/development-cost tradeoffs indicate that the Tile64 microprocessor has the potential to increase the performance of bioinformatics in an unprecedented way for a standalone Personal Computer (PC). Yet, the effective exploitation of these parallel implementations requires a detailed understanding of the peculiar many-core characteristics when migrating previous non-parallel source codes. Highlights? Computing power of the Tile64 many-core microprocessor can be exploited for NGS bioinformatics tasks. ? Tile64 many-core CPU architecture works as a cluster of pico-computers, as with the MC64-NW/SW algorithm. ? MC64-ClustalW shows an important performance improvement with a minor development effort. ? MC64-ABySS reveals that a MPI-like efficient API for Tile64 is essential to port successfully most of the existing parallel code. ? Wide-spreading of many-core CPU technologies could lead to a new paradigm in programming methodologies in the next years.

[1] P. Sneath,et al. Numerical Taxonomy , 1962, Nature.

[2] Michael S. Farrar. Optimizing Smith-Waterman for the Cell Broadband Engine , 2008 .

[3] Bertil Schmidt,et al. A hybrid architecture for bioinformatics , 2002, Future Gener. Comput. Syst..

[4] Pilar Hernández,et al. Genomic profiling of plastid DNA variation in the Mediterranean olive tree , 2011, BMC Plant Biology.

[5] Habib Zaidi,et al. Implementation of an Environment for Monte Carlo Simulation of Fully 3-D Positron Tomography on a High-Performance Parallel Platform , 1998, Parallel Comput..

[6] Francisco José Esteban,et al. Next-generation bioinformatics: using many-core processor architecture to develop a web service for sequence alignment , 2010, Bioinform..

[7] Pedro Trancoso,et al. Initial Experiences Porting a Bioinformatics Application to a Graphics Processor , 2005, Panhellenic Conference on Informatics.

[8] Dongrui Fan,et al. A Fast Linear-Space Sequence Alignment Algorithm with Dynamic Parallelization Framework , 2009, 2009 Ninth IEEE International Conference on Computer and Information Technology.

[9] Witold R. Rudnicki,et al. An efficient implementation of Smith Waterman algorithm on GPU using CUDA, for massively parallel scanning of sequence databases , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[10] Katherine Geiersbach,et al. Comparison of the Illumina Genome Analyzer and Roche 454 GS FLX for resequencing of hypertrophic cardiomyopathy-associated genes. , 2010, Journal of biomolecular techniques : JBT.

[11] Jonathan Schaeffer,et al. FastLSA: a fast, linear-space, parallel and sequential algorithm for sequence alignment , 2003, 2003 International Conference on Parallel Processing, 2003. Proceedings..

[12] Torbjørn Rognes,et al. Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation , 2011, BMC Bioinformatics.

[13] Torsten Hoefler,et al. Mpi on Millions of Cores * , 2022 .

[14] Klaus Schulten,et al. Adapting a message-driven parallel application to GPU-accelerated clusters , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[15] Roberto Gomperts,et al. Performance Optimization of Clustal W : Parallel Clustal W , HT Clustal , and MULTICLUSTAL , 2001 .

[16] Martin Vingron,et al. Annotating regulatory DNA based on man-mouse genomic comparison , 2002, ECCB.

[17] Jens H. Krüger,et al. A Survey of General‐Purpose Computation on Graphics Hardware , 2007, Eurographics.

[18] N. Saitou,et al. The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[19] Mark J. P. Chaisson,et al. De novo fragment assembly with short mate-paired reads: Does the read length matter? , 2009, Genome research.

[20] William Gropp,et al. Skjellum using mpi: portable parallel programming with the message-passing interface , 1994 .

[21] Timothy G. Mattson,et al. Programming the Intel 80-core network-on-a-chip Terascale Processor , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[22] Jason N. Dale,et al. Cell Broadband Engine Architecture and its first implementation - A performance view , 2007, IBM J. Res. Dev..

[23] Kuo-Bin Li,et al. ClustalW-MPI: ClustalW analysis using distributed and parallel computing , 2003, Bioinform..

[24] N. J. Avis,et al. An intelligent semi-automatic application porting system for application accelerators , 2009, UCHPC-MAW '09.

[25] Anthony Skjellum,et al. Using MPI - portable parallel programming with the message-parsing interface , 1994 .

[26] Jim des Rivières,et al. Eclipse: A platform for integrating development tools , 2004, IBM Syst. J..

[27] S. B. Needleman,et al. A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[28] Yao Zhang,et al. Parallel Computing Experiences with CUDA , 2008, IEEE Micro.

[29] Rudolf Eigenmann. Toward a methodology of optimizing programs for high-performance computers , 1993, ICS '93.

[30] Jack J. Dongarra,et al. Optimizing matrix multiplication for a short-vector SIMD architecture - CELL processor , 2009, Parallel Comput..

[31] J. Teich,et al. Comparison of Parallelization Frameworks for Shared Memory Multi-Core Architectures , 2010 .

[32] N. Gura,et al. UltraSPARC T2: A highly-treaded, power-efficient, SPARC SOC , 2007, 2007 IEEE Asian Solid-State Circuits Conference.

[33] Rodrigo Lopez,et al. Clustal W and Clustal X version 2.0 , 2007, Bioinform..

[34] Vaidy S. Sunderam,et al. Performance of the NAS Parallel Benchmarks on PVM-Based Networks , 1995, J. Parallel Distributed Comput..

[35] Ashwini K. Nanda,et al. Cell/B.E. blades: Building blocks for scalable, real-time, interactive, and digital media servers , 2007, IBM J. Res. Dev..

[36] Antonio Ruiz,et al. Recognition of circular patterns on GPUs: Performance analysis and contributions , 2008, J. Parallel Distributed Comput..

[37] Francisco José Esteban,et al. Parallelizing and optimizing a bioinformatics pairwise sequence alignment algorithm for many-core architecture , 2011, Parallel Comput..

[38] Jean-Marc Jézéquel,et al. Model-driven engineering for software migration in a large industrial context , 2007, MODELS'07.

[39] Italo Epicoco,et al. A Bioinfomatics Grid Alignment Toolkit , 2008, Future Gener. Comput. Syst..

[40] Boris D. Lubachevsky. Synchronization barrier and related tools for shared memory parallel programming , 2005, International Journal of Parallel Programming.

[41] Wen-mei W. Hwu,et al. Optimization principles and application performance evaluation of a multithreaded GPU using CUDA , 2008, PPoPP.

[42] Peter H. A. Sneath,et al. Numerical Taxonomy: The Principles and Practice of Numerical Classification , 1973 .

[43] Giorgio Valle,et al. CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment , 2008, BMC Bioinformatics.

[44] Gabriel Dorado,et al. Next-generation sequencing and syntenic integration of flow-sorted arms of wheat chromosome 4A exposes the chromosome structure and gene content. , 2012, The Plant journal : for cell and molecular biology.

[45] David Wentzlaff,et al. Processor: A 64-Core SoC with Mesh Interconnect , 2010 .

[46] Erik Lindholm,et al. NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.

[47] Steven J. M. Jones,et al. Abyss: a Parallel Assembler for Short Read Sequence Data Material Supplemental Open Access , 2022 .

[48] Yongchao Liu,et al. CUDASW++2.0: enhanced Smith-Waterman protein database search on CUDA-enabled GPUs based on SIMT and virtualized SIMD abstractions , 2010, BMC Research Notes.

[49] Michael Kistler,et al. Exploring the Viability of the Cell Broadband Engine for Bioinformatics Applications , 2007, IPDPS.

[50] O. Gotoh. An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[51] X. Huang,et al. CAP3: A DNA sequence assembly program. , 1999, Genome research.

[52] John E. Stone,et al. OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.

[53] Cole Trapnell,et al. Optimizing data intensive GPGPU computations for DNA sequence alignment , 2009, Parallel Comput..

[54] E. Birney,et al. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[55] John V. Harrison,et al. Legacy 4GL application migration via knowledge-based software engineering technology: a case study , 1997, Proceedings of Australian Software Engineering Conference ASWEC 97.

[56] Daniel S. Hirschberg,et al. A linear space algorithm for computing maximal common subsequences , 1975, Commun. ACM.

[57] J. Thompson,et al. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[58] H. Peter Hofstee,et al. Introduction to the Cell multiprocessor , 2005, IBM J. Res. Dev..