Next-generation bioinformatics: using many-core processor architecture to develop a web service for sequence alignment

MOTIVATION Bioinformatics algorithms and computing power are the main bottlenecks for analyzing huge amount of data generated by the current technologies, such as the 'next-generation' sequencing methodologies. At the same time, most powerful microprocessors are based on many-core chips, yet most applications cannot exploit such power, requiring parallelized algorithms. As an example of next-generation bioinformatics, we have developed from scratch a new parallelization of the Needleman-Wunsch (NW) sequence alignment algorithm for the 64-core Tile64 microprocessor. The unprecedented performance it offers for a standalone personal computer (PC) is discussed, optimally aligning sequences up to 20 times faster than the non-parallelized version, thus saving valuable time. AVAILABILITY This algorithm is available as a free web service for the scientific community at http://www.sicuma.uma.es/multicore. The open source code is also available on such site.

[1]  A. Löytynoja,et al.  Phylogeny-Aware Gap Placement Prevents Errors in Sequence Alignment and Evolutionary Analysis , 2008, Science.

[2]  Timothy G. Mattson,et al.  Programming the Intel 80-core network-on-a-chip Terascale Processor , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[3]  Daniel S. Hirschberg,et al.  A linear space algorithm for computing maximal common subsequences , 1975, Commun. ACM.

[4]  H. Peter Hofstee,et al.  Introduction to the Cell multiprocessor , 2005, IBM J. Res. Dev..

[5]  Michael Kistler,et al.  Exploring the Viability of the Cell Broadband Engine for Bioinformatics Applications , 2007, IPDPS.

[6]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[7]  T. Rognes,et al.  ParAlign: a parallel sequence alignment algorithm for rapid and sensitive database searches. , 2001, Nucleic acids research.

[8]  M. Gonzalo Claros,et al.  Intuitive Bioinformatics for Genomics Applications: Omega-Brigid Workflow Framework , 2009, IWANN.

[9]  N. Gura,et al.  UltraSPARC T2: A highly-treaded, power-efficient, SPARC SOC , 2007, 2007 IEEE Asian Solid-State Circuits Conference.

[10]  Kevin Skadron,et al.  A performance study of general-purpose applications on graphics processors using CUDA , 2008, J. Parallel Distributed Comput..

[11]  Jonathan Schaeffer,et al.  FastLSA: A Fast, Linear-Space, Parallel and Sequential Algorithm for Sequence Alignment , 2006, Algorithmica.

[12]  Giorgio Valle,et al.  CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment , 2008, BMC Bioinformatics.

[13]  Azzedine Boukerche,et al.  A parallel strategy for biological sequence alignment in restricted memory space , 2008, J. Parallel Distributed Comput..

[14]  Jens H. Krüger,et al.  A Survey of General‐Purpose Computation on Graphics Hardware , 2007, Eurographics.

[15]  Eric Rice,et al.  The UCSC Kestrel parallel processor , 2005, IEEE Transactions on Parallel and Distributed Systems.

[16]  Donald Yeung,et al.  BioBench: A Benchmark Suite of Bioinformatics Applications , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..

[17]  Mark Baker,et al.  University of Portsmouth Portsmouth Hants United Kingdom Po1 2up a Comparative Study of Java and C Performance in Two Large-scale Parallel Applications , 2022 .

[18]  Tao Wang,et al.  Parallel Linear Space Algorithm for Large-Scale Sequence Alignment , 2005, Euro-Par.

[19]  David Wentzlaff,et al.  Processor: A 64-Core SoC with Mesh Interconnect , 2010 .