GiSA: A Grid System for Genome Sequences Assembly

Sequencing genomes is a fundamental aspect of biological research. Shotgun sequencing, since introduced by Sanger et al [2], has remained the mainstay in the research field of genome sequence assembly. This method randomly obtains sequence reads (e.g. a subsequence including about 500 characters) from a genome and then assemblies them into contigs based on significant overlap among them. The whole-genome shotgun (WGS) approach, generates sequence reads directly from a whole-genome library and uses computational techniques to reassemble them. A variety of assembly programs have been previously proposed and implemented, including PHRAP [3] (Green 1994), CAP3 [4] (1999), Celera [5] (2000) etc. Because of great computational complexity and increasingly large size, they incur great time and space overhead. PHRAP [3], for instance, which can only run in a stand-alone way, requires many times memory (usually greater than 10) as the size of original sequence data. In realistic applications, sequencing process might come to become unacceptably slow for insufficient memory even with a mainframe with huge RAM.

[1]  X. Huang,et al.  CAP3: A DNA sequence assembly program. , 1999, Genome research.

[2]  F. Sanger,et al.  DNA sequencing with chain-terminating inhibitors. , 1977, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Eugene W. Myers,et al.  A whole-genome assembly of Drosophila. , 2000, Science.

[4]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.