UNION: An Efficient Mapping Tool Using UniMark with Non-overlapping Interval Indexing Strategy

NGS has become a popular research field in biologists because it was able to produce inexpensive and accuracy short biology sequences very fast. NGS technique has been improved to produce long length sequences, more than 100bp, recently with the same quality, accuracy and speed. Thus, tools for short sequences may be not suitable for long length sequences. We propose a new tool called UNION for re-sequencing applications by mapping long length sequences to a reference genome. UNION uses the UniMarker with a non-overlapping interval indexing strategy and a tool, CORAL, to do sequence alignments. For the experiments we randomly cut ten thousands sequences with a length of 512bp from the genome of Trichomonas and also produce mutations/sequence errors for these sequences to simulate different similarities. UNION has been compared with GMAP in terms of speed and accuracy and achieves better performance than that of GMAP.

[1]  Kun-Mao Chao,et al.  A tool for aligning very similar DNA sequences , 1997, Comput. Appl. Biosci..

[2]  Thomas D. Wu,et al.  GMAP: a genomic mapping and alignment program for mRNA and EST sequence , 2005, Bioinform..

[3]  S. Salzberg,et al.  Alignment of whole genomes. , 1999, Nucleic acids research.

[4]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[5]  Edward S. C. Shih,et al.  Single Nucleotide Polymorphism Mapping Using Genome-Wide Unique Sequences , 2002 .

[6]  F.R. Hsu,et al.  Aligning ESTs to genome using multilayer unique makers , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[7]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[8]  Thomas L. Madden,et al.  PowerBLAST: a new network BLAST application for interactive or automated sequence analysis and annotation. , 1997, Genome research.

[9]  Lukas Wagner,et al.  A Greedy Algorithm for Aligning DNA Sequences , 2000, J. Comput. Biol..

[10]  G. Rubin,et al.  A computer program for aligning a cDNA sequence with a genomic DNA sequence. , 1998, Genome research.

[11]  Chuan Yi Tang,et al.  Comparative exon prediction based on heuristic coding region alignment , 2005, 8th International Symposium on Parallel Architectures,Algorithms and Networks (ISPAN'05).