An Accurate Sequence Assembly Algorithm for Livestock, Plants and Microorganism Based on Spark

Sequence Assembly is one of the important topics in bioinformatics research. Sequence assembly algorithm has always met the problems of poor assembling precision and low efficiency. In view of these two problems, this paper designs and implements a precise assembling algorithm under the strategy of finding the source of reads based on the MapReduce (SA-BR-MR) and Eulerian path algorithm. Computational results show that SA-BR-MR is more accurate than other algorithms. At the same time, SA-BR-MR calculates 54 sequences which are randomly selected from animals, plants and microorganisms with base lengths from hundreds to tens of thousands from NCBI. All matching rates of the 54 sequences are 100%. For each species, the algorithm summarizes the range of K which makes the matching rates to be 100%. In order to verify the range of K value of hepatitis C virus (HCV) and related variants, the randomly selected eight HCV variants are calculated. The results verify the correctness of K range of hepatitis C and rela...

[1]  Haixu Tang,et al.  Next-generation sequencing technologies and fragment assembly algorithms. , 2012, Methods in molecular biology.

[2]  Asen Asenov,et al.  Self-consistent particle simulation of ion channels , 2005 .

[3]  Alla Lapidus,et al.  ExSPAnder: a universal repeat resolver for DNA fragment assembly , 2014, Bioinform..

[4]  Enrique Alba,et al.  Seeding strategies and recombination operators for solving the DNA fragment assembly problem , 2008, Inf. Process. Lett..

[5]  Yong Wang,et al.  A Genetic Algorithm Approach to Solving DNA Fragment Assembly Problem , 2005 .

[6]  Guo-Qiang Chen,et al.  DNA Fragments Assembly Based on Nicking Enzyme System , 2013, PloS one.

[7]  Caspar Zialor DNA sequencing with chain terminating inhibitors , 2014 .

[8]  Lei Wang,et al.  Rapid assembly of multiple DNA fragments through direct transformation of PCR products into E. coli and Lactobacillus. , 2014, Plasmid.

[9]  Enrique Alba,et al.  An improved trajectory-based hybrid metaheuristic applied to the noisy DNA Fragment Assembly Problem , 2014, Inf. Sci..

[10]  Nachol Chaiyaratana,et al.  DNA Fragment Assembly: An Ant Colony System Approach , 2006, EvoWorkshops.

[11]  Justin L. MacCallum,et al.  Iterative Assembly of Protein Fragments , 2010 .

[12]  Enrique Alba,et al.  Assembling DNA fragments with parallel algorithms , 2005, 2005 IEEE Congress on Evolutionary Computation.

[13]  Zhenzhou Ji,et al.  Parallelization of KMP algorithm for the masking-off of repeats in DNA fragment assembly , 2012 .

[14]  Enrique Alba,et al.  Performance of Distributed GAs on DNA Fragment Assembly , 2006, Parallel Evolutionary Computations.

[15]  Mark J. P. Chaisson,et al.  De novo fragment assembly with short mate-paired reads: Does the read length matter? , 2009, Genome research.

[16]  Bertil Schmidt,et al.  A fast hybrid short read fragment assembly algorithm , 2009, Bioinform..

[17]  P. Pevzner,et al.  An Eulerian path approach to DNA fragment assembly , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Norelle L. Daly,et al.  Characterizing circular peptides in mixtures: sequence fragment assembly of cyclotides from a violet plant by MALDI-TOF/TOF mass spectrometry , 2012, Amino Acids.

[19]  F. Sanger,et al.  DNA sequencing with chain-terminating inhibitors. 1977. , 1992, Biotechnology.

[20]  Liang Gao,et al.  Assembly sequence planning based on an improved harmony search algorithm , 2016 .

[21]  Florian Jeltsch,et al.  Plant functional traits and community assembly along interacting gradients of productivity and fragmentation , 2013 .

[22]  Goutam Chakraborty,et al.  An Efficient Genome Fragment Assembling Using GA with Neighborhood Aware Fitness Function , 2012, Appl. Comput. Intell. Soft Comput..

[23]  Nachol Chaiyaratana,et al.  DNA Fragment Assembly by Ant Colony and Nearest Neighbour Heuristics , 2006, ICAISC.

[24]  Anas A. Al-okaily,et al.  HGA: denovo genome assembly method for bacterial genomes using high coverage short sequencing reads , 2016, BMC Genomics.

[25]  Baomin Xu,et al.  An efficient algorithm for DNA fragment assembly in MapReduce. , 2012, Biochemical and biophysical research communications.