Gene Sequence Assembly Algorithm Model Based on the DBG Strategy and Its Application

With the continuous development of sequencing technology, the amount of bioinformatics data has increased geometrically, and the massive amount of bioinformatics data puts forward more stringent requirements for sequence assembly problems. The sequence assembly algorithm based on DBG (De Bruijn graph) strategy is a key algorithm in bioinformatics, which is widely used in the domain of gene sequence assembly. Current research on the domain of sequence assembly always focuses on optimization of specific steps to a specific algorithm and lack of research on domain-level high-abstract algorithm frameworks. To some extent, it leads to the redundancy of the sequence assembly algorithm, and some problems may be caused by the artificial selection algorithm. This paper analyzes the domain of DBGSA and establishes a feature model of this domain. Based on the production programming method, the DBGSA algorithm component is interactively designed. With the support of the PAR platform, the DBGSA algorithm component library is formally implemented, and furthermore, the DBGSA component library is used to assemble the specific algorithm. This research adds domain-level research to the domain of sequence assembly and implements the DBGSA component library, which can assemble specific sequence assembly algorithms, ensuring the efficiency of algorithm development and the reliability of assembly generation algorithms. At the same time, it also provides a valuable reference for solving problems in the domain of sequence assembly.

[1]  Alexander L. Wolf,et al.  Feature engineering [software development] , 1998, Proceedings Ninth International Workshop on Software Specification and Design.

[2]  Justin Chu,et al.  ARCS: scaffolding genome drafts with linked reads , 2017, Bioinform..

[3]  Justin Chu,et al.  ABySS 2.0: resource-efficient assembly of large genomes using a Bloom filter , 2016, bioRxiv.

[4]  P. Pevzner,et al.  metaSPAdes: a new versatile metagenomic assembler. , 2017, Genome research.

[5]  Sérgio Lifschitz,et al.  K-mer Mapping and de Bruijn graphs: The case for velvet fragment assembly , 2016, 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[6]  Siu-Ming Yiu,et al.  IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth , 2012, Bioinform..

[7]  Yu Gu,et al.  Applying graph-based differential grouping for multiobjective large-scale optimization , 2020, Swarm Evol. Comput..

[8]  T. Speed,et al.  GRIDSS: sensitive and specific genomic rearrangement detection using positional de Bruijn graph assembly. , 2017, Genome research.

[9]  Eugene W. Myers,et al.  A whole-genome assembly of Drosophila. , 2000, Science.

[10]  F. Zhao,et al.  A novel codon-based de Bruijn graph algorithm for gene construction from unassembled transcriptomes , 2016, Genome Biology.

[11]  Michael Gribskov,et al.  Comprehensive evaluation of de novo transcriptome assembly programs and their effects on differential gene expression analysis , 2016, Bioinform..

[12]  Haixu Tang,et al.  Fragment assembly with double-barreled data , 2001, ISMB.

[13]  Steven J. M. Jones,et al.  Abyss: a Parallel Assembler for Short Read Sequence Data Material Supplemental Open Access , 2022 .

[14]  Jaejoon Lee,et al.  Concepts and Guidelines of Feature Modeling for Product Line Software Engineering , 2002, ICSR.

[15]  Jinyun Xue A unified approach for developing efficient algorithmic programs , 2008, Journal of Computer Science and Technology.

[16]  S. Salzberg,et al.  Bioinformatics challenges of new sequencing technology. , 2008, Trends in genetics : TIG.

[17]  Jinyun Xue Genericity in PAR Platform , 2015, SOFL+MSVL.

[18]  Melissa Bastide,et al.  Assembling Genomic DNA Sequences with PHRAP , 2007, Current protocols in bioinformatics.

[19]  Bin Cao,et al.  Hybrid Microgrid Many-Objective Sizing Optimization With Fuzzy Decision , 2020, IEEE Transactions on Fuzzy Systems.

[20]  P. Pevzner,et al.  An Eulerian path approach to DNA fragment assembly , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[21]  N. Weisenfeld,et al.  Direct determination of diploid genome sequences , 2016, bioRxiv.

[22]  James R. Knight,et al.  Genome sequencing in microfabricated high-density picolitre reactors , 2005, Nature.

[23]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[24]  Jinyun Xue,et al.  Formal derivation of graph algorithmic programs using partition-and-recur , 1998, Journal of Computer Science and Technology.

[25]  Jin-Yun Xue,et al.  Formal Derivation of a Generic Algorithmic Program for Solving a Class of Extremum Problems , 2009, 2009 10th ACIS International Conference on Software Engineering, Artificial Intelligences, Networking and Parallel/Distributed Computing.

[26]  Mihai Pop,et al.  Genome assembly reborn: recent computational challenges , 2009, Briefings Bioinform..