Memory Optimization for Global Protein Network Alignment Using Pushdown Automata and De Bruijn Graph

Ongoing improvements in Computational Biology (CB) research have generated massive amounts of Protein-Protein Interactions (PPIs) data set. In this regards, the availability of PPI data for several organisms provoke the discovery of computational methods for measurements, analysis, modeling, comparisons, clustering and alignments of biological data networks. Nevertheless, fixed network comparison is computationally stubborn and as a result several methods have been used instead.  It is very crucial to utilize the memory of computing devices for Protein- Protein Interactions (PPIs) data set. We have compared the memory uses using Pushdown Automata and de Bruijn graph based Bloom Filter for global proteins network alignment.  De Bruijn graph is regularly used in Next Generation Sequencing (NGS) for large scale data set. De novo genome assembler utilizes the memory. Bloom filter and Pushdown Automat perform better to reduce memory. We have noticed that Pushdown Automata outperform Bloom filter in memory saving but it takes more time than Bloom filter. The result shows that Bloom filter software Mania implements full de novo assembly of human genome data set using  6.5 GB memory in 27 hours, on the other hand Pushdown Automat performs same results in 1 GB memory of 31 hours.

[1]  G. McVean,et al.  De novo assembly and genotyping of variants using colored de Bruijn graphs , 2011, Nature Genetics.

[2]  M. Zaslavskiy,et al.  A Path Following Algorithm for the Graph Matching Problem , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Huanming Yang,et al.  De novo assembly of human genomes with massively parallel short read sequencing. , 2010, Genome research.

[4]  Arend Hintze,et al.  Scaling metagenome sequence assembly with probabilistic de Bruijn graphs , 2011, Proceedings of the National Academy of Sciences.

[5]  Thomas C. Conway,et al.  Succinct data structures for assembling large genomes , 2010, Bioinform..

[6]  Michael Mitzenmacher,et al.  Less hashing, same performance: Building a better Bloom filter , 2008 .

[7]  Francis Bach,et al.  Global alignment of protein–protein interaction networks by graph matching methods , 2009, Bioinform..

[8]  Ying Wang,et al.  Algorithms for Large, Sparse Network Alignment Problems , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[9]  Siu-Ming Yiu,et al.  Meta-IDBA: a de Novo assembler for metagenomic data , 2011, Bioinform..

[10]  Mihai Pop,et al.  Exploiting sparseness in de novo genome assembly , 2012, BMC Bioinformatics.

[11]  Gunnar W. Klau,et al.  A new graph-based method for pairwise global network alignment , 2009, BMC Bioinformatics.

[12]  R. Holt,et al.  Targeted Assembly of Short Sequence Reads , 2011, PloS one.

[13]  Rayan Chikhi,et al.  Mapsembler, targeted and micro assembly of large NGS datasets on a desktop computer , 2012, BMC Bioinformatics.

[14]  Michael S. Waterman,et al.  A New Algorithm for DNA Sequence Assembly , 1995, J. Comput. Biol..

[15]  Marie-France Sagot,et al.  Identifying SNPs without a Reference Genome by Comparing Raw Reads , 2010, SPIRE.

[16]  Marie-France Sagot,et al.  Theme: Computational Biology and Bioinformatics Computational Sciences for Biology, Medicine and the Environment , 2012 .

[17]  N. Friedman,et al.  Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data , 2011, Nature Biotechnology.

[18]  Steven J. M. Jones,et al.  Abyss: a Parallel Assembler for Short Read Sequence Data Material Supplemental Open Access , 2022 .

[19]  Shi-Hua Zhang,et al.  Alignment of molecular networks by integer quadratic programming , 2007, Bioinform..