MSuPDA: A Memory Efficient Algorithm for Sequence Alignment

AbstractSpace complexity is a million dollar question in DNA sequence alignments. In this regard, memory saving under pushdown automata can help to reduce the occupied spaces in computer memory. Our proposed process is that anchor seed (AS) will be selected from given data set of nucleotide base pairs for local sequence alignment. Quick splitting techniques will separate the AS from all the DNA genome segments. Selected AS will be placed to pushdown automata’s (PDA) input unit. Whole DNA genome segments will be placed into PDA’s stack. AS from input unit will be matched with the DNA genome segments from stack of PDA. Match, mismatch and indel of nucleotides will be popped from the stack under the control unit of pushdown automata. During the POP operation on stack, it will free the memory cell occupied by the nucleotide base pair.

[1]  Toru Suzuki,et al.  Stage‐specific expression of microRNAs during Xenopus development , 2005, FEBS letters.

[2]  I-Min A. Chen,et al.  The Genomes OnLine Database (GOLD) v.4: status of genomic and metagenomic projects and their associated metadata , 2011, Nucleic Acids Res..

[3]  Natalia N. Ivanova,et al.  GenePRIMP: a gene prediction improvement pipeline for prokaryotic genomes , 2010, Nature Methods.

[4]  B. Berger,et al.  Human and Mouse Gene Structure: Comparative Analysis and Application to Exon Prediction , 2000 .

[5]  C. van Broeckhoven,et al.  novoSNP, a novel computational tool for sequence variation discovery. , 2005, Genome research.

[6]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[7]  S. Karlin,et al.  Applications and statistics for multiple high-scoring segments in molecular sequences. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[8]  T. Du,et al.  Asymmetry in the Assembly of the RNAi Enzyme Complex , 2003, Cell.

[9]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[10]  P Green,et al.  Base-calling of automated sequencer traces using phred. II. Error probabilities. , 1998, Genome research.

[11]  Jean-Michel Claverie,et al.  The Difficulty of Identifying Genes in Anonymous Vertebrate Sequences , 1997, Comput. Chem..

[12]  Temple F. Smith,et al.  Comparison of biosequences , 1981 .

[13]  Lee Aaron Newberg Memory-efficient dynamic programming backtrace and pairwise local sequence alignment , 2008, Bioinform..

[14]  R. Fleischmann,et al.  Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. , 1995, Science.

[15]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[16]  S. Karlin,et al.  Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Gail L. Rosen,et al.  Combining gene prediction methods to improve metagenomic gene annotation , 2011, BMC Bioinformatics.

[18]  I-Min A. Chen,et al.  The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata , 2007, Nucleic Acids Res..

[19]  J. Mullikin,et al.  SSAHA: a fast search method for large DNA databases. , 2001, Genome research.

[20]  D. Lipman,et al.  Rapid and sensitive protein similarity searches. , 1985, Science.

[21]  P. Green,et al.  Base-calling of automated sequencer traces using phred. I. Accuracy assessment. , 1998, Genome research.

[22]  Amir Dembo,et al.  Strong limit theorems of empirical functionals for large exceedances of partial sums of i , 1991 .

[23]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[24]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[25]  M. Waterman,et al.  Stochastic scrabble: large deviations for sequences with scores , 1988, Journal of Applied Probability.

[26]  Chuong B. Do,et al.  ProbCons: Probabilistic consistency-based multiple sequence alignment. , 2005, Genome research.

[27]  Paul Scheet,et al.  Automating sequence-based detection and genotyping of SNPs from diploid samples , 2006, Nature Genetics.

[28]  Mohammad Ibrahim Khan,et al.  An integrated algorithm for local sequence alignment , 2014, Network Modeling Analysis in Health Informatics and Bioinformatics.