An integrated algorithm for local sequence alignment

Local sequence alignment (LSA) is an essential part of DNA sequencing. LSA helps to identify the facts in biological identity, criminal investigations, disease identification, drug design and research. Large volume of biological data makes difficulties to the performance of efficient analysis and proper management of data in small space has become a serious issue. We have subdivided the data sets into various segments to reduce the data sets as well as for efficient memory use. The integration of dynamic programming (DP) and Chapman–Kolmogorov equations (CKE) makes the analysis faster. The subdivision process is named data reducing process (DRP). DRP is imposed before DP and CKE. This approach needs less space compared with other methods and the time requirement is also improved.

[1]  S. Lakshmivarahan,et al.  Parallel Sorting Algorithms , 1984, Adv. Comput..

[2]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[3]  J. Mullikin,et al.  SSAHA: a fast search method for large DNA databases. , 2001, Genome research.

[4]  Jin Wang,et al.  MED: a new non-supervised gene prediction algorithm for bacterial and archaeal genomes , 2007, BMC Bioinformatics.

[5]  R. Fleischmann,et al.  Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. , 1995, Science.

[6]  S. Karlin,et al.  Applications and statistics for multiple high-scoring segments in molecular sequences. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[8]  Steven Salzberg,et al.  Identifying bacterial genes and endosymbiont DNA with Glimmer , 2007, Bioinform..

[9]  Jin Wang,et al.  Accuracy improvement for identifying translation initiation sites in microbial genomes , 2004, Bioinform..

[10]  L. Pachter,et al.  SLAM: cross-species gene finding and alignment with a generalized pair hidden Markov model. , 2003, Genome research.

[11]  Waqar Haque,et al.  An efficient algorithm for local sequence alignment , 2008, 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[12]  David D. Lewis,et al.  An evaluation of phrasal and clustered representations on a text categorization task , 1992, SIGIR '92.

[13]  Temple F. Smith,et al.  Comparison of biosequences , 1981 .

[14]  Amir Dembo,et al.  Strong limit theorems of empirical functionals for large exceedances of partial sums of i , 1991 .

[15]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[16]  T. Du,et al.  Asymmetry in the Assembly of the RNAi Enzyme Complex , 2003, Cell.

[17]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[18]  P. Green,et al.  Base-calling of automated sequencer traces using phred. I. Accuracy assessment. , 1998, Genome research.

[19]  M. Waterman,et al.  Stochastic scrabble: large deviations for sequences with scores , 1988, Journal of Applied Probability.

[20]  S. Karlin,et al.  Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Maike Tech,et al.  An unsupervised classification scheme for improving predictions of prokaryotic TIS , 2006, BMC Bioinformatics.

[22]  Gail L. Rosen,et al.  Combining gene prediction methods to improve metagenomic gene annotation , 2011, BMC Bioinformatics.

[23]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[24]  M. Brent,et al.  Iterative gene prediction and pseudogene removal improves genome annotation. , 2006, Genome research.

[25]  T. D. Schneider,et al.  Use of the 'Perceptron' algorithm to distinguish translational initiation sites in E. coli. , 1982, Nucleic acids research.

[26]  Amitabh Varshney,et al.  High-throughput sequence alignment using Graphics Processing Units , 2007, BMC Bioinformatics.

[27]  C. van Broeckhoven,et al.  novoSNP, a novel computational tool for sequence variation discovery. , 2005, Genome research.

[28]  Toru Suzuki,et al.  Stage‐specific expression of microRNAs during Xenopus development , 2005, FEBS letters.

[29]  Nicholas L. Bray,et al.  AVID: A global alignment program. , 2003, Genome research.

[30]  S. Salzberg,et al.  Fast algorithms for large-scale genome alignment and comparison. , 2002, Nucleic acids research.

[31]  Wanda Pratt,et al.  The Effect of Feature Representation on MEDLINE Document Classification , 2005, AMIA.

[32]  E. Snyder,et al.  A versatile computational pipeline for bacterial genome annotation improvement and comparative analysis, with Brucella as a use case , 2007, Nucleic Acids Research.

[33]  Bor-Sen Chen,et al.  Computational reconstruction of transcriptional regulatory modules of the yeast cell cycle , 2006, BMC Bioinformatics.

[34]  D. Lipman,et al.  Rapid and sensitive protein similarity searches. , 1985, Science.

[35]  Craig A. Stewart,et al.  Introduction to computational biology , 2005 .

[36]  F. Eisenhaber,et al.  Synthesizing non-natural parts from natural genomic template , 2009, Journal of biological engineering.

[37]  Peter Goldfarb,et al.  Molecular biology (2nd edn): by David Freifelder, Jones & Bartlett Publishers, 1987. £19.95 (xxiv + 834 pages) ISBN 0 86720 069 3 , 1987 .

[38]  Jill P. Mesirov,et al.  Human and mouse gene structure: comparative analysis and application to exon prediction , 2000, RECOMB '00.

[39]  Natalia N. Ivanova,et al.  GenePRIMP: a gene prediction improvement pipeline for prokaryotic genomes , 2010, Nature Methods.

[40]  Martin Kollmar,et al.  A novel hybrid gene prediction method employing protein multiple sequence alignments , 2011, Bioinform..

[41]  S. Colowick,et al.  Methods in Enzymology , Vol , 1966 .

[42]  James P. Callan,et al.  Training algorithms for linear text classifiers , 1996, SIGIR '96.

[43]  Paul Scheet,et al.  Automating sequence-based detection and genotyping of SNPs from diploid samples , 2006, Nature Genetics.

[44]  P Green,et al.  Base-calling of automated sequencer traces using phred. II. Error probabilities. , 1998, Genome research.

[45]  Jean-Michel Claverie,et al.  The Difficulty of Identifying Genes in Anonymous Vertebrate Sequences , 1997, Comput. Chem..

[46]  Padmini Srinivasan,et al.  Hierarchical neural networks for text categorization (poster abstract) , 1999, SIGIR '99.

[47]  M. Waterman Mathematical Methods for DNA Sequences , 1989 .