A VOM based gene-finder that specializes in short genes

We present a new approach for gene finding based on a variable-order Markov (VOM) model. The VOM model is a generalization of the traditional Markov model; it is more efficient in terms of its parameterization, and, thus, can be trained on relatively short sequences. As a result, the proposed VOM gene-finder outperforms traditional gene-finders that are based on fifth-order Markov models for short newly sequenced bacterial genomes.

[1]  Anders Krogh,et al.  EasyGene – a prokaryotic gene finder that ranks ORFs by statistical significance , 2003, BMC Bioinformatics.

[2]  Jacob Ziv A universal prediction lemma and applications to universal data compression and prediction , 2001, IEEE Trans. Inf. Theory.

[3]  Douglas L. Brutlag,et al.  BioProspector: Discovering Conserved DNA Motifs in Upstream Regulatory Regions of Co-Expressed Genes , 2000, Pacific Symposium on Biocomputing.

[4]  M. Borodovsky,et al.  GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. , 2001, Nucleic acids research.

[5]  Stéphane Grumbach,et al.  A New Challenge for Compression Algorithms: Genetic Sequences , 1994, Inf. Process. Manag..

[6]  S. Karlin,et al.  Finding the genes in genomic DNA. , 1998, Current opinion in structural biology.

[7]  Mark Borodovsky,et al.  GENMARK: Parallel Gene Recognition for Both DNA Strands , 1993, Comput. Chem..

[8]  J. Fickett,et al.  Assessment of protein coding measures. , 1992, Nucleic acids research.

[9]  Chungfan Kim,et al.  A Study on Dicodon-oriented Gene Finding using Self-Identification Learning , 2000 .

[10]  JORMA RISSANEN,et al.  A universal data compression system , 1983, IEEE Trans. Inf. Theory.

[11]  Armin Shmilovici,et al.  Context-Based Statistical Process Control , 2003, Technometrics.

[12]  Felix L. Chernousko,et al.  Finding prokaryotic genes by the 'frame-by-frame' algorithm: targeting gene starts and overlapping genes , 1999, Bioinform..