Bayesian Detection of Coding Regions in DNA/RNA Sequences Through Event Factoring

We describe a Bayesian inference method for the identification of protein coding regions (active or residual) in DNA or RNA sequences. Its main feature is the computation of the conditional and a priori probabilities required in Bayes's formula by factoring each event (possible annotation) for a nucleotide string into the concatenation of shorter events, believed to be independent. The factoring allows us to obtain fast but reliable estimates for these parameters from readily available databases; whereas the probability estimation for unfactored events would require databases and tables of astronomical size. Promising results were obtained in tests with natural and artificial genomes.