Positive sample only learning (PSOL) for predicting RNA genes in E. coli

RNA genes lack most of the signals used for protein gene identification. A major shortcoming of previous discriminative methods to distinguish functional RNA (fRNA) genes from other non-coding genomic sequences is that only positive examples of fRNAs are known; there are no confirmed negatives - only intergenic sequences that may be positive or negative. To address this problem we developed the "positive sample only learning" (PSOL) method. This method can identify the most likely negative examples from an unlabeled set and is therefore able to distinguish putative functional RNA genes from other non-coding sequence. We compare RNA gene predictions using the PSOL method with previous large-scale analyses of the E. coli K12 genome.