Detection of 3-periodicity for small genomic sequences based on AR technique

The major signal in protein coding regions of genomic sequences is f=1/3 periodicity; some methods such as FFT-based methods, autocorrelation, mutual information function, etc., which exploit this phenomenon, rapidly lose effectiveness in the case of small DNA sequences when attempting to detect 3-periodicity. The paper proposes an AR technique as an alternative tool for this purpose, due to its improved coding region resolution for small data records. Theoretical analysis and experimental results show that the detection resolution for the AR technique is higher than that of Fourier methods for small DNA sequences. The sequence length and structure are the main factors that affect performance of any detection method. However, AR methods are more robust against variations in these factors. Unlike neural net-based methods, no a priori knowledge of the sequences is required. Hence, the AR technique appears to be a useful tool for 3-periodicity (including other periodicities), repeat and regulatory regions of unknown genomic sequences, especially small genomic sequence.