Analysis of the predictability of time series obtained from genomic sequences by using several predictors

In previous papers, we used one-step-ahead predictors for the genomic sequence recognition scores computation. The genomic sequences are coded as distances between successive bases. The recognition scores were then used as inputs for a hierarchical decision system. The relevance of these scores might be affected by the prediction quality. It is necessary to appreciate the prediction performance in a framework based on the analyzed time series predictability. The aim of this paper is to determine which predictors are most suitable for genomic sequence identification. We analyze linear predictors (like linear combiner), neuronal predictors (RBF or MLP type), and neuro-fuzzy predictors (Yamakawa model based). Several methods to appreciate the predictability of time series are used, like Hurst exponent, self-correlation function, and eta metric. All predictors were tested and compared for prediction quality using sequences from HIV-1 genome. The mean square prediction error (MSPE), direction test, and Theil coefficient were used as prediction performance measures. The prediction results obtained with the predictors are contrasted and discussed.

[1]  P. R. Shearer,et al.  Quantitative Forecasting Methods , 1990 .

[2]  Minglei Duan,et al.  TIME SERIES PREDICTABILITY , 2002 .

[3]  Horia-Nicolai Teodorescu Genetics, Gene Prediction, and Neuro-Fuzzy Systems - The Context and A Program Proposal , 2003 .

[4]  Horia-Nicolai L. Teodorescu,et al.  DNA Sequence Pattern Identification Using a Combination of Neuro-Fuzzy Predictors , 2004, ICONIP.

[5]  James D. Hamilton Time Series Analysis , 1994 .

[6]  Horia-Nicolai L. Teodorescu,et al.  A hybrid data-mining approach in genomics and text structures , 2003, Third IEEE International Conference on Data Mining.

[7]  M. A. Kaboudan,et al.  A Measure of Time Series’ Predictability Using Genetic Programming , 2004 .

[8]  Cathy H. Wu,et al.  Neural networks and genome informatics , 2000 .

[9]  Carrie Knerr,et al.  Time series prediction using neural networks , 2004 .

[10]  H.-N. Teodorescu,et al.  Genome bases sequences characterization by a neuro-fuzzy predictor , 2003, Proceedings of the 25th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (IEEE Cat. No.03CH37439).

[11]  Takeshi Yamakawa,et al.  Neuro-fuzzy systems: hybrid configurations , 1996 .

[12]  Horia-Nicolai Teodorescu,et al.  Predicting the Genome Bases Sequences by means of distance sequences and a Neuro-Fuzzy Predictor , 2010 .

[13]  Ncbi National Center for Biotechnology Information , 2008 .

[14]  Jonathan D. Cryer,et al.  Time Series Analysis , 1986 .