Identifying DNA splice sites using patterns statistical properties and fuzzy neural networks

This study introduces a new approach to recognize the boundaries between the parts of the DNA sequence retained after splicing and the parts of the DNA that are spliced out. The basic idea is to derive a new dataset from the original data to enhance the accuracy of the wellknown classification algorithms. The most accurate results are obtained by using a derived dataset that consists from the highest correlated features and the interesting statistical properties of the DNA sequences. On the other hand, using adaptive network based fuzzy inference system (ANFIS) with the derived dataset outperforms well-known classification algorithms. The classification rate that is achieved by using the new approach is 95.23 %, while the classification rates 92.12 %, 86.75 %, 83.13 % and 84.51 % are obtained by LevenbergMarquardt, generalized regression neural networks, radial basis functions and learning vector quantization, respectively. Moreover, this approach can be used to represent the DNA splice sites problem in form if-then rules and hence provides an understanding about the properties of this problem.