Investigating an Artificial Immune System to strengthen protein structure prediction and protein coding region identification using the Cellular Automata classifier

Genes carry the instructions for making proteins that are found in a cell as a specific sequence of nucleotides that are found in DNA molecules. But, the regions of these genes that code for proteins may occupy only a small region of the sequence. Identification of the coding regions plays a vital role in understanding these genes. In this paper we have explored an Artificial Immune System (AIS) that can be used to strengthen and identify the protein coding regions in a genomic DNA system in changing environments and the CA technique for protein structure prediction of small alpha/beta proteins using Rosetta. From an initial round of Rosetta sampling, we learn properties of the energy landscape that guide a subsequent round of sampling toward lower-energy structures. Three different approaches to improve tertiary fold prediction using the genetic algorithm are discussed: refinement of the search strategy; combination of prediction and experiment; inclusion of experimental data as selection criteria into the genetic algorithm. It has been developed using a slight variant of genetic algorithm. Good classifiers can be produced, especially when the number of the antigens is increased. However, an increase in the range of the antigens somehow affects the fitness of the immune system. Experimental results confirm the scalability of the proposed AIS FMACA based classifier to handle large volume of datasets irrespective of the number of classes, tuples and attributes. We note an increase in accuracy of more than 5.2%, over any existing standard algorithms that address this problem. This was the first algorithm to identify protein coding regions in mixed and also non-overlapping exon-intron boundary DNA sequences. The accuracy of prediction of the structure of proteins was also found comparable.

[1]  H. Szu,et al.  Fast TSP algorithm based on binary neuron output and analog neuron input using the zero-diagonal interconnect matrix and necessary and sufficient constraints of the permutation matrix , 1988, IEEE 1988 International Conference on Neural Networks.

[2]  E. Snyder,et al.  Identification of coding regions in genomic DNA sequences: an application of dynamic programming and neural networks. , 1993, Nucleic acids research.

[3]  Jerne Nk Towards a network theory of the immune system. , 1974 .

[4]  G. Vichniac Simulating physics with cellular automata , 1984 .

[5]  B. Blaisdell,et al.  A prevalent persistent global nonrandomness that distinguishes coding and non-coding eucaryotic nuclear DNA sequences , 2006, Journal of Molecular Evolution.

[6]  J. Kanashige,et al.  Challenging Aerospace Problems for Intelligent Systems , 2003 .

[7]  Jonathan Timmis,et al.  A resource limited artificial immune system for data analysis , 2001, Knowl. Based Syst..

[8]  Alan S. Perelson,et al.  The immune system, adaptation, and machine learning , 1986 .

[9]  A. Lapedes,et al.  Determination of eukaryotic protein coding regions using neural networks and information theory. , 1992, Journal of molecular biology.

[10]  A. Tauber Historical and Philosophical Perspectives Concerning Immune Cognition , 1997, Journal of the history of biology.

[11]  Marco Tomassini,et al.  Artificially Evolved Asynchronous Cellular Automata for the Density Task , 2002, ACRI.

[12]  Robin Milner,et al.  On Observing Nondeterminism and Concurrency , 1980, ICALP.

[13]  C. Langton Self-reproduction in cellular automata , 1984 .

[14]  S. Brunak,et al.  Neural network model of the genetic code is strongly correlated to the GES scale of amino acid transfer free energies. , 1994, Journal of molecular biology.

[15]  A. Thanailakis,et al.  Pseudorandom number generators for VLSI systems based on linear cellular automata , 1991 .

[16]  D. Dasgupta Artificial Immune Systems and Their Applications , 1998, Springer Berlin Heidelberg.

[17]  Jonathan Timmis,et al.  Artificial immune systems - a new computational intelligence paradigm , 2002 .

[18]  J. Fickett Recognition of protein coding regions in DNA sequences. , 1982, Nucleic acids research.

[19]  Tommaso Toffoli,et al.  Reversible Computing , 1980, ICALP.

[20]  Joos Vandewalle,et al.  Determination of weights for Hopfield associative memory by error back propagation , 1991, 1991., IEEE International Sympoisum on Circuits and Systems.

[21]  Santanu Chattopadhyay,et al.  Highly regular, modular, and cascadable design of cellular automata-based pattern classifier , 2000, IEEE Trans. Very Large Scale Integr. Syst..

[22]  E. Snyder,et al.  Identification of protein coding regions in genomic DNA. , 1995, Journal of molecular biology.

[23]  E. Uberbacher,et al.  Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Nicola Santoro,et al.  Convergence and aperiodicity in fuzzy cellular automata: Revisiting rule 90 , 1998 .

[25]  Parimal Pal Chaudhuri,et al.  Fuzzy Cellular Automata for Modeling Pattern Classifier , 2005, IEICE Trans. Inf. Syst..