Generation and evaluation of dimension-reduced amino acid parameter representations by artificial neural networks

Abstract. In order to process data of proteins, a numerical representation for an amino acid is often necessary. Many suitable parameters can be derived from experiments or statistical analysis of databases. To ensure a fast and efficient use of these sources of information, a reduction and extraction of relevant information out of these parameters is a basic need. In this approach established methods like principal component analysis (PCA) are supplemented by a method based on symmetric neural networks. Two different parameter representations of amino acids are reduced from five and seven dimensions, respectively, to one, two, three, or four dimensions by using a symmetric neural network approach alternatively with one or three hidden layers. It is possible to create general reduced parameter representations for amino acids. To demonstrate the ability of this approach, these reduced sets of parameters are applied for the ab initio prediction of protein secondary structure from primary structure only. Artificial neural networks are implemented and trained with a diverse representation of 430 proteins out of the PDB. An essentially faster training and also prediction without a decrease in accuracy is obtained for the reduced parameter representations in comparison with the complete set of parameters. The method is transferable to other amino acids or even other molecular building blocks, like nucleic acids, and therefore represents a general approach.