Neural network model of the genetic code is strongly correlated to the GES scale of amino acid transfer free energies.

A neural network trained to classify the 61 nucleotide triplets of the genetic code into 20 amino acid categories develops in its internal representation a pattern matching the relative cost of transferring amino acids with satisfied backbone hydrogen bonds from water to an environment of dielectric constant of roughly 2.0. Such environments are typically found in lipid membranes or in the interior of proteins. In learning the mapping between the codons and the categories, the network groups the amino acids according to the scale of transfer free energies developed by Engelman, Goldman and Steitz. Several other scales based on internal preference statistics also agree reasonably well with the network grouping. The network is able to relate the structure of the genetic code to quantifications of amino acid hydrophobicity-hydrophilicity more systematically than the numerous attempts made earlier. Due to its inherent non-linearity, the code is also shown to impose decisive constraints on algorithmic analysis of the protein coding potential of DNA.