A new approach to clustering the amino acids.

Each amino acid is represented by a vector of numerical measurements for the attributes of volume, area, hydrophilicity, polarity, hydrogen bonding, shape, and charge. Inter-residue distances are then calculated according to common metrics, and we introduce a new clustering objective function derived from information-theoretic considerations. The arguments of the function are the inter-object distances of the things to be clustered: in this case the amino acids. By means of approximating the solution of an integer programming problem, then, the residues are partitioned into clusters. The clusters obtained are compared with groups obtained in substitution/mutation studies and found to be similar. Thus, probably the strongest and most objective evidence to date is supplied for believing that physico-chemical properties account for the viability of substitutions and that the important similarities/differences are explained by a relatively small and simple set of properties.