A hierarchical neural network for predicting protein functions

This paper introduces the use of a modified feedforward neural network to cope with the problem of predicting protein functions. Since this kind of classification task is inherently hierarchical, this work proposes the use of two different architectures for the modified feedforward neural network, both mimicking the hierarchical nature of the classes (protein functions) to be predicted. The first approach consists of four feed-forward neural networks in cascade, each one taking as input the classification obtained by the previous network, which means, the input to a network is the classes that could be assigned to the protein at the immediately higher (parent) level in the class hierarchy. The second approach is an extension of the first one, which also adds as input to each sub-network the attributes of the protein being classified. In both situations, it was used two kinds of feed-forward architectures: an Adaline network, which is composed of a single layer of adjustable weights, and a MLP ("Multi-Layer Perceptron"), composed by two layers of adjustable weights. Both approaches were compared with a baseline consisting of a single MLP that maps the input attributes to the classes of the lowest level in the hierarchy. The MLP was built with the input layer, plus one hidden layer and one output layer. The three approaches were compared on eight datasets, the first four involving the prediction of GPCR (G-Protein Coupled Receptor) functions and the second four datasets involving the prediction of enzymes functions. The results show that a big-bang hierarchical neural network, based on the MLP paradigm, using a top-down evaluation for new instances has better behavior in hierarchical problems, when compared to its flat version.