Feature discovery under contextual supervision using mutual information

The author considers a neural network in which the inputs may be divided into two groups, termed primary inputs and contextual inputs. The goal of the network is to discover those linear functions of the primary inputs that are maximally related to the information contained in the contextual units. The strength of the relationship between the two sets of inputs is measured by using their average mutual information. In the situation where the inputs follow a multivariate, elliptically symmetric probability model, this is equivalent to performing a canonical correlation analysis. A stochastic algorithm is introduced to achieve this analysis. Some theoretical details including a convergence results are presented. Some possible nonlinear extensions are discussed.<<ETX>>