A Kernel Method to Extract Common Features Based on Mutual Information

Kernel canonical correlation analysis (CCA) aims to extract common features from a pair of multivariate data sets by maximizing a linear correlation between nonlinear mappings of the data. However, the kernel CCA tends to obtain the features that have only small information of original multivariates in spite of their high correlation, because it considers only statistics of the extracted features and the nonlinear mappings have high degree of freedom. We propose a kernel method for common feature extraction based on mutual information that maximizes a new objective function. The objective function is a linear combination of two kinds of mutual information, one between the extracted features and the other between the multivariate and its feature. A large value of the former mutual information provides strong dependency to the features, and the latter prevents loss of the feature’s information related to the multivariate. We maximize the objective function by using the Parallel Tempering MCMC in order to overcome a local maximum problem. We show the effectiveness of the proposed method via numerical experiments.