Dimensionality reduction based on ICA for regression problems

In manipulating data such as in supervised learning, we often extract new features from the original features for the purpose of reducing the dimensions of feature space and achieving better performance. In this paper, we show how standard algorithms for independent component analysis (ICA) can be applied to extract features for regression problems. The advantage is that general ICA algorithms become available to a task of feature extraction for regression problems by maximizing the joint mutual information between target variable and new features. Using the new features, we can greatly reduce the dimension of feature space without degrading the regression performance.

[1]  Thomas S. Huang,et al.  Small sample learning during multimedia retrieval using BiasMap , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[2]  Chong-Ho Choi,et al.  Input Feature Selection by Mutual Information Based on Parzen Window , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[4]  J.C. Principe,et al.  A methodology for information theoretic feature extraction , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).

[5]  E. Nadaraya On Estimating Regression , 1964 .

[6]  George H. John Enhancements to the data mining process , 1997 .

[7]  Terrence J. Sejnowski,et al.  Independent Component Analysis Using an Extended Infomax Algorithm for Mixed Subgaussian and Supergaussian Sources , 1999, Neural Computation.

[8]  Marco Loog,et al.  Supervised dimensionality reduction and contextual pattern recognition in medical image processing , 2004 .

[9]  Chong-Ho Choi,et al.  Feature Extraction Based on ICA for Binary Classification Problems , 2003, IEEE Trans. Knowl. Data Eng..

[10]  S. Weisberg Applied Linear Regression , 1981 .

[11]  K. Torkkola,et al.  Nonlinear feature transforms using maximum mutual information , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[12]  J. Príncipe,et al.  Learning from examples with quadratic mutual information , 1998, Neural Networks for Signal Processing VIII. Proceedings of the 1998 IEEE Signal Processing Society Workshop (Cat. No.98TH8378).

[13]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[14]  Konstantinos N. Plataniotis,et al.  Regularization studies of linear discriminant analysis in small sample size scenarios with application to face recognition , 2005, Pattern Recognit. Lett..

[15]  I. Jolliffe Principal Component Analysis , 2002 .

[16]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[17]  Chong-Ho Choi,et al.  Input feature selection for classification problems , 2002, IEEE Trans. Neural Networks.

[18]  Jian Yang,et al.  KPCA plus LDA: a complete kernel Fisher discriminant framework for feature extraction and recognition , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Paul A. Viola,et al.  Alignment by Maximization of Mutual Information , 1997, International Journal of Computer Vision.

[20]  Andrew R. Webb,et al.  Statistical Pattern Recognition , 1999 .

[21]  Ja-Chen Lin,et al.  A new LDA-based face recognition system which can solve the small sample size problem , 1998, Pattern Recognit..

[22]  Ker-Chau Li,et al.  On Principal Hessian Directions for Data Visualization and Dimension Reduction: Another Application of Stein's Lemma , 1992 .

[23]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[24]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[25]  Ker-Chau Li Sliced inverse regression for dimension reduction (with discussion) , 1991 .

[26]  Nojun Kwak Feature Extraction Based on Direct Calculation of Mutual Information , 2007, Int. J. Pattern Recognit. Artif. Intell..

[27]  Michel Vidal-Naquet,et al.  Visual features of intermediate complexity and their use in classification , 2002, Nature Neuroscience.

[28]  Hiroshi Motoda,et al.  Feature Extraction, Construction and Selection , 1998 .

[29]  Erkki Oja,et al.  Independent Component Analysis , 2001 .

[30]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[31]  Vladimir Cherkassky,et al.  Learning from Data: Concepts, Theory, and Methods , 1998 .

[32]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .