Feature extraction via Neural networks

A method for feature extraction which makes use of feedforward neural networks with a single hidden layer is presented. The topology of the networks is determined by a network construction algorithm and a network pruning algorithm. Network construction is achieved by having just 1 hidden unit initially; additional units are added only when they are needed to improve the network predictive accuracy. Once a fully connected network has been constructed, irrelevant/redundant network connections are removed by pruning. The hidden unit activations of the pruned network are the features extracted from the original dataset. Using artificial datasets, we illustrate how the method works and interpret the extracted features in terms of the original attributes of the datasets. We also discuss how the feature extraction method can be used in conjunction with other learning algorithms such as decision tree methods to obtain robust and effective classifiers.

[1]  Rudy Setiono,et al.  On the solution of the parity problem by a single hidden layer feedforward neural network , 1997, Neurocomputing.

[2]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[3]  Huan Liu,et al.  Chi2: feature selection and discretization of numeric attributes , 1995, Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence.

[4]  Rudy Setiono A Neural Network Construction Algorithm which Maximizes the Likelihood Function , 1995, Connect. Sci..

[5]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[6]  Simon Kasif,et al.  OC1: A Randomized Induction of Oblique Decision Trees , 1993, AAAI.

[7]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[8]  Ron Kohavi,et al.  Lazy Decision Trees , 1996, AAAI/IAAI, Vol. 1.

[9]  Larry A. Rendell,et al.  Constructive Induction On Decision Trees , 1989, IJCAI.

[10]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[11]  Christopher M. Bishop,et al.  Classification and regression , 1997 .

[12]  Rudy Setiono,et al.  A Penalty-Function Approach for Pruning Feedforward Neural Networks , 1997, Neural Computation.

[13]  J. Ross Quinlan,et al.  Generating Production Rules from Decision Trees , 1987, IJCAI.

[14]  Larry A. Rendell,et al.  Global Data Analysis and the Fragmentation Problem in Decision Tree Induction , 1997, ECML.

[15]  Gerhard Widmer,et al.  Machine Learning: ECML-97 , 1997, Lecture Notes in Computer Science.

[16]  Alberto L. Sangiovanni-Vincentelli,et al.  Inferring Reduced Ordered Decision Graphs of Minimum Description Length , 1995, ICML.

[17]  Larry A. Rendell,et al.  Learning hard concepts through constructive induction: framework and rationale , 1990, Comput. Intell..

[18]  Ron Kohavi,et al.  Bottom-Up Induction of Oblivious Read-Once Decision Graphs: Strengths and Limitations , 1994, AAAI.