Feature selection in text categorization using the Baldwin effect

Text categorization is the problem of automatically assigning predefined categories to natural language texts. A major difficulty of this problem stems from the high dimensionality of its feature space. Reducing the dimensionality, or selecting a good subset of features, without sacrificing accuracy, is of great importance for neural networks to be successfully applied to the area. In this paper, we propose a neuro-genetic approach to feature selection in text categorization. Candidate feature subsets are evaluated by using three-layer feedforward neural networks. The Baldwin effect concerns the tradeoffs between learning and evolution. It is used in our research to guide and improve the GA-based evolution of the feature subsets. Experimental results show that our neuro-genetic algorithm is able to perform as well as, if not better than, the best results of neural networks to date, while using fewer input features.