Chinese Web page classification based on CFS-GA feature selection algorithm

To reduce the dimension of the feature space and improve the precision of Chinese Web page classification,a method based on Correlation-based Feature Selection(CFS) and Genetic Algorithm(GA) is used in the process of feature selection.In the CFS-GA algorithm,a feature subset is regarded as a chromosome which is then performed in binary encode,and CFS is used as GA's fitness function to evaluate the chromosome.The greater the CFS value is,the greater the probability that individuals inherit to the next generation will be.Combining with GA's global search character,the algorithm can ensure that the feature subset is global optimum.Experiment is done on weka platform with the Chinese Web page dataset provided by the Sougou lab.The result shows that this algorithm can reduce the dimension of the feature space effectively and improve the precision of the classification.