An Improved Genetic Algorithm for Text Clustering

The genetic algorithm (GA) is a self-adapted probability search method used to solve optimization problems, which has been applied widely in science and engineering. In this paper, we propose an improved variable string length genetic algorithm (IVGA) for text clustering. Our algorithm has been exploited for automatically evolving the optimal number of clusters as well as providing proper data set clustering. The chromosome is encoded by special indices to indicate the location of each gene. More effective version of evolutional steps can automatically adjust the influence between the diversity of the population and selective pressure during generations. The superiority of the improved genetic algorithm over conventional variable string length genetic algorithm (VGA) is demonstrated by providing proper text clustering.

[1]  Guan Yi A Survey of Document Clustering , 2006 .

[2]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Ujjwal Maulik,et al.  Genetic algorithm-based clustering technique , 2000, Pattern Recognit..

[4]  Ujjwal Maulik,et al.  Performance Evaluation of Some Clustering Algorithms and Validity Indices , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Minoru Sasaki,et al.  Spam detection using text clustering , 2005, 2005 International Conference on Cyberworlds (CW'05).

[6]  Xiaoqing Ding,et al.  Combining text clustering and retrieval for corpus adaptation , 2007, Electronic Imaging.

[7]  Xin Yao,et al.  Evolutionary programming made faster , 1999, IEEE Trans. Evol. Comput..

[8]  S. Bandyopadhyay,et al.  Nonparametric genetic clustering: comparison of validity indices , 2001, IEEE Trans. Syst. Man Cybern. Syst..