Performance analysis of Particle Swarm Optimization applied to unsupervised categorization of short texts

Nowadays there is a need to access to on line information such as abstracts, news, opinions, evaluations of products, etc. That information is generally available on the web as short texts. Previous works have demonstrated the effectiveness of a discrete Particle Swarm Optimization algorithm, named CLUDIPSO, for clustering small short-text corpora. This article presents a preliminary study about the performance of CLUDIPSO on larger short-text corpora. The results were compared with those of the most representative algorithms of the state-of-the-art in the area. The experimental work gives strong evidence about the drawbacks of this algorithm to manage larger corpora. With respect to this last aspect, some possible reasons about the poor behavior of CLUDIPSO with larger short texts corpora are discussed and some alternatives in order to solve the problems observed, are considered.

[1]  Russell C. Eberhart,et al.  Swarm intelligence for permutation optimization: a case study of n-queens problem , 2003, Proceedings of the 2003 IEEE Swarm Intelligence Symposium. SIS'03 (Cat. No.03EX706).

[2]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[3]  Leticia C. Cagnina,et al.  Particle swarm optimization for sequencing problems: a case study , 2004, Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No.04TH8753).

[4]  Paolo Rosso,et al.  Particle Swarm Optimization for clustering short-text corpora , 2009, Computational Intelligence and Bioengineering.

[5]  Alexander F. Gelbukh,et al.  Clustering Abstracts Instead of Full Texts , 2004, TSD.

[6]  Paolo Rosso,et al.  A DISCRETE PARTICLE SWARM OPTIMIZER FOR CLUSTERING SHORT-TEXT CORPORA , 2008 .

[7]  Paolo Rosso,et al.  Evaluation of Internal Validity Measures in Short-Text Corpora , 2008, CICLing.

[8]  Paolo Rosso,et al.  Clustering Narrow-Domain Short Texts by Using the Kullback-Leibler Distance , 2009, CICLing.

[9]  Russell C. Eberhart,et al.  A new optimizer using particle swarm theory , 1995, MHS'95. Proceedings of the Sixth International Symposium on Micro Machine and Human Science.

[10]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[11]  Tomaz Erjavec,et al.  The JRC-Acquis: A Multilingual Aligned Parallel Corpus with 20+ Languages , 2006, LREC.

[12]  Paolo Rosso,et al.  An Approach to Clustering Abstracts , 2005, NLDB.

[13]  M. Braga,et al.  Exploratory Data Analysis , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[14]  Yue Shi,et al.  A modified particle swarm optimizer , 1998, 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98TH8360).

[15]  Benno Stein,et al.  On the Nature of Structure and Its Identification , 1999, WG.