Web Text Mining Using Harmony Search

The Harmony Search (HS) algorithm in recent years has been applied in many applications in computer science and engineering. This chapter is intended to review the application of the HS method in the area of web document clustering. Clustering is a problem of great practical importance that has been the focus of substantial research in several domains for decades. It is defined as the problem of partitioning data objects into groups, such that objects in the same group are similar, while objects in different groups are dissimilar. Due to the high-dimension and sparseness properties of documents the problem of clustering becomes more challenging when we apply it on web documents. Two algorithms in literature were proposed for clustering web documents with HS which will be reviewed in this chapter. Also three hybridization of HS based clustering with K-means algorithm will be reviewed. It will be shown that the HS method can outperform other methods in terms of solution quality and computational time.

[1]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[2]  Philip S. Yu,et al.  On the merits of building categorization systems by supervised clustering , 1999, KDD '99.

[3]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[4]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[5]  Nicolas Monmarché,et al.  AntClust: Ant Clustering and Web Usage Mining , 2003, GECCO.

[6]  Witold Pedrycz,et al.  Data Mining Methods for Knowledge Discovery , 1998, IEEE Trans. Neural Networks.

[7]  Andreas Hotho,et al.  Semantic Web Mining: State of the art and future directions , 2006, J. Web Semant..

[8]  Václav Snásel,et al.  Web Data Clustering , 2009, Foundations of Computational Intelligence.

[9]  Clark F. Olson,et al.  Parallel Algorithms for Hierarchical Clustering , 1995, Parallel Comput..

[10]  Shi Zhong,et al.  Semi-supervised model-based document clustering: A comparative study , 2006, Machine Learning.

[11]  Mohammad Reza Meybodi,et al.  Hybridization of K-Means and Harmony Search Methods for Web Page Clustering , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[12]  Morteza Haghir Chehreghani,et al.  Novel meta-heuristic algorithms for clustering web documents , 2008, Appl. Math. Comput..

[13]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[14]  Hassan Abolhassani,et al.  Harmony K-means algorithm for document clustering , 2009, Data Mining and Knowledge Discovery.

[15]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[16]  Joydeep Ghosh,et al.  Under Consideration for Publication in Knowledge and Information Systems Generative Model-based Document Clustering: a Comparative Study , 2003 .

[17]  B. Everitt,et al.  Cluster Analysis (2nd ed). , 1982 .

[18]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[19]  Gerald Salton,et al.  Automatic text processing , 1988 .

[20]  Shuting Xu,et al.  A Parallel Hybrid Web Document Clustering Algorithm and its Performance Study , 2004, The Journal of Supercomputing.

[21]  Brian Everitt,et al.  Cluster analysis , 1974 .

[22]  Andries P. Engelbrecht,et al.  Image Classification using Particle Swarm Optimization , 2002, SEAL.

[23]  Andries Petrus Engelbrecht,et al.  Data clustering using particle swarm optimization , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..

[24]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[25]  Daniela Rus,et al.  Using star clusters for filtering , 2000, CIKM '00.

[26]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[27]  Martin Ester,et al.  Frequent term-based text clustering , 2002, KDD.

[28]  Thomas E. Potok,et al.  Document clustering using particle swarm optimization , 2005, Proceedings 2005 IEEE Swarm Intelligence Symposium, 2005. SIS 2005..

[29]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[30]  George Karypis,et al.  Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering , 2004, Machine Learning.

[31]  Shi Zhong,et al.  A Comparative Study of Generative Models for Document Clustering , 2003 .

[32]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[33]  Chinatsu Aone,et al.  Fast and effective text mining using linear-time document clustering , 1999, KDD '99.

[34]  Vijay V. Raghavan,et al.  A clustering strategy based on a formalism of the reproductive process in natural systems , 1979, SIGIR 1979.