An Improved Semantic Smoothing Model for Model-Based Document Clustering

Recently, semantic smoothing is proposed as an efficient solution for the improvement of document cluster quality. However, the existing semantic smoothing model is not effective for partitional clustering to enhance the document clustering quality. In this paper, inspired by the TF*IDF schema and background elimination strategy, we first introduce an improved semantic smoothing model, which is suitable for both agglomerative and partitional clustering. Based on the improved semantic smoothing model, two model-document clustering algorithms, the partitional clustering algorithm and the agglomerative clustering algorithm, are also presented. The experimental results show our algorithms are more effective than the previous methods to improve the cluster quality.

[1]  John D. Lafferty,et al.  Two-stage language models for information retrieval , 2002, SIGIR '02.

[2]  Xiaohua Hu,et al.  Context-sensitive semantic smoothing for the language modeling approach to genomic IR , 2006, SIGIR.

[3]  Dai Yi-ru Model-based Research and Application of Virtual Enterprise Partner Selection , 2004 .

[4]  Roland Heilmann Resource–constrained project scheduling: a heuristic for the multi–mode case , 2001, OR Spectr..

[5]  R. C. Baker,et al.  A quantitative framework for designing efficient business process alliances , 1996, IEMC 96 Proceedings. International Conference on Engineering and Technology Management. Managing Virtual Enterprises: A Convergence of Communications, Computing, and Energy Technologies.

[6]  Grzegorz Waligóra,et al.  Simulated Annealing for Multi-Mode Resource-Constrained Project Scheduling , 2001, Ann. Oper. Res..

[7]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[8]  Yang Lu,et al.  Research of Component-Based Hybrid Design Pattern for Real-Time Microkernel , 2006, 16th International Conference on Artificial Reality and Telexistence--Workshops (ICAT'06).

[9]  Xiaohua Hu,et al.  Semantic Smoothing for Model-based Document Clustering , 2006, Sixth International Conference on Data Mining (ICDM'06).

[10]  Zhang Wen-Hui Model Checking: Theories,Techniques and Applications , 2002 .

[11]  Joydeep Ghosh,et al.  Under Consideration for Publication in Knowledge and Information Systems Generative Model-based Document Clustering: a Comparative Study , 2003 .

[12]  Zheng Wen-jun Evaluation Architecture and Optimization Decision of Partner Choice in Virtaul Enterprises , 2000 .

[13]  Yu Ge Study on Partner Selection Strategy in Dynamic Alliance , 2005 .

[14]  C. Fonseca,et al.  GENETIC ALGORITHMS FOR MULTI-OBJECTIVE OPTIMIZATION: FORMULATION, DISCUSSION, AND GENERALIZATION , 1993 .

[15]  Feng Weidong Partners selection process and optimization model for virtual corporations based on genetic algorithms , 2000 .