In this paper, we propose new approximate clustering algorithm that improves the precision of a top-down clustering. Top-down clustering is proposed to improve the clustering speed by Iwayama et al, where the cluster tree is generated by sampling some documents, making a cluster from these, assigning other documents to the nearest node and if the number of assigned documents is large, continuing sampling and clustering from top to down. To improve precision of the top-down clustering method, we propose selecting documents by applying a GA to decide a quasi-optimum layer and using a MDL criteria for evaluating the layer structure of a cluster tree.
[1]
Alfred V. Aho,et al.
The Design and Analysis of Computer Algorithms
,
1974
.
[2]
Takenobu Tokunaga,et al.
A Probabilistic Model for Text Categorization: Based on a Single Random Variable with Multiple Values
,
1994,
ANLP.
[3]
David E. Goldberg,et al.
Genetic Algorithms in Search Optimization and Machine Learning
,
1988
.
[4]
D. E. Goldberg,et al.
Genetic Algorithms in Search
,
1989
.
[5]
Takenobu Tokunaga,et al.
Hierarchical Bayesian Clustering for Automatic Text Classification
,
1995,
IJCAI.