Incremental document clustering using Multi-representation Indexing Tree

Incremental Document Clustering is a powerful technique for large-scale topic discovery from incremental documentation set. Indexing tree algorithm is advanced in efficiency. However, it tended to process spherical data. To address this problem, we present a novel Multi-Representation Indexing Tree (MRIT) algorithm for constructing a hierarchy that satisfies arbitrary shape clusters with a good performance. Compared with the Indexing tree algorithm, a cluster is decomposed into several sub clusters and is represented as a union of the sub clusters rather than the center of the cluster. Similarity of a document to one cluster is the distance to the nearest neighbor among the cluster's representative points. The experimental results on a variety of domains demonstrate that our algorithm can produce a quality cluster. It's insensitive to document input order, and efficient in terms of computational time.