Scalable construction of topic directory with nonparametric closed termset mining

A topic directory, e.g., Yahoo directory, provides a view of a document set at different levels of abstraction and is ideal for the interactive exploration and visualization of the document set. We present a method that dynamically generates a topic directory from a document set using a frequent closed termset mining algorithm. Our method shows experimental results of equal quality to recent document clustering methods and has additional benefits such as automatic generation of topic labels and determination of a clustering parameter.