Document Explorer: Discovering Knowledge in Document Collections

Document Explorer is a data mining system for document collections. Such a collection represents an application domain, and the primary goal of the system is to derive patterns that provide knowledge about this domain. Additionally, the derived patterns can be used to browse the collection. Document Explorer searches for patterns that capture relations between concepts of the domain. The patterns which have been verified as interesting are structured and presented in a visual user interface allowing the user to operate on the results to refine and redirect mining queries or to access the associated documents. The system offers preprocessing tools to construct or refine a knowledge base of domain concepts and to create an intermediate representation of the document collection that will be used by all subsequent data mining operations. The main pattern types, the system can search for, are frequent sets, associations, concept distributions, and keyword graphs. To enable the user to provide some explicit bias, the system provides a dedicated query language for searching the vast implicit spaces of pattern instances that exist in the collection.

[1]  Ido Dagan,et al.  Knowledge Discovery in Textual Databases (KDT) , 1995, KDD.

[2]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[3]  Willi Klösgen,et al.  Explora: A Multipattern and Multistrategy Discovery Assistant , 1996, Advances in Knowledge Discovery and Data Mining.