Document retrieval for e-mail search and discovery using formal concept analysis

This paper discusses a document discovery tool based on Conceptual Clustering by Formal Concept Analysis. The program allows users to navigate e-mail using a visual lattice metaphor rather than a tree. It implements a virtual file structure over e-mail where files and entire directories can appear in multiple positions. The content and shape of the lattice formed by the conceptual ontology can assist in e-mail discovery. The system described provides more flexibility in retrieving stored e-mails than what is normally available ine-mail clients. The paper discusses how conceptual ontologies can leverage traditional document retrieval systems and aid knowledge discovery in document collections.

[1]  Richard Cole,et al.  Using Conceptual Scaling In Formal Concept Analysis For Knowledge And Data Discovery In Medical Text , 1998 .

[2]  Bernhard Ganter,et al.  Formal Concept Analysis: Mathematical Foundations , 1998 .

[3]  Rudolf Wille,et al.  Conceptual Clustering via Convex-Ordinal Structures , 1993 .

[4]  Peter W. Eklund,et al.  Analyzing an Email Collection Using Formal Concept Analysis , 1999, PKDD.

[5]  Gerd Stumme,et al.  Hierarchies of conceptual scales , 1999 .

[6]  Constantine D. Spyropoulos,et al.  An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages , 2000, SIGIR '00.

[7]  Guy W. Mineau,et al.  Automatic Structuring of Knowledge Bases by Conceptual Clustering , 1995, IEEE Trans. Knowl. Data Eng..

[8]  William W. Cohen Learning Rules that Classify E-Mail , 1996 .

[9]  Rudolf Wille,et al.  Conceptual Landscapes of Knowledge: A Pragmatic Paradigm for Knowledge Processing , 1999 .

[10]  Claudio Carpineto,et al.  GALOIS: An Order-Theoretic Approach to Conceptual Clustering , 1993, ICML.

[11]  Gerd Stumme,et al.  Local Scaling in Conceptual Data Systems , 1996, ICCS.

[12]  Gerd Stumme,et al.  CEM - A Conceptual Email Manager , 2000, ICCS.

[13]  Gerd Stumme,et al.  CEM - A program for visualisation and discovery in email , 2000, KDD 2000.

[14]  Christopher Meek,et al.  Challenges of the Email Domain for Text Classification , 2000, ICML.

[15]  Frank Vogt,et al.  TOSCANA - a Graphical Tool for Analyzing and Exploring Data , 1994, GD.

[16]  Peter W. Eklund,et al.  Browsing Semi-structured Web Texts Using Formal Concept Analysis , 2001, ICCS.

[17]  Frank Vogt,et al.  Data Analysis Based on a Conceptual File , 1991 .