The Efficient Computation of Complete and Concise Substring Scales with Suffix Trees

Strings are an important part of most real application multivalued contexts. Their conceptual treatment requires the definition of substring scales, i.e., sets of relevant substrings, so as to form informative concepts. However these scales are either defined by hand, or derived in a context-unaware manner (e.g., all words occuring in string values). We present an efficient algorithm based on suffix trees that produces complete and concise substring scales. Completeness ensures that every possible concept is formed, like when considering the scale of all substrings. Conciseness ensures the number of scale attributes (substrings) is less than the cumulated size of all string values. This algorithm is integrated in Camelis, and illustrated on the set of all ICCS paper titles.

[1]  O. Ridoux,et al.  Introduction to logical information systems , 2004, Inf. Process. Manag..

[2]  Gerd Stumme,et al.  Conceptual Structures: Broadening the Base , 2001, Lecture Notes in Computer Science.

[3]  Rokia Missaoui,et al.  Experimental Comparison of Navigation in a Galois Lattice with Conventional Information Retrieval Methods , 1993, Int. J. Man Mach. Stud..

[4]  Gerd Stumme,et al.  CEM - A Conceptual Email Manager , 2000, ICCS.

[5]  Jan Komorowski,et al.  Principles of Data Mining and Knowledge Discovery , 2001, Lecture Notes in Computer Science.

[6]  Olivier Ridoux,et al.  A Framework for Developing Embeddable Customized Logics , 2001, LOPSTR.

[7]  Bernhard Ganter,et al.  Pattern Structures and Their Projections , 2001, ICCS.

[8]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[9]  Bernhard Ganter,et al.  Formal Concept Analysis: Mathematical Foundations , 1998 .

[10]  Olivier Ridoux,et al.  Searching for Objects and Properties with Logical Concept Analysis , 2001, ICCS.

[11]  Sergei O. Kuznetsov,et al.  Learning of Simple Conceptual Graphs from Positive and Negative Examples , 1999, PKDD.

[12]  Olivier Ridoux,et al.  A Logical Generalization of Formal Concept Analysis , 2000, ICCS.

[13]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[14]  Esko Ukkonen,et al.  On-line construction of suffix trees , 1995, Algorithmica.

[15]  Bernhard Ganter,et al.  Conceptual Structures: Logical, Linguistic, and Computational Issues , 2000, Lecture Notes in Computer Science.