论文信息 - An Efficient Tool for Discovering Simple Combinatorial Patterns from Large Text Databases

An Efficient Tool for Discovering Simple Combinatorial Patterns from Large Text Databases

In this poster, we present demonstration of a prototype system for efficient discovery of combinatorial patterns, called proximity word-association patterns, from a collection of texts. The algorithm computes the best k-proximity d-word patterns in almost linear expected time in the total input length n, which is drastically faster than a straightforward algorithm of O(n2d+1) time complexity.

Hiroki Arimura | Setsuo Arikawa | Shinichi Shimozono | Atsushi Wataki | Ryoichi Fujino

[1] Ricardo A. Baeza-Yates,et al. An Algorithm for String Matching with a Sequence of don't Cares , 1991, Inf. Process. Lett..

[2] Kaizhong Zhang,et al. Combinatorial pattern discovery for scientific data: some preliminary results , 1994, SIGMOD '94.

[3] MorishitaShinichi,et al. Data mining using two-dimensional optimized association rules , 1996 .

[4] Yasuhiko Morimoto,et al. Data mining using two-dimensional optimized association rules: scheme, algorithms, and visualization , 1996, SIGMOD '96.

[5] Linda Sellie,et al. Toward efficient agnostic learning , 1992, COLT '92.

[6] Hiroki Arimura,et al. Maximizing Agreement with a Classification by Bounded or Unbounded Number of Associated Words , 1998, ISAAC.

[7] Hiroki Arimura,et al. A Fast Algorithm for Discovering Optimal String Patterns in Large Text Databases , 1998, ALT.