Parallel Algorithms for Mining Sequential Associations : Issues and Challenges

Discovery of predictive sequential associations among events is becoming increasingly useful and essential in many scienti c and commercial domains. Enormous sizes of available datasets and possibly large number of mined associations demand e cient and scalable parallel algorithms. In this paper, we rst present a concept of universal sequential associations. Developing parallel algorithms for discovering such associations becomes quite challenging depending on the nature of the input data and the timing constraints imposed on the desired associations. We discuss possible challenging scenarios, and propose four di erent parallel algorithms that cater to various situations. This paper is written to serve as a comprehensive account of the design issues and challenges involved in parallelizing sequential association discovery algorithms. This work was supported by NSF grant ACI-9982274, by Army Research O ce grant DA/DAAG55-98-10441, by Army High Performance Computing Research Center cooperative agreement number DAAH04-952-0003/contract number DAAH04-95-C-0008, the content of which does not necessarily re ect the position or the policy of the government, and no o cial endorsement should be inferred. Access to computing facilities was provided by AHPCRC, Minnesota Supercomputer Institute. Related papers are available via WWW at URL: http://www.cs.umn.edu/~kumar. Department of Computer Science, University of Minnesota, Minneapolis, MN 55455

[1]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[2]  Philip S. Yu,et al.  Data Mining: An Overview from a Database Perspective , 1996, IEEE Trans. Knowl. Data Eng..

[3]  George Karypis,et al.  A Universal Formulation of Sequential Patterns , 1999 .

[4]  Heikki Mannila,et al.  Discovering Frequent Episodes in Sequences , 1995, KDD.

[5]  Masaru Kitsuregawa,et al.  Mining Algorithms for Sequential Patterns in Parallel: Hash Based Approach , 1998, PAKDD.

[6]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[7]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[8]  Vipin Kumar,et al.  ScalParC: a new scalable and efficient parallel classification algorithm for mining large datasets , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[9]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[10]  Vipin Kumar,et al.  Scalable parallel data mining for association rules , 1997, SIGMOD '97.