Efficient Parallel Algorithms for Mining Associations

The problem of mining hidden associations present in the large amounts of data has seen widespread applications in many practical domains such as customer-oriented planning and marketing, telecommunication network monitoring, and analyzing data from scientific experiments. The combinatorial complexity of the problem and phenomenal growth in the sizes of available datasets motivate the need for efficient and scalable parallel algorithms. The design of such algorithms is challenging. This chapter presents an evolutionary and comparative review of many existing representative serial and parallel algorithms for discovering two kinds of associations. The first part of the chapter is devoted to the non-sequential associations, which utilize the relationships between events that happen together. The second part is devoted to the more general and potentially more useful sequential associations, which utilize the temporal or sequential relationships between events. It is shown that many existing algorithms actually belong to a few categories which are decided by the broader design strategies. Overall the aim of the chapter is to provide a comprehensive account of the challenges and issues involved in effective parallel formulations of algorithms for discovering associations, and how various existing algorithms try to handle them.

[1]  M.A.W. Houtsma,et al.  Set-Oriented Mining for Association Rules , 1993, ICDE 1993.

[2]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[3]  Amihood Amir,et al.  A New and Versatile Method for Association Generation , 1997, PKDD.

[4]  George Karypis,et al.  Introduction to Parallel Computing , 1994 .

[5]  Philip S. Yu,et al.  An effective hash-based algorithm for mining association rules , 1995, SIGMOD '95.

[6]  Vipin Kumar,et al.  Scalable parallel data mining for association rules , 1997, SIGMOD '97.

[7]  Philip S. Yu,et al.  Efficient parallel data mining for association rules , 1995, CIKM '95.

[8]  Mohammed J. Zaki Parallel and distributed association mining: a survey , 1999, IEEE Concurr..

[9]  Rakesh Agrawal,et al.  Parallel Mining of Association Rules , 1996, IEEE Trans. Knowl. Data Eng..

[10]  Rakesh Agrawal,et al.  Parallel Mining of Association Rules: Design, Implementation and Experience , 1999 .

[11]  T. J. Watson,et al.  E cient Parallel Data Mining for Association RulesJong , 1995 .

[12]  Charu C. Aggarwal,et al.  A Tree Projection Algorithm for Generation of Frequent Item Sets , 2001, J. Parallel Distributed Comput..

[13]  Heikki Mannila,et al.  Discovering Frequent Episodes in Sequences , 1995, KDD.

[14]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[15]  Mohammed J. Zaki Efficient enumeration of frequent sequences , 1998, CIKM '98.

[16]  David Wai-Lok Cheung,et al.  Effect of Data Skewness in Parallel Mining of Association Rules , 1998, PAKDD.

[17]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[18]  George Karypis,et al.  A Universal Formulation of Sequential Patterns , 1999 .

[19]  Hannu Toivonen,et al.  Sampling Large Databases for Association Rules , 1996, VLDB.

[20]  T. J. Watson,et al.  An E ective Hash-Based Algorithm for Mining Association RulesJong , 1995 .

[21]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[22]  Masaru Kitsuregawa,et al.  Mining Algorithms for Sequential Patterns in Parallel: Hash Based Approach , 1998, PAKDD.

[23]  Sushil Jajodia,et al.  Testing complex temporal relationships involving multiple granularities and its application to data mining (extended abstract) , 1996, PODS.

[24]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[25]  Kyuseok Shim,et al.  SPIRIT: Sequential Pattern Mining with Regular Expression Constraints , 1999, VLDB.

[26]  Wei Li,et al.  New parallel algorithms for fast discovery of associ-ation rules , 1997 .

[27]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[28]  Philip S. Yu,et al.  Data Mining: An Overview from a Database Perspective , 1996, IEEE Trans. Knowl. Data Eng..

[29]  G. Karypis,et al.  Parallel Algorithms for Mining Sequential Associations : Issues and Challenges , 2000 .

[30]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[31]  P. Gács,et al.  Algorithms , 1992 .

[32]  Vipin Kumar,et al.  Introduction to Par-allel Computing: Design and Analysis of Parallel Algorithms , 1994 .

[33]  R. Agrawal,et al.  Research Report Mining Sequential Patterns: Generalizations and Performance Improvements Limited Distribution Notice Mining Sequential Patterns: Generalizations and Performance Improvements , 1996 .

[34]  Jiawei Han,et al.  A fast distributed algorithm for mining association rules , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[35]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[36]  Masaru Kitsuregawa,et al.  Hash based parallel algorithms for mining association rules , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[37]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[38]  Andreas Mueller,et al.  Fast sequential and parallel algorithms for association rule mining: a comparison , 1995 .

[39]  Vipin Kumar,et al.  ScalParC: a new scalable and efficient parallel classification algorithm for mining large datasets , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[40]  Srinivasan Parthasarathy,et al.  New Algorithms for Fast Discovery of Association Rules , 1997, KDD.

[41]  David Wai-Lok Cheung,et al.  Efficient Mining of Association Rules in Distributed Databases , 1996, IEEE Trans. Knowl. Data Eng..