A Fast Association Rule Mining Algorithm for Corpus

In this paper, we propose a new algorithm for mining association rules in corpus efficiently. Compared to classical transactional association rule mining problems, corpus contains large amount of items, and what is more, there are by far more item sets in corpus, and traditional association rule mining algorithm cannot handle corpus efficiently. To address this issue, a new algorithm, which combines the techniques of inverted hashing and the advantage of FP-Growth structure, is designed with enough considerations on the characteristic of corpus. Experimental results demonstrate that the new algorithm has gained a great promotion on performance.

[1]  John D. Holt,et al.  Usage of Mined Word Associations for Text Retrieval , 2007 .

[2]  Soon Myoung Chung,et al.  Mining association rules in text databases using multipass with inverted hashing and pruning , 2002, 14th IEEE International Conference on Tools with Artificial Intelligence, 2002. (ICTAI 2002). Proceedings..

[3]  Frank S. C. Tseng,et al.  An Integration of Fuzzy Association Rules and WordNet for Document Clustering , 2009, PAKDD.

[4]  Nicolas Pasquier,et al.  Efficient Mining of Association Rules Using Closed Itemset Lattices , 1999, Inf. Syst..

[5]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[6]  Soon Myoung Chung,et al.  Multipass Algorithms for Mining Association Rules in Text Databases , 2001, Knowledge and Information Systems.

[7]  Soon Myoung Chung,et al.  Parallel mining of association rules from text databases , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[8]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[9]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[10]  David Wai-Lok Cheung,et al.  Efficient Mining of Association Rules in Distributed Databases , 1996, IEEE Trans. Knowl. Data Eng..

[11]  Illhoi Yoo,et al.  A text mining method for discovering hidden links , 2009, 2009 IEEE International Conference on Granular Computing.

[12]  Shizhu Liu,et al.  Text Classification Using Sentential Frequent Itemsets , 2007, Journal of Computer Science and Technology.

[13]  Syed Sibte Raza Abidi,et al.  Mining Non-taxonomic Concept Pairs from Unstructured Text - A Concept Correlation Search Framework , 2011, WEBIST.

[14]  Zhen Zhu,et al.  Book Recommendation Service by Improved Association Rule Mining Algorithm , 2007, 2007 International Conference on Machine Learning and Cybernetics.

[15]  Jun Zhu,et al.  A Novel Text Classification Approach Based on Enhanced Association Rule , 2007, ADMA.

[16]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[17]  Philip S. Yu,et al.  Using a Hash-Based Method with Transaction Trimming for Mining Association Rules , 1997, IEEE Trans. Knowl. Data Eng..