Dimensionality Reduction for Association Rule Mining

Selecting relevant features from a dataset has been considered to be one of the major components of Data Mining techniques [1], [3]. Data mining techniques become computationally expensive when used with irrelevant features. Dimensionality reduction/feature selection algorithms are used basically to reduce the dimension(s) of a dataset without reducing the information content of the domain. There are basically two categories of feature selection methods: supervised, where each instance is associated with a class label, and un-supervised, where instances are not related to any class label. Un-supervised feature selection is used as a pre-processing of other machine learning techniques such as clustering, classification or association rule mining to reduce the dimensionality of the domain space without much loss of information content. This paper presents three techniques to reduce the dimensionality of dataset that are helpful to association rule mining problem. Out of them two can be found to be useful to the classical method of rule mining that treats the rule mining problem as single objective one. Whereas, the third one has been established to be effective with multi objective association rule mining.

[1]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[2]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[3]  Xiaoming Xu,et al.  A parameterless feature ranking algorithm based on MI , 2008, Neurocomputing.

[4]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[5]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[6]  Claire Cardie,et al.  Using Decision Trees to Improve Case-Based Learning , 1993, ICML.

[7]  B. Nath,et al.  Frequency Count Based Filter for Dimensionality Reduction , 2007, 15th International Conference on Advanced Computing and Communications (ADCOM 2007).

[8]  Serkan Günal,et al.  Subspace based feature selection for pattern recognition , 2008, Inf. Sci..

[9]  Yonatan Aumann,et al.  Borders: An Efficient Algorithm for Association Generation in Dynamic Databases , 1999, Journal of Intelligent Information Systems.

[10]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[11]  Philip S. Yu,et al.  Data Mining: An Overview from a Database Perspective , 1996, IEEE Trans. Knowl. Data Eng..

[12]  Agma J. M. Traina,et al.  Data pre-processing: a new algorithm for feature selection and data discretization , 2008, CSTST.

[13]  Constantin Zopounidis,et al.  Feature selection algorithms in classification problems: an experimental evaluation , 2005, Optim. Methods Softw..

[14]  B. Nath,et al.  Discovering Association Rules from Incremental Datasets , 2010 .

[15]  I K Fodor,et al.  A Survey of Dimension Reduction Techniques , 2002 .

[16]  R. Bellman Dynamic programming. , 1957, Science.

[17]  Huan Liu,et al.  A Probabilistic Approach to Feature Selection - A Filter Solution , 1996, ICML.

[18]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[19]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[20]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[21]  Heikki Mannila,et al.  Verkamo: Fast Discovery of Association Rules , 1996, KDD 1996.

[22]  Thomas G. Dietterich,et al.  Learning with Many Irrelevant Features , 1991, AAAI.

[23]  Bhabesh Nath,et al.  Multi-objective rule mining using genetic algorithms , 2004, Inf. Sci..

[24]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.