A novel feature selection method for text classification using association rules and clustering

Readability and accuracy are two important features of any good classifier. For reasons such as acceptable accuracy, rapid training and high interpretability, associative classifiers have recently been used in many categorization tasks. Although features could be very useful in text classification, both training time and the number of produced rules will increase significantly owing to the high dimensionality of text documents. In this paper an association classification algorithm for text classification is proposed that includes a feature selection phase to select important features and a clustering phase based on class labels to tackle this shortcoming. The experimental results from applying the proposed algorithm in comparison with the results of selected well-known classification algorithms show that our approach outperforms others both in efficiency and in performance.

[1]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[2]  Gary Geunbae Lee,et al.  Two scalable algorithms for associative text classification , 2013, Inf. Process. Manag..

[3]  Yang Wang,et al.  High-Order Pattern Discovery from Discrete-Valued Data , 1997, IEEE Trans. Knowl. Data Eng..

[4]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[5]  Zhonghua Tang,et al.  A New Class Based Associative Classification Algorithm , 2007, IMECS.

[6]  Gary Geunbae Lee,et al.  Text Categorization Based on Boosting Association Rules , 2008, 2008 IEEE International Conference on Semantic Computing.

[7]  Osmar R. Zaïane,et al.  Associative Classifiers for Medical Images , 2002, Revised Papers from MDM/KDD and PAKDD/KDMCD.

[8]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[9]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[10]  Peter I. Cowling,et al.  MCAR: multi-class classification based on association rule , 2005, The 3rd ACS/IEEE International Conference onComputer Systems and Applications, 2005..

[11]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[12]  Jianyong Wang,et al.  HARMONY: Efficiently Mining the Best Rules for Classification , 2005, SDM.

[13]  Das Amrita,et al.  Mining Association Rules between Sets of Items in Large Databases , 2013 .

[14]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[15]  Jian Pei,et al.  CMAR: accurate and efficient classification based on multiple class-association rules , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[16]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[17]  Susan T. Dumais,et al.  Hierarchical classification of Web content , 2000, SIGIR '00.

[18]  Hongjun Lu,et al.  Scalable association-based text classification , 2000, CIKM '00.

[19]  Panos M. Pardalos,et al.  Data Mining Via Entropy and Graph Clustering , 2007 .

[20]  Jiawei Han,et al.  CPAR: Classification based on Predictive Association Rules , 2003, SDM.

[21]  Kam-Fai Wong,et al.  Interpreting TF-IDF term weights as making relevance decisions , 2008, TOIS.

[22]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules and sequential patterns , 1996 .

[23]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[24]  Osmar R. Zaïane,et al.  Text document categorization by term association , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[25]  Panos M. Pardalos,et al.  Comprar Data Mining in Biomedicine | Vazacopoulos, Alkis | 9780387693187 | Springer , 2007 .

[26]  Georgios Paliouras,et al.  An evaluation of Naive Bayesian anti-spam filtering , 2000, ArXiv.

[27]  R. Mike Cameron-Jones,et al.  FOIL: A Midterm Report , 1993, ECML.

[28]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[29]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[30]  Benno Stein,et al.  Genre Classification of Web Pages , 2004, KI.