High performance feature selection algorithms using filter method for cloud-based recommendation system

In cloud-based recommendation system, the feature selection is implemented to reduce the large dimension of the cloud data. The feature selection increases the performance of the recommendation system without affecting the accuracy of the system. In this paper two filter model based algorithms SFS and MSFS are proposed to extract the necessary features for the recommendation system. The state of the art Naive bayes classification algorithm is used to evaluate the performance of the feature selection algorithm. The bench mark datasets Newsgroups, WebKB and Book Crossing are used for performance evaluation. The experimental results show that the proposed algorithm is superior to the existing feature selection algorithms T-Score, Information Gain and Chi squared.

[1]  S.S.R. Abidi,et al.  A hybrid feature selection strategy for image defining features: towards interpretation of optic nerve images , 2005, 2005 International Conference on Machine Learning and Cybernetics.

[2]  Zhen Liu,et al.  A new feature selection based on comprehensive measurement both in inter-category and intra-category for text categorization , 2012, Inf. Process. Manag..

[3]  Andrew Y. Ng,et al.  On Feature Selection: Learning with Exponentially Many Irrelevant Features as Training Examples , 1998, ICML.

[4]  Saket S. R. Mengle,et al.  Ambiguity measure feature-selection algorithm , 2009, J. Assoc. Inf. Sci. Technol..

[5]  Sejong Oh,et al.  CBFS: High Performance Feature Selection Algorithm Based on Feature Clearness , 2012, PloS one.

[6]  Li-Yeh Chuang,et al.  Improved binary PSO for feature selection using gene expression data , 2008, Comput. Biol. Chem..

[7]  K. Mouli,et al.  A Novel Subset Selection Clustering-Based Algorithm for High Dimensional Data , 2015 .

[8]  Dervis Karaboga,et al.  Dynamic clustering with improved binary artificial bee colony algorithm , 2015, Appl. Soft Comput..

[9]  Filippo Menczer,et al.  Feature selection in unsupervised learning via evolutionary search , 2000, KDD '00.

[10]  Zhaoyang Qu,et al.  Improved Feature-Selection Method Considering the Imbalance Problem in Text Categorization , 2014, TheScientificWorldJournal.

[11]  Mohd Saberi Mohamad,et al.  A Modified Binary Particle Swarm Optimization for Selecting the Small Subset of Informative Genes From Gene Expression Data , 2011, IEEE Transactions on Information Technology in Biomedicine.

[12]  C. Domeniconi,et al.  An Evaluation of Gene Selection Methods for Multi-class Microarray Data Classification , 2004 .

[13]  Huan Liu,et al.  Searching for Interacting Features , 2007, IJCAI.

[14]  J. Ben Rosen,et al.  Dimension reduction based on centroids and least squares for efficient processing of text data , 2001, SDM.

[15]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[16]  Yogesh R. Shepal A Fast Clustering-Based Feature Subset Selection Algorithm for High Dimensional Data , 2014 .

[17]  Wenqian Shang,et al.  A novel feature selection algorithm for text categorization , 2007, Expert Syst. Appl..

[18]  Sean M. McNee,et al.  Improving recommendation lists through topic diversification , 2005, WWW '05.

[19]  Haesun Park,et al.  Generalizing discriminant analysis using the generalized singular value decomposition , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  P. Langley Selection of Relevant Features in Machine Learning , 1994 .

[21]  Wang Liping,et al.  Feature Selection Algorithm Based on Conditional Dynamic Mutual Information , 2015 .

[22]  Wei-Ying Ma,et al.  OCFS: optimal orthogonal centroid feature selection for text categorization , 2005, SIGIR '05.

[23]  Li-Yeh Chuang,et al.  A Hybrid BPSO-CGA Approach for Gene Selection and Classification of Microarray Data , 2012, J. Comput. Biol..

[24]  Dervis Karaboga,et al.  A novel clustering approach: Artificial Bee Colony (ABC) algorithm , 2011, Appl. Soft Comput..

[25]  Sanmay Das,et al.  Filters, Wrappers and a Boosting-Based Hybrid for Feature Selection , 2001, ICML.

[26]  Michael I. Jordan,et al.  Feature selection for high-dimensional genomic microarray data , 2001, ICML.

[27]  Lloyd A. Smith,et al.  Feature Selection for Machine Learning: Comparing a Correlation-Based Filter Approach to the Wrapper , 1999, FLAIRS.

[28]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[29]  Naixue Xiong,et al.  Cold-Start Recommendation Using Bi-Clustering and Fusion for Large-Scale Social Recommender Systems , 2014, IEEE Transactions on Emerging Topics in Computing.

[30]  Norbert Fuhr,et al.  AIR/X - A rule-based multistage indexing system for Iarge subject fields , 1991, RIAO.

[31]  Spiridon D. Likothanassis,et al.  Best terms: an efficient feature-selection algorithm for text categorization , 2005, Knowledge and Information Systems.

[32]  Jie Lu,et al.  Web-Page Recommendation Based on Web Usage and Domain Knowledge , 2014 .

[33]  Fang Dong,et al.  A Personalized Hybrid Recommendation System Oriented to E-Commerce Mass Data in the Cloud , 2013, 2013 IEEE International Conference on Systems, Man, and Cybernetics.

[34]  Yiming Yang,et al.  High-performing feature selection for text classification , 2002, CIKM '02.

[35]  Jerffeson Teixeira de Souza,et al.  Feature selection with a general hybrid algorithm , 2004 .

[36]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[37]  Yudong Zhang,et al.  Binary PSO with mutation operator for feature selection using decision tree applied to spam detection , 2014, Knowl. Based Syst..

[38]  Dantong Ouyang,et al.  An artificial bee colony approach for clustering , 2010, Expert Syst. Appl..

[39]  Huan Liu,et al.  Consistency-based search in feature selection , 2003, Artif. Intell..

[40]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[41]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.