Feature weighting methods: A review

Abstract In the last decades, a wide portfolio of Feature Weighting (FW) methods have been proposed in the literature. Their main potential is the capability to transform the features in order to contribute to the Machine Learning (ML) algorithm metric proportionally to their estimated relevance for inferring the output pattern. Nevertheless, the extensive number of FW related works makes difficult to do a scientific study in this field of knowledge. Therefore, in this paper a global taxonomy for FW methods is proposed by focusing on: (1) the learning approach (supervised or unsupervised), (2) the methodology used to calculate the weights (global or local), and (3) the feedback obtained from the ML algorithm when estimating the weights (filter or wrapper). Among the different taxonomy levels, an extensive review of the state-of-the-art is presented, followed by some considerations and guide points for the FW strategies selection regarding significant aspects of real-world data analysis problems. Finally, a summary of conclusions and challenges in the FW field is briefly outlined.

[1]  Enrique Vidal,et al.  Learning weighted metrics to minimize nearest-neighbor classification error , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Leonid Portnoy,et al.  Intrusion detection with unlabeled data using clustering , 2000 .

[3]  Natarajan Sriraam,et al.  Classification of focal and non-focal EEG signals using neighborhood component analysis and machine learning algorithms , 2018, Expert Syst. Appl..

[4]  Jianping Gou,et al.  A Novel Weighted Voting for K-Nearest Neighbor Rule , 2011, J. Comput..

[5]  Shitong Wang,et al.  Attribute weighted mercer kernel based fuzzy clustering algorithm for general non-spherical datasets , 2006, Soft Comput..

[6]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[7]  Fernando Berzal Galiano,et al.  Evaluation Metrics for Unsupervised Learning Algorithms , 2019, ArXiv.

[8]  Mohand Saïd Allili,et al.  Group-of-features relevance in multinomial kernel logistic regression and application to human interaction recognition , 2020, Expert Syst. Appl..

[9]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[10]  Taskin Kavzoglu,et al.  A Comparison of Feature and Expert-based Weighting Algorithms in Landslide Susceptibility Mapping☆ , 2015 .

[11]  Khalid Benabdeslem,et al.  Unsupervised outlier detection for time series by entropy and dynamic time warping , 2018, Knowledge and Information Systems.

[12]  Elena Marchiori,et al.  Class Dependent Feature Weighting and K-Nearest Neighbor Classification , 2013, PRIB.

[13]  Soon Myoung Chung,et al.  Weighted naïVE Bayes for Text Classification Using positive Term-Class Dependency , 2012, Int. J. Artif. Intell. Tools.

[14]  Robert M. Haralick,et al.  Feature normalization and likelihood-based similarity measures for image retrieval , 2001, Pattern Recognit. Lett..

[15]  Sahibsingh A. Dudani The Distance-Weighted k-Nearest-Neighbor Rule , 1976, IEEE Transactions on Systems, Man, and Cybernetics.

[16]  Mahdi Hashemzadeh,et al.  New fuzzy C-means clustering method based on feature-weight and cluster-weight learning , 2019, Appl. Soft Comput..

[17]  Ahmed Bouridane,et al.  Simultaneous feature selection and feature weighting using Hybrid Tabu Search/K-nearest neighbor classifier , 2007, Pattern Recognit. Lett..

[18]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[19]  José Cristóbal Riquelme Santos,et al.  On the evolutionary optimization of k-NN by label-dependent feature weighting , 2012, Pattern Recognit. Lett..

[20]  Bao-Gang Hu,et al.  Linear feature-weighted support vector machine , 2009 .

[21]  Jianhong Wu,et al.  A convergence theorem for the fuzzy subspace clustering (FSC) algorithm , 2008, Pattern Recognit..

[22]  Jian Zhuang,et al.  Novel soft subspace clustering with multi-objective evolutionary approach for high-dimensional data , 2013, Pattern Recognit..

[23]  Renato Cordeiro de Amorim,et al.  A Survey on Feature Weighting Based K-Means Algorithms , 2015, Journal of Classification.

[24]  Bernhard Pfahringer,et al.  Locally Weighted Naive Bayes , 2002, UAI.

[25]  Chieh-Yuan Tsai,et al.  Developing a feature weight self-adjustment mechanism for a K-means clustering algorithm , 2008, Comput. Stat. Data Anal..

[26]  Chao Liu,et al.  Novel evolutionary multi-objective soft subspace clustering algorithm for credit risk assessment , 2019, Expert Syst. Appl..

[27]  Michael K. Ng,et al.  An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data , 2007, IEEE Transactions on Knowledge and Data Engineering.

[28]  Bernhard Schölkopf,et al.  Support Vector Method for Novelty Detection , 1999, NIPS.

[29]  Swagatam Das,et al.  Categorical fuzzy k-modes clustering with automated feature weight learning , 2015, Neurocomputing.

[30]  Quan Pan,et al.  An evidential K-nearest neighbor classification method with weighted attributes , 2013, Proceedings of the 16th International Conference on Information Fusion.

[31]  Jesús Alcalá-Fdez,et al.  KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[32]  Shasha Wang,et al.  Deep feature weighting for naive Bayes and its application to text classification , 2016, Eng. Appl. Artif. Intell..

[33]  Rasim M. Alguliyev,et al.  Weighted consensus clustering and its application to Big data , 2020, Expert Syst. Appl..

[34]  David W. Aha,et al.  Weighting Features , 1995, ICCBR.

[35]  G. W. Milligan,et al.  A study of standardization of variables in cluster analysis , 1988 .

[36]  Elsayed A. Sallam,et al.  A hybrid network intrusion detection framework based on random forests and weighted k-means , 2013 .

[37]  Michael K. Ng,et al.  Subspace Clustering of Text Documents with Feature Weighting K-Means Algorithm , 2005, PAKDD.

[38]  Jiye Liang,et al.  A novel attribute weighting algorithm for clustering high-dimensional categorical data , 2011, Pattern Recognit..

[39]  Yadong Wang,et al.  Improving fuzzy c-means clustering based on feature-weight learning , 2004, Pattern Recognit. Lett..

[40]  José Cristóbal Riquelme Santos,et al.  On the evolutionary weighting of neighbours and features in the k-nearest neighbour rule , 2017, Neurocomputing.

[41]  Francisco Herrera,et al.  Data Preprocessing in Data Mining , 2014, Intelligent Systems Reference Library.

[42]  Ahmed Al-Ani,et al.  Optimizing the k-NN metric weights using differential evolution , 2010, 2010 International Conference on Multimedia Computing and Information Technology (MCIT).

[43]  Reza Boostani,et al.  A novel adaptive LBP-based descriptor for color image retrieval , 2019, Expert Syst. Appl..

[44]  J. Anuradha,et al.  A Review of Feature Selection and Its Methods , 2019, Cybernetics and Information Technologies.

[45]  Wensheng Yin,et al.  Weighted k-Means Algorithm Based Text Clustering , 2009, 2009 International Symposium on Information Engineering and Electronic Commerce.

[46]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[47]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD 2000.

[48]  C. Granger Causality, cointegration, and control , 1988 .

[49]  Liangxiao Jiang,et al.  Class-specific attribute weighted naive Bayes , 2019, Pattern Recognit..

[50]  Thomas G. Dietterich,et al.  An Experimental Comparison of the Nearest-Neighbor and Nearest-Hyperrectangle Algorithms , 1995, Machine Learning.

[51]  Shasha Wang,et al.  A CFS-Based Feature Weighting Approach to Naive Bayes Text Classifiers , 2014, ICANN.

[52]  Zhihua Cai,et al.  Attribute Weighting via Differential Evolution Algorithm for Attribute Weighted Naive Bayes (WNB) , 2011 .

[53]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[54]  Dimitrios Gunopulos,et al.  Subspace Clustering of High Dimensional Data , 2004, SDM.

[55]  Feng Zhao,et al.  Robust Local Feature Weighting Hard C-Means Clustering Algorithm , 2011, IScIDE.

[56]  Dimitrios Gunopulos,et al.  Locally adaptive metrics for clustering high dimensional data , 2007, Data Mining and Knowledge Discovery.

[57]  Quan Pan,et al.  Multi-hypothesis nearest-neighbor classifier based on class-conditional weighted distance metric , 2015, Neurocomputing.

[58]  Kemal Polat,et al.  Classification of Parkinson's disease using feature weighting method on the basis of fuzzy C-means clustering , 2012, Int. J. Syst. Sci..

[59]  Yuan Zhang,et al.  Fuzzy clustering with the entropy of attribute weights , 2016, Neurocomputing.

[60]  Francisco Herrera,et al.  Statistical computation of feature weighting schemes through data estimation for nearest neighbor classifiers , 2014, Pattern Recognit..

[61]  Ashok N. Srivastava,et al.  Anomaly Detection and Diagnosis Algorithms for Discrete Symbol Sequences with Applications to Airline Safety , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[62]  Pierre Gançarski,et al.  A Collaborative Approach to Combine Multiple Learning Methods , 2000, Int. J. Artif. Intell. Tools.

[63]  Ohn Mar San,et al.  An alternative extension of the k-means algorithm for clustering categorical data , 2004 .

[64]  Hüseyin Gürüler,et al.  A novel diagnosis system for Parkinson’s disease using complex-valued artificial neural network with k-means clustering feature weighting method , 2017, Neural Computing and Applications.

[65]  Jianping Gou,et al.  A new distance-weighted k-nearest neighbor classifier , 2012 .

[66]  Shengrui Wang,et al.  Automated feature weighting in naive bayes for high-dimensional data classification , 2012, CIKM.

[67]  Michael K. Ng,et al.  Automated variable weighting in k-means type clustering , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[68]  Zhiping Zhou,et al.  Kernel-based multiobjective clustering algorithm with automatic attribute weighting , 2018, Soft Comput..

[69]  José Cristóbal Riquelme Santos,et al.  Improving the k-Nearest Neighbour Rule by an Evolutionary Voting Approach , 2014, HAIS.

[70]  Renato Cordeiro de Amorim,et al.  Minkowski metric, feature weighting and anomalous cluster initializing in K-Means clustering , 2012, Pattern Recognit..

[71]  Minh Le Nguyen,et al.  Feature weighting and SVM parameters optimization based on genetic algorithms for classification problems , 2016, Applied Intelligence.

[72]  Olufemi A. Omitaomu,et al.  Weighted dynamic time warping for time series classification , 2011, Pattern Recognit..

[73]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[74]  Li Zhang,et al.  Feature weight estimation based on dynamic representation and neighbor sparse reconstruction , 2018, Pattern Recognit..

[75]  Karen Sparck Jones A statistical interpretation of term specificity and its application in retrieval , 1972 .

[76]  Musa Mammadov,et al.  Attribute weighted Naive Bayes classifier using a local optimization , 2014, Neural Computing and Applications.

[77]  Mark A. Hall,et al.  A decision tree-based attribute weighting filter for naive Bayes , 2006, Knowl. Based Syst..

[78]  Jia Wu,et al.  A Correlation-Based Feature Weighting Filter for Naive Bayes , 2019, IEEE Transactions on Knowledge and Data Engineering.

[79]  Zhaohong Deng,et al.  A survey on soft subspace clustering , 2014, Inf. Sci..

[80]  Liangxiao Jiang,et al.  Two feature weighting approaches for naive Bayes text classifiers , 2016, Knowl. Based Syst..

[81]  Chengqi Zhang,et al.  Self-adaptive attribute weighting for Naive Bayes classification , 2015, Expert Syst. Appl..

[82]  J. Friedman,et al.  Clustering objects on subsets of attributes (with discussion) , 2004 .

[83]  Y. Heyden,et al.  Robust statistics in data analysis — A review: Basic concepts , 2007 .

[84]  Yongtao Hao,et al.  A feature weighted support vector machine and K-nearest neighbor algorithm for stock market indices prediction , 2017, Expert Syst. Appl..

[85]  Yuehui Chen,et al.  A new approach for imbalanced data classification based on data gravitation , 2014, Inf. Sci..

[86]  Quan Pan,et al.  BP $k$ NN: $k$ -Nearest Neighbor Classifier With Pairwise Distance Metrics and Belief Function Theory , 2019, IEEE Access.

[87]  Michael K. Ng,et al.  Feature weight estimation for gene selection: a local hyperlinear learning approach , 2014, BMC Bioinformatics.

[88]  Geoffrey I. Webb,et al.  Alleviating naive Bayes attribute independence assumption by attribute weighting , 2013, J. Mach. Learn. Res..

[89]  Hichem Frigui,et al.  Unsupervised learning of prototypes and attribute weights , 2004, Pattern Recognit..

[90]  Harry Zhang,et al.  Learning weighted naive Bayes with accurate ranking , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[91]  Eréndira Rendón,et al.  Internal versus External cluster validation indexes , 2011 .

[92]  Dit-Yan Yeung,et al.  Parzen-window network intrusion detectors , 2002, Object recognition supported by user interaction for service robots.

[93]  Hamid Parvin,et al.  A clustering ensemble framework based on elite selection of weighted clusters , 2013, Adv. Data Anal. Classif..

[94]  Wei Yang,et al.  Neighborhood Component Feature Selection for High-Dimensional Data , 2012, J. Comput..

[95]  Xiaodong Gu,et al.  Balancing between over-weighting and under-weighting in supervised term weighting , 2016, Inf. Process. Manag..

[96]  Swagatam Das,et al.  A feature weighted penalty based dissimilarity measure for k-nearest neighbor classification with missing features , 2016, Pattern Recognit. Lett..

[97]  Dae-Ki Kang,et al.  Experimental analysis of naïve Bayes classifier based on an attribute weighting framework with smooth kernel density estimations , 2015, Applied Intelligence.

[98]  Thomas L. Saaty,et al.  Analytic Heirarchy Process , 2014 .

[99]  Gerard Salton,et al.  A comparison of search term weighting: term relevance vs. inverse document frequency , 1981, SIGIR 1981.

[100]  Renato Cordeiro de Amorim,et al.  Feature weighting as a tool for unsupervised feature selection , 2018, Inf. Process. Lett..

[101]  Syed Fawad Hussain A novel robust kernel for classifying high-dimensional data using Support Vector Machines , 2019, Expert Syst. Appl..

[102]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[103]  Shengrui Wang,et al.  Soft subspace clustering of categorical data with probabilistic distance , 2016, Pattern Recognit..

[104]  Yangtao Wang,et al.  Multi-view fuzzy clustering with minimax optimization for effective clustering of data from multiple sources , 2016, Expert Syst. Appl..

[105]  Gautam Bhattacharya,et al.  Granger Causality Driven AHP for Feature Weighted kNN , 2017, Pattern Recognit..

[106]  Zhenzhou Lu,et al.  Variable importance analysis: A comprehensive review , 2015, Reliab. Eng. Syst. Saf..

[107]  Michael K. Ng,et al.  Subspace clustering with automatic feature grouping , 2015, Pattern Recognit..

[108]  Francisco Herrera,et al.  Integrating a differential evolution feature weighting scheme into prototype generation , 2012, Neurocomputing.

[109]  Saptarshi Chakraborty,et al.  Simultaneous variable weighting and determining the number of clusters—A weighted Gaussian means algorithm , 2018, Statistics & Probability Letters.

[110]  Francisco Herrera,et al.  Tutorial on practical tips of the most influential data preprocessing algorithms in data mining , 2016, Knowl. Based Syst..

[111]  Philip S. Yu,et al.  A Framework for Projected Clustering of High Dimensional Data Streams , 2004, VLDB.

[112]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[113]  Emanuele Frontoni,et al.  Machine learning-based design support system for the prediction of heterogeneous machine parameters in industry 4.0 , 2020, Expert Syst. Appl..

[114]  Michael K. Ng,et al.  HARP: a practical projected clustering algorithm , 2004, IEEE Transactions on Knowledge and Data Engineering.

[115]  Stephen E. Robertson,et al.  A probabilistic model of information retrieval: development and comparative experiments - Part 1 , 2000, Inf. Process. Manag..

[116]  Jian Su,et al.  Supervised and Traditional Term Weighting Methods for Automatic Text Categorization , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[117]  Wenbing Chang,et al.  A Novel Bearing Multi-Fault Diagnosis Approach Based on Weighted Permutation Entropy and an Improved SVM Ensemble Classifier , 2018, Sensors.

[118]  Renato Cordeiro de Amorim,et al.  Feature Relevance in Ward’s Hierarchical Clustering Using the Lp Norm , 2015, Journal of Classification.

[119]  Aristidis Likas,et al.  The MinMax k-Means clustering algorithm , 2014, Pattern Recognit..

[120]  Raja Jayaraman,et al.  Support vector-based algorithms with weighted dynamic time warping kernel function for time series classification , 2015, Knowl. Based Syst..

[121]  Zhenwen Ren,et al.  Simultaneous learning of reduced prototypes and local metric for image set classification , 2019, Expert Syst. Appl..

[122]  Xue Li,et al.  Time weight collaborative filtering , 2005, CIKM '05.

[123]  Mike Thelwall,et al.  A Study of Information Retrieval Weighting Schemes for Sentiment Analysis , 2010, ACL.

[124]  Dejing Dou,et al.  Calculating Feature Weights in Naive Bayes with Kullback-Leibler Measure , 2011, 2011 IEEE 11th International Conference on Data Mining.

[125]  Kemal Polat,et al.  Efficient sleep stage recognition system based on EEG signal using k-means clustering based feature weighting , 2010, Expert Syst. Appl..

[126]  Cornelio Yáñez-Márquez,et al.  Automatic feature weighting for improving financial Decision Support Systems , 2018, Decis. Support Syst..

[127]  Max A. Little,et al.  Exploiting Nonlinear Recurrence and Fractal Scaling Properties for Voice Disorder Detection , 2007, Biomedical engineering online.

[128]  Greg Hamerly,et al.  Learning the k in k-means , 2003, NIPS.

[129]  Michel Verleysen,et al.  K nearest neighbours with mutual information for simultaneous classification and missing data imputation , 2009, Neurocomputing.

[130]  Pierre Gançarski,et al.  Darwinian, Lamarckian, and Baldwinian (Co)Evolutionary Approaches for Feature Weighting in $K$-means-Based Algorithms , 2008, IEEE Transactions on Evolutionary Computation.

[131]  Yunming Ye,et al.  A new weighting k-means type clustering framework with an l2-norm regularization , 2018, Knowl. Based Syst..

[132]  Badong Chen,et al.  Weighted-permutation entropy: a complexity measure for time series incorporating amplitude information. , 2013, Physical review. E, Statistical, nonlinear, and soft matter physics.

[133]  Yijun Sun,et al.  Iterative RELIEF for Feature Weighting: Algorithms, Theories, and Applications , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[134]  Gongde Guo,et al.  Nearest neighbor classification of categorical data by attributes weighting , 2015, Expert Syst. Appl..

[135]  Kewei Cheng,et al.  Feature Selection , 2016, ACM Comput. Surv..

[136]  Nikola Bogunovic,et al.  A review of feature selection methods with applications , 2015, 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO).

[137]  Krzysztof Krawiec,et al.  Evolutionary weighting of image features for diagnosing of CNS tumors , 2000, Artif. Intell. Medicine.

[138]  Mengjie Zhang,et al.  Evaluation of particle swarm optimization based centroid classifier with different distance metrics , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[139]  David W. Aha,et al.  A Review and Empirical Evaluation of Feature Weighting Methods for a Class of Lazy Learning Algorithms , 1997, Artificial Intelligence Review.

[140]  Chang-Dong Wang,et al.  TW-Co-k-means: Two-level weighted collaborative k-means for multi-view clustering , 2018, Knowl. Based Syst..

[141]  Arun Ross,et al.  Score normalization in multimodal biometric systems , 2005, Pattern Recognit..

[142]  Jiye Liang,et al.  A weighting k-modes algorithm for subspace clustering of categorical data , 2013, Neurocomputing.

[143]  Michael K. Ng,et al.  An optimization algorithm for clustering using weighted dissimilarity measures , 2004, Pattern Recognit..

[144]  Timothy W. Finin,et al.  Delta TFIDF: An Improved Feature Space for Sentiment Analysis , 2009, ICWSM.

[145]  Yunming Ye,et al.  A feature group weighting method for subspace clustering of high-dimensional data , 2012, Pattern Recognit..

[146]  Philip S. Yu,et al.  Fast algorithms for projected clustering , 1999, SIGMOD '99.

[147]  Enrique Vidal,et al.  A class-dependent weighted dissimilarity measure for nearest neighbor classification problems , 2000, Pattern Recognit. Lett..

[148]  Eva Portillo,et al.  Analysis and Application of Normalization Methods with Supervised Feature Weighting to Improve K-means Accuracy , 2019, SOCO.

[149]  Bo Yang,et al.  A fast feature weighting algorithm of data gravitation classification , 2017, Inf. Sci..

[150]  Adnan Yazici,et al.  RELIEF-MM: effective modality weighting for multimedia information retrieval , 2014, Multimedia Systems.

[151]  Davar Giveki,et al.  Automatic Detection of Diabetes Diagnosis using Feature Weighted Support Vector Machines based on Mutual Information and Modified Cuckoo Search , 2012, ArXiv.

[152]  Olcay Kursun,et al.  Telediagnosis of Parkinson’s Disease Using Measurements of Dysphonia , 2010, Journal of Medical Systems.

[153]  Kyoung-jae Kim,et al.  Global optimization of case-based reasoning for breast cytology diagnosis , 2009, Expert Syst. Appl..

[154]  R. J. Kuo,et al.  Genetic intuitionistic weighted fuzzy k-modes algorithm for categorical data , 2019, Neurocomputing.