Distributed Selection of Continuous Features in Multilabel Classification Using Mutual Information

Multilabel learning is a challenging task demanding scalable methods for large-scale data. Feature selection has shown to improve multilabel accuracy while defying the curse of dimensionality of high-dimensional scattered data. However, the increasing complexity of multilabel feature selection, especially on continuous features, requires new approaches to manage data effectively and efficiently in distributed computing environments. This article proposes a distributed model for mutual information (MI) adaptation on continuous features and multiple labels on Apache Spark. Two approaches are presented based on MI maximization, and minimum redundancy and maximum relevance. The former selects the subset of features that maximize the MI between the features and the labels, whereas the latter additionally minimizes the redundancy between the features. Experiments compare the distributed multilabel feature selection methods on 10 data sets and 12 metrics. Results validated through statistical analysis indicate that our methods outperform reference methods for distributed feature selection for multilabel data, while MIM also reduces the runtime in orders of magnitude.

[1]  Igor Vajda,et al.  Estimation of the Information by an Adaptive Partitioning of the Observation Space , 1999, IEEE Trans. Inf. Theory.

[2]  Brian C. Ross Mutual Information between Discrete and Continuous Data Sets , 2014, PloS one.

[3]  Dae-Won Kim,et al.  Fast multi-label feature selection based on information-theoretic feature ranking , 2015, Pattern Recognit..

[4]  Miao Xu,et al.  Multi-Label Learning with PRO Loss , 2013, AAAI.

[5]  Min-Ling Zhang,et al.  Lift: Multi-Label Learning with Label-Specific Features , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Weiwei Liu,et al.  Doubly Approximate Nearest Neighbor Classification , 2018, AAAI.

[7]  Zhi-Hua Zhou,et al.  Multi-instance multi-label learning , 2008, Artif. Intell..

[8]  Dae-Won Kim,et al.  Multi-Label Learning Using Mathematical Programming , 2015, IEICE Trans. Inf. Syst..

[9]  Sebastián Ventura,et al.  Scalable extensions of the ReliefF algorithm for weighting and selecting features on the multi-label learning context , 2015, Neurocomputing.

[10]  Michel Verleysen,et al.  Mutual information for the selection of relevant variables in spectrometric nonlinear modelling , 2006, ArXiv.

[11]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[12]  Weiwei Liu,et al.  Deep Discrete Prototype Multilabel Learning , 2018, IJCAI.

[13]  D. Freedman,et al.  On the histogram as a density estimator:L2 theory , 1981 .

[14]  Grigorios Tsoumakas,et al.  Multi-label classification of music by emotion , 2011, EURASIP J. Audio Speech Music. Process..

[15]  Gregory Ditzler,et al.  A Sequential Learning Approach for Scaling Up Filter-Based Feature Subset Selection , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[16]  Qinghua Hu,et al.  Multi-label feature selection based on max-dependency and min-redundancy , 2015, Neurocomputing.

[17]  Grigorios Tsoumakas,et al.  On the Stratification of Multi-label Data , 2011, ECML/PKDD.

[18]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[19]  Michel Verleysen,et al.  Mutual information-based feature selection for multilabel classification , 2013, Neurocomputing.

[20]  Qinghua Hu,et al.  Multi-label feature selection with missing labels , 2018, Pattern Recognit..

[21]  Weiwei Liu,et al.  Making Decision Trees Feasible in Ultrahigh Feature and Label Dimensions , 2017, J. Mach. Learn. Res..

[22]  Dae-Won Kim,et al.  Feature selection for multi-label classification using multivariate mutual information , 2013, Pattern Recognit. Lett..

[23]  Dae-Won Kim,et al.  Mutual Information-based multi-label feature selection using interaction information , 2015, Expert Syst. Appl..

[24]  Jianhua Xu,et al.  Multi-label regularized quadratic programming feature selection algorithm with Frank-Wolfe method , 2018, Expert Syst. Appl..

[25]  Sebastián Ventura,et al.  LAIM discretization for multi-label data , 2016, Inf. Sci..

[26]  Dae-Won Kim,et al.  SCLS: Multi-label feature selection based on scalable criterion for large label set , 2017, Pattern Recognit..

[27]  Weiwei Liu,et al.  Multilabel Prediction via Cross-View Search , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[28]  Qinghua Hu,et al.  Feature Selection Based on Neighborhood Discrimination Index , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[29]  Jesse Read,et al.  A Pruned Problem Transformation Method for Multi-label Classification , 2008 .

[30]  Wei Gao,et al.  Learning safe multi-label prediction for weakly labeled data , 2017, Machine Learning.

[31]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[32]  Igor Vajda,et al.  Entropy expressions for multivariate continuous distributions , 2000, IEEE Trans. Inf. Theory.

[33]  Sebastián Ventura,et al.  Distributed multi-label feature selection using individual mutual information measures , 2020, Knowl. Based Syst..

[34]  Roberto Battiti,et al.  Feature Selection Based on the Neighborhood Entropy , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[35]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[36]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[37]  Tong Wei,et al.  Does Tail Label Help for Large-Scale Multi-Label Learning? , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[38]  Charles Elkan,et al.  Quadratic Programming Feature Selection , 2010, J. Mach. Learn. Res..

[39]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[40]  Francisco Herrera,et al.  Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power , 2010, Inf. Sci..

[41]  Marian Bubak,et al.  Perspectives on grid computing , 2010, Future Gener. Comput. Syst..

[42]  Newton Spolaôr,et al.  ReliefF for Multi-label Feature Selection , 2013, 2013 Brazilian Conference on Intelligent Systems.

[43]  Dae-Won Kim,et al.  Accelerating Multi-Label Feature Selection Based on Low-Rank Approximation , 2016, IEICE Trans. Inf. Syst..

[44]  Grigorios Tsoumakas,et al.  Mining Multi-label Data , 2010, Data Mining and Knowledge Discovery Handbook.

[45]  Pablo A. Estévez,et al.  A review of feature selection methods based on mutual information , 2013, Neural Computing and Applications.

[46]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[47]  Sebastián Ventura,et al.  Large-Scale Multi-label Ensemble Learning on Spark , 2017, 2017 IEEE Trustcom/BigDataSE/ICESS.

[48]  Tong Wei,et al.  Learning Compact Model for Large-Scale Multi-Label Data , 2019, AAAI.

[49]  Víctor Robles,et al.  Feature selection for multi-label naive Bayes classification , 2009, Inf. Sci..

[50]  Newton Spolaôr,et al.  A Comparison of Multi-label Feature Selection Methods using the Problem Transformation Approach , 2013, CLEI Selected Papers.

[51]  Newton Spolaôr,et al.  A Framework to Generate Synthetic Multi-label Datasets , 2014, CLEI Selected Papers.

[52]  Weiwei Liu,et al.  An Easy-to-hard Learning Paradigm for Multiple Classes and Multiple Labels , 2017, J. Mach. Learn. Res..

[53]  Sebastián Ventura,et al.  ReliefF-ML: An Extension of ReliefF Algorithm to Multi-label Learning , 2013, CIARP.

[54]  Sebastián Ventura,et al.  Distributed nearest neighbor classification for large-scale multi-label data on spark , 2018, Future Gener. Comput. Syst..

[55]  Qiang Yang,et al.  Document Transformation for Multi-label Feature Selection in Text Categorization , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[56]  Witold Pedrycz,et al.  Granular multi-label feature selection based on mutual information , 2017, Pattern Recognit..

[57]  J. Victor Binless strategies for estimation of information from neural data. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[58]  Grigorios Tsoumakas,et al.  Random k -Labelsets: An Ensemble Method for Multilabel Classification , 2007, ECML.