A Distributed Rough Evidential K-NN Classifier: Integrating Feature Reduction and Classification

The Evidential K-Nearest Neighbor (EK-NN) classification rule provides a global treatment of imperfect knowledge in class labels, but still suffers from the curse of dimensionality as well as runtime and memory restrictions when performing nearest neighbors search, in particular for large and highdimensional data. To avoid the curse of dimensionality, this paper first proposes a rough evidential K-NN (REK-NN) classification rule in the framework of rough set theory. Based on a reformulated K-NN rough set model, REK-NN selects features and thus reduces complexity by minimizing a proposed neighborhood pignistic decision error rate, which considers both Bayes decision error and spatial information among samples in feature space. In contrast to existing rough set-based feature selection methods, REK-NN is a synchronized rule rather than a stepwise one, in the sense that feature selection and learning are performed simultaneously. In order to further handle data with large sample size, we derive a distributed REK-NN method and implement it in the Apache Spark. The theoretical analysis of the classifier generalization error bound is finally presented. It is shown that the distributed REK-NN achieves good performances while drastically reducing the number of features and consuming less runtime and memory. Numerical experiments conducted on realworld datasets validate our conclusions.

[1]  Byung Ro Moon,et al.  Hybrid Genetic Algorithms for Feature Selection , 2004, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Sanjay Ghemawat,et al.  MapReduce: a flexible data processing tool , 2010, CACM.

[3]  Thierry Denoeux,et al.  An evidence-theoretic k-NN rule with parameter optimization , 1998, IEEE Trans. Syst. Man Cybern. Part C.

[4]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[5]  Qinghua Hu,et al.  Fuzzy probabilistic approximation spaces and their information measures , 2006, IEEE Transactions on Fuzzy Systems.

[6]  Zied Elouedi,et al.  Ensemble Enhanced Evidential k-NN Classifier Through Random Subspaces , 2017, ECSQARU.

[7]  Qinghua Hu,et al.  Neighborhood rough set based heterogeneous feature subset selection , 2008, Inf. Sci..

[8]  Zied Elouedi,et al.  Ensemble Enhanced Evidential k-NN Classifier Through Rough Set Reducts , 2018, IPMU.

[9]  Zhi-gang Su,et al.  Minimizing neighborhood evidential decision error for feature evaluation and selection based on evidence theory , 2012, Expert Syst. Appl..

[10]  Philippe Smets,et al.  Classification Using Belief Functions: Relationship Between Case-Based and Model-Based Approaches , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[11]  F. Pichon,et al.  T-norm and uninorm-based combination of belief functions , 2008, NAFIPS 2008 - 2008 Annual Meeting of the North American Fuzzy Information Processing Society.

[12]  Alexandros Nanopoulos,et al.  Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data , 2010, J. Mach. Learn. Res..

[13]  Keinosuke Fukunaga,et al.  A Branch and Bound Algorithm for Feature Subset Selection , 1977, IEEE Transactions on Computers.

[14]  Witold Pedrycz,et al.  Measuring relevance between discrete and continuous features based on neighborhood mutual information , 2011, Expert Syst. Appl..

[15]  Thierry Denoeux,et al.  A k-nearest neighbor classification rule based on Dempster-Shafer theory , 1995, IEEE Trans. Syst. Man Cybern..

[16]  Qinghua Hu,et al.  Neighbor Inconsistent Pair Selection for Attribute Reduction by Rough Set Approach , 2018, IEEE Transactions on Fuzzy Systems.

[17]  Thierry Denoeux,et al.  An evidential classifier based on feature selection and two-step classification strategy , 2015, Pattern Recognit..

[18]  Francisco Herrera,et al.  Fast and Scalable Approaches to Accelerate the Fuzzy k-Nearest Neighbors Classifier for Big Data , 2020, IEEE Transactions on Fuzzy Systems.

[19]  Thierry Denoeux,et al.  Decision-Making with Belief Functions: a Review , 2018, Int. J. Approx. Reason..

[20]  Philippe Smets,et al.  The Transferable Belief Model , 1991, Artif. Intell..

[21]  Thierry Denoeux,et al.  Dissimilarity Metric Learning in the Belief Function Framework , 2016, IEEE Transactions on Fuzzy Systems.

[22]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[23]  Quan Pan,et al.  Hybrid Classification System for Uncertain Data , 2017, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[24]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[25]  Quan Pan,et al.  A new belief-based K-nearest neighbor classification method , 2013, Pattern Recognit..

[26]  Thierry Denoeux,et al.  BPEC: Belief-Peaks Evidential Clustering , 2019, IEEE Transactions on Fuzzy Systems.

[27]  Francisco Herrera,et al.  kNN-IS: An Iterative Spark-based design of the k-Nearest Neighbors classifier for big data , 2017, Knowl. Based Syst..

[28]  Glenn Shafer,et al.  A Mathematical Theory of Evidence turns 40 , 2016, Int. J. Approx. Reason..

[29]  Zhi-gang Su,et al.  Immune genetic algorithm-based adaptive evidential model for estimating unmeasured parameter: Estimating levels of coal powder filling in ball mill , 2010, Expert Syst. Appl..

[30]  Thierry Denoeux,et al.  40 years of Dempster-Shafer theory , 2016, Int. J. Approx. Reason..

[31]  Arthur P. Dempster,et al.  Upper and Lower Probabilities Induced by a Multivalued Mapping , 1967, Classic Works of the Dempster-Shafer Theory of Belief Functions.

[32]  Patrick Wendell,et al.  Learning Spark: Lightning-Fast Big Data Analytics , 2015 .

[33]  Witold Pedrycz,et al.  Large-Scale Multimodality Attribute Reduction With Multi-Kernel Fuzzy Rough Sets , 2018, IEEE Transactions on Fuzzy Systems.

[34]  Jesús Alcalá-Fdez,et al.  KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[35]  Didier Dubois,et al.  Representations of Uncertainty in Artificial Intelligence: Probability and Possibility , 2020, A Guided Tour of Artificial Intelligence Research.

[36]  Thierry Denoeux,et al.  Evidential Clustering: A Review , 2016, IUKM.

[37]  Witold Pedrycz,et al.  Selecting Discrete and Continuous Features Based on Neighborhood Decision Error Minimization , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[38]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[39]  Janusz Zalewski,et al.  Rough sets: Theoretical aspects of reasoning about data , 1996 .

[40]  Thierry Denoeux,et al.  A new evidential K-nearest neighbor rule based on contextual discounting with partially supervised learning , 2019, Int. J. Approx. Reason..

[41]  Evaggelia Pitoura,et al.  Distributed In-Memory Processing of All k Nearest Neighbor Queries , 2016, IEEE Transactions on Knowledge and Data Engineering.

[42]  Degang Chen,et al.  Active Sample Selection Based Incremental Algorithm for Attribute Reduction With Rough Sets , 2017, IEEE Transactions on Fuzzy Systems.

[43]  Thierry Denoeux,et al.  Analysis of evidence-theoretic decision rules for pattern classification , 1997, Pattern Recognit..

[44]  T. Denœux Conjunctive and disjunctive combination of belief functions induced by nondistinct bodies of evidence , 2008 .