A Scalable Approach to Fuzzy Rough Nearest Neighbour Classification with Ordered Weighted Averaging Operators

Fuzzy rough sets have been successfully applied in classification tasks, in particular in combination with OWA operators. There has been a lot of research into adapting algorithms for use with Big Data through parallelisation, but no concrete strategy exists to design a Big Data fuzzy rough sets based classifier. Existing Big Data approaches use fuzzy rough sets for feature and prototype selection, and have often not involved very large datasets. We fill this gap by presenting the first Big Data extension of an algorithm that uses fuzzy rough sets directly to classify test instances, a distributed implementation of FRNN-OWA in Apache Spark. Through a series of systematic tests involving generated datasets, we demonstrate that it can achieve a speedup effectively equal to the number of computing cores used, meaning that it can scale to arbitrarily large datasets.

[1]  Francisco Herrera,et al.  Exact fuzzy k-nearest neighbor classification for big datasets , 2017, 2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE).

[2]  Tianrui Li,et al.  Dynamical updating fuzzy rough approximations for hybrid data under the variation of attribute values , 2017, Inf. Sci..

[3]  Jiye Liang,et al.  Fuzzy-rough feature selection accelerator , 2015, Fuzzy Sets Syst..

[4]  Dun Liu,et al.  A fuzzy rough set approach for incremental feature selection on hybrid information systems , 2015, Fuzzy Sets Syst..

[5]  Ronald R. Yager,et al.  On ordered weighted averaging aggregation operators in multicriteria decisionmaking , 1988, IEEE Trans. Syst. Man Cybern..

[6]  Chris Cornelis,et al.  Computing fuzzy rough approximations in large scale information systems , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[7]  Francisco Herrera,et al.  IFROWANN: Imbalanced Fuzzy-Rough Ordered Weighted Average Nearest Neighbor Classification , 2015, IEEE Transactions on Fuzzy Systems.

[8]  Francisco Herrera,et al.  Fuzzy rough classifiers for class imbalanced multi-instance data , 2016, Pattern Recognit..

[9]  Richard Jensen,et al.  Towards scalable fuzzy-rough feature selection , 2015, Inf. Sci..

[10]  Francisco Herrera,et al.  OWA-FRPS: A Prototype Selection Method Based on Ordered Weighted Average Fuzzy Rough Set Theory , 2013, RSFDGrC.

[11]  Pierre Baldi,et al.  Parameterized neural networks for high-energy physics , 2016, The European Physical Journal C.

[12]  P. Baldi,et al.  Searching for exotic particles in high-energy physics with deep learning , 2014, Nature Communications.

[13]  Chris Cornelis,et al.  Fuzzy rough positive region based nearest neighbour classification , 2012, 2012 IEEE International Conference on Fuzzy Systems.

[14]  D. Dubois,et al.  ROUGH FUZZY SETS AND FUZZY ROUGH SETS , 1990 .

[15]  Francisco Herrera,et al.  kNN-IS: An Iterative Spark-based design of the k-Nearest Neighbors classifier for big data , 2017, Knowl. Based Syst..

[16]  Chris Cornelis,et al.  A New Approach to Fuzzy-Rough Nearest Neighbour Classification , 2008, RSCTC.

[17]  Chris Cornelis,et al.  Distributed fuzzy rough prototype selection for Big Data regression , 2015, 2015 Annual Conference of the North American Fuzzy Information Processing Society (NAFIPS) held jointly with 2015 5th World Conference on Soft Computing (WConSC).

[18]  Francisco Herrera,et al.  Dynamic affinity-based classification of multi-class imbalanced data with one-versus-one decomposition: a fuzzy rough set approach , 2018, Knowledge and Information Systems.

[19]  Witold Pedrycz,et al.  Large-Scale Multimodality Attribute Reduction With Multi-Kernel Fuzzy Rough Sets , 2018, IEEE Transactions on Fuzzy Systems.

[20]  Hasan M. Asfoor,et al.  Fuzzy Rough Set Approximations in Large Scale Information Systems , 2015 .