Scalable Approximate FRNN-OWA Classification

Fuzzy rough nearest neighbor classification with ordered weighted averaging operators (FRNN-OWA) is an algorithm that classifies unseen instances according to their membership in the fuzzy upper and lower approximations of the decision classes. Previous research has shown that the use of OWA operators increases the robustness of this model. However, calculating membership in an approximation requires a nearest neighbor search. In practice, the query time complexity of exact nearest neighbor search algorithms in more than a handful of dimensions is near linear, which limits the scalability of FRNN-OWA. Therefore, we propose approximate FRNN-OWA, a modified model that calculates upper and lower approximations of decision classes using the approximate nearest neighbors returned by hierarchical navigable small worlds (HNSW), a recent approximative nearest neighbor search algorithm with logarithmic query time complexity at constant near-100% accuracy. We demonstrate that approximate FRNN-OWA is sufficiently robust to match the classification accuracy of exact FRNN-OWA while scaling much more efficiently. We test four parameter configurations of HNSW and evaluate their performance by measuring classification accuracy and construction and query times for samples of various sizes from three large datasets. We find that with two of the parameter configurations, approximate FRNN-OWA achieves near-identical accuracy to exact FRNN-OWA for most sample sizes within query times that are up to several orders of magnitude faster.

[1]  Theresa Beaubouef,et al.  Rough Sets , 2019, Lecture Notes in Computer Science.

[2]  Francisco Herrera,et al.  IFROWANN: Imbalanced Fuzzy-Rough Ordered Weighted Average Nearest Neighbor Classification , 2015, IEEE Transactions on Fuzzy Systems.

[3]  Chris Cornelis,et al.  A Scalable Approach to Fuzzy Rough Nearest Neighbour Classification with Ordered Weighted Averaging Operators , 2019, IJCRS.

[4]  Francisco Herrera,et al.  Fuzzy rough classifiers for class imbalanced multi-instance data , 2016, Pattern Recognit..

[5]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[6]  Chris Cornelis,et al.  A New Approach to Fuzzy-Rough Nearest Neighbour Classification , 2008, RSCTC.

[7]  Hasan M. Asfoor,et al.  Fuzzy Rough Set Approximations in Large Scale Information Systems , 2015 .

[8]  Chris Cornelis,et al.  Computing fuzzy rough approximations in large scale information systems , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[9]  Leonid Boytsov,et al.  Engineering Efficient and Effective Non-metric Space Library , 2013, SISAP.

[10]  Chris Cornelis,et al.  Ordered Weighted Average Based Fuzzy Rough Sets , 2010, RSKT.

[11]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Vladimir Krylov,et al.  Approximate nearest neighbor algorithm based on navigable small world graphs , 2014, Inf. Syst..

[13]  Chris Cornelis,et al.  Weight selection strategies for ordered weighted average based fuzzy rough sets , 2019, Inf. Sci..

[14]  Chris Cornelis,et al.  Distributed fuzzy rough prototype selection for Big Data regression , 2015, 2015 Annual Conference of the North American Fuzzy Information Processing Society (NAFIPS) held jointly with 2015 5th World Conference on Soft Computing (WConSC).

[15]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[16]  P. Baldi,et al.  Searching for exotic particles in high-energy physics with deep learning , 2014, Nature Communications.

[17]  Pierre Baldi,et al.  Parameterized neural networks for high-energy physics , 2016, The European Physical Journal C.

[18]  D. Dubois,et al.  ROUGH FUZZY SETS AND FUZZY ROUGH SETS , 1990 .

[19]  Piotr Indyk,et al.  Nearest Neighbors in High-Dimensional Spaces , 2004, Handbook of Discrete and Computational Geometry, 2nd Ed..

[20]  Tianrui Li,et al.  Dynamical updating fuzzy rough approximations for hybrid data under the variation of attribute values , 2017, Inf. Sci..

[21]  Ronald R. Yager,et al.  On ordered weighted averaging aggregation operators in multicriteria decisionmaking , 1988, IEEE Trans. Syst. Man Cybern..

[22]  Qinghua Hu,et al.  On Robust Fuzzy Rough Set Models , 2012, IEEE Transactions on Fuzzy Systems.

[23]  Yury A. Malkov,et al.  Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Richard Jensen,et al.  Towards scalable fuzzy-rough feature selection , 2015, Inf. Sci..

[25]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[26]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[27]  Jianyu Huang,et al.  Performance optimization for the k-nearest neighbors kernel on x86 architectures , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[28]  Jiye Liang,et al.  Fuzzy-rough feature selection accelerator , 2015, Fuzzy Sets Syst..

[29]  Dun Liu,et al.  A fuzzy rough set approach for incremental feature selection on hybrid information systems , 2015, Fuzzy Sets Syst..

[30]  Jeff Johnson,et al.  Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.

[31]  David G. Lowe,et al.  Scalable Nearest Neighbor Algorithms for High Dimensional Data , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Chris Cornelis,et al.  A comprehensive study of implicator-conjunctor-based and noise-tolerant fuzzy rough sets: Definitions, properties and robustness analysis , 2015, Fuzzy Sets Syst..

[33]  Francisco Herrera,et al.  Dynamic affinity-based classification of multi-class imbalanced data with one-versus-one decomposition: a fuzzy rough set approach , 2018, Knowledge and Information Systems.

[34]  Witold Pedrycz,et al.  Large-Scale Multimodality Attribute Reduction With Multi-Kernel Fuzzy Rough Sets , 2018, IEEE Transactions on Fuzzy Systems.