论文信息 - Parallel Approaches to Neighborhood Rough Sets: Classification and Feature Selection

Parallel Approaches to Neighborhood Rough Sets: Classification and Feature Selection

In these days, the ever-increasing volume of data requires that data mining algorithms should not only have high accuracy but also have high performance, which is really a challenge for the existing data analysis methods. Traditional algorithms, such as classification and feature selection under neighborhood rough sets, have been proved to be very effective in real applications. Parallel approach to these traditional algorithms could be a way to take the challenge. This is what we present in this paper, the design of parallel approaches to neighborhood rough sets and the implementation of classification and feature selection. Two optimizing strategies are proposed to improve the performance of the approaches: (1) The distributed cache is used to reduce I/O time. (2) Most of computations are put into the Map phase which helps reduce the overhead of communication. The experimental results show that the proposed algorithms scale pretty well and the speedup is getting higher with the increasing size of data.

[1] Da Ruan,et al. Rough sets based matrix approaches with dynamic attribute variation in set-valued information systems , 2012, Int. J. Approx. Reason..

[2] Qinghua Hu,et al. Neighborhood rough set based heterogeneous feature subset selection , 2008, Inf. Sci..

[3] Qinghua Hu,et al. Neighborhood classifiers , 2008, Expert Syst. Appl..

[4] Howard Gobioff,et al. The Google file system , 2003, SOSP '03.

[5] Da Ruan,et al. A parallel method for computing rough set approximations , 2012, Inf. Sci..

[6] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[7] Judy Qiu,et al. HyMR: a hybrid MapReduce workflow system , 2012, ECMLS '12.

[8] Geert Wets,et al. A rough sets based characteristic relation approach for dynamic attribute generalization in data mining , 2007, Knowl. Based Syst..

[9] Jian Pei,et al. Data Mining: Concepts and Techniques, 3rd edition , 2006 .

[10] Yi Pan,et al. International Journal of Approximate Reasoning a Comparison of Parallel Large-scale Knowledge Acquisition Using Rough Set Theory on Different Mapreduce Runtime Systems , 2022 .

[11] Catherine Blake,et al. UCI Repository of machine learning databases , 1998 .

[12] Witold Pedrycz,et al. Selecting Discrete and Continuous Features Based on Neighborhood Decision Error Minimization , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).