Parallel Approaches to Neighborhood Rough Sets: Classification and Feature Selection

In these days, the ever-increasing volume of data requires that data mining algorithms should not only have high accuracy but also have high performance, which is really a challenge for the existing data analysis methods. Traditional algorithms, such as classification and feature selection under neighborhood rough sets, have been proved to be very effective in real applications. Parallel approach to these traditional algorithms could be a way to take the challenge. This is what we present in this paper, the design of parallel approaches to neighborhood rough sets and the implementation of classification and feature selection. Two optimizing strategies are proposed to improve the performance of the approaches: (1) The distributed cache is used to reduce I/O time. (2) Most of computations are put into the Map phase which helps reduce the overhead of communication. The experimental results show that the proposed algorithms scale pretty well and the speedup is getting higher with the increasing size of data.