LSIS: Large scale instance selection algorithm for big data

Recently enormous volumes of data are generated in Information Systems, and data mining area is facing new challenges of transforming this “big data” into useful knowledge. To get from “big data” a manageable volume, we propose a large scale instance selection for reducing the initial dataset, leading to a reduction of both time taken and the computational resources that are necessary for performing the learning process, and improving the accuracy of classifier model. Our experimental results demonstrated that the proposed algorithms could scale well and efficiently process large datasets by selecting relevant instances for classification problem. The experimental results show also the contribution of the instance selection on the classification accuracy.

[1]  Joel Luis Carbonera,et al.  A Density-Based Approach for Instance Selection , 2015, 2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI).

[2]  Peter E. Hart,et al.  The condensed nearest neighbor rule (Corresp.) , 1968, IEEE Trans. Inf. Theory.

[3]  Juan José Rodríguez Diez,et al.  Instance selection of linear complexity for big data , 2016, Knowl. Based Syst..

[4]  Álvar Arnaiz-González,et al.  MR-DIS: democratic instance selection for big data by MapReduce , 2017, Progress in Artificial Intelligence.

[5]  Francisco Herrera,et al.  MRPR: A MapReduce solution for prototype reduction in big data classification , 2015, Neurocomputing.

[6]  Joel Luis Carbonera,et al.  A Novel Density-Based Approach for Instance Selection , 2016, 2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI).

[7]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[8]  Boonserm Kijsirikul,et al.  SV-kNNC: an algorithm for improving the efficiency of k-nearest neighbor , 2006 .

[9]  Francisco Herrera,et al.  Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Antonio González Muñoz,et al.  Three new instance selection methods based on local sets: A comparative study with several approaches from a bi-objective perspective , 2015, Pattern Recognit..