Speeding up multiple instance learning classification rules on GPUs

Multiple instance learning is a challenging task in supervised learning and data mining. However, algorithm performance becomes slow when learning from large-scale and high-dimensional data sets. Graphics processing units (GPUs) are being used for reducing computing time of algorithms. This paper presents an implementation of the G3P-MI algorithm on GPUs for solving multiple instance problems using classification rules. The GPU model proposed is distributable to multiple GPUs, seeking for its scalability across large-scale and high-dimensional data sets. The proposal is compared to the multi-threaded CPU algorithm with streaming SIMD extensions parallelism over a series of data sets. Experimental results report that the computation time can be significantly reduced and its scalability improved. Specifically, an speedup of up to 149$$\times $$× can be achieved over the multi-threaded CPU algorithm when using four GPUs, and the rules interpreter achieves great efficiency and runs over 108 billion genetic programming operations per second.

[1]  Zhi-Hua Zhou,et al.  Multi-Instance Learning Based Web Mining , 2005, Applied Intelligence.

[2]  Chun-Nan Hsu,et al.  Bayesian classification for data from the same unknown class , 2002, IEEE Trans. Syst. Man Cybern. Part B.

[3]  Renato A. Krohling,et al.  A co-evolutionary differential evolution algorithm for solving min-max optimization problems implemented on GPU using C-CUDA , 2012, Expert Syst. Appl..

[4]  William B. Langdon,et al.  A SIMD Interpreter for Genetic Programming on GPU Graphics Cards , 2007, EuroGP.

[5]  Thomas Gärtner,et al.  Multi-Instance Kernels , 2002, ICML.

[6]  Alex Alves Freitas,et al.  A Review of evolutionary Algorithms for Data Mining , 2008, Soft Computing for Knowledge Discovery and Data Mining.

[7]  Tien-Tsin Wong,et al.  Evolutionary Computing on Consumer Graphics Hardware , 2007, IEEE Intelligent Systems.

[8]  Xiaolong Wu,et al.  Exploiting More Parallelism from Applications Having Generalized Reductions on GPU Architectures , 2010, 2010 10th IEEE International Conference on Computer and Information Technology.

[9]  Yann Chevaleyre,et al.  Solving Multiple-Instance and Multiple-Part Learning Problems with Decision Trees and Rule Sets. Application to the Mutagenesis Problem , 2001, Canadian Conference on AI.

[10]  Alex A. Freitas,et al.  A survey of evolutionary algorithms for data mining and knowledge discovery , 2003 .

[11]  Xiaojun Qi,et al.  Incorporating multiple SVMs for automatic image annotation , 2007, Pattern Recognit..

[12]  Wolfgang Banzhaf,et al.  Fast Genetic Programming on GPUs , 2007, EuroGP.

[13]  Deepak Khemani,et al.  Interpretable and reconfigurable clustering of document datasets by deriving word-based rules , 2009, Knowledge and Information Systems.

[14]  James R. Foulds,et al.  Speeding Up and Boosting Diverse Density Learning , 2010, Discovery Science.

[15]  Jaume Bacardit,et al.  Speeding up the evaluation of evolutionary learning systems using GPGPUs , 2010, GECCO '10.

[16]  William B. Langdon,et al.  Graphics processing units and genetic programming: an overview , 2011, Soft Comput..

[17]  Darren M. Chitty Fast parallel genetic programming: multi-core CPU versus many-core GPU , 2012, Soft Comput..

[18]  Horst Bischof,et al.  PROST: Parallel robust online simple tracking , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[19]  Tomás Lozano-Pérez,et al.  A Framework for Multiple-Instance Learning , 1997, NIPS.

[20]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[21]  Sebastián Ventura,et al.  Multi-instance genetic programming for predicting student performance in web based educational environments , 2012, Appl. Soft Comput..

[22]  Sisi Chen,et al.  An Empirical Study on Multi-instance Learning , 2012 .

[23]  Ian Witten,et al.  Data Mining , 2000 .

[24]  Sheng Gao,et al.  Exploiting generalized discriminative multiple instance learning for multimedia semantic concept detection , 2008, Pattern Recognit..

[25]  César Hervás-Martínez,et al.  JCLEC: a Java framework for evolutionary computation , 2007, Soft Comput..

[26]  Bart Baesens,et al.  An empirical evaluation of the comprehensibility of decision table, tree and rule based predictive models , 2011, Decis. Support Syst..

[27]  James R. Foulds,et al.  A review of multi-instance learning assumptions , 2010, The Knowledge Engineering Review.

[28]  Yann Chevaleyre,et al.  Learning Rules from Multiple Instance Data: Issues and Algorithms , 2001 .

[29]  Stuart Harvey Rubin,et al.  A Human-Centered Multiple Instance Learning Framework for Semantic Video Retrieval , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[30]  Peter A. Whigham,et al.  Grammar-based Genetic Programming: a survey , 2010, Genetic Programming and Evolvable Machines.

[31]  Yann Chevaleyre,et al.  Solving multiple-instance and multiple-part learning problems with decision trees and decision rules . Application to the mutagenesis problem , 2000 .

[32]  Frederico G. Guimarães,et al.  Multi-Objective Differential Evolution on the GPU with C-CUDA , 2012, SOCO.

[33]  Wolfgang Banzhaf,et al.  Accelerating Genetic Programming through Graphics Processing Units. , 2009 .

[34]  Naftali Tishby,et al.  Multi-instance learning with any hypothesis class , 2011, J. Mach. Learn. Res..

[35]  Dr. Alex A. Freitas Data Mining and Knowledge Discovery with Evolutionary Algorithms , 2002, Natural Computing Series.

[36]  Lukasz A. Kurgan,et al.  mi-DS: Multiple-Instance Learning Algorithm , 2013, IEEE Transactions on Cybernetics.

[37]  Bernhard Pfahringer,et al.  A Two-Level Learning Method for Generalized Multi-instance Problems , 2003, ECML.

[38]  Sebastián Ventura,et al.  Speeding up the evaluation phase of GP classification algorithms on GPUs , 2012, Soft Comput..

[39]  Kristin P. Bennett,et al.  Fast Bundle Algorithm for Multiple-Instance Learning , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Sebastián Ventura,et al.  Multi-objective approach based on grammar-guided genetic programming for solving multiple instance problems , 2012, Soft Comput..

[41]  Thomas Hofmann,et al.  Support Vector Machines for Multiple-Instance Learning , 2002, NIPS.

[42]  Jie Xu,et al.  Region-based image categorization with reduced feature set , 2008, 2008 IEEE 10th Workshop on Multimedia Signal Processing.

[43]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[44]  Tony White,et al.  Stock trading strategy creation using GP on GPU , 2012, Soft Comput..

[45]  Tao Mei,et al.  MILC2: A Multi-Layer Multi-Instance Learning Approach to Video Concept Detection , 2008, MMM.

[46]  Zhi-Hua Zhou,et al.  Solving multi-instance problems with classifier ensemble based on constructive clustering , 2007, Knowledge and Information Systems.

[47]  William B. Langdon,et al.  GP on SPMD parallel graphics hardware for mega Bioinformatics data mining , 2008, Soft Comput..

[48]  Jun Wang,et al.  Solving the Multiple-Instance Problem: A Lazy Learning Approach , 2000, ICML.

[49]  Sebastián Ventura,et al.  Multiple instance learning for classifying students in learning management systems , 2011, Expert Syst. Appl..

[50]  Francisco Herrera,et al.  A Survey on the Application of Genetic Programming to Classification , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[51]  Pawel B. Myszkowski,et al.  GPGPU Implementation of Evolutionary Algorithm for Images Clustering , 2013, Advanced Methods for Computational Collective Intelligence.

[52]  Lijun Liu,et al.  An efficient parallel neural network-based multi-instance learning algorithm , 2012, The Journal of Supercomputing.

[53]  Sebastián Ventura,et al.  G3P-MI: A genetic programming algorithm for multiple instance learning , 2010, Inf. Sci..

[54]  Mariusz Boryczka,et al.  Dynamic Parameters in GP and LGP , 2013, Advanced Methods for Computational Collective Intelligence.

[55]  Zhijian Wu,et al.  Parallel differential evolution with self-adapting control parameters and generalized opposition-based learning for solving high-dimensional optimization problems , 2013, J. Parallel Distributed Comput..

[56]  William B. Langdon,et al.  A Many Threaded CUDA Interpreter for Genetic Programming , 2010, EuroGP.