Parallel approaches to machine learning - A comprehensive survey

Literature has always witnessed efforts that make use of parallel algorithms / parallel architecture to improve performance; machine learning space is no exception. In fact, a considerable effort has gone into this area in the past fifteen years. Our report attempts to bring together and consolidate such attempts. It tracks the development in this area since the inception of the idea in 1995, identifies different phases during the time period 1995-2011 and marks important achievements. When it comes to performance enhancement, GPU platforms have carved a special niche for themselves. The strength of these platforms comes from the capability to speed up computations exponentially by way of parallel architecture / programming methods. While it is evident that computationally complex processes like image processing, gaming etc. stand to gain much from parallel architectures; studies suggest that general purpose tasks such as machine learning, graph traversal, and finite state machines are also identified as the parallel applications of the future. Map reduce is another important technique that has evolved during this period and as the literature has it, it has been proved to be an important aid in delivering performance of machine learning algorithms on GPUs. The report summarily presents the path of developments.

[1]  Michael Oldroyd Optimisation of Massively Parallel Neural Networks , 2004 .

[2]  Roberto J. Bayardo,et al.  PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce , 2009, Proc. VLDB Endow..

[3]  Heiga Zen,et al.  An implementation of decision tree-based context clustering on graphics processing units , 2010, INTERSPEECH.

[4]  HyoukJoong Lee,et al.  Final Project Implementing Extremely Randomized Trees in CUDA , 2011 .

[5]  Mohammed J. Zaki Parallel and distributed association mining: a survey , 1999, IEEE Concurr..

[6]  Edward Y. Chang,et al.  Pfp: parallel fp-growth for query recommendation , 2008, RecSys '08.

[7]  Kevin Skadron,et al.  A performance study of general-purpose applications on graphics processors using CUDA , 2008, J. Parallel Distributed Comput..

[8]  Austin Carpenter,et al.  CUSVM: A CUDA IMPLEMENTATION OF SUPPORT VECTOR CLASSIFICATION AND REGRESSION , 2009 .

[9]  Thomas E. Potok,et al.  The GPU Enhanced Parallel Computing for Large Scale Data Clustering , 2011, 2011 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery.

[10]  Ruoming Jin,et al.  Communication and Memory Efficient Parallel Decision Tree Construction , 2003, SDM.

[11]  Baton Rouge,et al.  A Parallel Artificial Neural Network Implementation , 2006 .

[12]  Frank Mueller,et al.  Data-intensive document clustering on graphics processing unit (GPU) clusters , 2011, J. Parallel Distributed Comput..

[13]  Dinesh Manocha,et al.  Fast and approximate stream mining of quantiles and frequencies using graphics processors , 2005, SIGMOD '05.

[14]  James L. McClelland,et al.  Précis of Semantic Cognition: A Parallel Distributed Processing Approach , 2008, Behavioral and Brain Sciences.

[15]  George Dahl,et al.  Parallelizing neural network training for cluster systems , 2008 .

[16]  Udo Seiffert,et al.  Artificial Neural Networks on Massively Parallel Computer Hardware , 2004, ESANN.

[17]  Michael Granitzer,et al.  Accelerating K-Means on the Graphics Processor via CUDA , 2009, 2009 First International Conference on Intensive Applications and Services.

[18]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[19]  Kurt Keutzer,et al.  A map reduce framework for programming graphics processors , 2010 .

[20]  Srinivasan Parthasarathy,et al.  Parallel Algorithms for Discovery of Association Rules , 1997, Data Mining and Knowledge Discovery.

[21]  Srinivasan Parthasarathy,et al.  Parallel Data Mining for Association Rules on Shared-Memory Multi-Processors , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[22]  James L. McClelland,et al.  Semantic Cognition: A Parallel Distributed Processing Approach , 2004 .

[23]  Mitica Craus,et al.  A generalized parallel algorithm for frequent itemset mining , 2008 .

[24]  Naga K. Govindaraju,et al.  A Survey of General‐Purpose Computation on Graphics Hardware , 2007 .

[25]  Nitesh V. Chawla,et al.  A parallel decision tree builder for mining very large visualization datasets , 2000, SMC.

[26]  Jiawei Han,et al.  A fast distributed algorithm for mining association rules , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[27]  Mohammed J. Zaki,et al.  Parallel classification for data mining on shared-memory multiprocessors , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[28]  Kunle Olukotun,et al.  Map-Reduce for Machine Learning on Multicore , 2006, NIPS.

[29]  Rasmus Ulslev Pedersen,et al.  Object oriented machine learning with a multicore real-time Java processor: short paper , 2010, JTRES '10.

[30]  Srinivasan Parthasarathy,et al.  Parallel Data Mining for Association Rules on Shared-memory Systems , 1998 .

[31]  Bingsheng He,et al.  Frequent itemset mining on graphics processors , 2009, DaMoN '09.

[32]  Andreas Mueller,et al.  Fast sequential and parallel algorithms for association rule mining: a comparison , 1995 .

[33]  David K. McAllister,et al.  Fast Matrix Multiplies Using Graphics Hardware , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[34]  Kurt Keutzer,et al.  Fast support vector machine training and classification on graphics processors , 2008, ICML '08.

[35]  Viktor K. Prasanna,et al.  Parallel Evidence Propagation on Multicore Processors , 2009, PaCT.

[36]  Laurence Boxer,et al.  Scalable Parallel Algorithms for Geometric Pattern Recognition , 1999, J. Parallel Distributed Comput..

[37]  Bingsheng He,et al.  Efficient gather and scatter operations on graphics processors , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[38]  Osmar R. Zaïane,et al.  Parallel leap: large-scale maximal pattern mining in a distributed environment , 2006, 12th International Conference on Parallel and Distributed Systems - (ICPADS'06).

[39]  David Wai-Lok Cheung,et al.  Effect of Data Skewness in Parallel Mining of Association Rules , 1998, PAKDD.

[40]  John Shalf,et al.  The new landscape of parallel computer architecture , 2007 .

[41]  Thomas E. Potok,et al.  Parallel latent semantic analysis using a graphics processing unit , 2009, GECCO '09.

[42]  Yael Ben-Haim,et al.  A Streaming Parallel Decision Tree Algorithm , 2010, J. Mach. Learn. Res..

[43]  Masami Ito,et al.  Task decomposition and module combination based on class relations: a modular neural network for pattern classification , 1999, IEEE Trans. Neural Networks.

[44]  Srinivasan Parthasarathy,et al.  Parallel and Distributed Frequent Itemset Mining on Dynamic Datasets , 2003, HiPC.

[45]  Naga K. Govindaraju,et al.  Mars: A MapReduce Framework on graphics processors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[46]  Vipin Kumar,et al.  ScalParC: a new scalable and efficient parallel classification algorithm for mining large datasets , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[47]  Toby Sharp,et al.  Implementing Decision Trees and Forests on a GPU , 2008, ECCV.

[48]  Jason Weston,et al.  Large-scale kernel machines , 2007 .

[49]  Masaru Kitsuregawa,et al.  Hash based parallel algorithms for mining association rules , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[50]  Chia-Chu Chiang,et al.  A Parallel Apriori Algorithm for Frequent Itemsets Mining , 2006, Fourth International Conference on Software Engineering Research, Management and Applications (SERA'06).

[51]  Frank Mueller,et al.  Data-Intensive Document Clustering on GPU Clusters ✩,✩✩ , 2010 .

[52]  Thomas E. Potok,et al.  Flocking-based Document Clustering on the Graphics Processing Unit , 2007, NICSO.

[53]  Frank Mueller,et al.  Large-scale multi-dimensional document clustering on GPU clusters , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[54]  Krste Asanovic,et al.  Parallel neural network training on Multi-Spert , 1997, Proceedings of 3rd International Conference on Algorithms and Architectures for Parallel Processing.

[55]  Dinesh Manocha,et al.  GPUTeraSort: high performance graphics co-processor sorting for large database management , 2006, SIGMOD Conference.

[56]  Philip S. Yu,et al.  An effective hash-based algorithm for mining association rules , 1995, SIGMOD '95.

[57]  Eric Li,et al.  Optimization of Frequent Itemset Mining on Multiple-Core Processor , 2007, VLDB.

[58]  Rakesh Agrawal,et al.  Parallel Mining of Association Rules , 1996, IEEE Trans. Knowl. Data Eng..

[59]  Srinivasan Parthasarathy,et al.  Cache-conscious Frequent Pattern Mining on a Modern Processor , 2005, VLDB.

[60]  Shirish Tatikonda,et al.  Toward terabyte pattern mining: an architecture-conscious solution , 2007, PPoPP.

[61]  David Wai-Lok Cheung,et al.  Asynchronous parallel algorithm for mining association rules on a shared-memory multi-processors , 1998, SPAA '98.

[62]  Naga K. Govindaraju,et al.  GPGPU: general-purpose computation on graphics hardware , 2006, SC.

[63]  Yimin Wen,et al.  A Hierarchical and Parallel Method for Training Support Vector Machines , 2005, ISNN.

[64]  Adolfy Hoisie,et al.  Performance and Scalability Analysis of Teraflop-Scale Parallel Architectures Using Multidimensional Wavefront Applications , 2000, Int. J. High Perform. Comput. Appl..

[65]  Li Yang,et al.  Pruning and visualizing generalized association rules in parallel coordinates , 2005, IEEE Transactions on Knowledge and Data Engineering.

[66]  Ankur Gupta,et al.  Scalable Massively Parallel Artificial Neural Networks , 2005 .

[67]  Vipin Kumar,et al.  Scalable parallel data mining for association rules , 1997, SIGMOD '97.

[68]  Philip S. Yu,et al.  Efficient parallel data mining for association rules , 1995, CIKM '95.

[69]  Bingsheng He,et al.  GPUQP: query co-processing using graphics processors , 2007, SIGMOD '07.

[70]  Anthony K. H. Tung,et al.  Scalable Clustering Using Graphics Processors , 2006, WAIM.

[71]  Dinesh Manocha,et al.  Fast computation of database operations using graphics processors , 2004, SIGMOD '04.