A Resource Aware MapReduce Based Parallel SVM for Large Scale Image Classifications

Machine learning techniques have facilitated image retrieval by automatically classifying and annotating images with keywords. Among them support vector machines (SVMs) are used extensively due to their generalization properties. However, SVM training is notably a computationally intensive process especially when the training dataset is large. This paper presents RASMO, a resource aware MapReduce based parallel SVM algorithm for large scale image classifications which partitions the training data set into smaller subsets and optimizes SVM training in parallel using a cluster of computers. A genetic algorithm based load balancing scheme is designed to optimize the performance of RASMO in heterogeneous computing environments. RASMO is evaluated in both experimental and simulation environments. The results show that the parallel SVM algorithm reduces the training time significantly compared with the sequential SMO algorithm while maintaining a high level of accuracy in classifications.

[1]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Hao Wang,et al.  PSVM : Parallelizing Support Vector Machines on Distributed Computers , 2007 .

[3]  Stephen Winters-Hilt Distributed SVM learning and support vector reduction , 2017 .

[4]  Tamir Hazan,et al.  A Parallel Decomposition Solver for SVM: Distributed dual ascend using Fenchel Duality , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Ferhat Özgür Çatak,et al.  CloudSVM: Training an SVM Classifier in Cloud Computing Systems , 2012, ICPCA/SWS.

[6]  Maozhen Li,et al.  Evaluating Machine Learning Techniques for Automatic Image Annotations , 2009, 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery.

[7]  Danny Dolev,et al.  A Gaussian Belief Propagation Solver for Large Scale Support Vector Machines , 2008, ArXiv.

[8]  Shuai Li,et al.  A MapReduce based parallel SVM for large-scale predicting protein-protein interactions , 2014, Neurocomputing.

[9]  Chee Kheong Siew,et al.  Fast Modular network implementation for support vector machines , 2005, IEEE Transactions on Neural Networks.

[10]  François Poulet,et al.  Speed Up SVM Algorithm for Massive Classification Tasks , 2008, ADMA.

[11]  Jianping Fan,et al.  Automatic image annotation by using concept-sensitive salient objects for image content representation , 2004, SIGIR '04.

[12]  Kurt Keutzer,et al.  Fast support vector machine training and classification on graphics processors , 2008, ICML '08.

[13]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[14]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[15]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[16]  Arlo Faria,et al.  MapReduce : Distributed Computing for Machine Learning , 2006 .

[17]  Sheng-Hsun Hsu,et al.  Application of SVM and ANN for image retrieval , 2006, Eur. J. Oper. Res..

[18]  Nasullah Khalid Alham,et al.  Parallelizing support vector machines for scalable image annotation , 2011 .

[19]  Jianping Fan,et al.  Semantic image classification with hierarchical feature subset selection , 2005, MIR '05.

[20]  GhemawatSanjay,et al.  The Google file system , 2003 .

[21]  Chihli Hung and Chih-Fong Tsai,et al.  Automatically Annotating Images with Keywords: A Review of Image Annotation Systems , 2008 .

[22]  Jiebo Luo,et al.  Learning multi-label scene classification , 2004, Pattern Recognit..

[23]  Jian-xiong Dong,et al.  A Fast Parallel Optimization for Training Support Vector Machine , 2003, MLDM.

[24]  Bradley C. Kuszmaul Cilk provides the "best overall productivity" for high performance computing: (and won the HPC challenge award to prove it) , 2007, SPAA '07.

[25]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[26]  Luca Zanni,et al.  A parallel solver for large quadratic programs in training support vector machines , 2003, Parallel Computing.

[27]  Xiuwen Liu,et al.  Face detection using spectral histograms and SVMs , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[28]  N. Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods: Kernel-Induced Feature Spaces , 2000 .

[29]  Samy Bengio,et al.  A Parallel Mixture of SVMs for Very Large Scale Problems , 2001, Neural Computation.

[30]  S. Sathiya Keerthi,et al.  Parallel sequential minimal optimization for the training of support vector machines , 2006, IEEE Trans. Neural Networks.

[31]  Raimondo Schettini,et al.  Image annotation using SVM , 2003, IS&T/SPIE Electronic Imaging.

[32]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[33]  Jacek Gondzio,et al.  Hybrid MPI/OpenMP Parallel Linear Support Vector Machine Training , 2009 .

[34]  Ming-Hsuan Yang,et al.  Support vector machines for visual gender classification , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[35]  François Poulet,et al.  Classifying one billion data with a new distributed svm algorithm , 2006, 2006 International Conference onResearch, Innovation and Vision for the Future.

[36]  Maozhen Li,et al.  A MapReduce Based Distributed LSI for Scalable Information Retrieval , 2014, Comput. Informatics.

[37]  Shigeo Abe,et al.  Support Vector Machines for Pattern Classification (Advances in Pattern Recognition) , 2005 .

[38]  Thomas Sikora,et al.  The MPEG-7 visual standard for content description-an overview , 2001, IEEE Trans. Circuits Syst. Video Technol..

[39]  Geoffrey Fox,et al.  Study on Parallel SVM Based on MapReduce , 2012 .

[40]  Xiang Li,et al.  Advanced Data Mining and Applications (ADMA) , 2008, ADMA 2008.

[41]  Jing Yang An Improved Cascade SVM Training Algorithm with Crossed Feedbacks , 2006, First International Multi-Symposiums on Computer and Computational Sciences (IMSCCS'06).

[42]  Gérard Dreyfus,et al.  Single-layer learning revisited: a stepwise procedure for building and training a neural network , 1989, NATO Neurocomputing.

[43]  Jason Venner Getting Started with Hadoop Core , 2009 .

[44]  Frédéric Magoulès,et al.  Parallel Support Vector Machines on Multi-Core and Multiprocessor Systems , 2011 .

[45]  Bertrand Le Saux,et al.  Image recognition for digital libraries , 2004, MIR '04.

[46]  S. Sathiya Keerthi,et al.  Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[47]  Vwani P. Roychowdhury,et al.  Distributed Parallel Support Vector Machines in Strongly Connected Networks , 2008, IEEE Transactions on Neural Networks.

[48]  Maozhen Li,et al.  HSim: A MapReduce simulator in enabling Cloud Computing , 2013, Future Gener. Comput. Syst..

[49]  S. Sathiya Keerthi,et al.  Developing parallel sequential minimal optimization for fast training support vector machine , 2006, Neurocomputing.

[50]  Michael I. Jordan,et al.  Predictive low-rank decomposition for kernel methods , 2005, ICML.

[51]  Igor Durdanovic,et al.  Parallel Support Vector Machines: The Cascade SVM , 2004, NIPS.

[52]  Federico Girosi,et al.  Training support vector machines: an application to face detection , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[53]  Pavel Brazdil,et al.  Comparison of SVM and Some Older Classification Algorithms in Text Classification Tasks , 2006, IFIP AI.

[54]  Thomas Hofmann,et al.  Map-Reduce for Machine Learning on Multicore , 2007 .

[55]  Maozhen Li,et al.  Parallelizing multiclass Support Vector Machines for scalable image annotation , 2011, FSKD.

[56]  Yixin Chen,et al.  Image Categorization by Learning and Reasoning with Regions , 2004, J. Mach. Learn. Res..