Distributed Consensus Reduced Support Vector Machine

Nowadays, machine learning performs astonishingly well in many different fields. In general, the more data we have, our machine learning methods will show better results. However, in many situations, the data owners may not want to or not allow to share their data because of legal issues or privacy concerns. However, if we can pool all the data together as the training data for the machine learning task we will have a better result. In the other situation, we encounter an extremely large dataset, which is difficult to store in a single machine. We may utilize more computing units to solve it. To deal with these two problems, we propose the distributed consensus reduced support vector machine (DCRSVM), which is a nonlinear model for binary classification. We apply the ADMM, Alternating Direction Method of Multipliers, to solve the DCRSVM. In each iteration, the local worker will update their model by incorporating the information shared by the master. The local workers only share their models in each iteration but never share their data. The master will fuse the local models reported by the local workers. At the end, the master will generate the consensus model that almost identical to the model generated by pooling all data together. Pooling all data together is not allowed in many real world applications.

[1]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[2]  Su-Yun Huang,et al.  A Review of Reduced Kernel Trick in Machine Learning , 2014 .

[3]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[4]  Michael P. Wellman,et al.  Towards the Science of Security and Privacy in Machine Learning , 2016, ArXiv.

[5]  Igor Kononenko,et al.  Machine learning for medical diagnosis: history, state of the art and perspective , 2001, Artif. Intell. Medicine.

[6]  Jin Li,et al.  Privacy-preserving machine learning with multiple data providers , 2018, Future Gener. Comput. Syst..

[7]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[8]  Yuh-Jye Lee,et al.  RSVM: Reduced Support Vector Machines , 2001, SDM.

[9]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[10]  Jan Hendrik Witte,et al.  Deep Learning for Finance: Deep Portfolios , 2016 .

[11]  Su-Yun Huang,et al.  Reduced Support Vector Machines: A Statistical Theory , 2007, IEEE Transactions on Neural Networks.

[12]  James T. Kwok,et al.  Asynchronous Distributed ADMM for Consensus Optimization , 2014, ICML.

[13]  Simon Haykin,et al.  Generalized support vector machines , 1999, ESANN.

[14]  B. Mercier,et al.  A dual algorithm for the solution of nonlinear variational problems via finite element approximation , 1976 .

[15]  Mohammad Al-Rubaie,et al.  Privacy-Preserving Machine Learning: Threats and Solutions , 2018, IEEE Security & Privacy.

[16]  Z. Obermeyer,et al.  Predicting the Future - Big Data, Machine Learning, and Clinical Medicine. , 2016, The New England journal of medicine.