An aggregate and iterative disaggregate algorithm with proven optimality in machine learning

We propose a clustering-based iterative algorithm to solve certain optimization problems in machine learning, where we start the algorithm by aggregating the original data, solving the problem on aggregated data, and then in subsequent steps gradually disaggregate the aggregated data. We apply the algorithm to common machine learning problems such as the least absolute deviation regression problem, support vector machines, and semi-supervised support vector machines. We derive model-specific data aggregation and disaggregation procedures. We also show optimality, convergence, and the optimality gap of the approximated solution in each iteration. A computational study is provided.

[1]  Massimiliano Pontil,et al.  Support Vector Machines with Clustering for Training with Very Large Datasets , 2002, SETN.

[2]  Roy Mendelssohn Technical Note - Improved Bounds for Aggregated Linear Programs , 1980, Oper. Res..

[3]  Jun Yu,et al.  Learning Algorithms for Link Prediction Based on Chance Constraints , 2010, ECML/PKDD.

[4]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[5]  E. Balas Solution of Large-Scale Transportation Problems Through Aggregation , 1965 .

[6]  S. Sathiya Keerthi,et al.  Optimization Techniques for Semi-Supervised Support Vector Machines , 2008, J. Mach. Learn. Res..

[7]  S. Sathiya Keerthi,et al.  Branch and Bound for Semi-Supervised Support Vector Machines , 2006, NIPS.

[8]  Richard W. Taylor,et al.  Solving large-scale linear programs by aggregation , 1987, Comput. Oper. Res..

[9]  P. Hammer,et al.  Aggregation of inequalities in integer programming. , 1975 .

[10]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[11]  M. Narasimha Murty,et al.  Clustering based large margin classification: a scalable approach using SOCP formulation , 2006, KDD '06.

[12]  Sverre Storøy,et al.  Aggregation and Disaggregation in Integer Programming Problems , 1990, Oper. Res..

[13]  Jieping Ye,et al.  Scaling SVM and Least Absolute Deviations via Exact Data Reduction , 2013, ICML.

[14]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[15]  Jiawei Han,et al.  Making SVMs Scalable to Large Data Sets using Hierarchical Cluster Indexing , 2005, Data Mining and Knowledge Discovery.

[16]  James R. Evans,et al.  Aggregation and Disaggregation Techniques and Methodology in Optimization , 1991, Oper. Res..

[17]  Bernie Mulgrew,et al.  IEEE International Joint Conference on Neural Networks , 1999 .

[18]  Xiaojin Zhu,et al.  Semi-Supervised Learning , 2010, Encyclopedia of Machine Learning.

[19]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[20]  Andreas Bärmann,et al.  Solving network design problems via iterative aggregation , 2015, Math. Program. Comput..

[21]  Ayhan Demiriz,et al.  Semi-Supervised Support Vector Machines , 1998, NIPS.

[22]  Zhi-Hua Zhou,et al.  Towards Making Unlabeled Data Never Hurt , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  I. Vakhutinsky,et al.  Iterative Aggregation--A New Approach to the Solution of Large-Scale Problems , 1979 .

[24]  J. Mercer Functions of Positive and Negative Type, and their Connection with the Theory of Integral Equations , 1909 .

[25]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[26]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[27]  XuLei Yang,et al.  Weighted support vector machine for data classification , 2005 .

[28]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[29]  James R. Evans A network decomposition/aggregation procedure for a class of multicommodity transportation problems , 1983, Networks.

[30]  I. Litvinchev,et al.  Aggregation in Large-Scale Optimization , 2003 .

[31]  Jiawei Han,et al.  Classifying large data sets using SVMs with hierarchical clusters , 2003, KDD '03.