Optimization-based machine learning and data mining

Novel approaches for six important problems in machine learning, and two methods for solving linear programs, are introduced. Each machine learning problem is addressed by formulating it as an optimization problem. By using results based on theorems of the alternative for linear or convex functions, we are able to incorporate prior knowledge into function approximations or classifiers generated by linear combinations of linear or nonlinear kernels. We will consider prior knowledge consisting of linear inequalities to be satisfied over multiple polyhedral regions, nonlinear inequalities to be satisfied over arbitrary regions, and nonlinear equalities to be satisfied over arbitrary regions. Each kind of prior knowledge leads to different formulations, each with certain advantages. In privacy-preserving classification, data is divided into groups belonging to different entities unwilling to share their privately-held data. By using a completely random matrix, we are able to construct public classifiers that do not reveal the privately held data, but have accuracy comparable to that of an ordinary support vector machine classifier based on the entire data. To address the problem of feature selection in clustering, we propose a modification of the objective function of a standard clustering algorithm which allows features to be eliminated. For feature selection in nonlinear kernel classification, we propose a mixed-integer algorithm which alternates between optimizing the continuous variables of an ordinary nonlinear support vector machine and optimizing integer variables which correspond to selecting or removing features from the classifier. We also propose a classifier based on proximity to two planes which are not required to be parallel. Finally, we tackle the multiple instance classification problem by formulating the problem as the minimization of a linear function subject to linear and bilinear constraints. In several of these problems, the solution we propose involves solving linear programs. We consider sufficient conditions which allow us to determine whether a solution of a linear program obtained by either of two algorithms is exact. Both of these two algorithms can be implemented using only a solver for linear systems of equations.