epsilon-Support Vector and Large-Scale Data Mining Problems

Data mining and knowledge discovery has made great progress during the last fifteen years. As one of the major tasks of data mining, classification has wide business and scientific applications. Among a variety of proposed methods, mathematical programming based approaches have been proven to be excellent in terms of classification accuracy, robustness, and efficiency. However, there are several difficult issues. Two of these issues are of particular interest of this research. The first issue is that it is challenging to find optimal solution for large-scale dataset in mathematical programming problems due to the computational complexity. The second issue is that many mathematical programming problems require specialized codes or programs such as CPLEX or LINGO. The objective of this study is to propose solutions for these two problems. This paper proposed and applied mathematical programming model to classification problems to address two aspects of data mining algorithm: speed and scalability.