Maximizing margin quality and quantity

The large-margin principle has been widely applied to learn classifiers with good generalization power. While tremendous efforts have been devoted to develop machine learning techniques that maximize margin quantity, little attention has been paid to ensure the margin quality. In this paper, we proposed a new framework that aims to achieve superior generalizability by considering not only margin quantity but also margin quality. An instantiation of the framework was derived by deploying a max-min entropy principle to maximize margin-quality in addition to using a traditional means for maximizing margin-quantity. We developed an iterative learning algorithm to solve this instantiation. We compared the algorithm with a couple of widely-used machine learning techniques (e.g., Support Vector Machines, decision tree, naive Bayes classifier, k-nearest neighbors, etc.) and several other large margin learners (e.g., RELIEF, Simba, G-flip, LOGO, etc.) on a number of UCI machine learning datasets and gene expression datasets. The results demonstrated the effectiveness of our new framework and algorithm.

[1]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[2]  F. Zhan,et al.  The role of the Wnt-signaling antagonist DKK1 in the development of osteolytic lesions in multiple myeloma. , 2003, The New England journal of medicine.

[3]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[4]  Naftali Tishby,et al.  Margin based feature selection - theory and algorithms , 2004, ICML.

[5]  Yoav Freund,et al.  The Alternating Decision Tree Learning Algorithm , 1999, ICML.

[6]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[7]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[8]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[9]  Michael Elad,et al.  Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ1 minimization , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Jian Li,et al.  Iterative RELIEF for feature weighting , 2006, ICML.

[11]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[12]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[13]  Robert Tibshirani,et al.  Discriminant Adaptive Nearest Neighbor Classification , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Sinisa Todorovic,et al.  Local-Learning-Based Feature Selection for High-Dimensional Data Analysis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[16]  Kaizhu Huang,et al.  Sparse Metric Learning via Smooth Optimization , 2009, NIPS.

[17]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[18]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[19]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[20]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[21]  S. Sathiya Keerthi,et al.  Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[23]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[24]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[25]  S. Horvath,et al.  Gene Expression Profiling of Gliomas Strongly Predicts Survival , 2004, Cancer Research.

[26]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[27]  Koby Crammer,et al.  Margin Analysis of the LVQ Algorithm , 2002, NIPS.

[28]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[29]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[30]  Student,et al.  THE PROBABLE ERROR OF A MEAN , 1908 .

[31]  Yijun Sun,et al.  Iterative RELIEF for Feature Weighting: Algorithms, Theories, and Applications , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.