Learning interpretable classification rules using sequential rowsampling

In our previous work we have presented an approach to learn interpretable classification rules using a Boolean compressed sensing formulation. Our approach uses a linear programming (LP) relaxation and allows us to find interpretable (sparse) classification rules that achieve good generalization accuracy. However, the resulting LP representation for problems with either a large number of samples or large number of continuous features tends to become challenging for off-the-shelf LP solvers. We have explored a screening approach which allows us to dramatically reduce the number of active features without sacrificing optimality. In this work we explore reducing the number of samples in a sequential setting where we can certify reaching a near-optimal solution while only solving the LP on a small fraction of the available data points. In a batch setting this approach can dramatically reduce the computational complexity of the rule-learning LP formulation. In an online setting we derive stochastic upper and lower bounds on the the LP objective for unseen samples. This allows early stopping when we detect that the classifier will not change significantly with additional samples. The upper bounds are related to the learning curve literature in machine learning, and our lower bounds appear not to have been explored. Finally, we discuss a quick approach to compute the complete regularization path balancing rule interpretability versus accuracy.

[1]  Kush R. Varshney,et al.  Exact Rule Learning via Boolean Compressed Sensing , 2013, ICML.

[2]  E.J. Candes,et al.  An Introduction To Compressive Sampling , 2008, IEEE Signal Processing Magazine.

[3]  Tim Oates,et al.  Efficient progressive sampling , 1999, KDD '99.

[4]  A.C. Gilbert,et al.  Group testing and sparse signal recovery , 2008, 2008 42nd Asilomar Conference on Signals, Systems and Computers.

[5]  Cynthia Rudin,et al.  Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model , 2015, ArXiv.

[6]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[7]  George Atia,et al.  Boolean Compressed Sensing and Noisy Group Testing , 2009, IEEE Transactions on Information Theory.

[8]  Alan Agresti,et al.  Categorical Data Analysis , 2003 .

[9]  Venkatesh Saligrama,et al.  Non-adaptive group testing: Explicit bounds and novel algorithms , 2012, 2012 IEEE International Symposium on Information Theory Proceedings.

[10]  J. G. Saw,et al.  Chebyshev Inequality With Estimated Mean and Variance , 1984 .

[11]  Ichiro Takeuchi,et al.  Safe Screening of Non-Support Vectors in Pathwise SVM Computation , 2013, ICML.

[12]  John Langford,et al.  Beating the hold-out: bounds for K-fold and progressive cross-validation , 1999, COLT '99.

[13]  Andrew W. Moore,et al.  Hoeffding Races: Accelerating Model Selection Search for Classification and Function Approximation , 1993, NIPS.

[14]  Kush R. Varshney,et al.  Interactive Visual Salesforce Analytics , 2012, ICIS.

[15]  Dmitry M. Malioutov,et al.  Boolean compressed sensing: LP relaxation for group testing , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  Kush R. Varshney,et al.  Screening for learning classification rules via Boolean compressed sensing , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Pat Langley,et al.  Static Versus Dynamic Sampling for Data Mining , 1996, KDD.

[18]  Dmitry M. Malioutov,et al.  Sequential Compressed Sensing , 2010, IEEE Journal of Selected Topics in Signal Processing.