Learning understandable classifier models.

LEARNING UNDERSTANDABLE CLASSIFIER MODELS

[1]  Robert A. Lordo,et al.  Learning from Data: Concepts, Theory, and Methods , 2001, Technometrics.

[2]  Wee Kheng Leow,et al.  FERNN: An Algorithm for Fast Extraction of Rules from Neural Networks , 2004, Applied Intelligence.

[3]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[4]  Bart Baesens,et al.  Decompositional Rule Extraction from Support Vector Machines by Active Learning , 2009, IEEE Transactions on Knowledge and Data Engineering.

[5]  Martin Fodslette Møller,et al.  A scaled conjugate gradient algorithm for fast supervised learning , 1993, Neural Networks.

[6]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[7]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[8]  Pedro M. Domingos Knowledge Acquisition from Examples Via Multiple Models , 1997 .

[9]  Donald C. Wunsch,et al.  Neural network explanation using inversion , 2007, Neural Networks.

[10]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[11]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Randal E. Bryant Binary decision diagrams and beyond: enabling technologies for formal verification , 1995, ICCAD.

[13]  Christel Baier,et al.  A uniform framework for weighted decision diagrams and its implementation , 2008, International Journal on Software Tools for Technology Transfer.

[14]  Ron Kohavi,et al.  Bottom-Up Induction of Oblivious Read-Once Decision Graphs: Strengths and Limitations , 1994, AAAI.

[15]  Marc'Aurelio Ranzato,et al.  Sparse Feature Learning for Deep Belief Networks , 2007, NIPS.

[16]  Bart Baesens,et al.  ITER: An Algorithm for Predictive Regression Rule Extraction , 2006, DaWaK.

[17]  Jude W. Shavlik,et al.  Extracting refined rules from knowledge-based neural networks , 2004, Machine Learning.

[18]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[19]  Olcay Boz,et al.  Extracting decision trees from trained neural networks , 2002, KDD.

[20]  Ryszard S. Michalski,et al.  Knowledge acquisition by encoding expert rules versus computer induction from examples: a case study involving soybean pathology , 1999, Int. J. Hum. Comput. Stud..

[21]  Jacek M. Zurada,et al.  Top-Down Induction of Reduced Ordered Decision Diagrams from Neural Networks , 2011, ICANN.

[22]  Bart Baesens,et al.  Using Rule Extraction to Improve the Comprehensibility of Predictive Models , 2006 .

[23]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[24]  Daniel Rivero,et al.  A New Approach to the Extraction of ANN Rules and to Their Generalization Capacity Through GP , 2004, Neural Computation.

[25]  Donald F. Specht,et al.  Probabilistic neural networks , 1990, Neural Networks.

[26]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[27]  Matti Pietikäinen,et al.  A comparative study of texture measures with classification based on featured distributions , 1996, Pattern Recognit..

[28]  Jenq-Neng Hwang,et al.  Nonparametric multivariate density estimation: a comparative study , 1994, IEEE Trans. Signal Process..

[29]  Randal E. Bryant,et al.  Verification of Arithmetic Circuits with Binary Moment Diagrams , 1995, 32nd Design Automation Conference.

[30]  Aapo Hyvärinen,et al.  Fast and robust fixed-point algorithms for independent component analysis , 1999, IEEE Trans. Neural Networks.

[31]  Wei Jia,et al.  Discriminant sparse neighborhood preserving embedding for face recognition , 2012, Pattern Recognit..

[32]  Johannes Fürnkranz,et al.  ROC ‘n’ Rule Learning—Towards a Better Understanding of Covering Algorithms , 2005, Machine Learning.

[33]  Bart Baesens,et al.  Decision Diagrams in Machine Learning: An Empirical Study on Real-Life Credit-Risk Data , 2004, Diagrams.

[34]  Robert K. Brayton,et al.  Heuristic Minimization of BDDs Using Don't Cares , 1994, 31st Design Automation Conference.

[35]  Klaus-Robert Müller,et al.  Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[36]  Randal E. Bryant,et al.  Verification of arithmetic circuits using binary moment diagrams , 2001, International Journal on Software Tools for Technology Transfer.

[37]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[38]  Sebastian Thrun,et al.  The MONK''s Problems-A Performance Comparison of Different Learning Algorithms, CMU-CS-91-197, Sch , 1991 .

[39]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[40]  Bart Baesens,et al.  Recursive Neural Network Rule Extraction for Data With Mixed Attributes , 2008, IEEE Transactions on Neural Networks.

[41]  Rudy Setiono Extracting M-of-N rules from trained neural networks , 2000, IEEE Trans. Neural Networks Learn. Syst..

[42]  J. Ross Quinlan,et al.  Learning logical definitions from relations , 1990, Machine Learning.

[43]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[44]  Massoud Pedram,et al.  Factored Edge-Valued Binary Decision Diagrams , 1997, Formal Methods Syst. Des..

[45]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[46]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[47]  Fu Jie Huang,et al.  A Tutorial on Energy-Based Learning , 2006 .

[48]  M. C. Jones,et al.  A Brief Survey of Bandwidth Selection for Density Estimation , 1996 .

[49]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[50]  Jacek M. Zurada,et al.  Toward Better Understanding of Protein Secondary Structure: Extracting Prediction Rules , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[51]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[52]  Bu. Park,et al.  Rejoinder to ``Practical performance of several data driven bandwidth selectors" , 1992 .

[53]  Jacek M. Zurada,et al.  Obtaining Full Regularization Paths for Robust Sparse Coding with Applications to Face Recognition , 2012, 2012 11th International Conference on Machine Learning and Applications.

[54]  Joachim Diederich,et al.  Eclectic Rule-Extraction from Support Vector Machines , 2005 .

[55]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[56]  Ian H. Witten,et al.  Generating Accurate Rule Sets Without Global Optimization , 1998, ICML.

[57]  Mark Craven,et al.  Rule Extraction: Where Do We Go from Here? , 1999 .

[58]  R. Tibshirani,et al.  �-norm Support Vector Machines , 2003 .

[59]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[60]  Yaser S. Abu-Mostafa,et al.  Learning from hints in neural networks , 1990, J. Complex..

[61]  Jude W. Shavlik,et al.  Extracting Thee-Structured Representations of Thained Networks , 1995 .

[62]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[63]  Ji Zhu,et al.  Boosting as a Regularized Path to a Maximum Margin Classifier , 2004, J. Mach. Learn. Res..

[64]  Russell Reed,et al.  Pruning algorithms-a survey , 1993, IEEE Trans. Neural Networks.

[65]  Jacek M. Zurada,et al.  Perturbation method for deleting redundant inputs of perceptron networks , 1997, Neurocomputing.

[66]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[67]  Shuzo Yajima,et al.  The Complexity of the Optimal Variable Ordering Problems of Shared Binary Decision Diagrams , 1993, ISAAC.

[68]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[69]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[70]  Randal E. Bryant,et al.  Efficient implementation of a BDD package , 1991, DAC '90.

[71]  Jim Esch Computational Intelligence Methods For Rule-Based Data Understanding , 2004, Proc. IEEE.

[72]  P. Paatero Least squares formulation of robust non-negative factor analysis , 1997 .

[73]  Masumi Ishikawa,et al.  Structural learning with forgetting , 1996, Neural Networks.

[74]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[75]  Peter Sollich,et al.  Probabilistic Methods for Support Vector Machines , 1999, NIPS.

[76]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[77]  Honglak Lee,et al.  Unsupervised learning of hierarchical representations with convolutional deep belief networks , 2011, Commun. ACM.

[78]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[79]  Scott Sanner,et al.  Affine Algebraic Decision Diagrams (AADDs) and their Application to Structured Probabilistic Inference , 2005, IJCAI.

[80]  Jochen Bern,et al.  Boolean manipulation with free BDD's. First experimental results , 1994, Proceedings of European Design and Test Conference EDAC-ETC-EUROASIC.

[81]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[82]  Tiziano Villa,et al.  Exact Minimization of Binary Decision Diagrams Using Implicit Techniques , 1998, IEEE Trans. Computers.

[83]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[84]  Guillermo Sapiro,et al.  Supervised Dictionary Learning , 2008, NIPS.

[85]  Henrik Reif Andersen,et al.  Difference Decision Diagrams , 1999, CSL.

[86]  Fabio Somenzi,et al.  Symmetry detection and dynamic variable ordering of decision diagrams , 1994, ICCAD '94.

[87]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[88]  R. Michalski Attributional Calculus: A Logic and Representation Language for Natural Induction , 2004 .

[89]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[90]  Yung-Te Lai,et al.  Edge-valued binary decision diagrams for multi-level hierarchical verification , 1992, DAC '92.

[91]  Jacek M. Zurada,et al.  Review and performance comparison of SVM- and ELM-based classifiers , 2014, Neurocomputing.

[92]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[93]  Zhi-Hua Zhou,et al.  Medical diagnosis with C4.5 rule preceded by artificial neural network ensemble , 2003, IEEE Transactions on Information Technology in Biomedicine.

[94]  Enrico Macii,et al.  Algebraic decision diagrams and their applications , 1993, Proceedings of 1993 International Conference on Computer Aided Design (ICCAD).

[95]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[96]  Marc'Aurelio Ranzato,et al.  Semi-supervised learning of compact document representations with deep networks , 2008, ICML '08.

[97]  Paulo J. G. Lisboa,et al.  Orthogonal search-based rule extraction (OSRE) for trained neural networks: a practical and efficient approach , 2006, IEEE Transactions on Neural Networks.

[98]  Peter Clark,et al.  The CN2 Induction Algorithm , 1989, Machine Learning.

[99]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[100]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[101]  H. Andersen An Introduction to Binary Decision Diagrams , 1997 .

[102]  Alberto L. Sangiovanni-Vincentelli,et al.  Learning Complex Boolean Functions: Algorithms and Applications , 1993, NIPS.

[103]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[104]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[105]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[106]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[107]  Geoffrey E. Hinton,et al.  Simplifying Neural Networks by Soft Weight-Sharing , 1992, Neural Computation.

[108]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[109]  Mee Young Park,et al.  L1‐regularization path algorithm for generalized linear models , 2007 .

[110]  Bart Baesens,et al.  Using Neural Network Rule Extraction and Decision Tables for Credit - Risk Evaluation , 2003, Manag. Sci..

[111]  Honglak Lee,et al.  Sparse deep belief net model for visual area V2 , 2007, NIPS.

[112]  Giovanna Castellano,et al.  An iterative pruning algorithm for feedforward neural networks , 1997, IEEE Trans. Neural Networks.

[113]  Peter A. Beerel,et al.  Safe BDD minimization using don't cares , 1997, DAC.

[114]  Jude Shavlik,et al.  Refinement ofApproximate Domain Theories by Knowledge-Based Neural Networks , 1990, AAAI.

[115]  Marc'Aurelio Ranzato,et al.  A Unified Energy-Based Framework for Unsupervised Learning , 2007, AISTATS.

[116]  Ron Kohavi,et al.  Oblivious Decision Trees, Graphs, and Top-Down Pruning , 1995, IJCAI.

[117]  S. Rosset,et al.  Piecewise linear regularized solution paths , 2007, 0708.2197.

[118]  Marc'Aurelio Ranzato,et al.  Unsupervised Learning of Feature Hierarchies , 2009 .

[119]  Ke Huang,et al.  Sparse Representation for Signal Classification , 2006, NIPS.

[120]  Marek A. Perkowski,et al.  Multi-valued functional decomposition as a machine learning method , 1998, Proceedings. 1998 28th IEEE International Symposium on Multiple- Valued Logic (Cat. No.98CB36138).

[121]  Joydeep Ghosh,et al.  Symbolic Interpretation of Artificial Neural Networks , 1999, IEEE Trans. Knowl. Data Eng..

[122]  Jude W. Shavlik,et al.  Using Sampling and Queries to Extract Rules from Trained Neural Networks , 1994, ICML.

[123]  M. R. Osborne,et al.  On the LASSO and its Dual , 2000 .

[124]  Jochen Bern,et al.  Some heuristics for generating tree-like FBDD types , 1996, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[125]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[126]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[127]  Ying Wu,et al.  Robust 3D Action Recognition with Random Occupancy Patterns , 2012, ECCV.

[128]  E BryantRandal Graph-Based Algorithms for Boolean Function Manipulation , 1986 .

[129]  Robert Tibshirani,et al.  1-norm Support Vector Machines , 2003, NIPS.

[130]  Madhuri Jha ANN-DT : An Algorithm for Extraction of Decision Trees from Artificial Neural Networks , 2013 .

[131]  Guillermo Sapiro,et al.  Discriminative learned dictionaries for local image analysis , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[132]  Marc'Aurelio Ranzato,et al.  Efficient Learning of Sparse Representations with an Energy-Based Model , 2006, NIPS.

[133]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[134]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[135]  Fabio Somenzi,et al.  Efficient manipulation of decision diagrams , 2001, International Journal on Software Tools for Technology Transfer.

[136]  Bart Baesens,et al.  Minerva: Sequential Covering for Rule Extraction , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[137]  T.,et al.  Training Feedforward Networks with the Marquardt Algorithm , 2004 .

[138]  Hendrik Blockeel,et al.  Seeing the Forest Through the Trees: Learning a Comprehensible Model from an Ensemble , 2007, ECML.

[139]  LiMin Fu,et al.  Rule Generation from Neural Networks , 1994, IEEE Trans. Syst. Man Cybern. Syst..

[140]  Jacek M. Zurada,et al.  Introduction to artificial neural systems , 1992 .

[141]  Ingo Wegener,et al.  On the complexity of minimizing the OBDD size for incompletely specified functions , 1996, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[142]  Gregory J. Wolff,et al.  Optimal Brain Surgeon and general network pruning , 1993, IEEE International Conference on Neural Networks.

[143]  Richard Rudell Dynamic variable ordering for ordered binary decision diagrams , 1993, ICCAD.

[144]  Joachim Diederich,et al.  Survey and critique of techniques for extracting rules from trained artificial neural networks , 1995, Knowl. Based Syst..

[145]  J. Simonoff Multivariate Density Estimation , 1996 .

[146]  Xiaoyang Tan,et al.  Pattern Recognition , 2016, Communications in Computer and Information Science.

[147]  Brian R. Gaines,et al.  Transforming Rules and Trees into Comprehensible Knowledge Structures , 2000 .

[148]  Jacek M. Zurada,et al.  Extracting Rules From Neural Networks as Decision Diagrams , 2011, IEEE Transactions on Neural Networks.

[149]  Jorge Nocedal,et al.  Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization , 1997, TOMS.

[150]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[151]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[152]  Alberto L. Sangiovanni-Vincentelli,et al.  Using the minimum description length principle to infer reduced ordered decision graphs , 1996, Machine Learning.

[153]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[154]  Vladimir Cherkassky,et al.  Learning from Data: Concepts, Theory, and Methods , 1998 .