Deep Broad Learning - Big Models for Big Data

Deep learning has demonstrated the power of detailed modeling of complex high-order (multivariate) interactions in data. For some learning tasks there is power in learning models that are not only Deep but also Broad. By Broad, we mean models that incorporate evidence from large numbers of features. This is of especial value in applications where many dierent features and combinations of features all carry small amounts of information about the class. The most accurate models will integrate all that information. In this paper, we propose an algorithm for Deep Broad Learning called DBL. The proposed algorithm has a tunable parameter n, that species the depth of the model. It provides straightforward paths towards out-of-core learning for large data. We demonstrate that DBL learns models from large quantities of data with accuracy that is highly competitive with the state-ofthe-art.

[1]  Franz Pernkopf,et al.  On Discriminative Parameter Learning of Bayesian Network Classifiers , 2009, ECML/PKDD.

[2]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[3]  Bin Shen,et al.  Structural Extension to Logistic Regression: Discriminative Parameter Learning of Belief Net Classifiers , 2002, Machine Learning.

[4]  Mehran Sahami,et al.  Learning Limited Dependence Bayesian Classifiers , 1996, KDD.

[5]  Geoffrey I. Webb,et al.  The Need for Low Bias Algorithms in Classification Learning from Large Data Sets , 2002, PKDD.

[6]  Geoffrey I. Webb,et al.  Scalable Learning of Bayesian Network Classifiers , 2016, J. Mach. Learn. Res..

[7]  Geoffrey I. Webb,et al.  Not So Naive Bayes: Aggregating One-Dependence Estimators , 2005, Machine Learning.

[8]  Stan Matwin,et al.  Discriminative parameter learning for Bayesian networks , 2008, ICML '08.

[9]  Jorge Nocedal,et al.  A Limited Memory Algorithm for Bound Constrained Optimization , 1995, SIAM J. Sci. Comput..

[10]  Geoffrey I. Webb,et al.  Fast and Effective Single Pass Bayesian Learning , 2013, PAKDD.

[11]  Usama M. Fayyad,et al.  On the Handling of Continuous-Valued Attributes in Decision Tree Generation , 1992, Machine Learning.

[12]  Jorge Nocedal,et al.  Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization , 1997, TOMS.

[13]  Geoffrey I. Webb,et al.  Naive-Bayes Inspired Effective Pre-Conditioner for Speeding-Up Logistic Regression , 2014, 2014 IEEE International Conference on Data Mining.

[14]  Henry Tirri,et al.  On Discriminative Bayesian Network Classifiers and Logistic Regression , 2005, Machine Learning.

[15]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[16]  Franz Pernkopf,et al.  Discriminative versus generative parameter and structure learning of Bayesian network classifiers , 2005, ICML.

[17]  Geoffrey I. Webb,et al.  Learning by extrapolation from marginal to full-multivariate probability distributions: decreasingly naive Bayesian classification , 2011, Machine Learning.

[18]  Geoffrey I. Webb,et al.  Alleviating naive Bayes attribute independence assumption by attribute weighting , 2013, J. Mach. Learn. Res..