Thesis: incremental methods for Bayesian network structure learning

The incremental learning approach was firstly motivated as the human capability for incorporating knowledge from new experiences worth being programmed into artificial agents. However, nowadays there exist other practical (i.e. industrial) reasons which increase the interest in incremental algorithms. Nowadays, companies from a very wide range of activities store huge amounts of data every day. One-shot algorithms are not easily able to process and incorporate to a knowledge base this great amount of continuously incoming instances in a reasonable amount of time and memory space. We believe that, in this environment, incremental learning becomes particularly relevant since this sort of algorithms are able to revise already existing models of data without beginning from scratch and without re-processing past data. We present two different and general heuristics in order to convert batch hill-climbing searchers into incremental ones. We believe that the heuristic that we call Traversal Operators in Correct Order (TOCO) is the most novel and original contribution. This heuristic states that, given a learned knowledge structure and the learning path used to obtain the structure where the traversal operators are ordered in decreasing contribution of quality, the structure will be revised only when the order of traversal operators is changed in the light of new data and also that the structure will be rebuild from the first unordered operator of the path. So, the benefit of the TOCO heuristic is twofold. First, the model will only be revised when it is invalidated by new data, and second, in the case that it must be revised, the learning algorithm will not begin from scratch. The second heuristic of our work, that we called Reduced Search Space (RSS) heuristic, uses the knowledge gathered from previous learning steps and states that structures that had very low quality in past learning steps will still have low quality with respect to the

[1]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[2]  Paul S. Bradley,et al.  Scaling Clustering Algorithms to Large Databases , 1998, KDD.

[3]  Wray L. Buntine A Guide to the Literature on Learning Probabilistic Networks from Data , 1996, IEEE Trans. Knowl. Data Eng..

[4]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[5]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[6]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[7]  Ron Kohavi,et al.  Improving simple Bayes , 1997 .

[8]  David Maxwell Chickering,et al.  Efficient Approximations for the Marginal Likelihood of Bayesian Networks with Hidden Variables , 1997, Machine Learning.

[9]  Pedro Larrañaga,et al.  Learning Bayesian networks for clustering by means of constructive induction , 1999, Pattern Recognit. Lett..

[10]  Pat Langley,et al.  Elements of Machine Learning , 1995 .

[11]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Dirk Thierens,et al.  Building a GA from Design Principles for Learning Bayesian Networks , 2003, GECCO.

[13]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[14]  S. Lauritzen The EM algorithm for graphical association models with missing data , 1995 .

[15]  Geoff Hulten,et al.  A General Framework for Mining Massive Data Streams , 2003 .

[16]  Michael Lebowitz,et al.  Experiments with Incremental Concept Formation: UNIMEM , 1987, Machine Learning.

[17]  Josep Roure Alcobé,et al.  Robust Incremental Clustering with Bad Instance Orderings: A New Strategy , 1998, IBERAMIA.

[18]  Pedro Larrañaga,et al.  Learning Bayesian networks in the space of structures by estimation of distribution algorithms , 2003, Int. J. Intell. Syst..

[19]  Robert Castelo,et al.  Improved learning of Bayesian networks , 2001, UAI.

[20]  Pedro Larrañaga,et al.  An empirical comparison of four initialization methods for the K-Means algorithm , 1999, Pattern Recognit. Lett..

[21]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[22]  Nir Friedman,et al.  Learning the Dimensionality of Hidden Variables , 2001, UAI.

[23]  Douglas H. Fisher,et al.  Knowledge Acquisition Via Incremental Conceptual Clustering , 1987, Machine Learning.

[24]  Nir Friedman,et al.  Sequential Update of Bayesian Network Structure , 1997, UAI.

[25]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[26]  Miroslav Kubat,et al.  FAVORIT: Concept formation with ageing of knowledge , 1992, Pattern Recognit. Lett..

[27]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[28]  Robert Castelo,et al.  On Inclusion-Driven Learning of Bayesian Networks , 2003, J. Mach. Learn. Res..

[29]  Pedro Larrañaga,et al.  An Empirical Comparison Between K-Means, GAs and EDAs in Partitional Clustering , 2002, Estimation of Distribution Algorithms.

[30]  Marie desJardins,et al.  Evaluation and selection of biases in machine learning , 1995, Machine Learning.

[31]  Nir Friedman,et al.  The Bayesian Structural EM Algorithm , 1998, UAI.

[32]  Ramón López de Mántaras,et al.  A distance-based attribute selection measure for decision tree induction , 1991, Machine Learning.

[33]  Eugene Charniak,et al.  Bayesian Networks without Tears , 1991, AI Mag..

[34]  Finn Verner Jensen,et al.  Introduction to Bayesian Networks , 2008, Innovations in Bayesian Networks.

[35]  Barry Smyth,et al.  Remembering To Forget: A Competence-Preserving Case Deletion Policy for Case-Based Reasoning Systems , 1995, IJCAI.

[36]  Nir Friedman,et al.  Learning Bayesian Networks with Local Structure , 1996, UAI.

[37]  Eamonn J. Keogh,et al.  Learning augmented Bayesian classifiers: A comparison of distribution-based and classification-based approaches , 1999, AISTATS.

[38]  Douglas H. Fisher,et al.  A Case Study of Incremental Concept Induction , 1986, AAAI.

[39]  Geoff Hulten,et al.  Mining complex models from arbitrarily large databases in constant time , 2002, KDD.

[40]  Fredrik Kilander,et al.  COBBIT - A Control Procedure for COBWEB in the Presence of Concept Drift , 1993, ECML.

[41]  Josep Roure Alcobé An Incremental Algorithm for Tree-shaped Bayesian Network Learning , 2002, ECAI.

[42]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[43]  Miquel Sànchez-Marrè,et al.  Sustainable case learning for continuos domains , 1999, Environ. Model. Softw..

[44]  Paul E. Utgoff,et al.  Incremental Induction of Decision Trees , 1989, Machine Learning.

[45]  Dale Schuurmans,et al.  Learning Bayesian Nets that Perform Well , 1997, UAI.

[46]  Tom M. Mitchell,et al.  Generalization as Search , 2002 .

[47]  Pedro Larrañaga,et al.  An improved Bayesian structural EM algorithm for learning Bayesian networks for clustering , 2000, Pattern Recognit. Lett..

[48]  Pat Langley,et al.  Models of Incremental Concept Formation , 1990, Artif. Intell..

[49]  Michael I. Jordan Learning in Graphical Models , 1999, NATO ASI Series.

[50]  Allen Van Gelder,et al.  Computer Algorithms: Introduction to Design and Analysis , 1978 .

[51]  Luis M. de Campos,et al.  Algorithms for Learning Decomposable Models and Chordal Graphs , 1997, UAI.

[52]  Milan Studený,et al.  On characterizing Inclusion of Bayesian Networks , 2001, UAI.

[53]  Gregory M. Provan,et al.  Efficient Learning of Selective Bayesian Network Classifiers , 1996, ICML.

[54]  Nir Friedman,et al.  Learning Belief Networks in the Presence of Missing Values and Hidden Variables , 1997, ICML.

[55]  João Gama,et al.  Adaptive Bayes , 2002, IBERAMIA.

[56]  Richard E. Neapolitan,et al.  Probabilistic reasoning in expert systems - theory and algorithms , 2012 .

[57]  Tom M. Mitchell,et al.  The Need for Biases in Learning Generalizations , 2007 .

[58]  R. W. Robinson Counting unlabeled acyclic digraphs , 1977 .

[59]  Judea Pearl,et al.  Equivalence and Synthesis of Causal Models , 1990, UAI.

[60]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.

[61]  Pedro Larrañaga,et al.  Learning Bayesian network structures by searching for the best ordering with genetic algorithms , 1996, IEEE Trans. Syst. Man Cybern. Part A.

[62]  Josep Roure Alcobé Incremental Learning of Tree Augmented Naive Bayes Classifiers , 2002, IBERAMIA.

[63]  R. Michalski,et al.  Learning from Observation: Conceptual Clustering , 1983 .

[64]  Yang Xiang,et al.  Critical Remarks on Single Link Search in Learning Belief Networks , 1996, UAI.

[65]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[66]  J. Pearl,et al.  Learning simple causal structures , 1993 .

[67]  Ron Kohavi,et al.  MLC++: a machine learning library in C++ , 1994, Proceedings Sixth International Conference on Tools with Artificial Intelligence. TAI 94.

[68]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[69]  David J. Spiegelhalter,et al.  Sequential updating of conditional probabilities on directed graphical structures , 1990, Networks.

[70]  João Gama,et al.  Iterative Bayes , 2000, Intell. Data Anal..

[71]  Pedro Larrañaga,et al.  Structure Learning of Bayesian Networks by Genetic Algorithms: A Performance Analysis of Control Parameters , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[72]  Michael Lebowitz Deferred Commitment in UNIMEM: Waiting to Learn , 1988, ML.

[73]  J. A. Lozano,et al.  Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation , 2001 .

[74]  Wray L. Buntine Operations for Learning with Graphical Models , 1994, J. Artif. Intell. Res..

[75]  Pedro Larrañaga,et al.  Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[76]  Geoff Hulten,et al.  Catching up with the Data: Research Issues in Mining Data Streams , 2001, DMKD.

[77]  John R. Anderson,et al.  Explorations of an Incremental, Bayesian Algorithm for Categorization , 1992, Machine Learning.

[78]  Bo Thiesson,et al.  The Learning-Curve Sampling Method Applied to Model-Based Clustering , 2002, J. Mach. Learn. Res..

[79]  Andrew W. Moore,et al.  A Dynamic Adaptation of AD-trees for Efficient Machine Learning on Large Data Sets , 2000, ICML.

[80]  Juan Roberto Castelo Valdueza,et al.  The Discrete Acyclic Digraph Markov Model in Data Mining , 2002 .

[81]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.