论文信息 - Learning relational probability trees

Learning relational probability trees

Classification trees are widely used in the machine learning and data mining communities for modeling propositional data. Recent work has extended this basic paradigm to probability estimation trees. Traditional tree learning algorithms assume that instances in the training data are homogenous and independently distributed. Relational probability trees (RPTs) extend standard probability estimation trees to a relational setting in which data instances are heterogeneous and interdependent. Our algorithm for learning the structure and parameters of an RPT searches over a space of relational features that use aggregation functions (e.g. AVERAGE, MODE, COUNT) to dynamically propositionalize relational data and create binary splits within the RPT. Previous work has identified a number of statistical biases due to characteristics of relational data such as autocorrelation and degree disparity. The RPT algorithm uses a novel form of randomization test to adjust for these biases. On a variety of relational learning tasks, RPTs built using randomization tests are significantly smaller than other models and achieve equivalent, or better, performance.

[1] øöö Blockinøø. Well-Trained PETs : Improving Probability Estimation , 2000 .

[2] Stefan Kramer,et al. Structural Regression Trees , 1996, AAAI/IAAI, Vol. 1.

[3] Paul R. Cohen,et al. Multiple Comparisons in Induction Algorithms , 2000, Machine Learning.

[4] Arno J. Knobbe,et al. Multi-relational Decision Tree Induction , 1999, PKDD.

[5] Jennifer Neville,et al. Simple estimators for relational Bayesian classifiers , 2003, Third IEEE International Conference on Data Mining.

[6] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .

[7] Hendrik Blockeel,et al. Top-Down Induction of First Order Logical Decision Trees , 1998, AI Commun..

[8] Lise Getoor,et al. Learning Probabilistic Relational Models , 1999, IJCAI.

[9] Stefan Kramer,et al. Stochastic Propositionalization of Non-determinate Background Knowledge , 1998, ILP.

[10] Tom M. Mitchell,et al. Learning to Extract Symbolic Knowledge from the World Wide Web , 1998, AAAI/IAAI.

[11] Jennifer Neville,et al. Avoiding Bias when Aggregating Relational Data with Degree Disparity , 2003, ICML.

[12] Jennifer Neville,et al. Linkage and Autocorrelation Cause Feature Selection Bias in Relational Learning , 2002, ICML.

[13] Andrew McCallum,et al. A Machine Learning Approach to Building Domain-Specific Search Engines , 1999, IJCAI.