Random Tessellation Forests

Space partitioning methods such as random forests and the Mondrian process are powerful machine learning methods for multi-dimensional and relational data, and are based on recursively cutting a domain. The flexibility of these methods is often limited by the requirement that the cuts be axis aligned. The Ostomachion process and the self-consistent binary space partitioning-tree process were recently introduced as generalizations of the Mondrian process for space partitioning with non-axis aligned cuts in the two dimensional plane. Motivated by the need for a multi-dimensional partitioning tree with non-axis aligned cuts, we propose the Random Tessellation Process (RTP), a framework that includes the Mondrian process and the binary space partitioning-tree process as special cases. We derive a sequential Monte Carlo algorithm for inference, and provide random forest methods. Our process is self-consistent and can relax axis-aligned constraints, allowing complex inter-dimensional dependence to be captured. We present a simulation study, and analyse gene expression data of brain tissue, showing improved accuracies over other methods.

[1]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[2]  E. George Sampling random polygons , 1987, Journal of Applied Probability.

[3]  Marc A. Berger,et al.  An Introduction to Probability and Stochastic Processes , 1992 .

[4]  T. Mattfeldt Stochastic Geometry and Its Applications , 1996 .

[5]  Simon J. Godsill,et al.  On sequential Monte Carlo sampling methods for Bayesian filtering , 2000, Stat. Comput..

[6]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[7]  S. Horvath,et al.  Gene Expression Profiling of Gliomas Strongly Predicts Survival , 2004, Cancer Research.

[8]  N. Chopin Central limit theorem for sequential Monte Carlo methods and its application to Bayesian inference , 2004, math/0508594.

[9]  W. Nagel,et al.  Crack STIT tessellations: characterization of stationary random tessellations stable with respect to iteration , 2005, Advances in Applied Probability.

[10]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[11]  H. Chipman,et al.  Bayesian Additive Regression Trees , 2006 .

[12]  Thomas L. Griffiths,et al.  Learning Systems of Concepts with an Infinite Relational Model , 2006, AAAI.

[13]  J. K. Hunter,et al.  Measure Theory , 2007 .

[14]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[15]  Yee Whye Teh,et al.  The Mondrian Process , 2008, NIPS.

[16]  M. Barnes,et al.  Analysis of gene expression in two large schizophrenia cohorts identifies multiple changes associated with nerve terminal function , 2009, Molecular Psychiatry.

[17]  H. Chipman,et al.  BART: Bayesian Additive Regression Trees , 2008, 0806.3286.

[18]  M. R. Leadbetter Poisson Processes , 2011, International Encyclopedia of Statistical Science.

[19]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[20]  D. Godlovitch Idealised models of sea ice thickness dynamics , 2011 .

[21]  C. Davies,et al.  Transcription and pathway analysis of the superior temporal cortex and anterior prefrontal cortex in schizophrenia , 2011, Journal of neuroscience research.

[22]  W. Marsden I and J , 2012 .

[23]  Kurt Hornik,et al.  Misc Functions of the Department of Statistics (e1071), TU Wien , 2014 .

[24]  Yee Whye Teh,et al.  Mondrian Forests: Efficient Online Random Forests , 2014, NIPS.

[25]  Bonnie Kirkpatrick,et al.  Efficient Continuous-Time Markov Chain Estimation , 2014, ICML.

[26]  Frank D. Wood,et al.  Canonical Correlation Forests , 2015, ArXiv.

[27]  Yee Whye Teh,et al.  Particle Gibbs for Bayesian Additive Regression Trees , 2015, AISTATS.

[28]  Tyler M. Tomita,et al.  Random Projection Forests , 2015 .

[29]  Yang Wang,et al.  The Ostomachion Process , 2016, AAAI.

[30]  Bin Li,et al.  The Binary Space Partitioning-Tree Process , 2018, AISTATS.

[31]  Erwan Scornet,et al.  Minimax optimal rates for Mondrian trees and forests , 2018, The Annals of Statistics.

[32]  Bin Li,et al.  Binary Space Partitioning Forests , 2019, ArXiv.