Decision Forests: A Unified Framework for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning

This review presents a unified, efficient model of random decision forests which can be applied to a number of machine learning, computer vision, and medical image analysis tasks. Our model extends existing forest-based techniques as it unifies classification, regression, density estimation, manifold learning, semi-supervised learning, and active learning under the same decision forest framework. This gives us the opportunity to write and optimize the core implementation only once, with application to many diverse tasks. The proposed model may be used both in a discriminative or generative way and may be applied to discrete or continuous, labeled or unlabeled data. The main contributions of this review are: (1) Proposing a unified, probabilistic and efficient model for a variety of learning tasks; (2) Demonstrating margin-maximizing properties of classification forests; (3) Discussing probabilistic regression forests in comparison with other nonlinear regression algorithms; (4) Introducing density forests for estimating probability density functions; (5) Proposing an efficient algorithm for sampling from a density forest; (6) Introducing manifold forests for nonlinear dimensionality reduction; (7) Proposing new algorithms for transductive learning and active learning. Finally, we discuss how alternatives such as random ferns and extremely randomized trees stem from our more general forest model. This document is directed at both students who wish to learn the basics of decision forests, as well as researchers interested in the new contributions. It presents both fundamental and novel concepts in a structured way, with many illustrative examples and real-world applications. Thorough comparisons with state-of-the-art algorithms such as support vector machines, boosting and Gaussian processes are presented and relative advantages and disadvantages discussed. The many synthetic examples and existing commercial applications demonstrate the validity of the proposed model and its flexibility.

[1]  R. Plackett A REDUCTION FORMULA FOR NORMAL MULTIVARIATE INTEGRALS , 1954 .

[2]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[3]  S. Gupta Probability Integrals of Multivariate Normal and Multivariate $t^1$ , 1963 .

[4]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[5]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[6]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[7]  L. Devroye Non-Uniform Random Variate Generation , 1986 .

[8]  G. Stewart,et al.  Matrix Perturbation Theory , 1990 .

[9]  Simon Kasif,et al.  Induction of Oblique Decision Trees , 1993, IJCAI.

[10]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[11]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[12]  Simon Kasif,et al.  A System for Induction of Oblique Decision Trees , 1994, J. Artif. Intell. Res..

[13]  D. Geman,et al.  Randomized Inquiries About Shape: An Application to Handwritten Digit Recognition. , 1994 .

[14]  Åke Björck,et al.  Numerical methods for least square problems , 1996 .

[15]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[16]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  Yali Amit,et al.  Shape Quantization and Recognition with Randomized Trees , 1997, Neural Computation.

[18]  Christopher M. Bishop,et al.  GTM: The Generative Topographic Mapping , 1998, Neural Computation.

[19]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[21]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[22]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[23]  Antonio Criminisi,et al.  Accurate Visual Metrology from Single and Multiple Uncalibrated Images , 2001, Distinguished Dissertations.

[24]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[25]  Michael E. Tipping Sparse Bayesian Learning and the Relevance Vector Machine , 2001, J. Mach. Learn. Res..

[26]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[27]  Tommi S. Jaakkola,et al.  Partially labeled classification with Markov random walks , 2001, NIPS.

[28]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[29]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .

[30]  I. Jolliffe Principal Component Analysis , 2002 .

[31]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[32]  Paul A. Viola,et al.  Detecting Pedestrians Using Patterns of Motion and Appearance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[33]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[34]  U. von Toussaint,et al.  Bayesian inference and maximum entropy methods in science and engineering , 2004 .

[35]  S. Sheather Density Estimation , 2004 .

[36]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[37]  Maria L. Rizzo,et al.  TESTING FOR EQUAL DISTRIBUTIONS IN HIGH DIMENSION , 2004 .

[38]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[39]  Lawrence Cayton,et al.  Algorithms for manifold learning , 2005 .

[40]  Zhuowen Tu,et al.  Probabilistic boosting-tree: learning discriminative models for classification, recognition, and clustering , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[41]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[42]  Raphaël Marée,et al.  Random subwindows for robust image classification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[43]  Ronald R. Coifman,et al.  Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators , 2005, NIPS.

[44]  Martial Hebert,et al.  Semi-Supervised Self-Training of Object Detection Models , 2005, 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05) - Volume 1.

[45]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[46]  Robert Pless,et al.  On Manifold Structure of Cardiac MRI Data: Application to Segmentation , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[47]  Junhui Wang,et al.  On Transductive Support Vector Machines , 2006 .

[48]  Thorsten Joachims,et al.  Transductive Support Vector Machines , 2006, Semi-Supervised Learning.

[49]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[50]  Vincent Lepetit,et al.  Keypoint recognition using randomized trees , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Frédéric Jurie,et al.  Fast Discriminative Visual Codebooks using Randomized Clustering Forests , 2006, NIPS.

[52]  Kurt Driessens,et al.  Using Weighted Nearest Neighbor to Benefit from Unlabeled Data , 2006, PAKDD.

[53]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[54]  Irfan A. Essa,et al.  Tree-based Classifiers for Bilayer Video Segmentation , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[55]  M. Crawley Non‐linear Regression , 2007 .

[56]  Richard Baraniuk,et al.  Random Projections for Manifold Learning : Proofs and Analysis , 2007 .

[57]  Andrew W. Fitzgibbon,et al.  The Joint Manifold Model for Semi-supervised Multi-valued Regression , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[58]  Chinmay Hegde,et al.  Random Projections for Manifold Learning , 2007, NIPS.

[59]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[60]  Cheng Soon Ong,et al.  Multiclass multiple kernel learning , 2007, ICML '07.

[61]  Sanjoy Dasgupta,et al.  Learning the structure of manifolds using random projections , 2007, NIPS.

[62]  Andrew Zisserman,et al.  Image Classification using Random Forests and Ferns , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[63]  Antonio Torralba,et al.  Sharing Visual Features for Multiclass and Multiview Object Detection , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[64]  Willy Hereman,et al.  An introduction to diffusion maps , 2008 .

[65]  Matej Kristan Incremental learning with Gaussian mixture models , 2008 .

[66]  Andrew Blake,et al.  GeoS: Geodesic Image Segmentation , 2008, ECCV.

[67]  Philip H. S. Torr,et al.  Randomized trees for human pose detection , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[68]  Toby Sharp,et al.  Implementing Decision Trees and Forests on a GPU , 2008, ECCV.

[69]  Zhuowen Tu,et al.  Auto-context and its application to high-level vision tasks , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[70]  Rich Caruana,et al.  An empirical evaluation of supervised learning in high dimensions , 2008, ICML '08.

[71]  Roberto Cipolla,et al.  Semantic texton forests for image categorization and segmentation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[72]  Constantin F. Aliferis,et al.  A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification , 2008, BMC Bioinformatics.

[73]  Juergen Gall,et al.  Class-specific Hough forests for object detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[74]  Xiaojin Zhu,et al.  Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[75]  Juergen Gall,et al.  Class-specific Hough forests for object detection , 2009, CVPR.

[76]  Horst Bischof,et al.  On-line Random Forests , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[77]  Christoph H. Lampert Kernel Methods in Computer Vision , 2009, Found. Trends Comput. Graph. Vis..

[78]  Haibin Ling,et al.  Age regression from faces using random forests , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[79]  Horst Bischof,et al.  Semi-Supervised Random Forests , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[80]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[81]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[82]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[83]  Antonio Criminisi,et al.  Decision Forests with Long-Range Spatial Context for Organ Localization in CT Volumes , 2009 .

[84]  Ross T. Whitaker,et al.  On the Manifold Structure of the Space of Brain Images , 2009, MICCAI.

[85]  N. Meinshausen Node harvest: simple and interpretable regression and classication , 2009, 0910.2145.

[86]  Andrew Blake,et al.  Random Forest Classification for Automatic Delineation of Myocardium in Real-Time 3D Echocardiography , 2009, FIMH.

[87]  Christos Davatzikos,et al.  GRAM: A framework for geodesic registration on anatomical manifolds , 2010, Medical Image Anal..

[88]  Vincent Lepetit,et al.  Fast Keypoint Recognition Using Random Ferns , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[89]  Pal Mahesh,et al.  Semi-Supervised Learning Based Prediction of Musculoskeletal Disorder Risk , 2010 .

[90]  Dominicus Kester,et al.  BAYESIAN INFERENCE AND MAXIMUM ENTROPY METHODS IN SCIENCE AND ENGINEERING , 2010 .

[91]  Antonio Criminisi,et al.  Regression Forests for Efficient Anatomy Detection and Localization in CT Studies , 2010, MCV.

[92]  Sinisa Todorovic,et al.  (RF)^2 - Random Forest Random Field , 2010, NIPS.

[93]  Yu Chen,et al.  Silhouette-based object phenotype recognition using 3D shape priors , 2011, 2011 International Conference on Computer Vision.

[94]  Mert R. Sabuncu,et al.  The Relevance Voxel Machine (RVoxM): A Bayesian Method for Image-Based Prediction , 2011, MICCAI.

[95]  Daniel Rueckert,et al.  Laplacian Eigenmaps Manifold Learning for Landmark Localization in Brain MR Images , 2011, MICCAI.

[96]  Pushmeet Kohli,et al.  Markov Random Fields for Vision and Image Processing , 2011 .

[97]  Olivier Clatz,et al.  Spatial decision forests for MS lesion segmentation in multi-channel magnetic resonance images , 2011, NeuroImage.

[98]  Ignas Budvytis,et al.  Semi-supervised video segmentation using tree structured graphical models , 2011, CVPR.

[99]  Luc Van Gool,et al.  Real time head pose estimation with random regression forests , 2011, CVPR 2011.

[100]  Nassir Navab,et al.  STARS: A new ensemble partitioning approach , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[101]  Peter Kontschieder,et al.  Structured class-labels in random forests for semantic image labelling , 2011, 2011 International Conference on Computer Vision.

[102]  Dimitris N. Metaxas,et al.  Entangled Decision Forests and Their Application for Semantic Segmentation of CT Images , 2011, IPMI.

[103]  Andrew W. Fitzgibbon,et al.  Efficient regression of general-activity human poses from depth images , 2011, 2011 International Conference on Computer Vision.

[104]  Ian D. Reid,et al.  Unsupervised learning of a scene-specific coarse gaze estimator , 2011, 2011 International Conference on Computer Vision.

[105]  Dorin Comaniciu,et al.  Detection, Grading and Classification of Coronary Stenoses in Computed Tomography Angiography , 2011, MICCAI.

[106]  Sebastian Nowozin,et al.  Structured Learning and Prediction in Computer Vision , 2011, Found. Trends Comput. Graph. Vis..

[107]  Antonio Criminisi,et al.  Fast Multiple Organ Detection and Localization in Whole-Body MR Dixon Sequences , 2011, MICCAI.

[108]  Toby Sharp,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR.

[109]  Sebastian Nowozin,et al.  Decision tree fields , 2011, 2011 International Conference on Computer Vision.

[110]  Ullrich Köthe,et al.  On Oblique Random Forests , 2011, ECML/PKDD.

[111]  Alejandro F. Frangi,et al.  Characterizing Pathological Deviations from Normality Using Constrained Manifold-Learning , 2011, MICCAI.

[112]  Alexei A. Efros,et al.  Ensemble of exemplar-SVMs for object detection and beyond , 2011, 2011 International Conference on Computer Vision.

[113]  Antonio Criminisi,et al.  Robust linear registration of CT images using random regression forests , 2011, Medical Imaging.