Unsupervised random forest: a tutorial with case studies

Unsupervised methods, such as principal component analysis, have gained popularity and wide‐spread acceptance in the chemometrics and applied statistics communities. Unsupervised random forest is an additional method capable of discovering underlying patterns in the data. However, the number of applications of unsupervised random forest in chemometrics has been limited. One possible cause for this is the belief that random forest can only be used in a supervised analysis setting. This tutorial introduces the basic concepts of unsupervised random forest and illustrates several applications in chemometrics through worked examples. Copyright © 2016 John Wiley & Sons, Ltd.

[1]  G. Tutz,et al.  An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. , 2009, Psychological methods.

[2]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[3]  K. Hornik,et al.  Unbiased Recursive Partitioning: A Conditional Inference Framework , 2006 .

[4]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[5]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[6]  D. R. Cutler,et al.  Utah State University From the SelectedWorks of , 2017 .

[7]  Allan P. White,et al.  Technical Note: Bias in Information-Based Measures in Decision Tree Induction , 1994, Machine Learning.

[8]  S. Horvath,et al.  Unsupervised Learning With Random Forest Predictors , 2006 .

[9]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[10]  Chris Aldrich,et al.  Unsupervised Process Monitoring and Fault Diagnosis with Machine Learning Methods , 2013, Advances in Computer Vision and Pattern Recognition.

[11]  Wei Zhong Liu,et al.  Bias in information-based measures in decision tree induction , 1994, Machine Learning.

[12]  Peter Bühlmann,et al.  MissForest - non-parametric missing value imputation for mixed-type data , 2011, Bioinform..

[13]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[14]  C. Nathan,et al.  Isocitrate lyase mediates broad antibiotic tolerance in Mycobacterium tuberculosis , 2014, Nature Communications.

[15]  X. Chen,et al.  Random forests for genomic data analysis. , 2012, Genomics.

[16]  Achim Zeileis,et al.  Bias in random forest variable importance measures: Illustrations, sources and a solution , 2007, BMC Bioinformatics.

[17]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[18]  Steve Horvath,et al.  Tumor classification by tissue microarray profiling: random forest clustering applied to renal cell carcinoma , 2005, Modern Pathology.

[19]  Yves Grandvalet,et al.  Bagging Equalizes Influence , 2004, Machine Learning.

[20]  Chris Aldrich,et al.  Unsupervised Process Fault Detection with Random Forests , 2010 .

[21]  Gopal K Gupta,et al.  Introduction to Data Mining with Case Studies , 2011 .

[22]  Lutgarde M. C. Buydens,et al.  Breaking with trends in pre-processing? , 2013 .

[23]  Antanas Verikas,et al.  A novel approach to estimate proximity in a random forest: An exploratory study , 2012, Expert Syst. Appl..

[24]  R. A. van den Berg,et al.  Centering, scaling, and transformations: improving the biological information content of metabolomics data , 2006, BMC Genomics.

[25]  Luc Devroye,et al.  Consistency of Random Forests and Other Averaging Classifiers , 2008, J. Mach. Learn. Res..

[26]  Lutgarde M. C. Buydens,et al.  Pseudo-sample trajectories for variable interaction detection in Dissimilarity Partial Least Squares , 2015 .

[27]  Achim Zeileis,et al.  BMC Bioinformatics BioMed Central Methodology article Conditional variable importance for random forests , 2008 .

[28]  W. N. Street,et al.  Machine learning techniques to diagnose breast cancer from image-processed nuclear features of fine needle aspirates. , 1994, Cancer letters.

[29]  L. Breiman Population theory for boosting ensembles , 2003 .

[30]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[31]  Andy Liaw,et al.  Variable Selection in Random Forest with Application to Quantitative Structure-Activity Relationship , 2003 .