An experimental analysis of classifier ensembles for learning drifting concepts over time in autonomous outdoor robot navigation

Autonomous robot navigation in unstructured outdoor environments is a challenging area of active research and is currently unsolved. The navigation task requires identifying safe, traversable paths which allow the robot to progress toward a goal while avoiding obstacles. Stereo is an effective tool in the near field, but for smooth long-range trajectory planning or fast driving an approach is needed to understand far-field terrain as well. One approach is to apply Machine Learning techniques that accomplish near-to-far learning by augmenting near-field Stereo readings with learned classifications of the appearance of safe terrain and obstacles in the far field. A key problem with basic instantiations of this approach is that they are not able to identify obstacles in the far field unless there are examples of those obstacles in the near field within the same image, which is not always the case. This leads to a common failure mode in autonomous navigation where incorrect trajectories are followed by the robot as a result of short-sightedness. This thesis proposes to address this problem through the use of classifier ensembles which serve as a mechanism to store previously learned terrain models. Such ensembles are shown in the literature to improve predictive performance in both static environments and in the dynamic environments associated with the problem domain. In this domain, individual models are trained over time and added to an on-line model library as the robot navigates towards a goal. These stored models serve as memory and can be used for terrain classification of an incoming image. The key issues for this task are model selection of appropriate models from the library and the subsequent model combination of each selected model's output. Several methods for selecting and combining models from the library are proposed: choosing the best K models from the library, Bayesian Model Averaging, and a proposed adaptation of Ensemble Selection for use in dynamic environments. An extensive experimental evaluation is performed. This analysis is conducted on novel hand-labeled datasets taken directly from the problem domain, which are shown to contain drifting concepts. Several baselines are considered, including the one-model-per-image approach used in the most basic near-to-far learning. The experimental results uncover many important differences in the behavior of ensemble methods versus simpler approaches. In many scenarios, it is shown that the use of classifier ensembles increases far-field predictive performance compared to non-ensemble approaches, although this result was not shown to be statistically significant over all datasets and metrics in the analysis.

[1]  Takeo Kanade,et al.  Progress in robot road-following , 1986, Proceedings. 1986 IEEE International Conference on Robotics and Automation.

[2]  Ethem Alpaydin,et al.  Introduction to Machine Learning (Adaptive Computation and Machine Learning) , 2004 .

[3]  Thorsten Joachims,et al.  Detecting Concept Drift with Support Vector Machines , 2000, ICML.

[4]  Richard Granger,et al.  Incremental Learning from Noisy Data , 1986, Machine Learning.

[5]  Paul E. Utgoff,et al.  Incremental Induction of Decision Trees , 1989, Machine Learning.

[6]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[7]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[8]  D. Wolfe,et al.  Nonparametric Statistical Methods. , 1974 .

[9]  Giorgio Valentini,et al.  Bias-Variance Analysis of Support Vector Machines for the Development of SVM-Based Ensemble Methods , 2004, J. Mach. Learn. Res..

[10]  Michael C. Mozer,et al.  Optimizing Classifier Performance via an Approximation to the Wilcoxon-Mann-Whitney Statistic , 2003, ICML.

[11]  Nicolò Cesa-Bianchi,et al.  On Bayes Methods for On-Line Boolean Prediction , 1998, COLT '96.

[12]  Sebastian Thrun,et al.  Adaptive Road Following using Self-Supervised Learning and Reverse Optical Flow , 2005, Robotics: Science and Systems.

[13]  Roberto Manduchi,et al.  Obstacle Detection and Terrain Classification for Autonomous Off-Road Navigation , 2005, Auton. Robots.

[14]  Anthony Stentz,et al.  The Focussed D* Algorithm for Real-Time Replanning , 1995, IJCAI.

[15]  David Wettergreen,et al.  Experiments in Navigation and Mapping with a Hovering AUV , 2007, FSR.

[16]  Eric Krotkov,et al.  The DARPA PerceptOR evaluation experiments , 2007, Auton. Robots.

[17]  Sebastian Thrun,et al.  Self-supervised Monocular Road Detection in Desert Terrain , 2006, Robotics: Science and Systems.

[18]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[19]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[20]  Gregory Z. Grudic,et al.  Local path planning in image space for autonomous robot navigation in unstructured environments , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[21]  Kenneth O. Stanley Learning Concept Drift with a Committee of Decision Trees , 2003 .

[22]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[23]  Chih-Jen Lin,et al.  A Simple Decomposition Method for Support Vector Machines , 2002, Machine Learning.

[24]  Marcus A. Maloof,et al.  Dynamic weighted majority: a new ensemble method for tracking concept drift , 2003, Third IEEE International Conference on Data Mining.

[25]  Lars Kai Hansen,et al.  Neural Network Ensembles , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[27]  S. F. Actory,et al.  Personal correspondence , 1997 .

[28]  Irina Rish,et al.  An empirical study of the naive Bayes classifier , 2001 .

[29]  Tom M. Mitchell,et al.  Experience with a learning personal assistant , 1994, CACM.

[30]  Ron Kohavi,et al.  Bias Plus Variance Decomposition for Zero-One Loss Functions , 1996, ICML.

[31]  Glenn Fung,et al.  Incremental Support Vector Machine Classification , 2002, SDM.

[32]  Avinash C. Kak,et al.  Vision for Mobile Robot Navigation: A Survey , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[33]  Marcus A. Maloof,et al.  Using additive expert ensembles to cope with concept drift , 2005, ICML.

[34]  Matthew Turk,et al.  VITS-A Vision System for Autonomous Land Vehicle Navigation , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  C. J. van Rijsbergen,et al.  Information Retrieval , 1979, Encyclopedia of GIS.

[36]  João Gama,et al.  Incremental discretization, application to data with concept drift , 2007, SAC '07.

[37]  Linda G. Shapiro,et al.  Computer Vision , 2001 .

[38]  James R. Bergen,et al.  Visual odometry for ground vehicle applications , 2006, J. Field Robotics.

[39]  Jean-Claude Latombe,et al.  Robot motion planning , 1970, The Kluwer international series in engineering and computer science.

[40]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[41]  Rich Caruana,et al.  Data mining in metric space: an empirical analysis of supervised learning performance criteria , 2004, ROCAI.

[42]  Ralf Klinkenberg,et al.  An Ensemble Classifier for Drifting Concepts , 2005 .

[43]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[44]  Thomas P. Minka,et al.  Bayesian model averaging is not model combination , 2002 .

[45]  Darwin T. Kuan,et al.  Autonomous Robotic Vehicle Road Following , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[46]  Rich Caruana,et al.  Ensemble selection from libraries of models , 2004, ICML.

[47]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[48]  David J. Hand,et al.  A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems , 2001, Machine Learning.

[49]  Gerhard Widmer,et al.  Learning in the presence of concept drift and hidden contexts , 2004, Machine Learning.

[50]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[51]  Pedro M. Domingos Bayesian Averaging of Classifiers and the Overfitting Problem , 2000, ICML.

[52]  Ingrid Renz,et al.  Adaptive Information Filtering : Learning Drifting Concepts , 1998 .

[53]  Avrim Blum,et al.  On-line Algorithms in Machine Learning , 1996, Online Algorithms.

[54]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[55]  Gregory Z. Grudic,et al.  Outdoor Path Labeling Using Polynomial Mahalanobis Distance , 2006, Robotics: Science and Systems.

[56]  Jingjing Lu,et al.  Comparing naive Bayes, decision trees, and SVM with AUC and accuracy , 2003, Third IEEE International Conference on Data Mining.

[57]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[58]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[59]  Larry D. Jackel,et al.  The DARPA LAGR program: Goals, challenges, methodology, and phase I results , 2006, J. Field Robotics.

[60]  Gerhard Widmer,et al.  Learning Flexible Concepts from Streams of Examples: FLORA 2 , 1992, ECAI.

[61]  Chih-Jen Lin,et al.  Trust Region Newton Method for Logistic Regression , 2008, J. Mach. Learn. Res..

[62]  Michael Happold,et al.  Enhancing Supervised Terrain Classification with Predictive Unsupervised Learning , 2006, Robotics: Science and Systems.

[63]  David W. Opitz,et al.  An Empirical Evaluation of Bagging and Boosting , 1997, AAAI/IAAI.

[64]  Rich Caruana,et al.  Getting the Most Out of Ensemble Selection , 2006, Sixth International Conference on Data Mining (ICDM'06).

[65]  Robert A. Jacobs,et al.  Methods For Combining Experts' Probability Assessments , 1995, Neural Computation.

[66]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[67]  Jitendra Malik,et al.  Learning to detect natural image boundaries using local brightness, color, and texture cues , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[68]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Data Mining Researchers , 2003 .

[69]  Tom Fawcett,et al.  Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions , 1997, KDD.

[70]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[71]  Avrim Blum,et al.  Empirical Support for Winnow and Weighted-Majority Algorithms: Results on a Calendar Scheduling Domain , 2004, Machine Learning.

[72]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[73]  Manfred K. Warmuth,et al.  The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.

[74]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[75]  Christopher Rasmussen,et al.  Combining laser range, color, and texture cues for autonomous road following , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[76]  Hsuan-Tien Lin,et al.  A note on Platt’s probabilistic outputs for support vector machines , 2007, Machine Learning.

[77]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[78]  Jean Ponce,et al.  Computer Vision: A Modern Approach , 2002 .

[79]  Gregory Z. Grudic,et al.  Long-Term learning using multiple models for outdoor autonomous robot navigation , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[80]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[81]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[82]  V. Rich Personal communication , 1989, Nature.

[83]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[84]  David S. Platt Introducing Microsoft .NET, Second Edition , 2002 .

[85]  Ludmila I. Kuncheva,et al.  Classifier Ensembles for Changing Environments , 2004, Multiple Classifier Systems.

[86]  S. Sathiya Keerthi,et al.  Large scale semi-supervised linear SVMs , 2006, SIGIR.

[87]  Bernard Zenko,et al.  Is Combining Classifiers with Stacking Better than Selecting the Best One? , 2004, Machine Learning.

[88]  R. Siegwart,et al.  Autonomous Driving in Structured and Unstructured Environments , 2006, 2006 IEEE Intelligent Vehicles Symposium.

[89]  Ron Kohavi,et al.  The Case against Accuracy Estimation for Comparing Induction Algorithms , 1998, ICML.

[90]  Charles E. Thorpe,et al.  UNSCARF-a color vision system for the detection of unstructured roads , 1991, Proceedings. 1991 IEEE International Conference on Robotics and Automation.

[91]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[92]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[93]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[94]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[95]  Gregory Z. Grudic,et al.  Online Learning of Multiple Perceptual Models for Navigation in Unknown Terrain , 2007, FSR.

[96]  Michael J. Procopio,et al.  Using Binary Classifiers to Augment Stereo Vision for Enhanced Autonomous Robot Navigation ; CU-CS-1027-07 , 2007 .

[97]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[98]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[99]  D. Opitz,et al.  Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..

[100]  Alexey Tsymbal,et al.  The problem of concept drift: definitions and related work , 2004 .

[101]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.