Learning-based vision and its application to autonomous indoor navigation

Adaptation is critical to autonomous navigation of mobile robots. Many adaptive mechanisms have been implemented, ranging from simple color thresholding to complicated learning with artificial neural networks (ANN). The major focus of this thesis lies in machine learning for vision-based navigation. Two well known vision-based navigation systems are ALVINN and ROBIN developed by Carnegie-Mellon University and University of Maryland, respectively. ALVINN uses a two-layer feedforward neural network while ROBIN relies on a radial basis function network (RBFN). Although current ANN-based methods have achieved great success in vision-based navigation, they have two major disadvantages: (1) Local minimum problem: The training of either multilayer perceptron or radial basis function network can get stuck at poor local minimums. (2) The flexibility problem: After the system has been trained in certain road conditions, it is hard to make the system adapt to new road conditions while retaining good performance for those road conditions that have already been learned. Sometimes this is termed a “memory loss” problem. As part of our SHOSLIF (Self-organizing Hierarchical Optimal Subspace Learning and Inference Framework) effort, SHOSLIF-N (SHOSLIF for Navigation) treats vision-based navigation as a content-based retrieval problem. Three major components of SHOSLIF-N are: (1) Automatic feature derivation: Instead of starting with random initial weights, the system employs either principle component analysis or linear discriminant analysis to derive features which are best suited for navigation tasks. (2) Nonparametric recursive partitioning regression, which is more flexible than global parametric regression used in either ALVINN or ROBIN, is employed in direct input-to-output mapping. Nonparametric recursive partitioning regression is realized with a recursive partition tree (RPT). (3) Self-organizing mechanism. (4) Low computational complexity: the recursive partition tree has a logarithmic retrieval complexity and can be used to address the complexity issue in learning a large number of scenes. For a binary RPT, only the most dominant eigenvector of principle component analysis or linear discriminant analysis is needed for further partitioning of each inner node. This leads to an efficient online incremental learning algorithm: the system learns or rejects a learning sample “on-the-fly” with real time response. Similar to ALVINN and ROBIN, the basic SHOSLIF-N maps a single-framed retinal input into an output steering signal. The system was successfully tested but exhibited limited capability in handling more complicated situations. When the number of different turns or corners was increased to a certain extent, the system sometimes failed to make the turn. One way to tackle this problem is to incorporate state information; that indicates the relative position between the robot and the oncoming corner or intersection, into the system. Therefore, state-based SHOSLIF-N, a system that incorporates states and utilizes a simple yet efficient visual attention mechanism which is helpful in determining the correct state transitions, is proposed and tested. With a set of fewer than 300 learning samples, state-based SHOSLIF-N has been successfully tested in indoor navigation on the 2nd and 3rd floors of our Engineering Building. Using a SUN Sparc-I and a framegrabber, both online incremental learning and autonomous navigation were done in real-time. Comparative study with two ANN-based methods has shown the advantages of the system: faster learning and better performance for the tasks tested.

[1]  R. F.,et al.  Mathematical Statistics , 1944, Nature.

[2]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[3]  Jerome H. Friedman Multivariate adaptive regression splines (with discussion) , 1991 .

[4]  Juyang Weng,et al.  State-based SHOSLIF for indoor visual navigation , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[5]  Hiroshi Murase,et al.  Learning and recognition of 3D objects from appearance , 1993, [1993] Proceedings IEEE Workshop on Qualitative Vision.

[6]  J. H. Wilkinson The algebraic eigenvalue problem , 1966 .

[7]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[8]  C. Thorpe,et al.  Explicit models for robot road following , 1989, Proceedings, 1989 International Conference on Robotics and Automation.

[9]  Philip A. Chou,et al.  Optimal Partitioning for Classification and Regression Trees , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Charles Elving Thorpe,et al.  Fido: vision and navigation for a robot rover , 1984 .

[11]  D. Pomerleau,et al.  MANIAC : A Next Generation Neurally Based Autonomous Road Follower , 1993 .

[12]  Paolo Roberti,et al.  The accelerated power method , 1984 .

[13]  Teuvo Kohonen,et al.  Improved versions of learning vector quantization , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[14]  S. Yakowitz Nonparametric density and regression estimation for Markov sequences without mixing assumptions , 1989 .

[15]  Shang-Liang Chen,et al.  Orthogonal least squares learning algorithm for radial basis function networks , 1991, IEEE Trans. Neural Networks.

[16]  Dean A. Pomerleau,et al.  Neural Network Perception for Mobile Robot Guidance , 1993 .

[17]  Shaoyun Chen,et al.  Autonomous navigation through case-based learning , 1995, Proceedings of International Symposium on Computer Vision - ISCV.

[18]  Anil K. Jain,et al.  A client/server control architecture for robot navigation , 1996, Pattern Recognit..

[19]  Alan M. Thompson The Navigation System of the JPL Robot , 1977, IJCAI.

[20]  Stephen Grossberg Content-addressable memory storage by neural networks: a general model and global Liapunov method , 1993 .

[21]  Jari Kangas TIME-DEPENDENT SELF-ORGANIZING MAPS FOR SPEECH RECOGNITION , 1991 .

[22]  Reinhold Behringer,et al.  The seeing passenger car 'VaMoRs-P' , 1994, Proceedings of the Intelligent Vehicles '94 Symposium.

[23]  Lawrence Sirovich,et al.  Application of the Karhunen-Loeve Procedure for the Characterization of Human Faces , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Sameer A. Nene,et al.  A simple algorithm for nearest neighbor search in high dimensions , 1997 .

[25]  Ernst D. Dickmanns,et al.  Recursive 3-D Road and Relative Ego-State Recognition , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  John,et al.  On Comprehensive Visual Learning , 1994 .

[27]  M. Herbert Building and navigating maps of road scenes using an active sensor , 1989, Proceedings, 1989 International Conference on Robotics and Automation.

[28]  Juyang Weng,et al.  Incremental learning for vision-based navigation , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[29]  King-Sun Fu,et al.  Automated classification of nucleated blood cells using a binary tree classifier , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Ishwar K. Sethi,et al.  Decision tree performance enhancement using an artificial neural network implementation1 1This work was supported in part by NSF grant IRI-9002087 , 1991 .

[31]  Tapan K. Sarkar,et al.  A survey of conjugate gradient algorithms for solution of extreme eigen-problems of a symmetric matrix , 1989, IEEE Trans. Acoust. Speech Signal Process..

[32]  John G. Taylor,et al.  The temporal Kohönen map , 1993, Neural Networks.

[33]  Y. Feng,et al.  CONJUGATE GRADIENT METHODS FOR SOLVING THE SMALLEST EIGENPAIR OF LARGE SYMMETRIC EIGENVALUE PROBLEMS , 1996 .

[34]  Shu-jun Zhang,et al.  A Real-time Plan-view Method For Following Bending Roads , 1993, Proceedings of the Intelligent Vehicles '93 Symposium.

[35]  M. Loève Probability theory : foundations, random sequences , 1955 .

[36]  Alex Pentland,et al.  Task-Specific Gesture Analysis in Real-Time Using Interpolated Views , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[37]  Rodney A. Brooks,et al.  Intelligence Without Reason , 1991, IJCAI.

[38]  G. Siegle,et al.  Interaction Between Digital Road Map Systems And Trinocular Autonomous Driving , 1993, Proceedings of the Intelligent Vehicles '93 Symposium.

[39]  A. Kundu,et al.  Rotation and Gray Scale Transform Invariant Texture Identification using Wavelet Decomposition and Hidden Markov Model , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[40]  J. Weng Shoslif-n: Shoslif for Autonomous Navigation (phase Ii) 1 Control Signal Image Path Selection , 1995 .

[41]  Matthew Turk,et al.  VITS-A Vision System for Autonomous Land Vehicle Navigation , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[42]  GrossbergS. Adaptive pattern classification and universal recoding , 1976 .

[43]  Ernst D. Dickmanns,et al.  Vehicles Capable of Dynamic Vision , 1997, IJCAI.

[44]  S. Yakowitz NEAREST‐NEIGHBOUR METHODS FOR TIME SERIES ANALYSIS , 1987 .

[45]  Hans P. Moravec Obstacle avoidance and navigation in the real world by a seeing robot rover , 1980 .

[46]  Erkki Oja,et al.  Dynamically expanding context as committee adaptation method in on-line recognition of handwritten Latin characters , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[47]  Charles E. Thorpe,et al.  Vision-based neural network road and intersection detection and traversal , 1995, Proceedings 1995 IEEE/RSJ International Conference on Intelligent Robots and Systems. Human Robot Interaction and Cooperative Robots.

[48]  D.J. Kriegman,et al.  Stereo vision and navigation in buildings for mobile robots , 1989, IEEE Trans. Robotics Autom..

[49]  Hiroshi Murase,et al.  Real-time 100 object recognition system , 1996, Proceedings of IEEE International Conference on Robotics and Automation.

[50]  Wen-Hsiang Tsai,et al.  Autonomous Land Vehicle Guidance by Line and Road Following Using Clustering, Hough Transform, and Model Matching Techniques , 1994 .

[51]  Andrew D. Wilson,et al.  Using Connguration States for the Representation and Recognition of Gesture 2 Related Work , 1995 .

[52]  Hiroshi Murase,et al.  Illumination planning for object recognition in structured environments , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[53]  E. Oja,et al.  On stochastic approximation of the eigenvectors and eigenvalues of the expectation of a random matrix , 1985 .

[54]  Michael A. Arbib,et al.  Timing and chunking in processing temporal order , 1993, IEEE Trans. Syst. Man Cybern..

[55]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[56]  Jake K. Aggarwal,et al.  Significant line segments for an indoor mobile robot , 1993, IEEE Trans. Robotics Autom..

[57]  Risto Miikkulainen,et al.  SARDNET: A Self-Organizing Feature Map for Sequences , 1994, NIPS.

[58]  DeLiang Wang,et al.  Anticipation-based temporal pattern generation , 1995, IEEE Trans. Syst. Man Cybern..

[59]  Raj Reddy,et al.  Foundations and grand challenges of artificial intelligence , 1988 .

[60]  Jake K. Aggarwal,et al.  Mobile robot navigation and scene modeling using stereo fish-eye lens system , 1997, Machine Vision and Applications.

[61]  Michael C. Mozer,et al.  Dynamic On-line Clustering and State Extraction: An Approach to Symbolic Learning , 1998, Neural Networks.

[62]  DeLiang Wang,et al.  Incremental learning of complex temporal patterns , 1996, IEEE Trans. Neural Networks.

[63]  Reid G. Simmons,et al.  Probabilistic Robot Navigation in Partially Observable Environments , 1995, IJCAI.

[64]  Erkki Oja,et al.  Engineering applications of the self-organizing map , 1996, Proc. IEEE.

[65]  B. V. K. Vijaya Kumar,et al.  Efficient Calculation of Primary Images from a Set of Images , 1982, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[66]  Dean A. Pomerleau,et al.  RALPH: rapidly adapting lateral position handler , 1995, Proceedings of the Intelligent Vehicles '95. Symposium.

[67]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[68]  Olivier Faugeras,et al.  Maintaining representations of the environment of a mobile robot , 1988, IEEE Trans. Robotics Autom..

[69]  Harry Shum,et al.  Principal Component Analysis with Missing Data and Its Application to Polyhedral Object Modeling , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[70]  Robin Sibson,et al.  What is projection pursuit , 1987 .

[71]  Larry S. Davis,et al.  An improved radial basis function network for visual autonomous road following , 1996, IEEE Trans. Neural Networks.

[72]  L. Kantorovich,et al.  Functional analysis in normed spaces , 1952 .

[73]  Jonathan D. Courtney Mobile Robot Localization Using Pattern Classification Techniques , 1993 .

[74]  Charles E. Thorpe,et al.  ELVIS: Eigenvectors for Land Vehicle Image System , 1995, Proceedings 1995 IEEE/RSJ International Conference on Intelligent Robots and Systems. Human Robot Interaction and Cooperative Robots.

[75]  Thomas M. Cover,et al.  Estimation by the nearest neighbor rule , 1968, IEEE Trans. Inf. Theory.

[76]  E. D. Dickmanns,et al.  A Curvature-based Scheme for Improving Road Vehicle Guidance by Computer Vision , 1987, Other Conferences.

[77]  James W. Davis,et al.  An appearance-based representation of action , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[78]  S. Tsuji,et al.  Stereo vision of a mobile robot: World constraints for image matching and interpretation , 1986, Proceedings. 1986 IEEE International Conference on Robotics and Automation.

[79]  L Sirovich,et al.  Low-dimensional procedure for the characterization of human faces. , 1987, Journal of the Optical Society of America. A, Optics and image science.

[80]  Donald E. Knuth,et al.  The art of computer programming: sorting and searching (volume 3) , 1973 .

[81]  J. Friedman,et al.  [A Statistical View of Some Chemometrics Regression Tools]: Response , 1993 .

[82]  B. S. Manjunath,et al.  An eigenspace update algorithm for image analysis , 1995, Proceedings of International Symposium on Computer Vision - ISCV.

[83]  S. Zeger,et al.  Markov regression models for time series: a quasi-likelihood approach. , 1988, Biometrics.

[84]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[85]  Takeo Kanade,et al.  Vision and Navigation for the Carnegie-Mellon Navlab , 1987 .

[86]  Raj Reddy To dream the possible dream , 1996, CACM.

[87]  Massimo Bertozzi,et al.  GOLD: a parallel real-time stereo vision system for generic obstacle and lane detection , 1998, IEEE Trans. Image Process..

[88]  Larry S. Davis,et al.  A visual navigation system for autonomous land vehicles , 1987, IEEE J. Robotics Autom..

[89]  Yuntao Cui,et al.  Learning-based hand sign recognition using SHOSLIF-M , 1995, Proceedings of IEEE International Conference on Computer Vision.

[90]  Michael Lindenbaum,et al.  Partial eigenvalue decomposition of large images using spatial temporal adaptive method , 1995, IEEE Trans. Image Process..

[91]  T. Kanade,et al.  Toward autonomous driving: the CMU Navlab. I. Perception , 1991, IEEE Expert.

[92]  Hiroshi Murase,et al.  Learning, positioning, and tracking visual appearance , 1994, Proceedings of the 1994 IEEE International Conference on Robotics and Automation.

[93]  John Juyang Weng The Developmental Approach to Intelligent Robots , 1998 .

[94]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[95]  David J. Hand,et al.  Discrimination and Classification , 1982 .

[96]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[97]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[98]  Stephen John Walsh Indoor robot navigation using a symbolic landmark map , 1992 .

[99]  L. S. Davis,et al.  The Use Of A Radial Basis Function Network For Visual Autonomous Road Following , 1993, Proceedings of the Intelligent Vehicles '93 Symposium.

[100]  Michael A. Arbib,et al.  Complex temporal sequence learning based on short-term memory , 1990 .

[101]  Charles E. Thorpe,et al.  Color Vision For Road Following , 1989, Other Conferences.

[102]  J. Kuczy,et al.  Estimating the Largest Eigenvalue by the Power and Lanczos Algorithms with a Random Start , 1992 .

[103]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[104]  Juyang Weng,et al.  Vision-guided navigation using SHOSLIF , 1998, Neural Networks.

[105]  T. Poggio A theory of how the brain might work. , 1990, Cold Spring Harbor symposia on quantitative biology.

[106]  David Lowe,et al.  The optimised internal representation of multilayer classifier networks performs nonlinear discriminant analysis , 1990, Neural Networks.

[107]  Anil K. Jain,et al.  Artificial neural networks for feature extraction and multivariate data projection , 1995, IEEE Trans. Neural Networks.

[108]  Rafael M. Inigo,et al.  Machine Vision Applied to Vehicle Guidance , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[109]  D. Pomerleau Eecient T Raining of Artiicial Neural Networks for Autonomous Navigation , 1991 .

[110]  Jian Zhou,et al.  Off-Line Handwritten Word Recognition Using a Hidden Markov Model Type Stochastic Network , 1994, IEEE Trans. Pattern Anal. Mach. Intell..