Towards automatic generation of object recognition programs

This paper discusses issues and techniques to automatically compile object and sensor models into a visual recognition strategy for recognizing and locating an object in three-dimensional space from visual data. Historically, and even today, most successful model-based vision programs are handwritten; relevant knowledge of objects for recognition is extracted from examples of the object, tailored for the particular environment, and coded into the program by the implementors. If this is done properly, the resulting program is effective and efficient, but it requires long development time and many vision experts. Automatic generation of recognition programs by compilation attempts to automate this process. In particular, it extracts from the object and sensor models those features that are useful for recognition, and the control sequence which must be applied to deal with possible variations of the object appearances. The key components in automatic generation are: object modeling, sensor modeling, prediction of appearances, strategy generation, and program generation. An object model describes geometric and photometric properties of an object to be recognized. A sensor model specifies the sensor characteristics in predicting object appearances and variations of feature values. The appearances can be systematically grouped into aspects, where aspects are topologically equivalent classes with respect to the object features "visible" to the sensor. Once aspects are obtained, a recognition strategy is generated in the form of an interpretation tree from the aspects and their predicted feature values. An interpretation tree consists of two parts: a part which classifies an unknown region into one of the aspects, and a part which determines its precise attitude (position and orientation) within the classified aspect. Finally, the strategy is converted into a executable program by using object-oriented programming. One major emphasis of this paper is that sensors, as well as objects, must be explicitly modeled in order to achieve the goal of automatic generation of reliable and efficient recognition programs. Actual creation of interpretation trees for two toy objects and their execution for recognition from a bin of parts are demonstrated. University Libraries Carnegie Mellon University Pittsburgh, Pennsylvania 1521

[1]  Ray A. Jarvis,et al.  A Laser Time-of-Flight Range Scanner for Robotic Vision , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Thomas O. Binford,et al.  Computer Description of Curved Objects , 1973, IEEE Transactions on Computers.

[3]  Robert J. Woodham,et al.  Reflectance map techniques for analyzing surface defects in metal castings , 1978 .

[4]  Katsushi Ikeuchi,et al.  Numerical Shape from Shading and Occluding Boundaries , 1981, Artif. Intell..

[5]  W. Grimson,et al.  Model-Based Recognition and Localization from Sparse Range or Tactile Data , 1984 .

[6]  J. Canny Finding Edges and Lines in Images , 1983 .

[7]  P. Schönemann On artificial intelligence , 1985, Behavioral and Brain Sciences.

[8]  C. M. Brown Fast display of well-tesselated surfaces , 1979, Comput. Graph..

[9]  Philip E. Brou Using the Gaussian Image to Find the Orientation of Objects , 1984 .

[10]  Charles R. Dyer,et al.  Model-based recognition in robot vision , 1986, CSUR.

[11]  Rodney A. Brooks,et al.  Symbolic Reasoning Among 3-D Models and 2-D Images , 1981, Artif. Intell..

[12]  K. Tomiyasu,et al.  Tutorial review of synthetic-aperture radar (SAR) with applications to imaging of the ocean surface , 1978, Proceedings of the IEEE.

[13]  D. Mensa High resolution radar imaging , 1981 .

[14]  Robert C. Bolles,et al.  Locating Partially Visible Objects: The Local Feature Focus Method , 1980, AAAI.

[15]  M. Brady,et al.  Smoothed Local Symmetries and Their Implementation , 1984 .

[16]  Lawrence G. Roberts,et al.  Machine Perception of Three-Dimensional Solids , 1963, Outstanding Dissertations in the Computer Sciences.

[17]  Berthold K. P. Horn,et al.  Determining Grasp Points Using Photometric Stereo and the PRISM Binocular Stereo System , 1984 .

[18]  Jean Ponce,et al.  Describing surfaces , 1985, Comput. Vis. Graph. Image Process..

[19]  T. Kanade,et al.  The theory of straight homogeneous generalized cylinders and A taxonomy of generalized cylinders , 1983 .

[20]  R. Bolles,et al.  Recognizing and Locating Partially Visible Objects: The Local-Feature-Focus Method , 1982 .

[21]  Yoshiaki Shirai,et al.  Object Recognition Using Three-Dimensional Information , 1981, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Dana H. Ballard,et al.  Generalizing the Hough transform to detect arbitrary shapes , 1981, Pattern Recognit..

[23]  Gilbert Falk,et al.  Interpretation of Imperfect Line Data as a Three-Dimensional Scene , 1970, Artif. Intell..

[24]  James J. Little Determining Object Attitude from Extended Gaussian Images , 1985, IJCAI.

[25]  Kazutada Koshikawa,et al.  A Polarimetric Approach to Shape Understanding of Glossy Objects , 1979, IJCAI.

[26]  Masakazu Ejiri,et al.  Direction coding method and its application to scene analysis , 1975, IJCAI 1975.

[27]  Berthold K. P. Horn Extended Gaussian images , 1984, Proceedings of the IEEE.

[28]  Makoto Nagao,et al.  A Structural Analysis of Complex Aerial Photographs , 1980, Advanced Applications in Pattern Recognition.

[29]  Thomas O. Binford,et al.  Depth from Edge and Intensity Based Stereo , 1981, IJCAI.

[30]  M. Hebert,et al.  The Representation, Recognition, and Locating of 3-D Objects , 1986 .

[31]  Chris Goad,et al.  Special purpose automatic programming for 3D model-based vision , 1987 .

[32]  Takeo Kanade,et al.  Stereo by Intra- and Inter-Scanline Search Using Dynamic Programming , 1985, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Takeo Kanade,et al.  Model-Based Vision System by Object-Oriented Programming , 1988 .

[34]  Yoshiaki Shirai,et al.  A Model-based Recognition of Glossy objects using Their Polarizational Properties* , 1985 .

[35]  Thomas O. Binford,et al.  Survey of Model-Based Image Analysis Systems , 1982 .

[36]  John P. McDermott,et al.  Rule-Based Interpretation of Aerial Imagery , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Takeo Kanade,et al.  Vision and Navigation for the Carnegie-Mellon Navlab , 1987 .

[38]  Herbert Freeman,et al.  Characteristic Views As A Basis For Three-Dimensional Object Recognition , 1982, Other Conferences.

[39]  Charles L. Forgy,et al.  OPS5 user's manual , 1981 .

[40]  Katsushi Ikeuchi,et al.  Determining a Depth Map Using a Dual Photometric Stereo , 1987 .

[41]  Katsushi Ikeuchi Determining Attitude of Object From Needle Map Using Extended Gaussian Image , 1983 .

[42]  Alex Pentland,et al.  Perceptual Organization and the Representation of Natural Form , 1986, Artif. Intell..

[43]  W. Brown Synthetic Aperture Radar , 1967, IEEE Transactions on Aerospace and Electronic Systems.

[44]  Martial Hebert,et al.  Outdoor scene analysis using range data , 1986, Proceedings. 1986 IEEE International Conference on Robotics and Automation.

[45]  D Marr,et al.  Theory of edge detection , 1979, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[46]  Eric L. W. Grimson,et al.  From Images to Surfaces: A Computational Study of the Human Early Visual System , 1981 .

[47]  Robert C. Bolles,et al.  3DPO: A Three- Dimensional Part Orientation System , 1986, IJCAI.

[48]  Katsushi Ikeuchi Recognition of 3-D Objects Using the Extended Gaussian Image , 1981, IJCAI.

[49]  T. Poggio,et al.  A computational theory of human stereo vision , 1979, Proceedings of the Royal Society of London. Series B. Biological Sciences.