Controlling view-based algorithms using approximate world models and action information

Most view-based vision algorithms are based on strong assumptions about the disposition of the objects in the image. To safely apply those algorithms in real world image sequences, we propose that a vision system should be divided into two components. The first component contains an approximate world model of the scene-a low accuracy, coarse description of the objects and actions in the world. Approximate world models are constructed and updated by simple vision routines and by the use of action information provided by an external source. The second component employs view-based algorithms to perform required perceptual tasks; the selection and control of the view-based methods are determined by the information provided by the approximate world model. We demonstrate the approximate world model approach in a project to control cameras in a TV studio where the external context is provided by a script.

[1]  Claudio S. Pinhanez,et al.  Approximate World Models: Incorporating Qualitative and Linguistic Information into Vision Systems , 1996, AAAI/IAAI, Vol. 2.

[2]  D. Marr,et al.  Representation and recognition of the spatial organization of three-dimensional shapes , 1978, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[3]  Thomas M. Strat,et al.  Context-Based Vision: Recognizing Objects Using Information from Both 2D and 3D Imagery , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Cornelia Fermüller Global 3D motion estimation , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[5]  K. Rohr Towards model-based recognition of human movements in image sequences , 1994 .

[6]  David Beymer,et al.  Face recognition under varying pose , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Yiannis Aloimonos,et al.  Purposive and qualitative active vision , 1990, [1990] Proceedings. 10th International Conference on Pattern Recognition.

[8]  John K. Tsotsos Knowledge organization and its role in representation and interpretation for time‐varying data: the ALVEN system , 1985, Comput. Intell..

[9]  Robert C. Bolles,et al.  The Representation Space Paradigm of Concurrent Evolving Object Descriptions , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Roger C. Schank,et al.  CONCEPTUAL DEPENDENCY THEORY , 1975 .

[11]  H. Buxton,et al.  Advanced visual surveillance using Bayesian networks , 1997 .

[12]  Herbert Zettl Television Production Handbook , 1961 .

[13]  Michael J. Black,et al.  A computational and evolutionary perspective on the role of representation in vision , 1994 .

[14]  Geoffrey D. Sullivan,et al.  Visual Object Recognition Using Deformable Models of Vehicles , 1995 .

[15]  Thomas O. Binford,et al.  Ignorance, myopia, and naiveté in computer vision systems , 1991, CVGIP Image Underst..

[16]  Claudio S. Pinhanez,et al.  Intelligent Studios Modeling Space and Action to Control TV Cameras , 1997, Appl. Artif. Intell..

[17]  Alex Pentland,et al.  Face recognition using view-based and modular eigenspaces , 1994, Optics & Photonics.

[18]  James F. Allen Towards a General Theory of Action and Time , 1984, Artif. Intell..