Conceptual descriptions from monitoring and watching image sequences

Abstract This paper contrasts two ways of forming conceptual descriptions from images. The first, called “monitoring”, just follows the flow of data from images to interpretation, having little need for top-level control. The second, called “watching”, emphasizes the use of top-level control and actively selects evidence for task-based descriptions of the dynamic scenes. Here we look at the effect this has on forming conceptual descriptions. First, we look at how motion verbs and the perception of events contribute to an effective representational scheme. Then we go on to discuss illustrated examples of computing conceptual descriptions from images in our implementations of the monitoring and watching systems. Finally, we discuss future plans and related work.

[1]  Kim L. Boyer,et al.  Integration, Inference, and Management of Spatial Information Using Bayesian Networks: Perceptual Organization , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Hans-Hellmut Nagel,et al.  From image sequences towards conceptual descriptions , 1988, Image Vis. Comput..

[3]  Margaret M. Fleck The Topology of Boundaries , 2018, Artif. Intell..

[4]  Bernd Neumann,et al.  Understanding object motion: Recognition, learning and spatiotemporal reasoning , 1991, Robotics Auton. Syst..

[5]  Maja J. Matarić,et al.  Learning to Use Selective Attention and Short-Term Memory in Sequential Tasks , 1996 .

[6]  Jaakko Hintikka,et al.  On the Logic of Perception , 1969 .

[7]  Dana H. Ballard,et al.  Computer Vision , 1982 .

[8]  A. L. Yarbus,et al.  Eye Movements and Vision , 1967, Springer US.

[9]  E. Rolls High-level vision: Object recognition and visual cognition, Shimon Ullman. MIT Press, Bradford (1996), ISBN 0 262 21013 4 , 1997 .

[10]  David Chapman,et al.  Pengi: An Implementation of a Theory of Activity , 1987, AAAI.

[11]  Richard J. Howarth,et al.  Interpreting a Dynamic and Uncertain World: Task-Based Control , 1998, Artif. Intell..

[12]  Philip E. Agre,et al.  The dynamic structure of everyday life , 1988 .

[13]  Anthony G. Cohn,et al.  Generation of Semantic Regions from Image Sequences , 1996, ECCV.

[14]  Hilary Buxton,et al.  Visual Surveillance Monitoring and Watching , 1996, ECCV.

[15]  Geoffrey D. Sullivan,et al.  Model-Based Tracking , 2011, BMVC.

[16]  James F. Allen Towards a General Theory of Action and Time , 1984, Artif. Intell..

[17]  David C. Hogg,et al.  Learning Flexible Models from Image Sequences , 1994, ECCV.

[18]  S. Ullman Visual routines , 1984, Cognition.

[19]  Christopher M. Brown,et al.  Where to Look Next Using a Bayes Net: Incorporating Geometric Relations , 1992, ECCV.

[20]  Jörg R. J. Schirra,et al.  Optional Deep Case Filling and Focus Control with Mental Images: ANTLIMA-KOREF , 1995, IJCAI.

[21]  A. Treisman Preattentive processing in vision , 1985, Comput. Vis. Graph. Image Process..

[22]  C. A. R. Hoare,et al.  Communicating sequential processes , 1978, CACM.

[23]  Lawrence Birnbaum,et al.  Looking for trouble: Using causal semantics to direct focus of attention , 1993, 1993 (4th) International Conference on Computer Vision.

[24]  Adam Krzyzak,et al.  Computer Vision and Shape Recognition , 1989, World Scientific Series in Computer Science.

[25]  Shaogang Gong,et al.  Bayesian Nets for Mapping Contextual Knowledge to Computational Constraints in Motion Segmentation a , 1993 .

[26]  Michael Leyton,et al.  Inferring Causal History from Shape , 1989, Cogn. Sci..

[27]  Kurt Mehlhorn,et al.  Data Structures and Algorithms 3: Multi-dimensional Searching and Computational Geometry , 2012, EATCS Monographs on Theoretical Computer Science.

[28]  Samuel S. Blackman,et al.  Multiple-Target Tracking with Radar Applications , 1986 .

[29]  Alex Pentland,et al.  Pfinder: real-time tracking of the human body , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[30]  Aaron F. Bobick,et al.  Action recognition using probabilistic parsing , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[31]  W. Klein,et al.  Speech, place, and action : studies in deixis and related topics , 1982 .

[32]  David L. Waltz Semantic Structures: Advances in Natural Language Processing , 1989 .

[33]  Norman I. Badler,et al.  Temporal scene analysis: conceptual descriptions of object movements. , 1975 .

[34]  Refractor Vision , 2000, The Lancet.

[35]  Jitendra Malik,et al.  Automatic Symbolic Traffic Scene Analysis Using Belief Networks , 1994, AAAI.

[36]  Karl Bühler,et al.  II. The Deictic Field of Language and Deictic Words , 2011 .

[37]  E. Reed The Ecological Approach to Visual Perception , 1989 .

[38]  Ann E. Nicholson,et al.  Sensor Validation Using Dynamic Belief Networks , 1992, UAI.

[39]  David Chapman,et al.  Vision, instruction, and action , 1990 .

[40]  H. Garfinkel Studies in Ethnomethodology , 1968 .

[41]  Richard S. Bird,et al.  Introduction to functional programming , 1988, Prentice Hall International series in computer science.

[42]  G. Miller,et al.  Language and Perception , 1976 .

[43]  A F Bobick,et al.  Movement, activity and action: the role of knowledge in the perception of motion. , 1997, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[44]  Kim L. Boyer,et al.  Using perceptual inference networks to manage vision processes , 1994, Proceedings of 12th International Conference on Pattern Recognition.

[45]  Ian Horswill,et al.  Visual Routines and Visual Search: A Real-Time Implementation and an Automata-Theoretic Analysis , 1995, IJCAI.

[46]  J. Spickard Garfinkel and Ethnomethodology by John Heritage. Cambridge: Polity Press (Basil Blackwell), 1984, viii + 336 pp. $34.95 and Studies in Ethnomethodology , by Harold Garfinkel. Cambridge: Polity Press (Basil Blackwell), 1967, xi + 288 pp. $9.95 (paper, reissue) , 1987 .

[47]  I. Gordon Theories of Visual Perception , 1989 .

[48]  Keiji Kanazawa,et al.  Sensible Decisions: Toward a Theory of Decision-Theoretic Information Invariants , 1994, AAAI.

[49]  Günter Ewald,et al.  Geometry: an introduction , 1971 .

[50]  James V. Mahoney,et al.  Image Chunking: Defining Spatial Building Blocks for Scene Analysis , 1987 .

[51]  Shaogang Gong,et al.  Visual Surveillance in a Dynamic and Uncertain World , 1995, Artif. Intell..

[52]  Stuart J. Russell,et al.  The BATmobile: Towards a Bayesian Automated Taxi , 1995, IJCAI.

[53]  Hans-Hellmut Nagel,et al.  A vision of vision and language' comprises action: an example from road traffic , 1994 .

[54]  A. L. I︠A︡rbus Eye Movements and Vision , 1967 .

[55]  M F Land,et al.  The knowledge base of the oculomotor system. , 1997, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[56]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[57]  Hilary Buxton,et al.  Analogical representation of space and time , 1992, Image Vis. Comput..

[58]  J. Harvey,et al.  New Directions in Attribution Research , 2018 .

[59]  Tony Clark,et al.  Pattern Recognition of Noisy Sequences of Behavioural Events using Functional Combinators , 1994, Comput. J..

[60]  R. Thibadeau Artificial Perception of Actions , 1986 .

[61]  Eva Stopp,et al.  Time-dependent generation of minimal sets of spatial descriptions , 1998 .

[62]  Zenon W. Pylyshyn,et al.  Computational processes in human vision : an interdisciplinary perspective , 1988 .

[63]  Ann E. Nicholson,et al.  The Data Association Problem when Monitoring Robot Vehicles Using Dynamic Belief Networks , 1992, ECAI.

[64]  W. Hanks Referential Practice: Language and Lived Space among the Maya , 1990 .

[65]  Oscar Firschein,et al.  Readings in computer vision: issues, problems, principles, and paradigms , 1987 .

[66]  François Brémond,et al.  Issues of representing context illustrated by video-surveillance applications , 1998, Int. J. Hum. Comput. Stud..

[67]  Richard J. Howarth On seeing spatial expressions , 1998 .

[68]  R. J. Howarth,et al.  Attentional control for visual surveillance , 1998, Proceedings 1998 IEEE Workshop on Visual Surveillance.

[69]  R. C. Thomas,et al.  Computer Vision: A First Course , 1988 .

[70]  Michael Ian Shamos,et al.  Computational geometry: an introduction , 1985 .

[71]  Y. Shoham Reasoning About Change: Time and Causation from the Standpoint of Artificial Intelligence , 1987 .

[72]  Hilary Buxton,et al.  Selective Attention in Dynamic Vision , 1993, IJCAI.

[73]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[74]  Rajesh P. N. Rao,et al.  Embodiment is the foundation, not a level , 1996, Behavioral and Brain Sciences.

[75]  Marc H. J. Romanycia The Composition and Control of Visual Routines , 1989, Computer Vision and Shape Recognition.

[76]  Patrick Oliver,et al.  Representation and Processing of Spatial Expressions , 1998 .

[77]  Hilary Buxton,et al.  Watching behaviour: the role of context and learning , 1996, Proceedings of 3rd IEEE International Conference on Image Processing.

[78]  G. D. Sullivan,et al.  Natural and artificial low-level seeing systems - Visual interpretation of known objects in constrained scenes , 1992 .

[79]  A. Newell Unified Theories of Cognition , 1990 .

[80]  Gudula Retz-Schmidt,et al.  Various Views on Spatial Prepositions , 1988, AI Mag..

[81]  木村 和夫 Pragmatics , 1997, Language Teaching.