Directing attention for traffic scene analysis

Computer vision systems which must operate in real-time face one overriding problem before high-level recognition and analysis processes can be applied. Real-time image capture at any useful image resolution and frame-rate yields prodigious quantities of data. Image resolutions of 512×512 pixels or higher at 8 bits per pixel are commonly used for analysis and a frame rate possibly exceeding 25 fps is desirable for real-time applications. This results in data rates in excess of 6.4 Mbytes/sec. Data rates of this level challenge the most expensive processors while relatively inexpensive processors have no chance of coping. Parallel processor implementations of low-level image processing tasks is one approach to the problem, however complete parallelism is not possible because it requires too many processors and connections. Hence the need to reduce the quantity of data requiring analysis becomes a priority in any low-cost vision system. A method of achieving this is to extract the regions of interest in the scene, as these usually constitute a considerably lesser proportion of the whole image. The interest of a region is of course a problem specific measure. This paper therefore presents a problem independent approach to directing the attention of a vision system to those regions in the scene which are likely to be of interest. The regions of non-interest can then be discarded thereby drastically reducing the volume of data for further analysis.