Collisions between vehicles at urban and rural intersections account for nearly a third of all reported crashes in the United States. This has led to considerable interest at the federal level in developing an intelligent, low-cost system that can detect and prevent potential collisions in realtime. We propose the development of a system that uses video cameras to continuously gather traffic data at intersections (e.g., vehicle speeds, positions, trajectories, accelerations/decelerations, vehicle sizes, signal status etc.) which might eventually be used for collision prediction. This paper describes some of the challenges that face such a system as well as some of the possible solutions that are currently under investigation. Introduction Statistics from the crash database of the National Highway Traffic Safety Administration (NHTSA) show that in 1998 there were about 1.7 million vehicle crashes at intersections that accounted for as much as 27% of all reported crashes for the year and resulted in about 6,700 fatalities [1], [2]. The problem is expected to get worse with the continued proliferation of urban sprawl and the corresponding increases in traffic volumes and travel distances. Hence, there is considerable interest at the federal level [3] in the design and implementation of intelligent, realtime systems that can use knowledge of current traffic conditions at an intersection and its vicinity to predict potential collisions or near-misses and issue suitable countermeasures. We call this the Intersection Collision Warning and Avoidance (ICWA) problem. Effective solutions to the ICWA problem must deal with a number of complex issues: They must be able to integrate and synchronize temporal traffic information from a variety of sensors (e.g., multiple cameras from a computer vision-based system, radar, and GPS). They must process this information, detect collisions or near-misses, and issue countermeasures in real-time (e.g., at 10–15 Hz.) They must account for various trajectories of the vehicles. For instance, at the intersection, the vehicles in question may be moving at right angles to each other or they may be moving in opposite directions when one of them suddenly attempts a turn at the intersection. They must account for different vehicle speeds and acceleration/deceleration in the vicinity of the intersection. They must be able to process large numbers of vehicles moving relatively slowly (e.g., a suburban intersection) as well as few vehicles moving at high speed (e.g., a rural intersection). They must be able to distinguish between different types of vehicles (e.g., buses are longer than cars, so they have a larger collision profile and also make wider turns). They must account for pedestrians and cyclists crossing at the intersection (e.g., could these be treated as “vehicles” in their own right?). They must have effective means for communicating countermeasures. They must take into account other factors, such as the status of signals (if any) at the intersection and its vicinity, any signals issued by vehicles (e.g., flashing turn signals), the geometry of the intersection, current weather conditions (e.g., stopping distances in the winter are longer than in the summer), and the effect of countermeasures issued (e.g., does a suggested countermeasure such as a flashing warning light cause a vehicle to brake suddenly and create the potential for additional collisions?). Developing a full-fledged system as discussed above is our long-term goal. We envision that such a system would consist of three interacting modules, as shown in Figure 1. At present, we have initiated this process with a technology feasibility study for a system that includes some of the above features. The system we are currently developing will incorporate computer vision techniques to gather traffic and other data at intersections. We plan to test our solution both via computer simulations and via field tests at actual intersections, such as an intersection between a highway and a major county road in a suburb of Minneapolis, MN, and a suburban intersection in St. Paul, MN. Figure 1. The complete ICWA system. This focus of this paper is on the Vision Module and the Collision Prediction Module. Previous Work Our research group's earlier projects on pedestrian tracking, vehicle detection, tracking and classification, and intersection control are the basis of the new system. In particular, O. Masoud, S. Gupte, and N. Papanikolopoulos have developed a method that can detect, track, and classify vehicles by establishing correspondences among vehicle entities and “blobs” (regions of motion) in the image. This technique has been used for the collection of data at weaving sections where vehicles need to be tracked as they change lanes. Unlike commercially available systems (Nestor's CrossingGuard and TrafficVision, Trafficon, PEEK, Solo, Odetics' Vantage), our approach treats the vehicles as separate entities with specific geometric and kinetic properties and constraints. This allows us to follow them as they move from lane to lane. The weaving system has already been used by Dr. Eil Kwon (Minnesota Department of Transportation) to collect data from weaving sections. Understanding characteristics of accident-prone intersections along with the design of new operational guidelines will help us to better monitor those intersections so that we can help to prevent accidents. Some work done that is close to our approach is work in the area of three-dimensional visionbased tracking. Three-dimensional tracking uses models for vehicles and aims to handle complex traffic situations and arbitrary configurations. A suitable application would be conceptual descriptions of traffic situations [5]. Robustness is more important than computational efficiency in such applications. Kollnig and Nagel [6] developed a model-based system in which they proposed to use image gradients instead of edges for pose estimation. In another relevant piece of work, the same authors increased robustness by utilizing optical flow during the tracking process as well. Nagel et al. [7] and Leuck and Nagel [8] extended the previous approach to estimate the steering angle of vehicles. This was a necessary extension to handle trucks with trailers, which were represented as multiple linked rigid polyhedra. Experimental results in Leuck and Nagel [8] compared the steering angle and velocity of a vehicle to ground truth showing good performance. They also provided qualitative results for other vehicles showing an average success rate of 77%. Tracking a single vehicle took 2-3 seconds per frame. However, our proposed approach can handle a large number of vehicles in real-time. Finally, ours is the only effort of which we are aware that does vehicle tracking using a set of cameras (and can track a vehicle as it moves from one camera's field of view to another's). Our group has also done work on collision prediction, motivated by applications in air traffic control and robotics, albeit under rather simple assumptions [4]. For instance, given a collection of point-objects, moving with different speeds and along given trajectories, the algorithms in [4] can compute very rapidly the potential collisions and near-misses in the system (near-misses are based on a user-specified threshold distance up to which the moving points can approach each other safely). These methods have been extended to deal with entities modeled by rectangular bounding boxes and moving along orthogonal trajectories. The solutions are based on advanced algorithmic and data representation techniques from the field of computational geometry. Issues While the technology for monitoring vehicles has been improving over the years, monitoring intersections and areas of congested traffic remains a very challenging problem. Several issues must be addressed in order to effectively monitor intersections and prevent accidents: Shadows. Vehicles cast shadows as they travel, sometimes on top of other vehicles. Separating a vehicle from a shadow is a challenging problem, as shadows are not necessarily uniformly dark and may be the same color as the vehicle being tracked. Most methods that have been used to date lack an effective means to model shadows. Occlusions. Occlusions occur when something obscures a vehicle on the road. This may be a stationary object such as a tree or another vehicle. Using an elevated camera minimizes occlusions, but in the case of an intersection this is not desirable as the goal is to monitor traffic in all directions. Stop-and-Go. In congested traffic and at intersections, vehicles must slow down to a stop. Our system must keep track of vehicles even when they are not moving, meaning that usual tracking methods that separate moving objects from the background will fail. Possible Solutions The goal of our current work is to overcome these issues and create a system that can effectively predict vehicle collisions. At present, this system consists of a Vision Module for monitoring the intersection and a Collision Prediction Module to predict potentially dangerous situations. Vision Module The images captured by the camera(s) are analyzed in the Vision Module. The input to the Vision Module consists of a sequence of images; the outputs from the module are the positions and trajectories of the tracked entities. An adaptive background segmentation scheme (like that used by Stauffer et al. [11]) is used for learning the model of the background during the course of tracking. This model is used to make the background subtraction robust to changes in lighting conditions in the scene. In the next step, the individual foreground regions are extracted using a connected components method. A region tracking method is used for tracking the moving vehicles and pedestrians in the image. For tracking, two levels of abstraction are used: the blob level and the object level. In the blob
[1]
James U. Korein,et al.
Robotics
,
2018,
IBM Syst. J..
[2]
Y. Bar-Shalom.
Tracking and data association
,
1988
.
[3]
Michiel H. M. Smid,et al.
Fast Algorithms for Collision and Proximity Problems Involving Moving Geometric Objects
,
1994,
Comput. Geom..
[4]
Hans-Hellmut Nagel,et al.
Matching Object Models to Segments from an Optical Flow Field
,
1996,
ECCV.
[5]
J A Sethian,et al.
A fast marching level set method for monotonically advancing fronts.
,
1996,
Proceedings of the National Academy of Sciences of the United States of America.
[6]
H. Leuck,et al.
T3wT: Tracking Turning Trucks with Trailers
,
1998
.
[7]
Hans-Hellmut Nagel,et al.
Automatic differentiation facilitates OF-integration into steering-angle-based road vehicle tracking
,
1999,
Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).
[8]
W. Eric L. Grimson,et al.
Adaptive background mixture models for real-time tracking
,
1999,
Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).
[9]
Osama Masoud,et al.
Tracking and Analysis of Articulated Motion with an Application to Human Motion
,
2000
.
[10]
Hans-Hellmut Nagel,et al.
Incremental recognition of traffic situations from video image sequences
,
2000,
Image Vis. Comput..
[11]
Wassim G. Najm,et al.
Analysis of crossing path crashes
,
2001
.