Implementation of a Computer Vision Framework for Tracking and Visualizing Face Mask Usage in Urban Environments

The COVID-19 pandemic is an evolving situation in the United States and is spreading at alarming rates. The adoption of public health-informed hygienic practices can have a large impact on community transmission of COVID-19 including the wearing of face masks in public settings. Convolutional Neural Networks (CNN) can be trained to classify people wearing face masks with impressive accuracy. However, current face mask datasets contain clear, high-resolution close-up images of individuals with face masks which is unrepresentative of the lower fidelity images of distant faces more prominent in urban camera images. This paper proposes a practical deep learning computer vision framework for detection and tracking of people in public spaces and the use of face masks. A custom 6,000 image face mask dataset curated from over 50 hours of urban surveillance camera footage is created in this work. CNN-based detectors trained using the dataset are used to perform person detection and face mask classification. Then, a multi-target tracking module extracts individual trajectories from frame-by-frame detection. By associating detected face masks with tracked individuals, overall face mask usage can be estimated. The framework is implemented on several surveillance cameras along the Detroit RiverWalk, a 5-kilometer pedestrian park connecting various greenways, plazas, pavilions, and open green spaces along the Detroit River in Detroit, Michigan. The detection of park user types is shown to have an average precision of 89% and higher for most person classes with the mask detector having an accuracy of 96%. An interactive web application visualizes the data and is used by park managers to inform management decisions and assess strategies used to increase face mask usage rates.