StreetAware: A High-Resolution Synchronized Multimodal Urban Scene Dataset

Access to high-quality data is an important barrier in the digital analysis of urban settings, including applications within computer vision and urban design. Diverse forms of data collected from sensors in areas of high activity in the urban environment, particularly at street intersections, are valuable resources for researchers interpreting the dynamics between vehicles, pedestrians, and the built environment. In this paper, we present a high-resolution audio, video, and LiDAR dataset of three urban intersections in Brooklyn, New York, totaling almost 8 unique hours. The data were collected with custom Reconfigurable Environmental Intelligence Platform (REIP) sensors that were designed with the ability to accurately synchronize multiple video and audio inputs. The resulting data are novel in that they are inclusively multimodal, multi-angular, high-resolution, and synchronized. We demonstrate four ways the data could be utilized — (1) to discover and locate occluded objects using multiple sensors and modalities, (2) to associate audio events with their respective visual representations using both video and audio modes, (3) to track the amount of each type of object in a scene over time, and (4) to measure pedestrian speed using multiple synchronized camera views. In addition to these use cases, our data are available for other researchers to carry out analyses related to applying machine learning to understanding the urban environment (in which existing datasets may be inadequate), such as pedestrian-vehicle interaction modeling and pedestrian attribute recognition. Such analyses can help inform decisions made in the context of urban sensing and smart cities, including accessibility-aware urban design and Vision Zero initiatives.

[1]  F. Biljecki,et al.  Sensing urban soundscapes from street view imagery , 2023, Comput. Environ. Urban Syst..

[2]  Christos Diou,et al.  StreetScouting: A Deep Learning Platform for Automatic Detection and Geotagging of Urban Features from Street-Level Images , 2022, Applied Sciences.

[3]  M. Winters,et al.  Predicting walking-to-work using street-level imagery and deep learning in seven Canadian cities , 2022, Scientific Reports.

[4]  Yiwen Wang,et al.  Identification and Improvement of Hazard Scenarios in Non-Motorized Transportation Using Multiple Deep Learning and Street View Images , 2022, International journal of environmental research and public health.

[5]  J. Bello,et al.  Urban Sound & Sight: Dataset And Benchmark For Audio-Visual Urban Scene Understanding , 2022, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  G. Zussman,et al.  Smart City Intersections: Intelligence Nodes for Future Metropolises , 2022, Computer.

[7]  J. Bello,et al.  REIP: A Reconfigurable Environmental Intelligence Platform and Software Framework for Fast Sensor Network Prototyping , 2022, Sensors.

[8]  A. Tordeux,et al.  Review of Pedestrian Trajectory Prediction Methods: Comparing Deep Learning and Knowledge-Based Approaches , 2021, IEEE Transactions on Intelligent Transportation Systems.

[9]  Ping Luo,et al.  ByteTrack: Multi-Object Tracking by Associating Every Detection Box , 2021, ECCV.

[10]  I. Politis,et al.  Applying deep learning techniques for the prediction of pedestrian behaviour on crossings with countdown signal timers. , 2022, Transportation Research Procedia.

[11]  Cătălin Daniel Căleanu,et al.  A Review of Deep Learning-Based Methods for Pedestrian Trajectory Prediction , 2021, Sensors.

[12]  Eshed Ohn-Bar,et al.  X-World: Accessibility, Vision, and Autonomy Meet , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Ignacio Parra,et al.  Urban Intersection Classification: A Comparative Analysis , 2021, Sensors.

[14]  Nicholas C. Coops,et al.  Mapping trees along urban street networks with deep learning and street-level imagery , 2021 .

[15]  Kellie Corona,et al.  MEVA: A Large-Scale Multiview, Multimodal Video Dataset for Activity Detection , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[16]  Xuan Ding,et al.  Automated detecting and placing road objects from street-level images , 2019, Computational Urban Science.

[17]  Yang Zhao,et al.  Deep High-Resolution Representation Learning for Visual Recognition , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  A. Nassar Learning to map street-side objects using multiple views. (Cartographie des objets à partir de vues multiples par apprentissage profond) , 2021 .

[19]  Zhe Jiang,et al.  Mapping Road Safety Features from Streetview Imagery , 2020, Trans. Data Sci..

[20]  Vincent Lostanlen,et al.  SONYC-UST-V2: An Urban Sound Tagging Dataset with Spatiotemporal Context , 2020, DCASE.

[21]  Atika Rivenq,et al.  A novel multi-view pedestrian detection database for collaborative Intelligent Transportation Systems , 2020, Future Gener. Comput. Syst..

[22]  Marcel Worring,et al.  Urban Object Detection Kit: A System for Collection and Analysis of Street-Level Imagery , 2020, ICMR.

[23]  Yubin Kuang,et al.  Mapillary Street-Level Sequences: A Dataset for Lifelong Place Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Ye Liu,et al.  Estimating pedestrian volume using Street View images: A large-scale validation test , 2020, Comput. Environ. Urban Syst..

[25]  Graham Dove,et al.  Urban Mosaic: Visual Exploration of Streetscapes Using Large-Scale Image Data , 2020, CHI.

[26]  M. Ang,et al.  Toward Hierarchical Self-Supervised Monocular Absolute Depth Estimation for Autonomous Driving Applications , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[27]  Dragomir Anguelov,et al.  Scalability in Perception for Autonomous Driving: Waymo Open Dataset , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Jiliang Luo,et al.  CenterFace: Joint Face Detection and Alignment Using Face as Point , 2019, Sci. Program..

[29]  Qiang Xu,et al.  nuScenes: A Multimodal Dataset for Autonomous Driving , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Cláudio T. Silva,et al.  A New Approach for Pedestrian Density Estimation Using Moving Sensors and Computer Vision , 2018, ACM Trans. Spatial Algorithms Syst..

[31]  John K. Tsotsos,et al.  PIE: A Large-Scale Dataset and Models for Pedestrian Intention Estimation and Trajectory Prediction , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[32]  Mark Elshaw,et al.  Pedestrian and Cyclist Detection and Intent Estimation for Autonomous Vehicles: A Survey , 2019, Applied Sciences.

[33]  Stefanos Zafeiriou,et al.  RetinaFace: Single-stage Dense Face Localisation in the Wild , 2019, ArXiv.

[34]  Mohcine Chraibi,et al.  Prediction of Pedestrian Speed with Artificial Neural Networks , 2017, Traffic and Granular Flow '17.

[35]  Michelle Karg,et al.  NightOwls: A Pedestrians at Night Dataset , 2018, ACCV.

[36]  Dariu Gavrila,et al.  The EuroCity Persons Dataset: A Novel Benchmark for Object Detection , 2018, ArXiv.

[37]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[38]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[39]  Nuno Vasconcelos,et al.  Cascade R-CNN: Delving Into High Quality Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40]  Khouloud Dahmane,et al.  The Cerema pedestrian database: A specific database in adverse weather conditions to evaluate computer vision pedestrian detectors , 2017, 2017 International Conference on Information and Digital Technologies (IDT).

[41]  Avishek Chakraborty,et al.  A data set for evaluating the performance of multi-class multi-object video tracking , 2017, Defense + Security.

[42]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Alexei A. Efros,et al.  KrishnaCam: Using a longitudinal, single-person, egocentric dataset for scene understanding tasks , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[45]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[46]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[48]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[49]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Samuel B. Williams,et al.  ASSOCIATION FOR COMPUTING MACHINERY , 2000 .