A2X: An Agent and Environment Interaction Benchmark for Multimodal Human Trajectory Prediction

In recent years, human trajectory prediction (HTP) has garnered attention in computer vision literature. Although this task has much in common with the longstanding task of crowd simulation, there is little from crowd simulation that has been borrowed, especially in terms of evaluation protocols. The key difference between the two tasks is that HTP is concerned with forecasting multiple steps at a time and capturing the multimodality of real human trajectories. A majority of HTP models are trained on the same few datasets, which feature small, transient interactions between real people and little to no interaction between people and the environment. Unsurprisingly, when tested on crowd egress scenarios, these models produce erroneous trajectories that accelerate too quickly and collide too frequently, but the metrics used in HTP literature cannot convey these particular issues. To address these challenges, we propose (1) the A2X dataset, which has simulated crowd egress and complex navigation scenarios that compensate for the lack of agent-to-environment interaction in existing real datasets, and (2) evaluation metrics that convey model performance with more reliability and nuance. A subset of these metrics are novel multiverse metrics, which are better-suited for multimodal models than existing metrics. The dataset is available at: https://mubbasir.github.io/HTP-benchmark/.

[1]  Dinesh Manocha,et al.  A statistical similarity measure for aggregate crowd dynamics , 2012, ACM Trans. Graph..

[2]  Florian Buettner,et al.  Generative Models , 2009, Encyclopedia of Database Systems.

[3]  Elsevier Sdol,et al.  Transportation Research Part C: Emerging Technologies , 2009 .

[4]  Wang Bing-Hong,et al.  Evacuation behaviors at exit in CA model with force essentials: A comparison with social force model , 2006 .

[5]  Yang An,et al.  From Goals, Waypoints & Paths To Long Term Human Trajectory Forecasting , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Honglak Lee,et al.  Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.

[7]  Silvio Savarese,et al.  Learning Social Etiquette: Human Trajectory Understanding In Crowded Scenes , 2016, ECCV.

[8]  Fei-Fei Li,et al.  Socially-Aware Large-Scale Crowd Forecasting , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Marco Pavone,et al.  The Trajectron: Probabilistic Multi-Agent Trajectory Modeling With Dynamic Spatiotemporal Graphs , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Christoph Hölscher,et al.  Taxonomy of Human Wayfinding Tasks: A Knowledge-Based Approach , 2009, Spatial Cogn. Comput..

[11]  Ying Nian Wu,et al.  Multi-Agent Tensor Fusion for Contextual Trajectory Prediction , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  A. Schadschneider,et al.  Enhanced Empirical Data for the Fundamental Diagram and the Flow Through Bottlenecks , 2008, 0810.1945.

[13]  Dani Lischinski,et al.  Crowds by Example , 2007, Comput. Graph. Forum.

[14]  Petros Faloutsos,et al.  Evaluating and optimizing level of service for crowd evacuations , 2015, MIG.

[15]  Vladimir Pavlovic,et al.  Laying the Foundations of Deep Long-Term Crowd Flow Prediction , 2020, ECCV.

[16]  Jean Oh,et al.  Social Attention: Modeling Attention in Human Crowds , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[17]  Marco Pavone,et al.  Trajectron++: Dynamically-Feasible Trajectory Forecasting with Heterogeneous Data , 2020, ECCV.

[18]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[19]  Rachid Alami,et al.  Human-aware robot navigation: A survey , 2013, Robotics Auton. Syst..

[20]  Dinesh Manocha,et al.  Reciprocal n-Body Collision Avoidance , 2011, ISRR.

[21]  Luc Van Gool,et al.  WILDTRACK: A Multi-camera HD Dataset for Dense Unscripted Pedestrian Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  J. Malik,et al.  It Is Not the Journey but the Destination: Endpoint Conditioned Trajectory Prediction , 2020, ECCV.

[23]  Harry Gifford Crowd Simulation , 2013 .

[24]  Michael K. McBeath,et al.  Right-Handers and Americans Favor Turning to the Right , 2002, Hum. Factors.

[25]  Dariu M. Gavrila,et al.  Human motion trajectory prediction: a survey , 2019, Int. J. Robotics Res..

[26]  Silvio Savarese,et al.  Social LSTM: Human Trajectory Prediction in Crowded Spaces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Silvio Savarese,et al.  Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Glenn Reinman,et al.  SteerBench: a benchmark suite for evaluating steering behaviors , 2009, Comput. Animat. Virtual Worlds.

[29]  Helbing,et al.  Social force model for pedestrian dynamics. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[30]  Mubbasir Kapadia,et al.  Fusion-Based Wayfinding Prediction Model for Multiple Information Sources , 2019, 2019 22th International Conference on Information Fusion (FUSION).

[31]  Nuria Pelechano,et al.  Simulating Heterogeneous Crowds with Interactive Behaviors , 2014, Eurographics.

[32]  Luc Van Gool,et al.  You'll never walk alone: Modeling social behavior for multi-target tracking , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[33]  Gonzalo Ferrer,et al.  Social-aware robot navigation in urban environments , 2013, 2013 European Conference on Mobile Robots.

[34]  Hideki Nakamura,et al.  Application of social force model to pedestrian behavior analysis at signalized crosswalk , 2014 .

[35]  Alexandre Alahi,et al.  Human Trajectory Forecasting in Crowds: A Deep Learning Perspective , 2020, IEEE Transactions on Intelligent Transportation Systems.

[36]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[37]  Norman I. Badler,et al.  Virtual Crowds: Steps Toward Behavioral Realism , 2015, Virtual Crowds: Steps Toward Behavioral Realism.

[38]  Zhi Yan,et al.  Online learning for human classification in 3D LiDAR-based tracking , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[39]  H. Van Dyke Parunak,et al.  A Survey of Environments and Mechanisms for Human-Human Stigmergy , 2005, E4MAS.