Use Information You Have Never Observed Together: Data Fusion as a Major Step Towards Realistic Test Scenarios

Scenario-based testing is a major pillar in the development and effectiveness assessment of automated driving systems. Thereby, test scenarios address different information layers and situations (normal driving, critical situations and accidents) by using different databases. However, the systematic combination of accident and / or normal driving databases into new synthetic databases can help to obtain scenarios that are as realistic as possible. This paper shows how statistical matching (SM) can be applied to fuse different categorial accident and traffic observation databases. Hereby, the fusion is demonstrated in two use cases, each featuring several fusion methods. In use case 1, a synthetic database was generated out of two accident data samples, whereby 78.7% of the original values could be estimated correctly by a random forest classifier. The same fusion using distance-hot-deck reproduced only 67% of the original values, but better preserved the marginal distributions. A real-world application is illustrated in use case 2, where accident data was fused with over 23,000 car trajectories at one intersection in Germany. We could show that SM is applicable to fuse categorial traffic databases. In future research, the combination of hot-deck-methods and machine learning classifiers needs to be further investigated.

[1]  Markus Maurer,et al.  Ontology based Scene Creation for the Development of Automated Vehicles , 2017, 2018 IEEE Intelligent Vehicles Symposium (IV).

[2]  Data Fusion, Record Linkage und Data Mining , 2014 .

[3]  Susanne Rässler,et al.  Statistical Matching: "A Frequentist Theory, Practical Applications, And Alternative Bayesian Approaches" , 2002 .

[4]  Cesar H. Comin,et al.  A Systematic Comparison of Supervised Classifiers , 2013, PloS one.

[5]  Marcello D'Orazio,et al.  Statistical Matching: Theory and Practice , 2006 .

[6]  Romuald Aufrère,et al.  Map Matching and Lanes Number Estimation with Openstreetmap , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[7]  Eva Endres Statistical matching meets probabilistic graphical models , 2019 .

[8]  Mingue Park,et al.  Statistical micro matching using a multinomial logistic regression model for categorical data , 2019, Communications for Statistical Applications and Methods.

[9]  Hongchao Liu,et al.  Factor Identification and Prediction for Teen Driver Crash Severity Using Machine Learning: A Case Study , 2020, Applied Sciences.

[10]  Matthias Lehmann,et al.  Use of a criticality metric for assessment of critical traffic situations as part of SePIA , 2019 .

[11]  Markus Maurer,et al.  Scenarios for Development, Test and Validation of Automated Vehicles , 2018, 2018 IEEE Intelligent Vehicles Symposium (IV).

[12]  Hong Han,et al.  Variable selection using Mean Decrease Accuracy and Mean Decrease Gini based on Random Forest , 2016, 2016 7th IEEE International Conference on Software Engineering and Service Science (ICSESS).

[13]  Vincenzo Lagani,et al.  Towards Integrative Causal Analysis of Heterogeneous Data Sets and Studies , 2012, J. Mach. Learn. Res..

[14]  Marcello D'Orazio,et al.  Auxiliary variable selection in a statistical matching problem , 2019 .

[15]  Lev V. Utkin An imprecise deep forest for classification , 2020, Expert Syst. Appl..

[16]  Philipp Probst,et al.  Hyperparameters and tuning strategies for random forest , 2018, WIREs Data Mining Knowl. Discov..

[17]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[18]  Elena Console,et al.  Data Fusion , 2009, Encyclopedia of Database Systems.

[19]  Statistical matching : a model based approach for data integration , 2013 .

[20]  Liu Yingchun,et al.  Random forest algorithm in big data environment , 2014 .

[21]  Gerhard Tutz,et al.  Random forest for ordinal responses: Prediction and variable selection , 2016, Comput. Stat. Data Anal..

[22]  Christian König,et al.  Datenfusion und Datenintegration: 6. wissenschaftliche Tagung , 2005 .

[23]  H. Lüders,et al.  Comments , 2002, Clinical Neurophysiology.