Machine Learning for Predicting the Impact Point of a Low Speed Vehicle Crash

Using time series in-car data, this research focuses on predicting the point of impact of a low speed crash by developing an automatized machine learning approach for time series applications. After an initial data exploration, we discuss the extraction of features from time series and different ways to select the most relevant features. From 3,176 extracted features 9 are selected and used for a classification with a decision tree. To optimize the hyper-parameters of the decision tree algorithm, a randomized search with 50,000 iterations is conducted. The modeling results are graphically presented and discussed. With a final prediction accuracy of 89% (cross-validated 76%), the optimized decision tree offers great potential for utilization in vehicle insurance processing for automatized settlement of low-speed crash damages.

[1]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[2]  Kaneeka Vidanage,et al.  Image processing based severity and cost prediction of damages in the vehicle body: A computational intelligence approach , 2017, 2017 National Information Technology Conference (NITC).

[3]  Xiao-Ping Zhang,et al.  Advances in Intelligent Computing, International Conference on Intelligent Computing, ICIC 2005, Hefei, China, August 23-26, 2005, Proceedings, Part I , 2005, ICIC.

[4]  Ronald R. Yager,et al.  Advances in Intelligent Computing — IPMU '94 , 1994, Lecture Notes in Computer Science.

[5]  Witold R. Rudnicki,et al.  Feature Selection with the Boruta Package , 2010 .

[6]  Thomas Bäck,et al.  Towards Single- and Multiobjective Bayesian Global Optimization for Mixed Integer Problems , 2019 .

[7]  Gonzalo Álvarez,et al.  Combining expert knowledge with automatic feature extraction for reliable web attack detection , 2015, Secur. Commun. Networks.

[8]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[9]  Peter Stagge,et al.  Recurrent neural networks for time series classification , 2003, Neurocomputing.

[10]  Karthik Ramasubramanian,et al.  Machine Learning Using R , 2016, Apress.

[11]  Alfred A. Bartolucci,et al.  Introduction to Statistical Analysis of Laboratory Data: Bartolucci/Introduction to Statistical Analysis of Laboratory Data , 2016 .

[12]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[13]  Andreas W. Kempa-Liehr,et al.  Time Series FeatuRe Extraction on basis of Scalable Hypothesis tests (tsfresh - A Python package) , 2018, Neurocomputing.

[14]  Aurélien Géron,et al.  Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems , 2017 .

[15]  Markus Hofmann,et al.  Text Mining and Visualization: Case Studies Using Open-Source Tools , 2016 .