Automatic Clustering for Unsupervised Risk Diagnosis of Vehicle Driving for Smart Road

Early risk diagnosis and driving anomaly detection from vehicle stream are of great benefits in a range of advanced solutions towards Smart Road and crash prevention, although there are intrinsic challenges, especially lack of ground truth, definition of multiple risk exposures. This study proposes a domain-specific automatic clustering (termed Autocluster) to self-learn the optimal models for unsupervised risk assessment, which integrates key steps of risk clustering into an auto-optimisable pipeline, including feature and algorithm selection, hyperparameter auto-tuning. Firstly, based on surrogate conflict measures, indicator-guided feature extraction is conducted to construct temporal-spatial and kinematical risk features. Then we develop an elimination-based model reliance importance (EMRI) method to unsupervised-select the useful features. Secondly, we propose balanced Silhouette Index (bSI) to evaluate the internal quality of imbalanced clustering. A loss function is designed that considers the clustering performance in terms of internal quality, inter-cluster variation, and model stability. Thirdly, based on Bayesian optimisation, the algorithm selection and hyperparameter auto-tuning are self-learned to generate the best clustering partitions. Various algorithms are comprehensively investigated. Herein, NGSIM vehicle trajectory data is used for test-bedding. Findings show that Autocluster is reliable and promising to diagnose multiple distinct risk exposures inherent to generalised driving behaviour. Besides, we also delve into risk clustering, such as, algorithms heterogeneity, Silhouette analysis, hierarchical clustering flows, etc. Meanwhile, the Autocluster is also a method for unsupervised multi-risk data labelling and indicator threshold calibration. Furthermore, Autocluster is useful to tackle the challenges in imbalanced clustering without ground truth or priori knowledge

[1]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[2]  Jie Lu,et al.  A Mobile Telematics Pattern Recognition Framework for Driving Behavior Extraction , 2021, IEEE Transactions on Intelligent Transportation Systems.

[3]  Estivill-CastroVladimir Why so many clustering algorithms , 2002 .

[4]  Y D Wong,et al.  Micro-simulation of vehicle conflicts involving right-turn vehicles at signalized intersections based on cellular automata. , 2014, Accident; analysis and prevention.

[5]  Robert B. Fisher,et al.  Classifying imbalanced data sets using similarity based hierarchical decomposition , 2015, Pattern Recognit..

[6]  Chaoxian Wu,et al.  Research Advances and Challenges of Autonomous and Connected Ground Vehicles , 2019, IEEE Transactions on Intelligent Transportation Systems.

[7]  Vincenzo Punzo,et al.  On the assessment of vehicle trajectory data accuracy and application to the Next Generation SIMulation (NGSIM) program data , 2011 .

[8]  Su Nguyen,et al.  Online Incremental Machine Learning Platform for Big Data-Driven Smart Traffic Management , 2019, IEEE Transactions on Intelligent Transportation Systems.

[9]  Yiik Diew Wong,et al.  A feature learning approach based on XGBoost for driving assessment and risk prediction. , 2019, Accident; analysis and prevention.

[10]  C. Hydén,et al.  Evaluation of traffic safety, based on micro-level behavioural data: theoretical framework and first implementation. , 2010, Accident; analysis and prevention.

[11]  Jeffery Archer,et al.  Indicators for traffic safety assessment and prediction and their application in micro-simulation modelling : a study of urban and suburban intersections , 2005 .

[12]  A. Horst A time-based analysis of road user behaviour in normal and critical encounters , 1990 .

[13]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[14]  Christian Hennig,et al.  Recovering the number of clusters in data sets with noise features using feature rescaling factors , 2015, Inf. Sci..

[15]  Cynthia Rudin,et al.  All Models are Wrong, but Many are Useful: Learning a Variable's Importance by Studying an Entire Class of Prediction Models Simultaneously , 2019, J. Mach. Learn. Res..

[16]  Tao Tang,et al.  Big Data Analytics in Intelligent Transportation Systems: A Survey , 2019, IEEE Transactions on Intelligent Transportation Systems.

[17]  M M Minderhoud,et al.  Extended time-to-collision measures for road traffic safety assessment. , 2001, Accident; analysis and prevention.

[18]  Omar Y. Al-Jarrah,et al.  Deep Learning-based Vehicle Behaviour Prediction For Autonomous Driving Applications: A Review , 2019, ArXiv.

[19]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[20]  Vladimir Estivill-Castro,et al.  Why so many clustering algorithms: a position paper , 2002, SKDD.

[21]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[22]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[23]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[24]  Hendrik Blockeel,et al.  Using internal validity measures to compare clustering algorithms , 2015, ICML 2015.

[25]  Kenji Suzuki,et al.  Binary coordinate ascent: An efficient optimization technique for feature subset selection for machine learning , 2016, Knowl. Based Syst..

[26]  Flávio Cunto,et al.  Calibration and validation of simulated vehicle safety performance at signalized intersections. , 2008, Accident; analysis and prevention.

[27]  Jonathan M Hankey,et al.  Performance of basic kinematic thresholds in the identification of crash and near-crash events within naturalistic driving data. , 2017, Accident; analysis and prevention.

[28]  Xiupeng Shi,et al.  An Automated Machine Learning (AutoML) Method of Risk Prediction for Decision-Making of Autonomous Vehicles , 2020, IEEE Transactions on Intelligent Transportation Systems.

[29]  Juan José Rodríguez Diez,et al.  Diversity techniques improve the performance of the best imbalance learning ensembles , 2015, Inf. Sci..

[30]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[31]  Francisco Herrera,et al.  An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics , 2013, Inf. Sci..

[32]  Sean Hughes,et al.  Clustering by Fast Search and Find of Density Peaks , 2016 .

[33]  Y D Wong,et al.  Key risk indicators for accident assessment conditioned on pre-crash vehicle trajectory. , 2018, Accident; analysis and prevention.

[34]  Yiik Diew Wong,et al.  Fuzzy Cellular Automata Model for Signalized Intersections , 2015, Comput. Aided Civ. Infrastructure Eng..

[35]  S. M. Sohel Mahmud,et al.  Application of proximal surrogate indicators for safety evaluation: A review of recent developments and research needs , 2017 .

[36]  Aaron Klein,et al.  Efficient and Robust Automated Machine Learning , 2015, NIPS.

[37]  Karim Ismail,et al.  Traffic conflict techniques for road safety analysis: open questions and some insights , 2014 .