On Developing a Driver Identification Methodology Using In-Vehicle Data Recorders

Recently, cutting edge technologies to facilitate data collection have emerged on a large scale. One of the most prominent is the in-vehicle data recorder (IVDR). There are multiple ways to assign the IVDR’s data to the different drivers who share the same vehicle. Irrespective of the level of sophistication, all of these technologies still suffer considerable limitations in their accuracy. The purpose of this paper is to propose a methodology, which can identify the driver of a given trip using historical trip-based data. To do so, an advanced machine learning pipeline is proposed. The main goal is to take advantage of highly available data—such as driver-labeled floating car data collected by a IVDR—to build a pattern-based algorithm able to identify the trip’s driver category when its true identity is unknown. This stepwise process includes feature generation/selection, multiple heterogeneous explanatory models, and an ensemble approach (i.e., stacked generalization) to reduce their generalization error. Our goal is to provide an inexpensive alternative to existing driver identification technologies, which can serve as their complement and/or validation purposes. Experiments conducted over a real-world case study from Israel uncover the potential of this idea: it obtained an accuracy of ~88% and Cohen’s Kappa agreement score of ~74%.

[1]  Antonio Albiol,et al.  Face recognition using HOG-EBGM , 2008, Pattern Recognit. Lett..

[2]  João Gama,et al.  Validating the coverage of bus schedules: A Machine Learning approach , 2015, Inf. Sci..

[3]  Christina Inbakaran,et al.  Travel surveys: review of international survey methods , 2011 .

[4]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[5]  Dot Hs,et al.  The 100-Car Naturalistic Driving Study Phase II - Results of the 100-Car Field Experiment , 2006 .

[6]  M. Banerjee,et al.  Beyond kappa: A review of interrater agreement measures , 1999 .

[7]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[8]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[9]  R. Iman,et al.  Approximations of the critical region of the fbietkan statistic , 1980 .

[10]  Bernard W. Silverman,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[11]  Kurt Hornik,et al.  Misc Functions of the Department of Statistics (e1071), TU Wien , 2014 .

[12]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[13]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[14]  Chunyan Miao,et al.  FANS: face annotation by searching large-scale web facial images , 2013, WWW.

[15]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[16]  João Mendes-Moreira,et al.  An Incremental Probabilistic Model to Predict Bus Bunching in Real-Time , 2014, IDA.

[17]  Keiichi Uchimura,et al.  Driver Inattention Monitoring System for Intelligent Vehicles: A Review , 2009, IEEE Transactions on Intelligent Transportation Systems.

[18]  Aaron Klein,et al.  Efficient and Robust Automated Machine Learning , 2015, NIPS.

[19]  Tomer Toledo,et al.  Can providing feedback on driving behavior and training on parental vigilant care affect male teen drivers and their parents? , 2014, Accident; analysis and prevention.

[20]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[21]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[22]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[23]  Oded Cats,et al.  Towards an AVL-based Demand Estimation Model , 2016 .

[24]  Peter R. Stopher,et al.  Search for a global positioning system device to measure person travel , 2008 .

[25]  B. S. Sawant,et al.  SECURITY IN E-BANKING VIA CARD LESS BIOMETRIC ATMS , 2012 .

[26]  Michelle M. Porter,et al.  Correspondence between self-reported and objective measures of driving exposure and patterns in older drivers. , 2010, Accident; analysis and prevention.

[27]  Xiaolei Ma,et al.  Mining smart card data for transit riders’ travel patterns , 2013 .

[28]  B. V. K. Vijaya Kumar,et al.  A Bayesian Approach to Deformed Pattern Matching of Iris Images , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[30]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[31]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[32]  Carlo Giacomo Prato,et al.  Modeling the behavior of novice young drivers during the first year after licensure. , 2010, Accident; analysis and prevention.

[33]  Tomer Toledo,et al.  The First Year of Driving - Can IVDR and Parental Involvement make it Safer? , 2013 .

[34]  Christina Gloeckner,et al.  Modern Applied Statistics With S , 2003 .

[35]  Chunyan Miao,et al.  Learning to name faces: a multimodal learning scheme for search-based face annotation , 2013, SIGIR.

[36]  M. Friedman A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings , 1940 .

[37]  Theresa C McLoud,et al.  Voice Recognition , 2009, Encyclopedia of Biometrics.

[38]  T. Dingus,et al.  Crash and risky driving involvement among novice adolescent drivers and their parents. , 2011, American journal of public health.

[39]  Sergio A. Velastin,et al.  A Review of Computer Vision Techniques for the Analysis of Urban Traffic , 2011, IEEE Transactions on Intelligent Transportation Systems.

[40]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[41]  Sotiris B. Kotsiantis,et al.  Supervised Machine Learning: A Review of Classification Techniques , 2007, Informatica.

[42]  João Gama,et al.  Predicting Taxi–Passenger Demand Using Streaming Data , 2013, IEEE Transactions on Intelligent Transportation Systems.

[43]  Aixia Guo,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2014 .

[44]  M. Sujatha,et al.  Recognition of Human Iris Patterns for Biometric Identification , 2015 .

[45]  Kurt Hornik,et al.  Misc Functions of the Department of Statistics, ProbabilityTheory Group (Formerly: E1071), TU Wien , 2015 .

[46]  Oded Cats,et al.  Feature Selection Issues in Long-Term Travel Time Prediction , 2016, IDA.

[47]  Masayoshi Tomizuka,et al.  An Overview on Study of Identification of Driver Behavior Characteristics for Automotive Control , 2014 .

[48]  Alípio Mário Jorge,et al.  Ensemble approaches for regression: A survey , 2012, CSUR.

[49]  Thomas A. Dingus,et al.  THE 100 CAR NATURALISTIC DRIVING STUDY, PHASE I - EXPERIMENTAL DESIGN , 2002 .

[50]  Eibe Frank,et al.  Logistic Model Trees , 2003, Machine Learning.

[51]  Saso Dzeroski,et al.  Combining Classifiers with Meta Decision Trees , 2003, Machine Learning.

[52]  Oded Cats,et al.  Automated Setting of Bus Schedule Coverage Using Unsupervised Machine Learning , 2016, PAKDD.

[53]  Alexandre M. Bayen,et al.  Evaluation of traffic data obtained via GPS-enabled mobile phones: The Mobile Century field experiment , 2009 .

[54]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[55]  Michel Ferreira,et al.  Using exit time predictions to optimize self automated parking lots , 2014, 17th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[56]  Leo Breiman,et al.  Stacked regressions , 2004, Machine Learning.

[57]  Anil K. Jain,et al.  Handbook of Fingerprint Recognition , 2005, Springer Professional Computing.

[58]  Gregory D. Abowd,et al.  Driver Classification Based on Driving Behaviors , 2016, IUI.

[59]  Johannes Stallkamp,et al.  Face Recognition for Smart Interactions , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[60]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[61]  Thomas Reinartz,et al.  A Unifying View on Instance Selection , 2002, Data Mining and Knowledge Discovery.

[62]  Lisa N. Wundersitz,et al.  The benefits of measuring driving exposure using objective GPS-based methods and subjective self-report methods concurrently , 2013 .

[63]  Simon J. Godsill,et al.  An Overview of Existing Methods and Recent Advances in Sequential Monte Carlo , 2007, Proceedings of the IEEE.

[64]  Rafik A. Goubran,et al.  Measuring variation in driving habits between drivers , 2014, 2014 IEEE International Symposium on Medical Measurements and Applications (MeMeA).

[65]  Jennifer Ogle,et al.  Quantitative assessment of driver speeding behavior using instrumented vehicles , 2005 .

[66]  João Gama,et al.  An online learning framework for predicting the taxi stand's profitability , 2014, 17th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[67]  Michel Ferreira,et al.  On Predicting the Taxi-Passenger Demand: A Real-Time Approach , 2013, EPIA.

[68]  Michelle M Porter,et al.  Measurement of Driving Patterns of Older Adults Using Data Logging Devices with and without Global Positioning System Capability , 2007, Traffic injury prevention.

[69]  Gavin Brown,et al.  Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection , 2012, J. Mach. Learn. Res..

[70]  Joseph Sill,et al.  Feature-Weighted Linear Stacking , 2009, ArXiv.

[71]  Ashok A. Ghatol,et al.  Iris recognition: an emerging biometric technology , 2007 .

[72]  Michael Goshey,et al.  Radio Frequency Identification (RFID) , 2008, ACM SIGSPATIAL International Workshop on Advances in Geographic Information Systems.

[73]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[74]  K. Ming Leung,et al.  Learning Vector Quantization , 2017, Encyclopedia of Machine Learning and Data Mining.

[75]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[76]  Pavel Brazdil,et al.  Text Categorization Using an Ensemble Classifier Based on a Mean Co-association Matrix , 2012, MLDM.

[77]  Thomas A. Dingus,et al.  The 100-Car Naturalistic Driving Study Phase II – Results of the 100-Car Field Experiment , 2006 .

[78]  Michel Ferreira,et al.  Time-evolving O-D matrix estimation using high-speed GPS data streams , 2016, Expert Syst. Appl..

[79]  Bertrand Michel,et al.  Correlation and variable importance in random forests , 2013, Statistics and Computing.

[80]  Oren Musicant,et al.  Driving Patterns of Novice Drivers – a Temporal Spatial Perspective , 2012 .

[81]  Li-Jia Li,et al.  Multi-view Face Detection Using Deep Convolutional Neural Networks , 2015, ICMR.

[82]  Anne T McCartt,et al.  Effects of in-vehicle monitoring on the driving behavior of teenagers. , 2010, Journal of safety research.