In this thesis driver rated data is studied using data mining techniques. The rated data
consists of roughly 72 hours of data from seven drivers. The ultimate goal is to be
able to identify patterns of high rating and match them towards a reference database
consisting naturalistic driving data. Two segmentations of the drives are used, equilength
subsegments and steering operations. An alternative morphed standardised rating scaled
is proposed.
Two data mining approaches are applied. The first method is based on using an ensemble
classifier on features derived from the CAN-data to predict the rating of each segment
of the data. The second method uses an outlier detection algorithm and a hierarchical
clustering approach on a distance metric based on the angles between the principal
variance components of the observations.
Using the ensemble classifier and general variables a large proportion of rating variance
can be explained when including driver and route factors. Large rating values can be
identified well. For the standardised rating the prediction of high values is worse with
many false positives.
The matching of signals using the covariance structure works well. Using hierarchical
clustering clusters with standardised rating high above average can be obtained. Outliers
with high standardised rating are extracted and matched towards a larger database. The
matches are few but similar to the original situations owing to the fact the matching is
strict.
In conclusion the ensemble classifier works well for predicting rating when driver and
route factors are included. The covariance-based method performs well for situation
matching and clusters with high rating can be identified. It also has potential to be be
used for extracting and matching more sofisticated patterns.
[1]
Margaret M. Peden,et al.
World Report on Road Traffic Injury Prevention
,
2004
.
[2]
Gregory Piatetsky-Shapiro,et al.
Knowledge Discovery in Databases: An Overview
,
1992,
AI Mag..
[3]
J. Dargay,et al.
Vehicle Ownership and Income Growth, Worldwide: 1960-2030
,
2007
.
[4]
Tarek Sayed,et al.
Clustering Vehicle Trajectories with Hidden Markov Models Application to Automated Traffic Safety Analysis
,
2006,
The 2006 IEEE International Joint Conference on Neural Network Proceedings.
[5]
C. Sutton.
Classification and Regression Trees, Bagging, and Boosting
,
2005
.
[6]
Mohammed Nasser,et al.
A New Singular Value Decomposition Based Robust Graphical Clustering Technique and Its Application in Climatic Data
,
2011
.
[7]
Parry , Walls , and Harrington : Automobile Externalities and Policies
,
2006
.
[8]
Zoubin Ghahramani,et al.
An Introduction to Hidden Markov Models and Bayesian Networks
,
2001,
Int. J. Pattern Recognit. Artif. Intell..
[9]
Gregory Piatetsky-Shapiro,et al.
Knowledge discovery in databases: 10 years after
,
2000,
SKDD.
[10]
Audra E. Kosh,et al.
Linear Algebra and its Applications
,
1992
.