Variations on a theme

This paper introduces Probabilistic Topic Modeling (PTM) as a promising approach to naturalistic driving data analyses. Naturalistic driving data present an unprecedented opportunity to understand driver behavior. Novel strategies are needed to achieve a more complete picture of these datasets than is provided by the local event-based analytic strategy that currently dominates the field. PTM is a text analysis method for uncovering word-based themes across documents. In this application, documents were represented by drives and words were created from speed and acceleration data using Symbolic Aggregate approximation (SAX). A twenty-topic Latent Dirichlet Allocation (LDA) topic model was developed using words from 10,705 documents (real-world drives) by 26 drivers. The resulting LDA model clustered the drives into meaningful topics. Topic membership probabilities were successfully used as features in subsequent analyses to differentiate between healthy drivers and those suffering from Obstructive Sleep Apnea.