A classification of public transit users with smart card data based on time series distance metrics and a hierarchical clustering method

ABSTRACT A classification of the behavior of smart card users is important in the field of public transit demand analysis. It provides an understanding of people’s sequence of activities within a period of time. However, classical metrics such as Euclidean distance is not appropriate when dealing with time-series classification. To solve this problem, in this article a method for the classification of public transit smart card users’ daily transactions, which are represented in time series, is presented. The chosen approach uses cross-correlation distance (CCD), hierarchical clustering, and subgroups by metric parameter to understand the users’ temporal patterns. The clustering results are compared with dynamic time warping (DTW) distance (a common method to measure time-series distance). After a brief pedagogical example to explain the DTW and CCD concepts, a program is developed in R to validate the method on a real dataset of smart card data transactions. The dataset concerns the use of the public transit system in the city of Gatineau in September 2013. The results demonstrate that CCD performs better than DTW to classify the time series, and that the classification method identifies different public transit users’ daily behaviors. The results will help transit authorities to offer better services for smart card users from diverse groups.

[1]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[2]  Mark Hickman,et al.  Validating and calibrating a destination estimation algorithm for public transport smart card fare collection systems , 2015 .

[3]  Toni Giorgino,et al.  Computing and Visualizing Dynamic Time Warping Alignments in R: The dtw Package , 2009 .

[4]  S. Williams,et al.  Pearson's correlation coefficient. , 1996, The New Zealand medical journal.

[5]  M.M. Deris,et al.  A Comparative Study for Outlier Detection Techniques in Data Mining , 2006, 2006 IEEE Conference on Cybernetics and Intelligent Systems.

[6]  Martin Trépanier,et al.  Individual Trip Destination Estimation in a Transit Smart Card Automated Fare Collection System , 2007, J. Intell. Transp. Syst..

[7]  Lior Rokach,et al.  Clustering Methods , 2005, The Data Mining and Knowledge Discovery Handbook.

[8]  Mark Hickman,et al.  Transit origin-destination estimation , 2017 .

[9]  Harry Timmermans,et al.  MEASURING AND PREDICTING ADAPTATION BEHAVIOR IN MULTIDIMENSIONAL ACTIVITY-TRAVEL PATTERNS , 2006 .

[10]  Ka Kee Alfred Chu,et al.  Enriching Archived Smart Card Transaction Data for Transit Demand Modeling , 2008 .

[11]  Etienne Côme,et al.  Analyzing year-to-year changes in public transport passenger behaviour using smart card data , 2017 .

[12]  Qiang Jiang Modelling Challenges , 2011 .

[13]  Juan de Oña,et al.  Analysis of transit quality of service through segmentation and classification tree techniques , 2015 .

[14]  Le Minh Kieu,et al.  Transit passenger segmentation using travel regularity mined from Smart Card transactions data , 2014 .

[15]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[16]  Soi-Hoi Lam,et al.  ACCEPTANCE TENDENCIES AND COMMUTERS' BEHAVIOR UNDER DIFFERENT ROAD PRICING SCHEMES , 2007 .

[17]  Catherine Morency,et al.  Smart card data use in public transit: A literature review , 2011 .

[18]  Bruno Agard,et al.  Measuring transit use variability with smart-card data , 2007 .

[19]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[20]  Mark Hickman,et al.  Trip purpose inference using automated fare collection data , 2014, Public Transp..

[21]  Nuno Constantino Castro,et al.  Time Series Data Mining , 2009, Encyclopedia of Database Systems.

[22]  Bruno Agard,et al.  Challenges in Spatial-Temporal Data Analysis Targeting Public Transport , 2015 .

[23]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[24]  Xiaolei Ma,et al.  Mining smart card data for transit riders’ travel patterns , 2013 .

[25]  Debapratim Pandit,et al.  Determination of level-of-service scale values for quantitative bus transit service attributes based on user perception , 2015 .

[26]  Maria Bordagaray,et al.  Modelling user perception of bus transit quality considering user and service heterogeneity , 2014 .

[27]  F. G. Benitez,et al.  Determining a public transport satisfaction index from user surveys , 2013 .

[28]  Yu Wei Chang,et al.  Seasonal ARIMA forecasting of inbound air travel arrivals to Taiwan , 2009 .

[29]  Elena Deza,et al.  Encyclopedia of Distances , 2014 .

[30]  Bruno Agard,et al.  A visual segmentation method for temporal smart card data , 2017 .

[31]  Bruno Agard,et al.  Evaluating the Impacts of a Bus-Rapid Transit on Users' Temporal Patterns Using Cross Correlation Distance and Sampled Hierarchical Clustering Applied to Smart Card Data , 2017 .

[32]  Yasuo Asakura,et al.  Behavioural data mining of transit smart card data: A data fusion approach , 2014 .

[33]  Richard A. Davis,et al.  Introduction to time series and forecasting , 1998 .

[34]  Fumitaka Kurauchi,et al.  Public Transport Planning with Smart Card Data , 2016 .

[35]  S. Fujii,et al.  Demand adaptation towards new transport modes: the case of high-speed rail in Taiwan , 2015 .

[36]  Bin Zhang,et al.  Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R , 2008, Bioinform..

[37]  Haris N. Koutsopoulos,et al.  Inferring patterns in the multi-week activity sequences of public transport users , 2016 .

[38]  Philip Sedgwick,et al.  Pearson’s correlation coefficient , 2012, BMJ : British Medical Journal.

[39]  Stephen Graham Ritchie,et al.  TRANSPORTATION RESEARCH. PART C, EMERGING TECHNOLOGIES , 1993 .

[40]  W. Verstraeten,et al.  A comparison of time series similarity measures for classification and change detection of ecosystem dynamics , 2011 .

[41]  Alexander Mendiburu,et al.  Distance Measures for Time Series in R: The TSdist Package , 2016, R J..

[42]  Bruno Agard,et al.  MINING PUBLIC TRANSPORT USER BEHAVIOUR FROM SMART CARD DATA , 2006 .

[43]  Dongjoo Park,et al.  Dynamic multi-interval bus travel time prediction using bus transit data , 2010 .

[44]  Hiroaki Nishiuchi,et al.  Spatial-Temporal Daily Frequent Trip Pattern of Public Transport Passengers Using Smart Card Data , 2013, Int. J. Intell. Transp. Syst. Res..

[45]  Li He,et al.  Estimating the Destination of Unlinked Trips in Transit Smart Card Fare Data , 2015 .

[46]  ZhangBin,et al.  Defining clusters from a hierarchical cluster tree , 2008 .

[47]  Agostino Nuzzolo,et al.  Advanced public transport and intelligent transport systems: new modelling challenges , 2016 .

[48]  Keemin Sohn,et al.  Activity imputation for trip-chains elicited from smart-card data using a continuous hidden Markov model , 2016 .