Large-scale transit market segmentation with spatial-behavioural features

Abstract Transit market segmentation enables transit providers to comprehend the commonalities and heterogeneities among different groups of passengers, so that they can cater for individual transit riders’ mobility needs. The problem has recently been attracting a great interest with the proliferation of automated data collection systems such as Smart Card Automated Fare Collection (AFC), which allow researchers to observe individual travel behaviours over a long time period. However, there is a need for an integrated market segmentation method that incorporating both spatial and behavioural features of individual transit passengers. This algorithm also needs to be efficient for large-scale implementation. This paper proposes a new algorithm named Spatial Affinity Propagation (SAP) based on the classical Affinity Propagation algorithm (AP) to enable large-scale spatial transit market segmentation with spatial-behavioural features. SAP segments transit passengers using spatial geodetic coordinates, where passengers from the same segment are located within immediate walking distance; and using behavioural features mined from AFC data. The comparison with AP and popular algorithms in literature shows that SAP provides nearly as good clustering performance as AP while being 52% more efficient in computation time. This efficient framework would enable transit operators to leverage the availability of AFC data to understand the commonalities and heterogeneities among different groups of passengers.

[1]  Christopher P. Monterola,et al.  Inferring Passenger Type from Commuter Eigentravel Matrices , 2015, ArXiv.

[2]  Ka Kee Alfred Chu,et al.  Enriching Archived Smart Card Transaction Data for Transit Demand Modeling , 2008 .

[3]  Ricardo J. G. B. Campello,et al.  Density-Based Clustering Based on Hierarchical Density Estimates , 2013, PAKDD.

[4]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[5]  Etienne Côme,et al.  Analyzing year-to-year changes in public transport passenger behaviour using smart card data , 2017 .

[6]  Le Minh Kieu,et al.  A modified Density-Based Scanning Algorithm with Noise for spatial travel pattern analysis from Smart Card AFC data , 2015 .

[7]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[8]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[9]  Jiawei Han,et al.  Efficient and Effective Clustering Methods for Spatial Data Mining , 1994, VLDB.

[10]  Derya Birant,et al.  ST-DBSCAN: An algorithm for clustering spatial-temporal data , 2007, Data Knowl. Eng..

[11]  D. Defays,et al.  An Efficient Algorithm for a Complete Link Method , 1977, Comput. J..

[12]  Le Minh Kieu,et al.  Passenger Segmentation Using Smart Card Data , 2015, IEEE Transactions on Intelligent Transportation Systems.

[13]  Lin Yao,et al.  Bus Arrival Time Calculation Model Based on Smart Card Data , 2016 .

[14]  Sung-Pil Hong,et al.  Mining missing train logs from Smart Card data , 2016 .

[15]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[16]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.

[17]  John R. Hauser,et al.  Consumer Oriented Transportation Planning: an Integrated Methodology For Modeling Consumer Perceptions, Preference and Behavior , 1978 .

[18]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[19]  Bruno Agard,et al.  MINING PUBLIC TRANSPORT USER BEHAVIOUR FROM SMART CARD DATA , 2006 .

[20]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[21]  Yasuo Asakura,et al.  Behavioural data mining of transit smart card data: A data fusion approach , 2014 .

[22]  Le Minh Kieu,et al.  Transfer demand prediction for timed transfer coordination in public transport operational control , 2016 .

[23]  Licia Capra,et al.  Individuals among commuters: Building personalised transport information services from fare collection systems , 2013, Pervasive Mob. Comput..

[24]  Xiaolei Ma,et al.  Mining smart card data for transit riders’ travel patterns , 2013 .

[25]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[26]  Howard J. Hamilton,et al.  Density-Based Spatial Clustering in the Presence of Obstacles and Facilitators , 2004, PKDD.

[27]  Hanan Samet,et al.  The Quadtree and Related Hierarchical Data Structures , 1984, CSUR.

[28]  Michel Verleysen,et al.  Clustering Smart Card Data for Urban Mobility Analysis , 2017, IEEE Transactions on Intelligent Transportation Systems.

[29]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[30]  Peter White,et al.  The Potential of Public Transport Smart Card Data , 2005 .

[31]  Haris N. Koutsopoulos,et al.  Inferring patterns in the multi-week activity sequences of public transport users , 2016 .

[32]  Yang Li,et al.  Forecasting short-term subway passenger flow under special events scenarios using multiscale radial basis function networks ☆ , 2017 .

[33]  G. N. Lance,et al.  Mixed-Data Classificatory Programs I - Agglomerative Systems , 1967, Aust. Comput. J..

[34]  Zhenliang Ma,et al.  Predicting short-term bus passenger demand using a pattern hybrid approach , 2014 .