Decision Tree Based Station-Level Rail Transit Ridership Forecasting

AbstractThis paper presents a decision-tree based model to forecast rail transit ridership at the station level according to the surrounding land-use patterns. The canonical correlation analysis (CCA) method is used to identify key land use variables by evaluating their degrees of contribution to the rail transit station demand, which can effectively reduce dimensionality and complexity of the decision tree. A full month of Smart Card data and detailed regulatory land use plan from Chongqing, China are collected for model development and validation. The proposed model offers the capability of targeting key lane use patterns and associating them with rail transit station boarding and alighting demand at a high level of accuracy. The proposed model can reveal underlying rules between rail transit station demand and land use variables, and can be used to assist in developing the Transit Oriented Development (TOD) plans to improve land use and transit operational efficiency.

[1]  Alissa R Sherry,et al.  Conducting and Interpreting Canonical Correlation Analysis in Personality Research: A User-Friendly Primer , 2005, Journal of personality assessment.

[2]  Olutayo V.A,et al.  Traffic Accident Analysis Using Decision Trees and Neural Networks , 2014 .

[3]  Christina Mastrangelo,et al.  Addressing multicollinearity in semiconductor manufacturing , 2011, Qual. Reliab. Eng. Int..

[4]  A. D. Owen,et al.  The characteristics of railway passenger demand. An econometric investigation. , 1987 .

[5]  Simon Blainey,et al.  Modelling local rail demand in South Wales , 2010 .

[6]  M. Kuby,et al.  Factors influencing light-rail station boardings in the United States , 2004 .

[7]  K. Mardia Assessment of multinormality and the robustness of Hotelling's T^2 test , 1975 .

[8]  Keemin Sohn,et al.  Factors generating boardings at Metro stations in the Seoul metropolitan area , 2010 .

[9]  Mark Wardman,et al.  Rail network accessibility and the demand for inter-urban rail travel , 2000 .

[10]  A. D. Owen,et al.  An econometric investigation into the characteristics of railway passenger demand , 1987 .

[11]  J. Hair Multivariate data analysis , 1972 .

[12]  Sylvain Arlot,et al.  A survey of cross-validation procedures for model selection , 2009, 0907.4728.

[13]  Lisa Fan,et al.  A decision tree approach for traffic accident analysis of saskatchewan highways , 2013, 2013 26th IEEE Canadian Conference on Electrical and Computer Engineering (CCECE).

[14]  Michael G. McNally,et al.  The Four Step Model , 2007 .

[15]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[16]  Chenfeng Xiong,et al.  Artificial Intelligence Approach to Modeling Travel Mode Switching in a Dynamic Behavioral Process , 2014 .

[17]  Shaul Markovitch,et al.  Anytime Learning of Decision Trees , 2007, J. Mach. Learn. Res..

[18]  Jonathan Preston,et al.  Demand forecasting for new local rail stations and services , 1991 .

[19]  Chi Xie,et al.  WORK TRAVEL MODE CHOICE MODELING USING DATA MINING: DECISION TREES AND NEURAL NETWORKS , 2002 .

[20]  Momiao Xiong,et al.  Canonical correlation analysis for RNA-seq co-expression networks , 2013, Nucleic acids research.

[21]  Hamed Ahmadi,et al.  Applying Data Mining in Prediction and Classification of Urban Traffic , 2009, 2009 WRI World Congress on Computer Science and Information Engineering.

[22]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[23]  Corinne Mulley,et al.  Forecasting public transport demand for the Sydney Greater Metropolitan Area: a comparison of univariate and multivariate methods , 2014 .

[24]  Sevgi Erdogan,et al.  How to Increase Rail Ridership in Maryland: Direct Ridership Models for Policy Guidance , 2016 .

[25]  Norman Marshall,et al.  Sketch Transit Modeling Based on 2000 Census Data , 2006 .

[26]  R. Cervero Alternative Approaches to Modeling the Travel-Demand Impacts of Smart Growth , 2006 .

[27]  R. Cervero,et al.  Influences of Built Environments on Walking and Cycling: Lessons from Bogotá , 2009 .

[28]  J. Zurada,et al.  A Comparison of Regression and Artificial Intelligence Methods in a Mass Appraisal Context , 2011 .

[29]  Peter A. Bandettini,et al.  Characteristic cortical thickness patterns in adolescents with autism spectrum disorders: Interactions with age and intellectual ability revealed by canonical correlation analysis , 2012, NeuroImage.

[30]  Feng Lu,et al.  Comparison Study on Classification Performance for Short-Term Urban Traffic Flow Condition Using Decision Tree Algorithms , 2009, 2009 WRI World Congress on Software Engineering.

[31]  T. Ouarda,et al.  Estimation of water quality characteristics at ungauged sites using artificial neural networks and canonical correlation analysis , 2011 .

[32]  Ron Kohavi,et al.  Wrappers for performance enhancement and oblivious decision graphs , 1995 .

[33]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[34]  Kweku-Muata Osei-Bryson,et al.  Evaluation of decision trees: a multi-criteria approach , 2004, Comput. Oper. Res..

[35]  James E. Larsen,et al.  Correcting for Errors in Statistical Appraisal Equations , 1988 .

[36]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[37]  Scott T. Weiss,et al.  Using Canonical Correlation Analysis to Discover Genetic Regulatory Variants , 2010, PloS one.

[38]  Xuehao Chu,et al.  RIDERSHIP MODELS AT THE STOP LEVEL , 2004 .

[39]  J. Dargay,et al.  A Forecasting Model for Long Distance Travel in Great Britain , 2010 .

[40]  Nicolae Duduta Direct Ridership Models of Bus Rapid Transit and Metro Systems in Mexico City, Mexico , 2013 .

[41]  Juhwan Oh,et al.  Transit-oriented development in a high-density city: Identifying its association with transit ridership in Seoul, Korea , 2011 .

[42]  Javier Gutiérrez,et al.  Transit ridership forecasting at station level: an approach based on distance-decay weighted regression , 2011 .

[43]  B. Tabachnick,et al.  Using Multivariate Statistics , 1983 .

[44]  Robert Cervero,et al.  The Half-Mile Circle: Does It Best Represent Transit Station Catchments? , 2012 .

[45]  Daoqin Tong,et al.  Development of a temporal and spatial linkage between transit demand and land-use patterns , 2013 .