Modeling bike availability in a bike-sharing system using machine learning

This paper models the availability of bikes at San Francisco Bay Area Bike Share stations using machine learning algorithms. Random Forest (RF) and Least-Squares Boosting (LSBoost) were used as univariate regression algorithms, and Partial Least-Squares Regression (PLSR) was applied as a multivariate regression algorithm. The univariate models were used to model the number of available bikes at each station. PLSR was applied to reduce the number of required prediction models and reflect the spatial correlation between stations in the network. Results clearly show that univariate models have lower error predictions than the multivariate model. However, the multivariate model results are reasonable for networks with a relatively large number of spatially correlated stations. Results also show that station neighbors and the prediction horizon time are significant predictors. The most effective prediction horizon time that produced the least prediction error was 15 minutes.

[1]  Cristina V. Lopes,et al.  Bagging gradient-boosted trees for high precision, low variance ranking models , 2011, SIGIR.

[2]  C. Morency,et al.  Balancing a Dynamic Public Bike-Sharing System , 2012 .

[3]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[4]  A. Höskuldsson PLS regression methods , 1988 .

[5]  P. DeMaio Bike-sharing: History, Impacts, Models of Provision, and Future , 2009 .

[6]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[7]  Tal Raviv,et al.  Static repositioning in a bike-sharing system: models and solution approaches , 2013, EURO J. Transp. Logist..

[8]  S. Wold,et al.  The Collinearity Problem in Linear Regression. The Partial Least Squares (PLS) Approach to Generalized Inverses , 1984 .

[9]  Rafael E. Banchs,et al.  Article in Press Pervasive and Mobile Computing ( ) – Pervasive and Mobile Computing Urban Cycles and Mobility Patterns: Exploring and Predicting Trends in a Bicycle-based Public Transport System , 2022 .

[10]  R. Alexander Rixey,et al.  Station-Level Forecasting of Bikesharing Ridership , 2013 .

[11]  Nuria Oliver,et al.  Sensing and predicting the pulse of the city through shared bicycling , 2009, IJCAI 2009.

[12]  Herman Wold,et al.  Soft modelling: The Basic Design and Some Extensions , 1982 .

[13]  Francesco Calabrese,et al.  Cityride: A Predictive Bike Sharing Journey Advisor , 2012, 2012 IEEE 13th International Conference on Mobile Data Management.

[14]  Robert C. Hampshire,et al.  Inventory rebalancing and vehicle routing in bike sharing systems , 2017, Eur. J. Oper. Res..

[15]  B. Kowalski,et al.  Partial least-squares regression: a tutorial , 1986 .

[16]  Dirk C. Mattfeld,et al.  Understanding Bike-Sharing Systems using Data Mining: Exploring Activity Patterns , 2011 .

[17]  Christian Rudloff,et al.  Modeling Demand for Bicycle Sharing Systems - neighboring stations as a source for demand and a reason for structural breaks , 2014 .

[18]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[19]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[20]  Jessica Schoner,et al.  Modeling Bike Share Station Activity: Effects of Nearby Businesses and Jobs on Trips to and from Stations , 2016, 2207.10577.

[21]  David Daddio Maximizing bicycle sharing: an empirical analysis of capital bikeshare usage , 2012 .

[22]  Jinhua Zhao,et al.  A Seasonal Autoregressive Model Of Vancouver Bicycle Traffic Using Weather Variables , 2011 .

[23]  Zafer Barutçuoglu A Comparison of Model Aggregation Methods for Regression , 2003, ICANN.