Lasso Regression for Daily Rainfall Modeling at Citeko Station, Bogor, Indonesia

Abstract Rainfall is one of the climate components which is very important for an agrarian country with a tropical climate like Indonesia. There are so many variables that can affect the intensity of rainfall, including humidity, air temperature, speed and wind direction. However, not all of these variables can have a significant effect, therefore, we need a modeling technique that can select and shrink the predictor variables so as to make the model simpler. One of the techniques used is the Lasso (least absolute shrinkage and selection operator). The purpose of this study is to model rainfall intensity using lasso regression and determine which variable have the greatest influence on rainfall intensity. This paper also designs an application system that can be used to implement Lasso regression. The response variables in the form of daily rainfall at the Citeko observation station, Bogor Indonesia is assumed to have normal distribution. Rainfall intensity modeling involves 16 predictor variables, namely the maximum temperature, minimum temperature, average temperature, average humidity, sun exposure, maximum wind speed, wind direction when maximum wind speed and 8 other categorical variables are the most frequent wind direction. The resulting Lasso regression model successfully select and shrink the variables used to 9 variables only. In addition, based on the AIC and coefficient of determination indices for the model evaluation, the Lasso regression shows satisfying results rather than classical multiple linear regression. In Lasso regression, AIC value is 5901.20 where in classical model it is 5911.43. Also, Lasso regression coefficient of determination value is greater that is 22.73% while in classical model is 21.41%. From lasso regression model that has been obtained, it is known that the most frequent wind direction to the north is the variable that has the greatest influence on rainfall intensity. The Lasso regression application has been successfully created so that it can facilitate the user in uploading data, exploration of data and variables, Lasso regression analysis and report result.