Agriculture Commodity Arrival Prediction using Remote Sensing Data: Insights and Beyond

In developing countries like India agriculture plays an extremely important role in the lives of the population. In India, around 80\% of the population depend on agriculture or its by-products as the primary means for employment. Given large population dependency on agriculture, it becomes extremely important for the government to estimate market factors in advance and prepare for any deviation from those estimates. Commodity arrivals to market is an extremely important factor which is captured at district level throughout the country. Historical data and short-term prediction of important variables such as arrivals, prices, crop quality etc. for commodities are used by the government to take proactive steps and decide various policy measures. In this paper, we present a framework to work with short timeseries in conjunction with remote sensing data to predict future commodity arrivals. We deal with extremely high dimensional data which exceed the observation sizes by multiple orders of magnitude. We use cascaded layers of dimensionality reduction techniques combined with regularized regression models for prediction. We present results to predict arrivals to major markets and state wide prices for `Tur' (red gram) crop in Karnataka, India. Our model consistently beats popular ML techniques on many instances. Our model is scalable, time efficient and can be generalized to many other crops and regions. We draw multiple insights from the regression parameters, some of which are important aspects to consider when predicting more complex quantities such as prices in the future. We also combine the insights to generate important recommendations for different government organizations.

[1]  P. C. Doraiswamya,et al.  Crop condition and yield simulations using Landsat and MODIS , 2004 .

[2]  Stefano Ermon,et al.  Deep Gaussian Process for Crop Yield Prediction Based on Remote Sensing Data , 2017, AAAI.

[3]  Rob J Hyndman,et al.  Automatic Time Series Forecasting: The forecast Package for R , 2008 .

[4]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[5]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[6]  Greg Ridgeway,et al.  Generalized Boosted Models: A guide to the gbm package , 2006 .

[7]  J. Friedman Stochastic gradient boosting , 2002 .

[8]  Nicholas S. Novella,et al.  African Rainfall Climatology Version 2 for Famine Early Warning Systems , 2013 .

[9]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[10]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[11]  Tzvi Aviv,et al.  Ensemble of Cubist models for soy yield prediction using soil features and remote sensing variables , 2017 .

[12]  A. Storeygard,et al.  The View from Above: Applications of Satellite Data in Economics , 2016 .

[13]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[14]  J. Schmee An Introduction to Multivariate Statistical Analysis , 1986 .

[15]  Clement Atzberger,et al.  Using Low Resolution Satellite Imagery for Yield Prediction and Yield Anomaly Detection , 2013, Remote. Sens..

[16]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[17]  Richard A. Davis,et al.  Introduction to time series and forecasting , 1998 .

[18]  Ayse Kilic,et al.  Estimating Crop Coefficients Using Remote Sensing-Based Vegetation Index , 2013, Remote. Sens..

[19]  J. Cihlar,et al.  Relationship Between AVHRR NDVI And Environmental Parameters , 1989, 12th Canadian Symposium on Remote Sensing Geoscience and Remote Sensing Symposium,.