A Hybrid Machine Learning Method and Its Application in Municipal Waste Prediction

Prediction methods combining clustering and classification techniques have the potential of creating more accurate results than the individual techniques, particularly for large datasets. In this paper, a hybrid prediction method is proposed from combining weighted k-means clustering and linear regression. Weighted k-means is used to cluster the dataset. Then, linear regression is performed on each cluster to build the final predictors. The proposed method has been applied to the problem of municipal waste prediction and evaluated with a dataset including 63,000 records. The results showed that it outperforms the single application of linear regression and k-means clustering in terms of prediction accuracy and robustness. The prediction model is integrated into a decision support system for strategic and operational planning of waste and recycling services at the City of Calgary in Canada. The potential usage of the prediction model is to improve the resource utilization, like personnel and vehicles.

[1]  Jiawei Han,et al.  Generalization-Based Data Mining in Object-Oriented Databases Using an Object Cube Model , 1998, Data Knowl. Eng..

[2]  Isabella Wieczorek,et al.  Resource Estimation in Software Engineering , 2002 .

[3]  Ni-Bin Chang,et al.  Forecasting municipal solid waste generation in a fast-growing urban region with system dynamics modeling. , 2005, Waste management.

[4]  Chih-Fong Tsai,et al.  Customer churn prediction by hybrid neural networks , 2009, Expert Syst. Appl..

[5]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[6]  Monique Polit,et al.  Prediction of parameters characterizing the state of a pollution removal biologic process , 2005, Eng. Appl. Artif. Intell..

[7]  Roohollah Noori,et al.  Evaluation of PCA and Gamma test techniques on ANN operation for weekly solid waste prediction. , 2010, Journal of environmental management.

[8]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[9]  A. Goia,et al.  Functional clustering and linear regression for peak load forecasting , 2010 .

[10]  M Purcell,et al.  Prediction of household and commercial BMW generation according to socio-economic and other factors for the Dublin region. , 2009, Waste management.

[11]  Günther Ruhe,et al.  Decision Support System for Cost-benefit Analysis in Service Provision , 2011, ICEIS.

[12]  Ian Witten,et al.  Data Mining , 2000 .

[13]  Andrew Kusiak,et al.  Short-term prediction of wind power with a clustering approach , 2010 .

[14]  Hugo Hidalgo,et al.  Identification of behavior patterns in household solid waste generation in Mexicali's city: Study case , 2008 .

[15]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[16]  Gregory R. Madey,et al.  The Design and Validation of a Hybrid Information System for the Auditor's Going Concern Decision , 1998, J. Manag. Inf. Syst..

[17]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[18]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[19]  Durga Toshniwal,et al.  Hybrid prediction model for Type-2 diabetic patients , 2010, Expert Syst. Appl..

[20]  Nouri Rouh Elah,et al.  PREDICTION OF MUNICIPAL SOLID WASTE GENERATION BY USE OF ARTIFICIAL NEURAL NETWORK: A CASE STUDY OF MASHHAD , 2008 .

[21]  R. Dennis Cook,et al.  Cross-Validation of Regression Models , 1984 .

[22]  S. Probert,et al.  Municipal solid waste: a prediction methodology for the generation rate and composition in the European Union countries and the United States of America , 1998 .