A two-stage ensemble of diverse models for recognition of abnormal data in raw wind data

Wind energy integration research generally relies on complex sensors located at remote sites. The procedure for generating high-level synthetic information from databases containing large amounts of low-level data must therefore account for possible sensor failures and imperfect input data. Data-mining methods are widely used for recognizing the relationship between wind farm power output and wind speed, which is important for wind power prediction. Incorrect and unnatural data has great influence on the results. To address this problem, the paper presents an empirical methodology that can efficiently preprocess and filter the raw wind data using a two-stage ensemble of diverse models. First, abnormal features are extracted from raw wind data and the dataset is labeled according to the wind farm operation state records and the characters of typical abnormal data. Next, a two-stage classification model is built by Random Forest (RF) and Gradient Boosting Decision Tree (GBDT). In the first stage, a RF classifier is trained with the labeled dataset as input. In the second stage, a GBDT classifier is trained with the labeled dataset and the RF classification result as input. Finally, the testing set is predicted respectively by the two trained models and the average of forecast values of the RF model and the GBDT model are considered as the final result. The methodology was tested successfully on the data collected from a large wind farm in northeast China.

[1]  M. Schlechtingen,et al.  Using Data-Mining Approaches for Wind Turbine Power Curve Monitoring: A Comparative Study , 2013, IEEE Transactions on Sustainable Energy.

[2]  J. V. Milanovic,et al.  Wind Farm Model Aggregation Using Probabilistic Clustering , 2013, IEEE Transactions on Power Systems.

[3]  Wenzhong Gao,et al.  Wind power plant prediction by using neural networks , 2012, 2012 IEEE Energy Conversion Congress and Exposition (ECCE).

[4]  A. Kusiak,et al.  Short-Term Prediction of Wind Farm Power: A Data Mining Approach , 2009, IEEE Transactions on Energy Conversion.

[5]  Hao Chen,et al.  Collaborative change impact analysis for enterprise application evolution , 2014, Proceedings of 2014 IEEE International Conference on Service Operations and Logistics, and Informatics.

[6]  Zijun Zhang,et al.  Short-Horizon Prediction of Wind Power: A Data-Driven Approach , 2010, IEEE Transactions on Energy Conversion.

[7]  K. Agbossou,et al.  Nonlinear model identification of wind turbine with a neural network , 2004, IEEE Transactions on Energy Conversion.

[8]  Sanjay Chawla,et al.  SLOM: a new measure for local spatial outliers , 2006, Knowledge and Information Systems.

[9]  Li Li,et al.  A two-stage classification framework for imbalanced data with overlapping labels , 2014, Proceedings of 2014 IEEE International Conference on Service Operations and Logistics, and Informatics.