Automatic Detection and Correction of Multi-class Classification Errors Using System Whole-part Relationships

Real-world dynamic systems such as physical and atmosphereocean systems often exhibit a hierarchical system-subsystem structure. However, the paradigm of making this hierarchical/modular structure and the rich properties they encode a “first-class citizen” of machine learning algorithms is largely absent from the literature. Furthermore, traditional data mining approaches focus on designing new classifiers or ensembles of classifiers, while there is a lack of study on detecting and correcting prediction errors of existing forecasting (or classification) algorithms. In this paper, we propose DETECTOR, a hierarchical method for detecting and correcting forecast errors by employing the whole-part relationships between the target system and non-target systems. Experimental results show that DETECTOR can successfully detect and correct forecasting errors made by state-of-art classifier ensemble techniques and traditional single classifier methods at an average rate of 22%, corresponding to a 11% average forecasting accuracy increase, in seasonal forecasting of hurricanes and landfalling hurricanes in North Atlantic and North African rainfall.

[1]  Geoffrey I. Webb,et al.  MultiBoosting: A Technique for Combining Boosting and Wagging , 2000, Machine Learning.

[2]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Huan Liu,et al.  Exploiting social relations for sentiment analysis in microblogging , 2013, WSDM.

[4]  James D. Clark,et al.  Interannual Variation of Tropical Cyclone Activity over the Central North Pacific. , 2002 .

[5]  Mong-Ming Lu,et al.  Climate Prediction of Tropical Cyclone Activity in the Vicinity of Taiwan Using the Multivariate Least Absolute Deviation Regression Method , 2007 .

[6]  J. Ross Quinlan,et al.  Improved Use of Continuous Attributes in C4.5 , 1996, J. Artif. Intell. Res..

[7]  Stefan Kramer,et al.  Ensembles of nested dichotomies for multi-class problems , 2004, ICML.

[8]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[9]  Huan Liu,et al.  Text Analytics in Social Media , 2012, Mining Text Data.

[10]  Brent E. Harrison,et al.  Biclustering-Driven Ensemble of Bayesian Belief Network Classifiers for Underdetermined Problems , 2011, IJCAI.

[11]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[12]  Nagiza F. Samatova,et al.  Classification of Emerging Extreme Event Tracks in Multivariate Spatio-Temporal Physical Systems Using Dynamic Network Structures: Application to Hurricane Track Prediction , 2011, IJCAI.

[13]  Stephen J. Connor,et al.  Environmental Risk and Meningitis Epidemics in Africa , 2003, Emerging infectious diseases.

[14]  Nagiza F. Samatova,et al.  Discovery of extreme events-related communities in contrasting groups of physical system networks , 2012, Data Mining and Knowledge Discovery.

[15]  Potsdam,et al.  Complex networks in climate dynamics. Comparing linear and nonlinear network construction methods , 2009, 0907.4359.

[16]  J. Gibbs On the equilibrium of heterogeneous substances , 1878, American Journal of Science and Arts.

[17]  Nagiza F. Samatova,et al.  Spice: discovery of phenotype-determining component interplays , 2012, BMC Systems Biology.

[18]  Peter J. Webster,et al.  Extended-range seasonal hurricane forecasts for the North Atlantic with a hybrid dynamical-statistical model , 2010 .