Improving prediction models applied in systems monitoring natural hazards and machinery

Improving prediction models applied in systems monitoring natural hazards and machinery A method of combining three analytic techniques including regression rule induction, the k-nearest neighbors method and time series forecasting by means of the ARIMA methodology is presented. A decrease in the forecasting error while solving problems that concern natural hazards and machinery monitoring in coal mines was the main objective of the combined application of these techniques. The M5 algorithm was applied as a basic method of developing prediction models. In spite of an intensive development of regression rule induction algorithms and fuzzy-neural systems, the M5 algorithm is still characterized by the generalization ability and unbeatable time of data model creation competitive with other systems. In the paper, two solutions designed to decrease the mean square error of the obtained rules are presented. One consists in introducing into a set of conditional variables the so-called meta-variable (an analogy to constructive induction) whose values are determined by an autoregressive or the ARIMA model. The other shows that limitation of a data set on which the M5 algorithm operates by the k-nearest neighbor method can also lead to error decreasing. Moreover, three application examples of the presented solutions for data collected by systems of natural hazards and machinery monitoring in coal mines are described. In Appendix, results of several benchmark data sets analyses are given as a supplement of the presented results.

[1]  Tomasz Grychowski,et al.  Hazard assessment based on fuzzy logic , 2008 .

[2]  Chunshien Li,et al.  Recurrent neuro-fuzzy hybrid-learning approach to accurate system modeling , 2007, Fuzzy Sets Syst..

[3]  Wlodzislaw Duch,et al.  A new methodology of extraction, optimization and application of crisp and fuzzy logical rules , 2001, IEEE Trans. Neural Networks.

[4]  Francis Eng Hock Tay,et al.  Modified support vector machines in financial time series forecasting , 2002, Neurocomputing.

[5]  Shin'ichi Satoh,et al.  The SR-tree: an index structure for high-dimensional nearest neighbor queries , 1997, SIGMOD '97.

[6]  Marek Sikora,et al.  Application of machine learning for prediction a methane concentration in a coal-mine , 2006 .

[7]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[8]  Luís Torgo,et al.  Kernel Regression Trees , 2007 .

[9]  Ryszard S. Michalski,et al.  Data-Driven Constructive Induction , 1998, IEEE Intell. Syst..

[10]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[11]  Andrew Luk,et al.  A Re-Examination of the Distance-Weighted k-Nearest Neighbor Classification Rule , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[12]  Johannes Fürnkranz,et al.  Separate-and-conquer Regression , 2010, LWA.

[13]  Francis Eng Hock Tay,et al.  Support vector machine with adaptive parameters in financial time series forecasting , 2003, IEEE Trans. Neural Networks.

[14]  P. Young,et al.  Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.

[15]  Sung-Kwun Oh,et al.  Identification of fuzzy systems by means of an auto-tuning algorithm and its application to nonlinear systems , 2000, Fuzzy Sets Syst..

[16]  Ryszard S. Michalski,et al.  Hypothesis-Driven Constructive Induction in AQ17-HCI: A Method and Experiments , 1994, Machine Learning.

[17]  Ron Kohavi,et al.  Lazy Decision Trees , 1996, AAAI/IAAI, Vol. 1.

[18]  Arkadiusz Wojna,et al.  RIONA: A New Classification System Combining Rule Induction and Instance-Based Learning , 2002, Fundam. Informaticae.

[19]  David W. Aha,et al.  A Review and Empirical Evaluation of Feature Weighting Methods for a Class of Lazy Learning Algorithms , 1997, Artificial Intelligence Review.

[20]  Wojciech Kotlowski,et al.  ENDER: a statistical framework for boosting decision rules , 2010, Data Mining and Knowledge Discovery.

[21]  Stefan Wess,et al.  Using k-d Trees to Improve the Retrieval Step in Case-Based Reasoning , 1993, EWCBR.

[22]  Bernhard Schölkopf,et al.  New Support Vector Algorithms , 2000, Neural Computation.

[23]  J. Jonak Use of Artificial Intelligence in Automation of Rock Cutting by Mining Machines , 2002 .

[24]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[25]  Hao Ying,et al.  Essentials of fuzzy modeling and control , 1995 .

[26]  Krzysztof Dembczy ENDER: a statistical framework for boosting decision rules , 2010 .

[27]  Jian Yang,et al.  A flexible support vector machine for regression , 2011, Neural Computing and Applications.

[28]  Łukasz Wróbel,et al.  Application of Rule Induction Algorithms for Analysis of Data Collected by Seismic Hazard Monitoring Systems in Coal Mines , 2010 .

[29]  J. Ross Quinlan,et al.  Combining Instance-Based and Model-Based Learning , 1993, ICML.

[30]  Z. Krzystanek,et al.  Application of a hybrid method of machine learning for description and on-line estimation of methane hazard in mine workings , 2011 .

[31]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[32]  Józef Kabiesz,et al.  Effect of the form of data on the quality of mine tremors hazard forecasting using neural networks , 2006 .

[33]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[34]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[35]  Leszek Rutkowski,et al.  Generalized regression neural networks in time-varying environment , 2004, IEEE Transactions on Neural Networks.

[36]  Darron William Dixon A statistical analysis of monitored data for methane prediction , 1992 .

[37]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[38]  M. Setnes,et al.  Comparision of Two Construction Algorithms for Takagi-Sugeno Fuzzy Models , 1999 .

[39]  Michelangelo Ceci,et al.  Top-down induction of model trees with regression and splitting nodes , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  David E. Rumelhart,et al.  Predicting the Future: a Connectionist Approach , 1990, Int. J. Neural Syst..

[41]  Ian Witten,et al.  Data Mining , 2000 .

[42]  Tony R. Martinez,et al.  An Integrated Instance‐Based Learning Algorithm , 2000, Comput. Intell..

[43]  Jacek M. Leski,et al.  Fuzzy and Neuro-Fuzzy Intelligent Systems , 2000, Studies in Fuzziness and Soft Computing.

[44]  Pei-Yi Hao,et al.  New support vector algorithms with parametric insensitive/margin model , 2010, Neural Networks.

[45]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[46]  J.-S.R. Jang,et al.  Structure determination in fuzzy modeling: a fuzzy CART approach , 1994, Proceedings of 1994 IEEE 3rd International Fuzzy Systems Conference.

[47]  J. R. Quinlan Learning With Continuous Classes , 1992 .

[48]  Johannes Fürnkranz,et al.  On the quest for optimal rule learning heuristics , 2010, Machine Learning.

[49]  Christopher M. Bishop,et al.  Classification and regression , 1997 .

[50]  Krzysztof Siwek,et al.  Ensemble Neural Network Approach for Accurate Load Forecasting in a Power System , 2009, Int. J. Appl. Math. Comput. Sci..

[51]  Marcin Michalak,et al.  Adaptive kernel approach to the time series prediction , 2011, Pattern Analysis and Applications.