Massively parallel feature extraction framework application in predicting dangerous seismic events

In this paper we introduce an automated mechanism for knowledge discovery from data streams. As a part of this work, we also present a new approach to the creation of classifiers ensemble based on a wide variety of models. Furthermore, we describe an innovative, highly scalable feature extraction and selection framework designed to work with the MapReduce programming model and the application of designed framework to build an ensemble of classifiers which takes into account both the quality and the diversity of individual models. The effectiveness of the solution has been verified through a participation in an open data mining competition which concerned the problem of predicting periods of increased seismic activity causing life-threatening accidents in coal mines. The submitted solution obtained the highest AUC score of all the solutions uploaded by 106 participating research teams.

[1]  Marek Grzegorowski,et al.  Window-based feature extraction framework for multi-sensor data: A posture recognition case study , 2015, 2015 Federated Conference on Computer Science and Information Systems (FedCSIS).

[2]  Dominik Slezak,et al.  Processing and mining complex data streams , 2014, Inf. Sci..

[3]  Dominik Slezak,et al.  Computation of Approximate Reducts with Dynamically Adjusted Approximation Threshold , 2015, ISMIS.

[4]  Dominik Slezak,et al.  Predicting Dangerous Seismic Events: AAIA'16 Data Mining Challenge , 2016, 2016 Federated Conference on Computer Science and Information Systems (FedCSIS).

[5]  Marek Grzegorowski,et al.  Window-Based Feature Engineering for Prediction of Methane Threats in Coal Mines , 2015, RSFDGrC.

[6]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[7]  李继荣,et al.  基于Rough sets和Fuzzy sets理论的约简算法 , 2003 .

[8]  Marek Grzegorowski,et al.  Mining Data from Coal Mines: IJCRS'15 Data Challenge , 2015, RSFDGrC.

[9]  Wojciech Niemiro,et al.  Clustering approach to the problem of human activity recognition using motion data , 2015, 2015 Federated Conference on Computer Science and Information Systems (FedCSIS).

[10]  Piotr Synak,et al.  Application of Temporal Descriptors to Musical Instrument Sound Recognition , 2003, Journal of Intelligent Information Systems.

[11]  Marek Sikora,et al.  Improving prediction models applied in systems monitoring natural hazards and machinery , 2012, Int. J. Appl. Math. Comput. Sci..

[12]  Federated Conference on Computer Science and Information Systems - FedCSIS 2012, Wroclaw, Poland, 9-12 September 2012, Proceedings , 2012, FedCSIS.

[13]  Adam Zagorecki A versatile approach to classification of multivariate time series data , 2015, 2015 Federated Conference on Computer Science and Information Systems (FedCSIS).

[14]  Dominik Slezak,et al.  Rough Set Methods for Attribute Clustering and Selection , 2014, Appl. Artif. Intell..

[15]  Eftim Zdravevski,et al.  SVM Parameter Tuning with Grid Search and Its Impact on Reduction of Model Over-fitting , 2015, RSFDGrC.

[16]  Marek Gagolewski,et al.  The winning solution to the AAIA'15 data mining competition: Tagging Firefighter Activities at a Fire Scene , 2015, 2015 Federated Conference on Computer Science and Information Systems (FedCSIS).

[17]  Marc Boullé Tagging fireworkers activities from body sensors under distribution drift , 2015, 2015 Federated Conference on Computer Science and Information Systems (FedCSIS).

[18]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[19]  Francisco Herrera,et al.  Implementing algorithms of rough set theory and fuzzy rough set theory in the R package "RoughSets" , 2014, Inf. Sci..

[20]  Hung Son Nguyen,et al.  On Efficient Handling of Continuous Attributes in Large Data Bases , 2001, Fundam. Informaticae.

[21]  Marek Grzegorowski,et al.  Scaling of Complex Calculations over Big Data-Sets , 2014, AMT.

[22]  Andrzej Skowron,et al.  From Sensory Data to Decision Making: A Perspective on Supporting a Fire Commander , 2013, 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT).

[23]  Dominik Slezak,et al.  Tagging Firefighter Activities at the emergency scene: Summary of AAIA'15 data mining competition at knowledge pit , 2015, 2015 Federated Conference on Computer Science and Information Systems (FedCSIS).

[24]  Dominik Slezak,et al.  Random Probes in Computation and Assessment of Approximate Reducts , 2014, RSEISP.

[25]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..