Explanation-aware feature selection using symbolic time series abstraction: Approaches and experiences in a petro-chemical production context

For supporting interpretation, assessment and application of data mining models, explanation-aware methods are crucial. This paper presents an approach for explanation-aware feature selection and assessment using symbolic abstractions of time series. For that, we utilize the symbolic approximate aggregation (SAX) method for data abstraction to be implemented into data mining models. We investigate several approaches and discuss experiences in the context of petro-chemical production.

[1]  David Leake,et al.  Explanation-Aware Computing, Papers from the 2007 AAAI Workshop, Vancouver, British Columbia, Canada, July 22-23, 2007 , 2007, ExaCt.

[2]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[3]  Frank Puppe,et al.  Subgroup Mining for Interactive Knowledge Refinement , 2005, AIME.

[4]  Frank Puppe,et al.  Introspective Subgroup Analysis for Interactive Knowledge Refinement , 2006, FLAIRS Conference.

[5]  Michel Verleysen,et al.  Feature Selection for Interpatient Supervised Heart Beat Classification , 2011, BIOSIGNALS.

[6]  Sven Behnke,et al.  Humanoid Robots - From Fiction to Reality? , 2008, Künstliche Intell..

[7]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[8]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[9]  Ameet Talwalkar,et al.  MLlib: Machine Learning in Apache Spark , 2015, J. Mach. Learn. Res..

[10]  A. Karr Exploratory Data Mining and Data Cleaning , 2006 .

[11]  Michael M. Richter,et al.  On Explanation , 2008, Künstliche Intell..

[12]  Martin Atzmüller,et al.  Description-oriented community detection using exhaustive subgroup discovery , 2016, Inf. Sci..

[13]  Li Wei,et al.  Experiencing SAX: a novel symbolic representation of time series , 2007, Data Mining and Knowledge Discovery.

[14]  Damian Flynn,et al.  DATA MINING TECHNIQUES APPLIED TO POWER PLANT PERFORMANCE MONITORING , 2005 .

[15]  Bogdan Gabrys,et al.  Data-driven Soft Sensors in the process industry , 2009, Comput. Chem. Eng..

[16]  Raffaella Piccarreta,et al.  Classification trees for ordinal variables , 2008, Comput. Stat..

[17]  Gerd Stumme,et al.  A Personality Based Design Approach Using Subgroup Discovery , 2012, HCSE.

[18]  William B. Thompson,et al.  Reconstructive Expert System Explanation , 1992, Artif. Intell..

[19]  Thomas Reinartz,et al.  CRISP-DM 1.0: Step-by-step data mining guide , 2000 .

[20]  Huan Liu,et al.  Feature Selection for Clustering: A Review , 2018, Data Clustering: Algorithms and Applications.

[21]  Martin Atzmüller,et al.  The Mining and Analysis Continuum of Explaining Uncovered , 2010, SGAI Conf..

[22]  Giuliano Galimberti,et al.  Classification Trees for Ordinal Responses in R: The rpartScore Package , 2012 .

[23]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[24]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[25]  Heiko Paulheim,et al.  Semantic Web in data mining and knowledge discovery: A comprehensive survey , 2016, J. Web Semant..

[26]  Cheng Soon Ong,et al.  Multivariate spearman's ρ for aggregating ranks using copulas , 2016 .

[27]  Sten Bay Jørgensen,et al.  A systematic approach for soft sensor development , 2007, Comput. Chem. Eng..

[28]  Benjamin Klöpper,et al.  Defining software architectures for big data enabled operator support systems , 2016, 2016 IEEE 14th International Conference on Industrial Informatics (INDIN).

[29]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.