A Hybrid HSIC-ACO Algorithm for Variable Selection in Process Engineering

Recently, data mining and machine learning techniques have been increasingly applied in process engineering. Various successful applications include fault detection and development of data driven models. While fault detection is useful for steady operation of the plant, data driven models can be employed for robust prediction of structure activity relationships. Many of these models require nonlinear classification techniques. The success of these techniques relies on the integration of informative domain knowledge to the concerned methods. In this study, we propose a hybrid Ant Colony optimization (ACO) based variable selection approach in conjunction with Support Vector Machines (SVM) to determine informative subsets of process variables that may help detect faults efficiently, making the fault detection model more robust in the process. In addition, we employ a Hilbert Schmidt Independence Criterion (HSIC) based variable ranking heuristic to guide ACO towards better search spaces. Performance testing of HSIC-ACO was carried out on the benchmark Tennessee Eastman Process challenge and large scale QSAR prediction data collected from relevant sources. Our results demonstrate improved fault detection and structure-activity prediction capabilities using the HSIC-ACO algorithm.

[1]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[2]  Bhaskar D. Kulkarni,et al.  Feature Selection for Cancer Classification Using Ant Colony Optimization and Support Vector Machines , 2007, Analysis of Biological Data: A Soft Computing Approach.

[3]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[4]  Le Song,et al.  Supervised feature selection via dependence estimation , 2007, ICML '07.

[5]  Abdessamad Kobi,et al.  Fault detection and identification with a new feature selection based on mutual information , 2008 .

[6]  Vaidyanathan K. Jayaraman,et al.  Biogeography-based informative gene selection and cancer classification using SVM and Random Forests , 2012, 2012 IEEE Congress on Evolutionary Computation.

[7]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[8]  Christos Georgakis,et al.  Plant-wide control of the Tennessee Eastman problem , 1995 .

[9]  E. F. Vogel,et al.  A plant-wide industrial process control problem , 1993 .

[10]  Bhaskar D. Kulkarni,et al.  Knowledge incorporated support vector machines to detect faults in Tennessee Eastman Process , 2005, Comput. Chem. Eng..

[11]  Shameek Ghosh,et al.  Hybrid biogeography based simultaneous feature selection and MHC class I peptide binding prediction using support vector machines and random forests. , 2013, Journal of immunological methods.

[12]  Luca Maria Gambardella,et al.  Ant colony system: a cooperative learning approach to the traveling salesman problem , 1997, IEEE Trans. Evol. Comput..