Classification and concentration prediction of combustible gas based on BPNN and PCA

Detection of combustible gases is very important to reduce the modality and disability of human in both of civil and military situation. In this paper, a method of detection combustible gases of acetone and ethanol was proposed by using back propagation neural network (BPNN) and principal component analysis (PCA). The gas data were collected using some metal oxide semiconductor (MOS) gas sensors exposed to the mixture combustible gases of different concentration. The features of low and high frequency domain were extracted to establish a feature vector of 432 dimensions. Then PCA was used to reduce the dimension of feature vector from 432 to 11 which retained 99% information. The results showed the binary classification accuracy of BPNN is up to 100% for train, validation and test when distinguishing the combustible gas from the air. The mean and variance of error (0.004±0.008) for concentration prediction were obtained based on BPNN and PCA. The results demonstrated that the proposed method is effective for classification and concentration prediction of combustible gas. Introduction Recently, the production and application of combustible gases has been increasing with industrialization. The detection of combustible gases is very important to avoiding gas leakage and serious accident [1-3]. Especially, the quantitative analysis and classification of mixture combustible gases is a hotspot in the research field [4-7]. Artificial neural network has been proved to be a powerful tool in the concentration estimation [5, 7-9]. Zhao et al. used the back propagation (BP) method and radial basis function (RBF) neural network in the data analysis of metal oxide gas sensors and arrays, and obtained good accuracy of concentration prediction [7]. Zhang et al. studied a concentration estimation of indoor contaminants for the air quality monitoring in dwellings by using chaos-based optimization of BPNN and integrated into a self-designed portable E-nose instrument [8]. Zhang et al. studied the concentration estimation of multiple kinds of chemicals using multilayer perceptron (MLP) neural network and got a good performance in accuracy and convergence [9]. Ziyatdinov et al. proposed a chemical sensing system based on an array of 16 metal-oxide gas sensors and used PCA to analyze the difference of the first three principle component [5]. Although many methods were proposed based on artificial neural networks, there are few researches about the binary classification and concentration prediction of combustible gases via combining BPNN and PCA, especially for the high dimension of feature vectors as the Ref. 5. In this work, a binary classification and concentration prediction of combustible gas were studied by using BPNN and PCA. The paper is organized as follows: Section 2 describes materials and methods including data collection, feature extraction and introduction of BPNN; Section 3 is the results and discussion about classification between combustible gas and pure air by BPNN and prediction of concentration of combustible gas by BPNN and PCA; Section 4 presents the conclusions. 2nd Workshop on Advanced Research and Technology in Industry Applications (WARTIA 2016) © 2016. The authors Published by Atlantis Press 1563 Materials and methods Data collection: The measured data was collected by using a chemical sensing system based on an array of 16 metal-oxide gas sensors and an external mechanical ventilator to simulate the biological respiration cycle. The tested gas classes (12 in total) formed a relatively broad combination of two analytes, acetone and ethanol, in binary mixtures. Three concentrations doses 0.1, 0.3 and 1 vol. % were used to prepare the dilutions in water for the pure analytes. The same dilutions were used to generate gas mixtures. The gas classes included samples of pure ethanol ('lab' attribute eth-0.1, eth-0.3 and eth-1), samples of pure acetone (ace-0.1, ace-0.3 and ace-1), samples of binary mixtures of ethanol and acetone (ace-0.1-eth-0.1, ace-0.1-eth-0.3, ace-0.3-eth-0.1, ace-0.1-eth-1 and ace-1-eth-0.1) and samples of water dilutions without any analyte (air) giving a total number of 12 classes. The choice of these analytes and concentrations was not affected by any particular application constraint, except that the sensors of selected models show consistent and diverse responses among the gas classes. Raw data of each sample contains 16 time-series (one time-series per sensor). Each time-series was recorded during 5 min at a sample rate of 25 Hz (samples per second), providing 7500 data points per time-series. The total number of attributes per sample in raw data is 120000. More details about the experiment and data collection is available in Ref 5 and the website http://archive.ics.uci.edu/ml/datasets.html. Feature extraction: The raw signals were pre-processed with a median filter, and were filtered by two Butterworth filters of 3rd order: a low-pass filter (cut-off frequency 0.01Hz) and a high-pass filter (pass-frequency 0.07 Hz) to generate the low/high frequency signals respectively. Then, the filtered singles were divided into each segment signals during each respiratory cycles. The amplitudes of the high/low frequency signal as two features. A cycle-independent feature per single measurement also was introduced, defined as the maximum of the low-frequency signal over the course of the measurement. The extraction of more features refers to Ref. 5. Feature data set includes three types of features extracted from each time-series. Each time-series (one time-series per sensor) is associated with 1 maximum features, 13 high-frequency features and 13 low-frequency features (the features correspond to the first 13 respiration cycles, respectively). The total number of attributes per sample in feature data set is 432. Back Propagation Neural Network (BPNN): BPNN is one kind of neural network, which is commonly used in prediction, pattern classification, data mining, et al, without any prior knowledge about the existing problem. Figure 1 shows a three-layer BPNN topology, which includes an input layer, hidden layer and output layer. Input layer is also a feature layer into which the feature vectors input one by one. Hidden layer link to input layer and output layer with weights and implement a computation by a transfer function, such as pure linear function, sigmoid function. Output layer represents the target layer which outputs the prediction results of the BPNN model. Fig. 1. Three-layer BPNN topology In order to improve the prediction accuracy of BPNN model, training is required to make the network more intelligent. The ANN is trained with a set of input and known output pairs called the training set. At the beginning of the training process, the network weights are initialized with the data provided from the laboratory. Then an error back propagation algorithm is used to adjust the weights till the prediction error is acceptable. The error back propagation algorithm is showed as above. Step 1: Initialize the weights and thresholds, and specify error tolerance, maximum iteration number and other parameters. Step 2: Randomly select the kth sample and input into neural network. Feature vector and target