Feature selection based on mutual information

The application of machine learning models such as support vector machine (SVM) and artificial neural networks (ANN) in predicting reservoir properties has been effective in the recent years when compared with the traditional empirical methods. Despite that the machine learning models suffer a lot in the faces of uncertain data which is common characteristics of well log dataset. The reason for uncertainty in well log dataset includes a missing scale, data interpretation and measurement error problems. Feature Selection aimed at selecting feature subset that is relevant to the predicting property. In this paper a feature selection based on mutual information criterion is proposed, the strong point of this method relies on the choice of threshold based on statistically sound criterion for the typical greedy feedforward method of feature selection. Experimental results indicate that the proposed method is capable of improving the performance of the machine learning models in terms of prediction accuracy and reduction in training time.

[1]  Y. Zee Ma,et al.  Uncertainty Analysis in Well-Log and Petrophysical Interpretations , 2011 .

[2]  Charu C. Aggarwal,et al.  Managing and Mining Uncertain Data , 2009, Advances in Database Systems.

[3]  Ratna Babu Chinnam,et al.  mr2PSO: A maximum relevance minimum redundancy feature selection method based on swarm intelligence for support vector machine classification , 2011, Inf. Sci..

[4]  Jef Caers,et al.  A workflow to account for uncertainty in well-log data in 3 D geostatistical reservoir modeling , 2007 .

[5]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[6]  C. D. Kemp,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[7]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[8]  Michel Verleysen,et al.  Feature Selection with Mutual Information for Uncertain Data , 2011, DaWaK.

[9]  Michel Verleysen,et al.  Resampling methods for parameter-free and robust feature selection with mutual information , 2007, Neurocomputing.

[10]  Michel Verleysen,et al.  Mutual information for the selection of relevant variables in spectrometric nonlinear modelling , 2006, ArXiv.

[11]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[12]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[14]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[15]  Ali Selamat,et al.  A Hybrid Model through the Fusion of Type-2 Fuzzy Logic Systems and Sensitivity-Based Linear Learning Method for Modeling PVT Properties of Crude Oil Systems , 2012, Adv. Fuzzy Syst..

[16]  Emmanuel Gringarten Integrated uncertainty assessment - from seismic and well logs to flow simulation , 2012 .

[17]  Le Song,et al.  Feature Selection via Dependence Maximization , 2012, J. Mach. Learn. Res..

[18]  Shuicheng Yan,et al.  Ranking with Uncertain Labels , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[19]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[20]  Jane Labadin,et al.  Prediction of Petroleum Reservoir Properties using Different Versions of Adaptive Neuro-Fuzzy Inference System Hybrid Models , 2013, CISIM 2013.

[21]  Saeed Al-Bukhitan,et al.  Hybrid Soft Computing for PVT Properties Prediction , 2010, ESANN.