Research on Hybrid Feature Selection Method Based on Iterative Approximation Markov Blanket

The basic experimental data of traditional Chinese medicine are generally obtained by high-performance liquid chromatography and mass spectrometry. The data often show the characteristics of high dimensionality and few samples, and there are many irrelevant features and redundant features in the data, which bring challenges to the in-depth exploration of Chinese medicine material information. A hybrid feature selection method based on iterative approximate Markov blanket (CI_AMB) is proposed in the paper. The method uses the maximum information coefficient to measure the correlation between features and target variables and achieves the purpose of filtering irrelevant features according to the evaluation criteria, firstly. The iterative approximation Markov blanket strategy analyzes the redundancy between features and implements the elimination of redundant features and then selects an effective feature subset finally. Comparative experiments using traditional Chinese medicine material basic experimental data and UCI's multiple public datasets show that the new method has a better advantage to select a small number of highly explanatory features, compared with Lasso, XGBoost, and the classic approximate Markov blanket method.

[1]  Yu Kui K-part Lasso based on feature selection algorithm for high-dimensional data , 2012 .

[2]  R. Souza,et al.  Robust PCA and MIC statistics of baryons in early minihaloes , 2013, 1308.6009.

[3]  Wei Zhong-qian Bayesian network structure learning algorithm based on maximal information coefficient , 2014 .

[4]  Michael Mitzenmacher,et al.  Detecting Novel Associations in Large Data Sets , 2011, Science.

[5]  Shunkai Fu,et al.  基于拓扑信息加速马尔科夫毯学习 (Accelerating the Recovery of Markov Blanket Using Topology Information) , 2015, 计算机科学.

[6]  Wael Abd-Almageed,et al.  Feature Selection using Partial Least Squares regression and optimal experiment design , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[7]  Qinbao Song,et al.  A Fast Clustering-Based Feature Subset Selection Algorithm for High-Dimensional Data , 2013, IEEE Transactions on Knowledge and Data Engineering.

[8]  Ahmad Mani-Varnosfaderani,et al.  Assessment of the orthogonality in two-dimensional separation systems using criteria defined by the maximal information coefficient. , 2015, Journal of chromatography. A.

[9]  Peng Gao,et al.  An Effective Method of Monitoring the Large-Scale Traffic Pattern Based on RMT and PCA , 2010 .

[10]  Michel Lang,et al.  A Multicriteria Approach to Find Predictive and Sparse Models with Stable Feature Selection for High-Dimensional Data , 2017, Comput. Math. Methods Medicine.

[11]  Fei Guo,et al.  Multi-scale encoding of amino acid sequences for predicting protein interactions using gradient boosting decision tree , 2017, PloS one.

[12]  Christian E. Schaerer,et al.  Understanding a Version of Multivariate Symmetric Uncertainty to assist in Feature Selection , 2017, ArXiv.

[13]  Li Kangshun,et al.  Entropy preserving histogram specification with adaptive brightness , 2012 .

[14]  Cun-Hui Zhang,et al.  Adaptive Lasso for sparse high-dimensional regression models , 2008 .

[15]  Xi Zhao,et al.  A Simple Decomposition Alternating Direction Method for Matrix Completion , 2013, ITQM.

[16]  Jesús S. Aguilar-Ruiz,et al.  Incremental wrapper-based gene selection from microarray data for cancer classification , 2006, Pattern Recognit..

[17]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[18]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[19]  An Zeng,et al.  基于MIC的深度置信网络研究 (Deep Belief Networks Research Based on Maximum Information Coefficient) , 2016, 计算机科学.

[20]  Nils Lid Hjort,et al.  Fridge: Focused fine‐tuning of ridge regression for personalized predictions , 2018, Statistics in medicine.

[21]  Min Han,et al.  Forward Feature Selection Based on Approximate Markov Blanket , 2012, ISNN.

[22]  Xuan Huang 特征降维技术的研究与进展 (Research and Development of Feature Dimensionality Reduction) , 2018, 计算机科学.

[23]  Yong Shi,et al.  Feature Selection with Attributes Clustering by Maximal Information Coefficient , 2013, ITQM.