An improved feature selection algorithm based on Markov blanket

For decades, coronary artery heart disease(CHD) has been one of the most threatening diseases to human health. Syndrome pattern mining is one of the attempts researchers have been done to conquer this disease. The main issue of syndrome pattern mining is to confirm the correspondence between syndrome and syndroms subset, so it can be done through feature selection techniques. Feature selection is a critical unit in classification, which is used to classify syndroms into different syndromes, and can effectively improve the speed and accuracy. In this paper, we propose a novel feature selection algorithm based on Markov blanket and information gain(MB-IGFS) for syndroms classification problem. In particular, we give a new and intuitive measurement of condition independence between features and class labels, which is more accurate and easy for calculation. For evaluation, experiments were conducted on Breast Cancer Wisconsin (Diagnostic) Data Set. Results suggest that, compared with other feature selection methods, MB-IGFS is effective and efficient in eliminating irrelevant and redundant features. Then we used MB-IGFS to give optimal syndroms subsets for both Solid ZHENG and Virtual ZHENG syndrome. We conclude that MB-IGFS appears a very attractive solution in syndroms classification applications.

[1]  Constantin F. Aliferis,et al.  Algorithms for Large Scale Markov Blanket Discovery , 2003, FLAIRS.

[2]  Constantin F. Aliferis,et al.  Towards Principled Feature Selection: Relevancy, Filters and Wrappers , 2003 .

[3]  Zexuan Zhu,et al.  Markov blanket-embedded genetic algorithm for gene selection , 2007, Pattern Recognit..

[4]  Franz von Kutschera,et al.  Causation , 1993, J. Philos. Log..

[5]  Sebastian Thrun,et al.  Bayesian Network Induction via Local Neighborhoods , 1999, NIPS.

[6]  Shunkai Fu,et al.  Markov Blanket based Feature Selection: A Review of Past Decade , 2010 .

[7]  Dimitris Margaritis,et al.  Speculative Markov blanket discovery for optimal feature selection , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[8]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[9]  O. Mangasarian,et al.  Multisurface method of pattern separation for medical diagnosis applied to breast cytology. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Cui Zi An Approximate Markov Blanket Feature Selection Algorithm , 2007 .

[11]  Constantin F. Aliferis,et al.  HITON: A Novel Markov Blanket Algorithm for Optimal Variable Selection , 2003, AMIA.

[12]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[13]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[14]  Constantin F. Aliferis,et al.  Time and sample efficient discovery of Markov blankets and direct causal relations , 2003, KDD '03.

[15]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[16]  Jesper Tegnér,et al.  Towards scalable and data efficient learning of Markov boundaries , 2007, Int. J. Approx. Reason..

[17]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[18]  Shunkai Fu,et al.  Fast Markov Blanket Discovery Algorithm Via Local Learning within Single Pass , 2008, Canadian Conference on AI.

[19]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..