论文信息 - Feature selection for Bayesian network classifiers using the MDL-FS score - 字舞流文

Feature selection for Bayesian network classifiers using the MDL-FS score

When constructing a Bayesian network classifier from data, the more or less redundant features included in a dataset may bias the classifier and as a consequence may result in a relatively poor classification accuracy. In this paper, we study the problem of selecting appropriate subsets of features for such classifiers. To this end, we propose a new definition of the concept of redundancy in noisy data. For comparing alternative classifiers, we use the Minimum Description Length for Feature Selection (MDL-FS) function that we introduced before. Our function differs from the well-known MDL function in that it captures a classifier's conditional log-likelihood. We show that the MDL-FS function serves to identify redundancy at different levels and is able to eliminate redundant features from different types of classifier. We support our theoretical findings by comparing the feature-selection behaviours of the various functions in a practical setting. Our results indicate that the MDL-FS function is more suited to the task of feature selection than MDL as it often yields classifiers of equal or better performance with significantly fewer attributes.

Marco Wiering | Madalina M. Drugan | M. Wiering | Mădălina M. Drugan

[1] Yun-Ze Cai,et al. Feature Selection for Classificatory Analysis Based on Information-theoretic Criteria , 2008 .

[2] Feiping Nie,et al. A unified framework for semi-supervised dimensionality reduction , 2008, Pattern Recognit..

[3] Isabelle Guyon,et al. An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[4] Nir Friedman,et al. Bayesian Network Classifiers , 1997, Machine Learning.

[5] Pedro Larrañaga,et al. Feature subset selection by Bayesian networks: a comparison with genetic and sequential algorithms , 2001, Int. J. Approx. Reason..

[6] David E. Goldberg,et al. Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[7] Trevor J. Hastie,et al. Discriminative vs Informative Learning , 1997, KDD.

[8] Lei Liu,et al. Feature selection with dynamic mutual information , 2009, Pattern Recognit..

[9] Pedro M. Domingos,et al. Learning Bayesian network classifiers by maximizing conditional likelihood , 2004, ICML.

[10] Pat Langley,et al. Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[11] Bin Yu,et al. Model Selection and the Principle of Minimum Description Length , 2001 .

[12] Tomi Silander,et al. Learning locally minimax optimal Bayesian networks , 2010, Int. J. Approx. Reason..

[13] Igor Kononenko,et al. Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[14] Xingquan Zhu,et al. Class Noise vs. Attribute Noise: A Quantitative Study , 2003, Artificial Intelligence Review.

[15] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.

[16] Marvin Minsky,et al. Steps toward Artificial Intelligence , 1995, Proceedings of the IRE.

[17] Jesper Tegnér,et al. Towards scalable and data efficient learning of Markov boundaries , 2007, Int. J. Approx. Reason..

[18] Constantin F. Aliferis,et al. Towards Principled Feature Selection: Relevancy, Filters and Wrappers , 2003 .

[19] Pat Langley,et al. Induction of Selective Bayesian Classifiers , 1994, UAI.

[20] Sotiris B. Kotsiantis,et al. Supervised Machine Learning: A Review of Classification Techniques , 2007, Informatica.

[21] LiuLei,et al. Feature selection with dynamic mutual information , 2009 .

[22] Daphne Koller,et al. Toward Optimal Feature Selection , 1996, ICML.

[23] Terran Lane,et al. Learning class-discriminative dynamic Bayesian networks , 2005, ICML.

[24] C. N. Liu,et al. Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[25] Pedro Larrañaga,et al. Feature selection in Bayesian classifiers for the prognosis of survival of cirrhotic patients treated with TIPS , 2005, J. Biomed. Informatics.

[26] Pedro Larrañaga,et al. Discriminative vs. Generative Learning of Bayesian Network Classifiers , 2007, ECSQARU.

[27] P. Langley. Selection of Relevant Features in Machine Learning , 1994 .

[28] Estevam R. Hruschka,et al. BayesRule: A Markov-Blanket based procedure for extracting a set of probabilistic rules from Bayesian classifiers , 2008, Int. J. Hybrid Intell. Syst..

[29] Huan Liu,et al. Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[30] Jesper Tegnér,et al. Consistent Feature Selection for Pattern Recognition in Polynomial Time , 2007, J. Mach. Learn. Res..

[31] Mark A. Hall,et al. Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[32] Hiroshi Motoda,et al. Computational Methods of Feature Selection , 2022 .

[33] Vincent J. Carey,et al. Supervised Machine Learning , 2008 .

[34] Franz Pernkopf,et al. Discriminative versus generative parameter and structure learning of Bayesian network classifiers , 2005, ICML.

[35] Pedro M. Domingos,et al. On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[36] F. Fleuret. Fast Binary Feature Selection with Conditional Mutual Information , 2004, J. Mach. Learn. Res..

[37] Pedro Larrañaga,et al. Bayesian classifiers based on kernel density estimation: Flexible classifiers , 2009, Int. J. Approx. Reason..

[38] Linda C. van der Gaag,et al. A New MDL-Based Function for Feature Selection for Bayesian Network Classifiers , 2004, ECAI.

[39] Yijun Sun,et al. Iterative RELIEF for Feature Weighting: Algorithms, Theories, and Applications , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40] Vladimir Pavlovic,et al. Boosted Bayesian network classifiers , 2008, Machine Learning.

[41] Marco Zaffalon,et al. Fast algorithms for robust classification with Bayesian nets , 2007, Int. J. Approx. Reason..

[42] Jorma Rissanen,et al. Strong optimality of the normalized ML models as universal codes and information in data , 2001, IEEE Trans. Inf. Theory.

[43] Larry A. Rendell,et al. A Practical Approach to Feature Selection , 1992, ML.

[44] Henry Tirri,et al. BAYDA: Software for Bayesian Classification and Feature Selection , 1998, KDD.

[45] Huan Liu,et al. Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[46] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[47] Adam C. Winstanley,et al. Invariant optimal feature selection: A distance discriminant and feature ranking based solution , 2008, Pattern Recognit..

[48] Ron Kohavi,et al. Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[49] Tommi S. Jaakkola,et al. Feature Selection and Dualities in Maximum Entropy Discrimination , 2000, UAI.

[50] G. Bortolan,et al. The problem of linguistic approximation in clinical decision making , 1988, Int. J. Approx. Reason..

[51] J. Rissanen,et al. Modeling By Shortest Data Description* , 1978, Autom..

[52] Pat Langley,et al. An Analysis of Bayesian Classifiers , 1992, AAAI.

[53] Russell Greiner,et al. Discriminative Model Selection for Belief Net Structures , 2005, AAAI.

[54] Jorma Rissanen,et al. Efficient Computation of Stochastic Complexity , 2003 .

[55] Madalina M. Drugan. Conditional log-likelihood MDL and Evolutionary MCMC , 2006 .

[56] Ron Kohavi,et al. Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[57] InzaIñaki,et al. Feature selection in Bayesian classifiers for the prognosis of survival of cirrhotic patients treated with TIPS , 2005 .

[58] Sang Joon Kim,et al. A Mathematical Theory of Communication , 2006 .

[59] Huan Liu,et al. Spectral feature selection for supervised and unsupervised learning , 2007, ICML '07.

[60] Jeff A. Bilmes,et al. Dynamic Bayesian Multinets , 2000, UAI.

[61] Luis M. de Campos,et al. Bayesian network models for hierarchical text classification from a thesaurus , 2009, Int. J. Approx. Reason..

[62] Laurent Younes,et al. A Stochastic Algorithm for Feature Selection in Pattern Recognition , 2007, J. Mach. Learn. Res..

[63] D. E. Goldberg,et al. Genetic Algorithms in Search , 1989 .

[64] Usama M. Fayyad,et al. Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.