Markov Blanket Discovery in Positive-Unlabelled and Semi-supervised Data

The importance of Markov blanket discovery algorithms is twofold: as the main building block in constraint-based structure learning of Bayesian network algorithms and as a technique to derive the optimal set of features in filter feature selection approaches. Equally, learning from partially labelled data is a crucial and demanding area of machine learning, and extending techniques from fully to partially supervised scenarios is a challenging problem. While there are many different algorithms to derive the Markov blanket of fully supervised nodes, the partially-labelled problem is far more challenging, and there is a lack of principled approaches in the literature. Our work derives a generalization of the conditional tests of independence for partially labelled binary target variables, which can handle the two main partially labelled scenarios: positive-unlabelled and semi-supervised. The result is a significantly deeper understanding of how to control false negative errors in Markov Blanket discovery procedures and how unlabelled data can help.

[1]  Graham K. Rand,et al.  Quantitative Applications in the Social Sciences , 1983 .

[2]  Gavin Brown,et al.  Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection , 2012, J. Mach. Learn. Res..

[3]  Alexander Zien,et al.  Gaussian Processes and the Null-Category Noise Model , 2006 .

[4]  Ruichu Cai,et al.  BASSUM: A Bayesian semi-supervised method for classification feature selection , 2011, Pattern Recognit..

[5]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[6]  Thomas M. Cover,et al.  Elements of Information Theory: Cover/Elements of Information Theory, Second Edition , 2005 .

[7]  Davide Bacciu,et al.  Efficient identification of independence networks using mutual information , 2012, Computational Statistics.

[8]  Ji Zhu,et al.  A Method for Inferring Label Sampling Mechanisms in Semi-Supervised Learning , 2004, NIPS.

[9]  Sebastian Thrun,et al.  Bayesian Network Induction via Local Neighborhoods , 1999, NIPS.

[10]  Constantin F. Aliferis,et al.  Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification Part I: Algorithms and Empirical Evaluation , 2010, J. Mach. Learn. Res..

[11]  Gavin Brown,et al.  Informative Priors for Markov Blanket Discovery , 2012, AISTATS.

[12]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[13]  Constantin F. Aliferis,et al.  Time and sample efficient discovery of Markov blankets and direct causal relations , 2003, KDD '03.

[14]  Masashi Sugiyama,et al.  Semi-Supervised Learning of Class Balance under Class-Prior Change by Distribution Matching , 2012, ICML.

[15]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[16]  André Elisseeff,et al.  Using Markov Blankets for Causal Structure Learning , 2008, J. Mach. Learn. Res..

[17]  Dimitris Margaritis,et al.  Speculative Markov blanket discovery for optimal feature selection , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[18]  Constantin F. Aliferis,et al.  Towards Principled Feature Selection: Relevancy, Filters and Wrappers , 2003 .

[19]  Charles Elkan,et al.  Learning classifiers from only positive and unlabeled data , 2008, KDD.

[20]  M. Aldenderfer,et al.  Cluster Analysis. Sage University Paper Series On Quantitative Applications in the Social Sciences 07-044 , 1984 .

[21]  P. Lachenbruch Statistical Power Analysis for the Behavioral Sciences (2nd ed.) , 1989 .

[22]  Gavin Brown,et al.  Statistical Hypothesis Testing in Positive Unlabelled Data , 2014, ECML/PKDD.

[23]  A. Hasman,et al.  Probabilistic reasoning in intelligent systems: Networks of plausible inference , 1991 .

[24]  Charles Elkan,et al.  Making generative classifiers robust to selection bias , 2007, KDD '07.

[25]  Alan Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[26]  Guy Van den Broeck,et al.  Efficient Algorithms for Bayesian Network Parameter Learning from Incomplete Data , 2014, UAI.

[27]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[28]  Daniel A. Keim,et al.  On Knowledge Discovery and Data Mining , 1997 .

[29]  D. Altman,et al.  Missing data , 2007, BMJ : British Medical Journal.