Learning Bayesian Networks from Ordinal Data

Bayesian networks are a powerful framework for studying the dependency structure of variables in a complex system. The problem of learning Bayesian networks is tightly associated with the given data type. Ordinal data, such as stages of cancer, rating scale survey questions, and letter grades for exams, are ubiquitous in applied research. However, existing solutions are mainly for continuous and nominal data. In this work, we propose an iterative score-and-search method - called the Ordinal Structural EM (OSEM) algorithm - for learning Bayesian networks from ordinal data. Unlike traditional approaches designed for nominal data, we explicitly respect the ordering amongst the categories. More precisely, we assume that the ordinal variables originate from marginally discretizing a set of Gaussian variables, whose structural dependence in the latent space follows a directed acyclic graph. Then, we adopt the Structural EM algorithm and derive closed-form scoring functions for efficient graph searching. Through simulation studies, we illustrate the superior performance of the OSEM algorithm compared to the alternatives and analyze various factors that may influence the learning accuracy. Finally, we demonstrate the practicality of our method with a real-world application on psychological survey data from 408 patients with co-morbid symptoms of obsessive-compulsive disorder and depression.

[1]  D. Madigan,et al.  Model Selection and Accounting for Model Uncertainty in Graphical Models Using Occam's Window , 1994 .

[2]  Ralf Eggeling,et al.  Learning Bayesian networks with local structure, mixed variables, and exact algorithms , 2019, Int. J. Approx. Reason..

[3]  Geert Molenberghs,et al.  A pairwise likelihood approach to estimation in multilevel probit models , 2004, Comput. Stat. Data Anal..

[4]  David Heckerman,et al.  Learning Bayesian Networks: A Unification for Discrete and Gaussian Domains , 1995, UAI.

[5]  Yang Liu,et al.  Large-scale empirical validation of Bayesian Network structure learning algorithms with noisy data , 2020, Int. J. Approx. Reason..

[6]  David Heckerman,et al.  Parameter Priors for Directed Acyclic Graphical Models and the Characteriration of Several Probability Distributions , 1999, UAI.

[7]  David Maxwell Chickering,et al.  Large-Sample Learning of Bayesian Networks is NP-Hard , 2002, J. Mach. Learn. Res..

[8]  J. Markowitz,et al.  The 16-Item quick inventory of depressive symptomatology (QIDS), clinician rating (QIDS-C), and self-report (QIDS-SR): a psychometric evaluation in patients with chronic major depression , 2003, Biological Psychiatry.

[9]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[10]  Xiao-Li Meng,et al.  Maximum likelihood estimation via the ECM algorithm: A general framework , 1993 .

[11]  Tom Heskes,et al.  Copula PC Algorithm for Causal Discovery from Mixed Data , 2016, ECML/PKDD.

[12]  Marco Grzegorczyk,et al.  Improving the structure MCMC sampler for Bayesian networks by introducing a new edge reversal move , 2008, Machine Learning.

[13]  Carlos F. Daganzo,et al.  Multinomial Probit: The Theory and its Application to Demand Forecasting. , 1980 .

[14]  C. Varin,et al.  A mixed autoregressive probit model for ordinal longitudinal data. , 2010, Biostatistics.

[15]  Peter Bühlmann,et al.  Causal Inference Using Graphical Models with the R Package pcalg , 2012 .

[16]  Giusi Moffa,et al.  Partition MCMC for Inference on Acyclic Digraphs , 2015, 1504.05006.

[17]  P. McCullagh Analysis of Ordinal Categorical Data , 1985 .

[18]  Mikko Koivisto,et al.  Exact Bayesian Structure Discovery in Bayesian Networks , 2004, J. Mach. Learn. Res..

[19]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[20]  Judea Pearl,et al.  Equivalence and Synthesis of Causal Models , 1990, UAI.

[21]  I. Moustaki,et al.  Assessing Partial Association Between Ordinal Variables: Quantification, Visualization, and Hypothesis Testing , 2020, Journal of the American Statistical Association.

[22]  R D Bock,et al.  High-dimensional multivariate probit analysis. , 1996, Biometrics.

[23]  Peter Buhlmann,et al.  Geometry of the faithfulness assumption in causal inference , 2012, 1207.0547.

[24]  Paolo Giudici,et al.  Improving Markov Chain Monte Carlo Model Search for Data Mining , 2004, Machine Learning.

[25]  Qiang Shen,et al.  Learning Bayesian networks: approaches and issues , 2011, The Knowledge Engineering Review.

[26]  Nir Friedman,et al.  Learning Belief Networks in the Presence of Missing Values and Hidden Variables , 1997, ICML.

[27]  S. Chib,et al.  Analysis of multivariate probit models , 1998 .

[28]  D. Heckerman,et al.  Addendum on the scoring of Gaussian directed acyclic graphical models , 2014, 1402.6863.

[29]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[30]  Giusi Moffa,et al.  Sequential Monte Carlo EM for multivariate probit models , 2014, Comput. Stat. Data Anal..

[31]  Giorgos Borboudakis,et al.  Constraint-based causal discovery with mixed data , 2018, International Journal of Data Science and Analytics.

[32]  J. Ashford,et al.  Multi-variate probit analysis. , 1970, Biometrics.

[33]  Christopher Meek,et al.  Causal inference and causal explanation with background knowledge , 1995, UAI.

[34]  Peter Spirtes,et al.  An Anytime Algorithm for Causal Inference , 2001, AISTATS.

[35]  Michael I. Jordan Graphical Models , 2003 .

[36]  R. Frost,et al.  The Yale-Brown Obsessive Compulsive Scale: interview versus self-report. , 1996, Behaviour research and therapy.

[37]  J. York,et al.  Bayesian Graphical Models for Discrete Data , 1995 .

[38]  Alain Hauser,et al.  High-dimensional consistency in score-based and hybrid structure learning , 2015, The Annals of Statistics.

[39]  P. Spirtes,et al.  Ancestral graph Markov models , 2002 .

[40]  Trevor Hastie,et al.  An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.

[41]  Thomas S. Richardson,et al.  Learning high-dimensional directed acyclic graphs with latent and selection variables , 2011, 1104.5617.

[42]  James Cussens,et al.  Bayesian Network Structure Learning with Integer Programming: Polytopes, Facets and Complexity , 2017, J. Artif. Intell. Res..

[43]  Zoubin Ghahramani,et al.  The Hidden Life of Latent Variables: Bayesian Learning with Mixed Graph Models , 2009, J. Mach. Learn. Res..

[44]  Diego Colombo,et al.  Order-independent constraint-based causal structure learning , 2012, J. Mach. Learn. Res..

[45]  Nir Friedman,et al.  The Bayesian Structural EM Algorithm , 1998, UAI.

[46]  G. C. Wei,et al.  A Monte Carlo Implementation of the EM Algorithm and the Poor Man's Data Augmentation Algorithms , 1990 .

[47]  F. Musella,et al.  A PC algorithm variation for ordinal variables , 2013, Comput. Stat..

[48]  P. Mair,et al.  Co-morbid obsessive–compulsive disorder and depression: a Bayesian network approach , 2017, Psychological Medicine.

[49]  David Maxwell Chickering,et al.  Optimal Structure Identification With Greedy Search , 2002, J. Mach. Learn. Res..

[50]  Constantin F. Aliferis,et al.  The max-min hill-climbing Bayesian network structure learning algorithm , 2006, Machine Learning.

[51]  Jonathan J. Forster,et al.  Bayesian model determination for multivariate ordinal and binary data , 2008, Comput. Stat. Data Anal..

[52]  Nir Friedman,et al.  Being Bayesian About Network Structure. A Bayesian Approach to Structure Discovery in Bayesian Networks , 2004, Machine Learning.

[53]  Peter Bühlmann,et al.  Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm , 2007, J. Mach. Learn. Res..

[54]  David J. Nott,et al.  A pairwise likelihood approach to analyzing correlated binary data , 2000 .