A New Equilibrium Criterion for Learning the Cardinality of Latent Variables

Mining high dimensional data-sets extracted from real world problems is a challenging task due to the large features' space. The latent variables are used to reduce the dimensions of this space by representing highly dependent features. They simplify the creation of probabilistic models and they clarify the semantic of the inferred knowledge. Learning these variables for Bayesian network, as the most generic probabilistic model, is problematic. Actually, there is not a direct way that leads to finding their cardinalities. The precision of the inferred model is highly dependent on the accuracy of the latent variable's cardinality. Therefore, choosing a small value leads to a generalized model having a high rate of information loss. Moreover, a high cardinality tend to over-fit the data, to generate complex latent variables and to burden the parameter learning of the probabilistic model. In this paper, we propose a new criterion based on the mutual information and the log likelihood, called the equilibrium criterion. We mathematically and experimentally validate its efficiency for estimating the cardinality of the latent variable. We also demonstrate its performance in finding the hidden cause of a set of observed variables. The experimental analysis shows that our method succeeded in restoring the original cardinality of intentionally deleted variables in known networks.

[1]  Nir Friedman,et al.  Learning the Dimensionality of Hidden Variables , 2001, UAI.

[2]  W C Gogel,et al.  Directional separation and the size cue to distance , 1971, Psychologische Forschung.

[3]  Frank Wittig Learning Bayesian networks with hidden variables for user modeling , 1999 .

[4]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[5]  Christopher K. I. Williams,et al.  Greedy Learning of Binary Latent Trees , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Tengfei Liu,et al.  A Survey on Latent Tree Models and Applications , 2013, J. Artif. Intell. Res..

[7]  S. Eddy Hidden Markov models. , 1996, Current opinion in structural biology.

[8]  Le Song,et al.  A Spectral Algorithm for Latent Tree Graphical Models , 2011, ICML.

[9]  Salma Jamoussi,et al.  Weighted ensemble learning of Bayesian network for gene regulatory networks , 2015, Neurocomputing.

[10]  J. Woodward,et al.  Independence, Invariance and the Causal Markov Condition , 1999, The British Journal for the Philosophy of Science.

[11]  David Heckerman,et al.  Bayesian Networks for Knowledge Discovery , 1996, Advances in Knowledge Discovery and Data Mining.

[12]  Tao Chen,et al.  Model-based multidimensional clustering of categorical data , 2012, Artif. Intell..

[13]  Gavin Brown,et al.  Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection , 2012, J. Mach. Learn. Res..

[14]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[15]  Hao Wu,et al.  Structure Learning of Bayesian Network with Latent Variables by Weight-Induced Refinement , 2014, Web-KR '14.

[16]  Claude E. Shannon,et al.  The mathematical theory of communication , 1950 .

[17]  Weixiong Zhang,et al.  Configuration landscape analysis and backbone guided local search: Part I: Satisfiability and maximum satisfiability , 2004, Artif. Intell..

[18]  Dennis Connolly,et al.  Constructing Hidden Variables in Bayesian Networks via Conceptual Clustering , 1993, ICML.

[19]  A. Alexandrova The British Journal for the Philosophy of Science , 1965, Nature.

[20]  Luis M. de Campos,et al.  A Scoring Function for Learning Bayesian Networks based on Mutual Information and Conditional Independence Tests , 2006, J. Mach. Learn. Res..

[21]  Nir Friedman,et al.  Learning Bayesian Network Structure from Massive Datasets: The "Sparse Candidate" Algorithm , 1999, UAI.

[22]  T. Moon The expectation-maximization algorithm , 1996, IEEE Signal Process. Mag..

[23]  Gregory F. Cooper,et al.  A Bayesian Method for the Induction of Probabilistic Networks from Data , 1992 .

[24]  Judea Pearl,et al.  A Theory of Inferred Causation , 1991, KR.

[25]  Tomas Kocka,et al.  Effective Dimensions of Hierarchical Latent Class Models , 2011, J. Artif. Intell. Res..

[26]  Luis M. de Campos,et al.  Searching for Bayesian Network Structures in the Space of Restricted Acyclic Partially Directed Graphs , 2011, J. Artif. Intell. Res..

[27]  Wray L. Buntine Theory Refinement on Bayesian Networks , 1991, UAI.

[28]  Tomas Kocka,et al.  Efficient learning of hierarchical latent class models , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[29]  Tao Chen,et al.  Efficient Model Evaluation in the Search-Based Approach to Latent Structure Discovery , 2008 .

[30]  Marie Davidian,et al.  Latent-model robustness in structural measurement error models , 2006 .

[31]  Nevin Lianwen Zhang,et al.  Hierarchical latent class models for cluster analysis , 2002, J. Mach. Learn. Res..

[32]  David Maxwell Chickering,et al.  A Transformational Characterization of Equivalent Bayesian Network Structures , 1995, UAI.

[33]  H. Akaike A new look at the statistical model identification , 1974 .