Trees and beyond: exploiting and improving tree-structured graphical models

Probabilistic models commonly assume that variables are independent of each other conditioned on a subset of other variables. Graphical models provide a powerful framework for encoding such conditional independence structure of a large collection of random variables. A special class of graphical models with significant theoretical and practical importance is the class of tree-structured graphical models. Tree models have several advantages: they can be easily learned given data, their structures are often intuitive, and inference in tree models is highly efficient. However, tree models make strong conditional independence assumptions, which limit their modeling power substantially. This thesis exploits the advantages of tree- structured graphical models and considers modifications to overcome their limitations. To improve the modeling accuracy of tree models, we consider latent trees in which variables at some nodes represent the original (observed) variables of interest while others represent the latent variables added during the learning procedure. The appeal of such models is clear: the additional latent variables significantly increase the modeling power, and inference on trees is scalable with or without latent variables. We propose two computationally efficient and statistically consistent algorithms for learning latent trees, and compare the proposed algorithms to other methods by performing extensive numerical experiments on various latent tree models. We exploit the advantages of tree models in the application of modeling contextual information of an image. Object co-occurrences and spatial relationships can be important cues in recognizing and localizing object instances. We develop tree-based context models and demonstrate that its simplicity enables us to integrate many sources of contextual information efficiently. In addition to object recognition, we are interested in using context models to detect objects that are out of their normal context. This task requires precise and careful modeling of object relationships, so we use a latent tree for object co-occurrences. Many of the latent variables can be interpreted as scene categories, capturing higher-order dependencies among object categories. Tree-structured graphical models have been widely used in multi-resolution (MR) modeling. In the last part of the thesis, we move beyond trees, and propose a new modeling framework that allows additional dependency structure at each scale of an MR tree model. We mainly focus on MR models with jointly Gaussian variables, and assume that variables at each scale have sparse covariance structure (as opposed to fully-uncorrelated structure in MR trees) conditioned on variables at other scales. We develop efficient inference algorithms that are partly based on inference on the embedded MR tree and partly based on local filtering at each scale. In addition, we present methods for learning such models given data at the finest scale by formulating a convex optimization problem. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)

[1]  Christopher K. I. Williams,et al.  Greedy Learning of Binary Latent Trees , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Stephen Gould,et al.  Projected Subgradient Methods for Learning Sparse Gaussians , 2008, UAI.

[3]  Vincent Y. F. Tan,et al.  Learning Gaussian Tree Models: Analysis of Error Exponents and Extremal Structures , 2009, IEEE Transactions on Signal Processing.

[4]  Josiane Zerubia,et al.  Multiscale Markov random field models for parallel image classification , 1993, 1993 (4th) International Conference on Computer Vision.

[5]  Martin J. Wainwright,et al.  Tree-based reparameterization framework for analysis of sum-product and related algorithms , 2003, IEEE Trans. Inf. Theory.

[6]  Antonio Torralba,et al.  Building a database of 3D scenes from user annotations , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Ram Rajagopal,et al.  Network delay inference from additive metrics , 2010 .

[8]  D. Robinson,et al.  Comparison of phylogenetic trees , 1981 .

[9]  Tomas Kocka,et al.  Efficient learning of hierarchical latent class models , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[10]  Charles A. Bouman,et al.  A multiscale random field model for Bayesian image segmentation , 1994, IEEE Trans. Image Process..

[11]  M. Drton,et al.  Model selection for Gaussian concentration graphs , 2004 .

[12]  Nir Friedman,et al.  Learning Belief Networks in the Presence of Missing Values and Hidden Variables , 1997, ICML.

[13]  Robert D. Nowak,et al.  Network delay tomography , 2003, IEEE Trans. Signal Process..

[14]  Francoise J. Preteux,et al.  Hierarchical Markov random field models applied to image analysis: a review , 1995, Optics + Photonics.

[15]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[16]  Dmitry M. Malioutov,et al.  Approximate inference in Gaussian graphical models , 2008 .

[17]  A. Dress,et al.  Reconstructing the shape of a tree from observed dissimilarity data , 1986 .

[18]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[19]  Sébastien Roch,et al.  A short proof that phylogenetic tree reconstruction by maximum likelihood is hard , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[20]  Antonio Torralba,et al.  Contextual Models for Object Detection Using Boosted Random Fields , 2004, NIPS.

[21]  A. Willsky Multiresolution Markov models for signal and image processing , 2002, Proc. IEEE.

[22]  Richard G. Baraniuk,et al.  Robust Distributed Estimation Using the Embedded Subgraphs Algorithm , 2006, IEEE Transactions on Signal Processing.

[23]  Leslie Greengard,et al.  A fast algorithm for particle simulations , 1987 .

[24]  B. Mandelbrot,et al.  Fractional Brownian Motions, Fractional Noises and Applications , 1968 .

[25]  Joseph T. Chang,et al.  A signal-to-noise analysis of phylogeny estimation by neighbor-joining: Insufficiency of polynomial length sequences. , 2006, Mathematical biosciences.

[26]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[27]  Fei-Fei Li,et al.  Modeling mutual context of object and human pose in human-object interaction activities , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[28]  Pushmeet Kohli,et al.  Graph Cut Based Inference with Co-occurrence Statistics , 2010, ECCV.

[29]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[30]  A. Willsky,et al.  Latent variable graphical model selection via convex optimization , 2010 .

[31]  Serge J. Belongie,et al.  Object categorization using co-occurrence, location and appearance , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  R. Prim Shortest connection networks and some generalizations , 1957 .

[33]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[34]  M. Steel The complexity of reconstructing trees from qualitative characters and subtrees , 1992 .

[35]  Daphne Koller,et al.  Efficient Structure Learning of Markov Networks using L1-Regularization , 2006, NIPS.

[36]  Johan Löfberg,et al.  YALMIP : a toolbox for modeling and optimization in MATLAB , 2004 .

[37]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[38]  Geoffrey E. Hinton,et al.  Modeling image patches with a directed hierarchy of Markov random fields , 2007, NIPS.

[39]  Michael I. Jordan Graphical Models , 2003 .

[40]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[41]  Tao Chen,et al.  Efficient Model Evaluation in the Search-Based Approach to Latent Structure Discovery , 2008 .

[42]  Robert M. Gray,et al.  Multiresolution image classification by hierarchical modeling with two-dimensional hidden Markov models , 2000, IEEE Trans. Inf. Theory.

[43]  A. Willsky,et al.  Multiscale Gaussian Graphical Models and Algorithms for Large-Scale Inference , 2007, 2007 IEEE/SP 14th Workshop on Statistical Signal Processing.

[44]  I. Biederman,et al.  Scene perception: Detecting and judging objects undergoing relational violations , 1982, Cognitive Psychology.

[45]  G. Kauermann On a dualization of graphical Gaussian models , 1996 .

[46]  J. Laurie Snell,et al.  Markov Random Fields and Their Applications , 1980 .

[47]  Venkat Chandrasekaran,et al.  Maximum entropy relaxation for multiscale graphical model selection , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[48]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[49]  J. Besag On the Statistical Analysis of Dirty Pictures , 1986 .

[50]  Antonio Torralba,et al.  Contextual Priming for Object Detection , 2003, International Journal of Computer Vision.

[51]  K. C. Chou,et al.  Multiscale recursive estimation, data fusion, and regularization , 1994, IEEE Trans. Autom. Control..

[52]  J. Lake,et al.  Reconstructing evolutionary trees from DNA and protein sequences: paralinear distances. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[53]  Edward J. Delp,et al.  Segmentation of textured images using a multiresolution Gaussian autoregressive model , 1999, IEEE Trans. Image Process..

[54]  Gene H. Golub,et al.  Matrix computations , 1983 .

[55]  Stephen Gould,et al.  Multi-Class Segmentation with Relative Location Prior , 2008, International Journal of Computer Vision.

[56]  Nevin Lianwen Zhang,et al.  Hierarchical latent class models for cluster analysis , 2002, J. Mach. Learn. Res..

[57]  Tandy J. Warnow,et al.  A Few Logs Suffice to Build (almost) All Trees: Part II , 1999, Theor. Comput. Sci..

[58]  Sham M. Kakade,et al.  A spectral algorithm for learning Hidden Markov Models , 2008, J. Comput. Syst. Sci..

[59]  William T. Freeman,et al.  Correctness of Belief Propagation in Gaussian Graphical Models of Arbitrary Topology , 1999, Neural Computation.

[60]  Antonio Torralba,et al.  Using the Forest to See the Trees: A Graphical Model Relating Features, Objects, and Scenes , 2003, NIPS.

[61]  Alexei A. Efros,et al.  Automatic photo pop-up , 2005, SIGGRAPH 2005.

[62]  Michael I. Jordan,et al.  Probabilistic Networks and Expert Systems , 1999 .

[63]  Kevin Atteson,et al.  The Performance of Neighbor-Joining Methods of Phylogenetic Reconstruction , 1999, Algorithmica.

[64]  Robert Nowak,et al.  Network Tomography: Recent Developments , 2004 .

[65]  Joseph T. Chang,et al.  Reconstruction of Evolutionary Trees from Pairwise Distributions on Current Species , 1992 .

[66]  Nanny Wermuth,et al.  Multivariate Dependencies: Models, Analysis and Interpretation , 1996 .

[67]  Nir Friedman,et al.  Discovering Hidden Variables: A Structure-Based Approach , 2000, NIPS.

[68]  A. Tversky,et al.  Additive similarity trees , 1977 .

[69]  William L. Briggs,et al.  A multigrid tutorial , 1987 .

[70]  Martin J. Wainwright,et al.  Embedded trees: estimation of Gaussian Processes on graphs with cycles , 2004, IEEE Transactions on Signal Processing.

[71]  Tao Jiang,et al.  A Polynomial Time Approximation Scheme for Inferring Evolutionary Trees from Quartet Topologies and Its Application , 2001, SIAM J. Comput..

[72]  Miroslav Dudík,et al.  Maximum Entropy Distribution Estimation with Generalized Regularization , 2006, COLT.

[73]  Venkat Chandrasekaran,et al.  Learning Markov Structure by Maximum Entropy Relaxation , 2007, AISTATS.

[74]  Paul W. Fieguth,et al.  Efficient Multiresolution Counterparts to Variational Methods for Surface Reconstruction , 1998, Comput. Vis. Image Underst..

[75]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[76]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[77]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[78]  Andrea Vedaldi,et al.  Objects in Context , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[79]  J. Farris Estimating Phylogenetic Trees from Distance Matrices , 1972, The American Naturalist.

[80]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[81]  R. Serfling Approximation Theorems of Mathematical Statistics , 1980 .

[82]  J. Kruskal On the shortest spanning subtree of a graph and the traveling salesman problem , 1956 .

[83]  Alan S. Willsky,et al.  Multiscale Autoregressive Models and Wavelets , 1999, IEEE Trans. Inf. Theory.

[84]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[85]  Antonio Criminisi,et al.  Object categorization by learned universal visual dictionary , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[86]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[87]  TorralbaA.,et al.  Using the forest to see the trees , 2010 .

[88]  Neil Henry Latent structure analysis , 1969 .

[89]  Mari Ostendorf,et al.  ML parameter estimation of a multiscale stochastic process using the EM algorithm , 2000, IEEE Trans. Signal Process..

[90]  Alexandre d'Aspremont,et al.  Convex optimization techniques for fitting sparse Gaussian graphical models , 2006, ICML.

[91]  Martin J. Wainwright,et al.  High-Dimensional Graphical Model Selection Using ℓ1-Regularized Logistic Regression , 2006, NIPS.

[92]  Nir Friedman,et al.  Learning Hidden Variable Networks: The Information Bottleneck Approach , 2005, J. Mach. Learn. Res..

[93]  Elchanan Mossel,et al.  Optimal phylogenetic reconstruction , 2005, STOC '06.

[94]  Alexei A. Efros,et al.  Putting Objects in Perspective , 2006, CVPR.

[95]  David Heckerman,et al.  A Tutorial on Learning with Bayesian Networks , 1998, Learning in Graphical Models.

[96]  Robert D. Nowak,et al.  Wavelet-based statistical signal processing using hidden Markov models , 1998, IEEE Trans. Signal Process..

[97]  F. Fairman Introduction to dynamic systems: Theory, models and applications , 1979, Proceedings of the IEEE.

[98]  Charles Kemp,et al.  The discovery of structural form , 2008, Proceedings of the National Academy of Sciences.

[99]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[100]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[101]  Martin J. Wainwright,et al.  Tree-Based Modeling and Estimation of Gaussian Processes on Graphs with Cycles , 2000, NIPS.

[102]  Ruggero Frezza,et al.  Gaussian reciprocal processes and self-adjoint stochastic differential equations of second order , 1991 .

[103]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[104]  Dmitry M. Malioutov,et al.  Walk-Sums and Belief Propagation in Gaussian Graphical Models , 2006, J. Mach. Learn. Res..

[105]  S. Si LARGE DEVIATION FOR THE EMPIRICAL CORRELATION COEFFICIENT OF TWO GAUSSIAN RANDOM VARIABLES , 2007 .

[106]  Ashutosh Saxena,et al.  Cascaded Classification Models: Combining Models for Holistic Scene Understanding , 2008, NIPS.

[107]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[108]  Venkat Chandrasekaran,et al.  Estimation in Gaussian Graphical Models Using Tractable Subgraphs: A Walk-Sum Analysis , 2008, IEEE Transactions on Signal Processing.

[109]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[110]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[111]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[112]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[113]  J. M. Hammersley,et al.  Markov fields on finite graphs and lattices , 1971 .

[114]  Lang Tong,et al.  A large-deviation analysis for the maximum likelihood learning of tree structures , 2009, 2009 IEEE International Symposium on Information Theory.