Mode Estimation for High Dimensional Discrete Tree Graphical Models

This paper studies the following problem: given samples from a high dimensional discrete distribution, we want to estimate the leading (δ, ρ)-modes of the underlying distributions. A point is defined to be a (δ, ρ)-mode if it is a local optimum of the density within a δ-neighborhood under metric ρ. As we increase the "scale" parameter δ, the neighborhood size increases and the total number of modes monotonically decreases. The sequence of the (δ, ρ)-modes reveal intrinsic topographical information of the underlying distributions. Though the mode finding problem is generally intractable in high dimensions, this paper unveils that, if the distribution can be approximated well by a tree graphical model, mode characterization is significantly easier. An efficient algorithm with provable theoretical guarantees is proposed and is applied to applications like data analysis and multiple predictions.

[1]  Larry Wasserman,et al.  Forest Density Estimation , 2010, J. Mach. Learn. Res..

[2]  Leonidas J. Guibas,et al.  Persistence-based clustering in riemannian manifolds , 2011, SoCG '11.

[3]  Ranjan Maitra Initializing Partition-Optimization Algorithms , 2009, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[4]  P. Bühlmann,et al.  Sparse graphical Gaussian modeling of the isoprenoid gene network in Arabidopsis thaliana , 2004, Genome Biology.

[5]  Christoph H. Lampert,et al.  Computing the M Most Probable Modes of a Graphical Model , 2013, AISTATS.

[6]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[7]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[8]  Surajit Ray,et al.  The topography of multivariate normal mixtures , 2005 .

[9]  J. Marron,et al.  SiZer for Exploration of Structures in Curves , 1999 .

[10]  Sebastian Nowozin,et al.  Structured Learning and Prediction in Computer Vision , 2011, Found. Trends Comput. Graph. Vis..

[11]  Menachem Fromer,et al.  Accurate prediction for atomic‐level protein design and its application in diversifying the near‐optimal sequence space , 2009, Proteins.

[12]  D. Nilsson,et al.  An efficient algorithm for finding the M most probable configurationsin probabilistic expert systems , 1998, Stat. Comput..

[13]  Andrew P. Witkin,et al.  Scale-Space Filtering , 1983, IJCAI.

[14]  Chao Chen,et al.  Diffusion runs low on persistence fast , 2011, 2011 International Conference on Computer Vision.

[15]  Oscar Firschein,et al.  Readings in computer vision: issues, problems, principles, and paradigms , 1987 .

[16]  Tony Lindeberg,et al.  Scale-Space Theory in Computer Vision , 1993, Lecture Notes in Computer Science.

[17]  Surajit Ray,et al.  A Nonparametric Statistical Approach to Clustering via Mode Identification , 2007, J. Mach. Learn. Res..

[18]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[19]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Michael I. Jordan,et al.  Beyond Independent Components: Trees and Clusters , 2003, J. Mach. Learn. Res..

[21]  Gregory Shakhnarovich,et al.  Diverse M-Best Solutions in Markov Random Fields , 2012, ECCV.

[22]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[23]  D. W. Scott,et al.  The Mode Tree: A Tool for Visualization of Nonparametric Density Features , 1993 .

[24]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[25]  B. Silverman,et al.  Using Kernel Density Estimates to Investigate Multimodality , 1981 .