论文信息 - Exact Inference Algorithms and Their Optimization in Bayesian Clustering

Exact Inference Algorithms and Their Optimization in Bayesian Clustering

Clustering is a central task in computational statistics. Its aim is to divide observed data into groups of items, based on the similarity of their features. Among various approaches to clustering, Bayesian model-based clustering has recently gained popularity. Many existing works are based on stochastic sampling methods. This work is concerned with exact, exponential-time algorithms for the Bayesian model-based clustering task. In particular, we consider the exact computation of two summary statistics: the number of clusters, and pairwise incidence of items in the same cluster. We present an implemented algorithm for computing these statistics substantially faster than would be achieved by direct enumeration of the possible partitions. The method is practically applicable to data sets of up to approximately 25 items. We apply a variant of the exact inference method into graphical models where a given variable may have up to four parent variables. The parent variables can then have up to 16 value combinations, and the task is to cluster them and find combinations that lead to similar conditional probability tables. Further contributions of this work are related to number theory. We show that a novel combination of addition chains and additive bases provides the optimal arrangement of multiplications, when the task is to use repeated multiplication starting from a given number or entity, but only a certain kind of function of the successive powers is required. This arrangement speeds up the computation of the posterior distribution for the number of clusters. The same arrangement method can be applied to other multiplicative tasks, for example, in matrix multiplication. We also present new algorithmic results related to finding extremal additive bases. Before this work, the extremal additive bases were known up to length 23. We have computed them up to length 24 in the unrestricted case, and up to length 41 in the restricted case.

Jukka Kohonen | J. Kohonen

[1] Svein Mossige. Algorithms for computing the $h$-range of the postage stamp problem , 1981 .

[2] Samuel S. Wagstaff,et al. Additive h-bases for n , 1979 .

[3] A new upper bound for finite additive bases , 2005, math/0503241.

[4] P. Green,et al. Bayesian Model-Based Clustering Procedures , 2007 .

[5] John P. Robinson,et al. Some Extremal Postage Stamp Bases , 2010 .

[6] Jukka Corander,et al. BAPS 2: enhanced possibilities for the analysis of genetic population structure , 2004, Bioinform..

[7] Jukka Corander,et al. Addition Chains Meet Postage Stamps: Reducing the Number of Multiplications , 2013, J. Integer Seq..

[8] A. Stöhr,et al. Gelöste und ungelöste Fragen über Basen der natürlichen Zahlenreihe. I. , 1955 .

[9] Jeffrey A. Barnett,et al. A Postage Stamp Problem , 1980 .

[10] Arnulf Von Mrose. Untere Schranken für die Reichweiten von Extremalbasen fester Ordnung , 1979 .

[11] Hans Rohrbach. Ein Beitrag zur additiven Zahlentheorie , 1937 .