Probabilistic document model for automated document composition

We present a new paradigm for automated document composition based on a generative, unified probabilistic document model (PDM) that models document composition. The model formally incorporates key design variables such as content pagination, relative arrangement possibilities for page elements and possible page edits. These design choices are modeled jointly as coupled random variables (a Bayesian Network) with uncertainty modeled by their probability distributions. The overall joint probability distribution for the network assigns higher probability to good design choices. Given this model, we show that the general document layout problem can be reduced to probabilistic inference over the Bayesian network. We show that the inference task may be accomplished efficiently, scaling linearly with the content in the best case. We provide a useful specialization of the general model and use it to illustrate the advantages of soft probabilistic encodings over hard one-way constraints in specifying design aesthetics.

[1]  Alan Borning,et al.  A constraint extension to scalable vector graphics , 2001, WWW '01.

[2]  Rob Miller,et al.  Lessons learned about one-way, dataflow constraints in the Garnet and Amulet graphical toolkits , 2001, TOPL.

[3]  Michael Frederick Plass,et al.  Optimal pagination techniques for automatic typesetting systems , 1981 .

[4]  Steven K. Feiner,et al.  A Survey of Automated Layout Techniques for Information Presentations , 2005 .

[5]  Wilmot Li,et al.  Review of automatic document formatting , 2009, DocEng '09.

[6]  Rolf Klein,et al.  On the Pagination of Complex Documents , 2003, Computer Science in Perspective.

[7]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[8]  David Salesin,et al.  Adaptive layout for dynamically aggregated documents , 2008, IUI '08.

[9]  David Salesin,et al.  Adaptive grid-based document layout , 2003, ACM Trans. Graph..

[10]  Owen Rees,et al.  A framework for structure, layout & function in documents , 2005, DocEng '05.

[11]  Peter J. Stuckey,et al.  The Cassowary linear arithmetic constraint solving algorithm , 2001, TCHI.

[12]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[13]  Craig Gotsman,et al.  Energy‐Based Image Deformation , 2009, Comput. Graph. Forum.

[14]  Xiaofan Lin,et al.  Active layout engine: Algorithms and applications in variable data printing , 2006, Comput. Aided Des..

[15]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.