Optimal pagination techniques for automatic typesetting systems

This thesis considers how to use a computer to break a document into pages suitable for printing. Although this problem is easy to solve when the document consists of just text, it becomes complicated when footnotes, displays, and figures are introduced. These elements add some freedom of choice in the way breaks are chosen, since the white space around the displays and the exact placement of the figures can be decided by the pagination algorithm. Out of the many possible ways to paginate such a document, the pagination algorithm should pick the one that is in some sense optimal. The approach taken here is to define a badness function that depends on the way the document is broken up, and then to design an algorithm to find a way to minimize the value of this function. The document is modelled by two lists, the text list and the figure list. Each item in the text list is either a 'box', corresponding to something that will print such as a line of text, a 'glue' item, corresponding to the white space between the lines, a 'penalty' item, corresponding to a legal place to break the list, or a 'citation', marking a reference to one of the figures. The items in the figure list indicate the size of each figure, and by how much each figure is allowed to stretch or shrink. This model is based on the one used in the TEX typesetting system. The optimizing pagination algorithm uses dynamic programming to find, for each i, j, and k, the best way to put the first i lines of text and the first j figures onto the first k pages; to make the program run in a reasonable amount of time, this calculation includes only those subproblems that are feasible, i.e., likely to lead to a solution with a small badness value. The badness function must be chosen carefully in order to get a problem that can be solved by these techniques. For certain simple badness functions, the pagination problem is NP-complete; two such functions are described in the thesis.