Suppose a directed graph has its arcs stored in secondary memory, and we wish to compute its transitive closure, also storing the result in secondary memory. We assume that an amount of main memory capable of holding <italic>s</italic> “values” is available, and that <italic>s</italic> lies between <italic>n</italic>, the number of nodes of the graph, and <italic>e</italic>, the number of arcs. The cost measure we use for algorithms is the <italic>I/O complexity</italic> of Kung and Hong, where we count 1 every time a value is moved into main memory from secondary memory, or vice versa.
In the dense case, where <italic>e</italic> is close to <italic>n</italic><supscrpt>2</supscrpt>, we show that I/O equal to <italic>&Ogr;</italic>(<italic>n</italic><supscrpt>3</supscrpt> / √<italic>s</italic>) is sufficient to compute the transitive closure of an <italic>n</italic>-node graph, using main memory of size <italic>s</italic>. Moreover, it is necessary for any algorithm that is “standard,” in a sense to be defined precisely in the paper. Roughly, “standard” means that paths are constructed only by concatenating arcs and previously discovered paths. This class includes the usual algorithms that work for the generalization of transitive closure to semiring problems. For the sparse case, we show that I/O equal to <italic>&Ogr;</italic>(<italic>n</italic><supscrpt>2</supscrpt> √<italic>e/s</italic>) is sufficient, although the algorithm we propose meets our definition of “standard” only if the underlying graph is acyclic. We also show that &OHgr;(<italic>n</italic><supscrpt>2</supscrpt> √<italic>e/s</italic>) is necessary for any standard algorithm in the sparse case. That settles the I/O complexity of the sparse/acyclic case, for standard algorithms. It is unknown whether this complexity can be achieved in the sparse, cyclic case, by a standard algorithm, and it is unknown whether the bound can be beaten by nonstandard algorithms.
We then consider a special kind of standard algorithm, in which paths are constructed only by concatenating arcs and old paths, never by concatenating two old paths. This restriction seems essential if we are to take advantage of sparseness. Unfortunately, we show that almost another factor of <italic>n</italic> I/O is necessary. That is, there is an algorithm in this class using I/O <italic>&Ogr;</italic>(<italic>n</italic><supscrpt>3</supscrpt> √<italic>e/s</italic>) for arbitrary sparse graphs, including cyclic ones. Moreover, every algorithm in the restricted class must use &OHgr;(<italic>n</italic><supscrpt>3</supscrpt> √<italic>e/s</italic>/log<supscrpt>3</supscrpt> <italic>n</italic>) I/O, on some cyclic graphs.
[1]
Henry S. Warren,et al.
A modification of Warshall's algorithm for the transitive closure of binary relations
,
1975,
Commun. ACM.
[2]
Jeffrey D. Ullman,et al.
Principles of Database and Knowledge-Base Systems, Volume II
,
1988,
Principles of computer science series.
[3]
J. Hopcroft,et al.
Algorithm 447: efficient algorithms for graph manipulation
,
1973,
CACM.
[4]
Raghu Ramakrishnan,et al.
Efficient Transitive Closure Algorithms
,
1988,
VLDB.
[5]
J. Hopcroft,et al.
Efficient algorithms for graph manipulation
,
1971
.
[6]
Stephen Warshall,et al.
A Theorem on Boolean Matrices
,
1962,
JACM.
[7]
Edward G. Coffman,et al.
Organizing matrices and matrix operations for paged memory systems
,
1969,
Commun. ACM.
[8]
Don Coppersmith,et al.
Matrix multiplication via arithmetic progressions
,
1987,
STOC.
[9]
Alfred V. Aho,et al.
The Design and Analysis of Computer Algorithms
,
1974
.
[10]
H. V. Jagadish,et al.
Direct Algorithms for Computing the Transitive Closure of Database Relations
,
1987,
VLDB.
[11]
H. T. Kung,et al.
I/O complexity: The red-blue pebble game
,
1981,
STOC '81.