Asymptotic Estimation of the Average Number of Terminal States in DAWGs

Abstract Following the work of A. Blumer, A. Ehrenfeucht and D. Haussler, we obtain an asymptotic estimation of the average number of terminal states in the suffix directed acyclic word graph (DAWG – also called suffix automaton) under a Bernoulli model. We first extract an expression of the average from the structure of the DAWG. With a Mellin transform, we then obtain an asymptotic expansion of the form ln(n)/ln(A)+C(A)+F(n) where n is the size of the word, A the alphabet size, C(A) a function of A, and F an oscillating function with small amplitude. Finally, we compare theoretical results with experimental results.