Efficient generation of lexical analysers

General tools for the automatic generation of lexical analysers such as LEX1 convert a specification consisting of a set of regular expressions into a deterministic finite automaton. The main algorithms involved are subset construction, state minimization and table compression. Even if these algorithms do not show their worst‐case time behaviour they are still quite expensive. This paper shows how to solve the problem in linear time for practical cases, thus resulting in a significant speed‐up. The idea is to combine the automaton introduced by Aho and Corasick2 with the direct computation of an efficient table representation. Besides the algorithm we present experimental results of the scanner generator Rex3 which uses this technique.