Myths and Facts about the Efficient Implementation of Finite Automata and Lexical Analysis

Finite automata and their application in lexical analysis play an important role in many parts of computer science and particularly in compiler constructions. We measured 12 scanners using different implementation strategies and found that the execution time differed by a factor of 74. Our analysis of the algorithms as well as run-time statistics on cache misses and instruction frequency reveals substantive differences in code locality and certain kinds of overhead typical for specific implementation strategies. Some of the traditional statements on writing “fast” scanners could not be confirmed. Finally, we suggest an improved scanner generator.

[1]  Douglas W. Jones,et al.  How (not) to code a finite state machine , 1988, SIGP.

[2]  Peter Bumbulis,et al.  RE2C: a more versatile scanner generator , 1993, LOPL.

[3]  Josef Grosch Efficient generation of lexical analysers , 1989, Softw. Pract. Exp..

[4]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[5]  David Alan Wolverton A perfect hash function for Ada reserved words , 1984, ALET.

[6]  Erhard Plödereder,et al.  Where Does GOTO Go to? , 1996, Ada-Europe.

[7]  Douglas T. Ross,et al.  Automatic generation of efficient lexical processors using finite state techniques , 1968, CACM.

[8]  S. Tucker Taft,et al.  Ada 95 Reference Manual , 1995, Lecture Notes in Computer Science.

[9]  Curtis R. Cook,et al.  A letter oriented minimal perfect hashing function , 1982, SIGP.

[10]  H. Mössenböck Alex - A simple and efficient scanner generator , 1986 .

[11]  Tony Mason,et al.  Lex & Yacc , 1992 .

[12]  Paul G. Sorenson,et al.  The Theory And Practice of Compiler Writing , 1985 .

[13]  David L Weaver,et al.  The SPARC architecture manual : version 9 , 1994 .

[14]  George Havas,et al.  Graph-theoretic obstacles to perfect hashing , 1993 .

[15]  Anthony M. Sloane,et al.  Eli: a complete, flexible compiler construction system , 1992, CACM.

[16]  Christopher W. Fraser,et al.  A retargetable compiler for ANSI C , 1991, SIGP.

[17]  Richard J. Cichelli Minimal perfect hash functions made simple , 1980, CACM.

[18]  Josef Grosch Generators for High-Speed Front-Ends , 1988, CC.

[19]  Paul Lukowicz,et al.  Experimental evaluation in computer science: A quantitative study , 1995, J. Syst. Softw..

[20]  Duane Szafron,et al.  LexAGen: An interactive incremental scanner generator , 1990, Softw. Pract. Exp..

[21]  Dieter Maurer,et al.  Compiler Design , 2013, Springer Berlin Heidelberg.

[22]  William M. Waite,et al.  The cost of lexical analysis , 1986, Softw. Pract. Exp..

[23]  George Havas,et al.  An Optimal Algorithm for Generating Minimal Perfect Hash Functions , 1992, Inf. Process. Lett..

[24]  John G. P. Barnes,et al.  Programming in Ada 95 , 1995 .