The High Cost of Opens in the UNIX Environment

Hierarchically structured file systems modeled after the UNIX file system are rapidly becoming a de facto standard. Understanding and improving the behavior of such file systems has been hampered by a lack of information on the ways in which they are used. In this paper we present a brief description of data on file and directory reference patterns collected from a 4.2BSD UNIX system. We then use these data to investigate the importance of the name lookup overhead involved in opening and using files. We find that, in the absence of caching, name resolution accounts for the majority of the cost of accessing files. Simulations show that whole directory caching is an effective strategy for decreasing this overhead.