High-performance file system design

File systems and I/O subsystems should be smart; they can analyze how they are being used and tune themselves dynamically to improve their performance. File systems should select caching and disk placement strategies on a per-file basis, and they should use system-wide disk reorganization strategies. For example, systems should be able to reorganize the data on disk automatically during idle periods so that system performance is improved during future periods of peak load. This dissertation presents the design and analysis of iPcress, a prototype of a next-generation file system. iPcress is a smart, high-performance, reliable file system. It uses statistical information collected on a per-file basis to tune itself. iPcress has a framework in which various optimizations can be performed by the file system automatically. It is extensible; other optimization techniques can be incorporated easily, so that the system may evolve. In addition, iPcress can incorporate a variety of file access and placement techniques and choose the best combination of techniques for each file dynamically. A sample smart optimization--clustering active disk data in the center of the disk--is described; it increases disk throughput up to 30%.

[1]  K. K. Ramakrishnan,et al.  File access characterization of VAX/VMS environments , 1990, Proceedings.,10th International Conference on Distributed Computing Systems.

[2]  John S. Heidemann,et al.  Implementation of the Ficus Replicated File System , 1990, USENIX Summer.

[3]  Alan Jay Smith,et al.  Sequentiality and prefetching in database systems , 1978, TODS.

[4]  Jacques Kouloumdjian,et al.  Data Base Reorganization by Clustering Methods , 1978, Inf. Syst..

[5]  Jim Gray,et al.  Why Do Computers Stop and What Can Be Done About It? , 1986, Symposium on Reliability in Distributed Software and Database Systems.

[6]  Riccardo Gusella The Analysis of Diskless Workstation Traffic on an Ethernet , 1987 .

[7]  Randy H. Katz,et al.  Failure correction techniques for large disk arrays , 1989, ASPLOS III.

[8]  David K. Gifford,et al.  The TWA reservation system , 1984, CACM.

[9]  Krishna R. Pattipati,et al.  A calculus of variations approach to file allocation problems in computer systems , 1990, SIGMETRICS '90.

[10]  Steve R. Kleiman,et al.  Vnodes: An Architecture for Multiple File System Types in Sun UNIX , 1986, USENIX Summer.

[11]  Rick Floyd Short-Term File Reference Patterns in a UNIX Environment, , 1986 .

[12]  A. Retrospective,et al.  The UNIX Time-sharing System , 1977 .

[13]  Ronald E. Barkley,et al.  A Dynamic File System Inode Allocation and Reclaim Policy , 1990, USENIX Winter.

[14]  Ingrid Liu,et al.  Static vs Dynamic Management of Consistently Very Active Data Sets , 1987, Int. CMG Conference.

[15]  J. Howard Et El,et al.  Scale and performance in a distributed file system , 1988 .

[16]  R. A. Floyd Transparency in distributed file systems , 1989 .

[17]  A. Lewis Bastian Cached DASD Performance Prediction and Validation , 1982, Int. CMG Conference.

[18]  Malcolm C. Easton,et al.  Computation of Cold-Start Miss Ratios , 1978, IEEE Transactions on Computers.

[19]  Rafael Alonso,et al.  Long-Term Caching Strategies for Very Large Distributed File Systems , 1991, USENIX Summer.

[20]  Raymie Stata,et al.  Specifying data availability in multi-device file systems , 1990, OPSR.

[21]  Samuel H. Fuller Minimal-total-processing time drum and disk scheduling disciplines , 1974, CACM.

[22]  Alan Jay Smith,et al.  Optimization of I/O systems by cache disks and file migration: A summary , 1981, Perform. Evaluation.

[23]  Peter J. Denning,et al.  The working set model for program behavior , 1968, CACM.

[24]  Kenneth Baclawski,et al.  A stochastic model of data access and communication , 1989 .

[25]  Hector Garcia-Molina,et al.  Disk striping , 1986, 1986 IEEE Second International Conference on Data Engineering.

[26]  Ben Shneiderman Optimum data base reorganization points , 1973, CACM.

[27]  Mark B. Friedman DASD Access Patterns , 1983, Int. CMG Conference.

[28]  Robert Geist,et al.  A continuum of disk scheduling algorithms , 1987, TOCS.

[29]  DiskPerformanceCarl StaelinHector Garcia-MolinaDepartment Clustering Active Disk Data to Improve , 1990 .

[30]  Mahadev Satyanarayanan,et al.  A study of file sizes and functional lifetimes , 1981, SOSP.

[31]  Michael Stonebraker,et al.  Implementation techniques for main memory database systems , 1984, SIGMOD '84.

[32]  Peter J. Denning,et al.  Working Sets Past and Present , 1980, IEEE Transactions on Software Engineering.

[33]  Margo I. Seltzer,et al.  Disk Scheduling Revisited , 1990 .

[34]  John A. Kunze,et al.  A trace-driven analysis of the UNIX 4.2 BSD file system , 1985, SOSP '85.

[35]  John L. Gustafson,et al.  Reevaluating Amdahl's law , 1988, CACM.

[36]  Howard Frank,et al.  Analysis and Optimization of Disk Storage Devices for Time-Sharing Systems , 1969, JACM.

[37]  Thomas R. Gross,et al.  Combining the concepts of compression and caching for a two-level filesystem , 1991, ASPLOS IV.

[38]  A. Lewis Bastian,et al.  Characteristics of DASD Use , 1981, Int. CMG Conference.

[39]  Rahul Simha,et al.  A Microeconomic Approach to Optimal File Allocation , 1986, ICDCS.

[40]  Dina Bitton,et al.  Disk Shadowing , 1988, VLDB.

[41]  Carol P. Grossman,et al.  Cache-DASD Storage Design for Improving System Performance , 1985, IBM Syst. J..

[42]  Chak-Kuen Wong,et al.  On the Optimality of the Probability Ranking Scheme in Storage Applications , 1973, JACM.

[43]  David K. Gifford,et al.  A caching file system for a programmer's workstation , 1985, SOSP '85.

[44]  Andrea Sikeler VAR-PAGE-LRU A Buffer Replacement Algorithm Supporting Different Page Sizes , 1988, EDBT.

[45]  John Wilkes,et al.  Disk scheduling algorithms based on rotational position , 1991 .

[46]  Robert B. Hagmann,et al.  Reimplementing the Cedar file system using logging and group commit , 1987, SOSP '87.

[47]  Philip D. L. Koch Disk file allocation based on the buddy system , 1987, TOCS.

[48]  Jim Gray,et al.  A census of Tandem system availability between 1985 and 1990 , 1990 .

[49]  Carl Staelin,et al.  File system design using large memories , 1990, Proceedings of the 5th Jerusalem Conference on Information Technology, 1990. 'Next Decade in Information Technology'.

[50]  Ravi Krishnamurthy,et al.  The Case For Safe RAM , 1989, VLDB.

[51]  K OusterhoutJohn,et al.  Caching in the Sprite network file system , 1988 .

[52]  C. J. Date An Introduction to Database Systems, Volume II , 1980 .

[53]  Marc E. Nelson,et al.  Automatic Unix Backup in a Mass-Storage Environment , 1988, USENIX Winter.

[54]  Harvey F. Silverman,et al.  Placement of Records on a Secondary Storage Device to Minimize Access Time , 1973, JACM.

[55]  Richard B. Wilmot File Usage Patterns from SMF Data , 1989, Int. CMG Conference.

[56]  Robert S. Swarz,et al.  The theory and practice of reliable system design , 1982 .

[57]  Alan Jay Smith,et al.  Disk cache—miss ratio analysis and design considerations , 1983, TOCS.

[58]  Scott D. Carson,et al.  Error Bounds on Disk Arrangement Using Frequency Information , 1989, Inf. Process. Lett..

[59]  Garth A. Gibson Performance and Reliability in Redundant Arrays of Inexpensive Disks , 1999, Int. CMG Conference.

[60]  Joel L. Wolf,et al.  The placement optimization program: a practical solution to the disk file assignment problem , 1989, SIGMETRICS '89.

[61]  Michael G. Baker DASD Tuning - Understanding the Basics , 1989, Int. CMG Conference.

[62]  J. T. Robinson,et al.  Data cache management using frequency-based replacement , 1990, SIGMETRICS '90.

[63]  Irving L. Traiger,et al.  Evaluation Techniques for Storage Hierarchies , 1970, IBM Syst. J..

[64]  Randy H. Katz,et al.  An evaluation of redundant arrays of disks using an Amdahl 5890 , 1990, SIGMETRICS '90.

[65]  Rick Floyd,et al.  Directory Reference Patterns in a UNIX Environment. , 1986 .

[66]  Peter Dibble,et al.  A parallel interleaved file system , 1990 .

[67]  Donald E. Knuth,et al.  fundamental algorithms , 1969 .

[68]  Martin E. Schulze,et al.  Considerations in the Design of a RAID Prototype , 1988 .

[69]  Fred Douglis,et al.  Beating the I/O bottleneck: a case for log-structured file systems , 1989, OPSR.

[70]  Maurice J. Bach The Design of the UNIX Operating System , 1986 .

[71]  Willy Zwaenepoel,et al.  File access performance of diskless workstations , 1986, TOCS.

[72]  Sailesh Chutani,et al.  DEcorum File System Architectural Overview , 1990, USENIX Summer.

[73]  Bruce E. Keith Perspectives on NES File Server Performance Characterization , 1990, USENIX Summer.

[74]  B. Wolman,et al.  IOBENCH: a system independent IO benchmark , 1989, CARN.

[75]  Andreas Reuter,et al.  Principles of transaction-oriented database recovery , 1983, CSUR.

[76]  Scott D. Carson,et al.  A system for adaptive disk rearrangement , 1990, Softw. Pract. Exp..

[77]  Alexandre Brandwajn Aspects of DASD Performance , 1983, Int. CMG Conference.

[78]  David R. Cheriton,et al.  Log files: an extended file service exploiting write-once storage , 1987, SOSP '87.

[79]  William J. Bolosky,et al.  Mach: A New Kernel Foundation for UNIX Development , 1986, USENIX Summer.

[80]  Randy H. Katz,et al.  Two Papers on RAIDs , 1988 .

[81]  Michael L. Kazar,et al.  Synchronization and Caching Issues in the Andrew File System , 1988, USENIX Winter.

[82]  Mendel Rosenblum,et al.  The LFS Storage Manager , 1990, USENIX Summer.