Amino : Extending ACID Semantics to the File System

An organization’s data is often its most valuable asset, but today’s file systems provide few facilities to ensure its safety. Databases, on the other hand, have long provided transactions. Transactions are useful because they provide atomicity, consistency, isolation, and durability (ACID). Many applications could make use of these semantics, but databases have a wide variety of non-standard interfaces. As a result, applications like mail servers currently perform elaborate error handling to ensure atomicity and consistency, because it is easier and more portable than using a DBMS. Most editors write changes to a temporary file, and then rename the temporary file over the original to replace the content atomically, but there is no standard method to atomically update two related files (e.g., /etc/shadow and /etc/passwd). A transaction-oriented programming model provides three key benefits: (1) complex error-handling code is avoided, because failed operations can simply be aborted; (2) concurrent accesses behave as if they were serialized (thereby preventing time-of-check-time-of-use security vulnerabilities); and (3) once a transaction is committed, it will not be lost due to software or hardware failures. We believe that file systems should export transactions as a first-class service. In this way, applications can continue to use the simple, flexible, and pervasive POSIX API to access their data, but do so in a transactionally protected environment. Furthermore, other applications, which may not need such transactional protections can still access the data, without any changes. To support transactions properly, both file systems and the OS itself must handle transactions. File systems must log redo and undo information, so that transactions can be applied or aborted. The OS also needs to support transactions: traditional caches (e.g., the page cache or directory-name-lookup cache) return data to user space applications without consulting the file system. This creates several problems for transactional file systems. If the database management system does not mitigate all accesses, then the ACID properties cannot be guaranteed. If a transaction was aborted, then the OS caches will contain stale data—violating atomicity. If the database is not consulted before accessing an object, then it can’t perform proper locking— violating isolation. This dictates that an OS which supports transactions must have database-managed caches. On further examination, database-managed caches are only a first step. When accesses are mitigated through the database’s caches, the file system is consistent and the user-level application can interact with it transactionally, but other OS components do not share this consistent state. For example, if a file is created by opening it with the O CREAT option, then a new file descriptor is created and inserted into the OS’s process control block (PCB). If the transaction is later aborted, then the PCB will point to a file descriptor for a non-existent file. Clearly, these issues demand support for transactions in the OS proper. This means that it is useful for the OS to extend transactions as far as possible, including to the application so that its data structures can be kept consistent with the file system. Status. We have designed a prototype file system that exports ACID transactions to user-level applications, while preserving the ubiquitous and convenient POSIX interface. In our prototype ACID file system, called Amino, unmodified applications operate without any changes. For these unmodified applications, each system call is transaction protected. Using Amino, application developers can protect an arbitrary sequence of system calls using transactions by inserting simple BEGIN, COMMIT, and ABORT calls. Other transactional file systems such as QuickSilver [3] only run on specialized OSes designed from the ground up for transactions, or change the file access API (e.g, QuickSilver cannot randomly write to a file and WinFS [1] changes the file access API to one that uses items instead of files), preventing existing applications from accessing the data. We built our prototype on top of the Berkeley Database (BDB), a time-tested and reliable embedded database. Our prototype intercepts system calls at the ABI-level via a ptrace monitor to provide OS services to any application from userlevel. Because we intercept operations at the ABI, any existing application can be run through our monitor. More importantly, we are operating above the OS, so that we can use databasemanaged caches, and transactionally protect all of the relevant file system structures (including PCBs). To protect these structures, we use recoverable virtual memory (RVM) [2]. Our RVM system improves on previous systems by allowing nested transactions and transparently logging updates via page-protection. Our performance evaluation shows that ACID semantics can be added to applications with acceptable overheads. When Amino adds atomicity, consistency, and isolation functionality to an application, it has an overhead of 19.8% over Ext3 due to overhead of the monitoring infrastructure. When atomicity, consistency, isolation and durability are provided, it is up to 39.8% faster than Ext3, because BDB’s balance tree structure has improved locality and is optimized for durable performance.