Hathitrust: A research library at web Scale

Research libraries have a mission to build collections that will meet the research needs of their user communities over time, to curate these collections to ensure perpetual access, and to facilitate intellectual and physical access to these collections as effectively as possible. Recent mass digitization projects as well as financial pressures and limited space to store print collections have created a new environment and new challenges for large research libraries. This paper will describe one approach to these challenges: HathiTrust, a shared digital repository owned and operated by a partnership of more than forty major libraries. ********** The activities of research libraries in the next five to 10 years will define the role of libraries in the digital age. The library community must now ensure that these collections not only retain their research value in a digital platform, but also realize their potential as users adjust their information needs and expectations. --HathiTrust FAQ, July 2010 (www.hathitrust.org/faq) In an era of mass digitization of library collections, research libraries are confronting an array of new challenges to continuing their traditional role as stewards of library collections. How will libraries ensure perpetual preservation of these sometimes massive new digital library collections, a promise Google does not make? How will libraries provide wide access to their digital collections in an appropriate manner, unbeholden to commercial interests and in support of the activities of scholars? What new possibilities for services are opened up by digital formats, and how can libraries bring those new services to their user communities? How do these new large digital collections relate to print collections, and what opportunities are available for libraries to coordinate collection management between print and digital materials? This paper will consider these challenges and then describe how HathiTrust, a shared digital repository owned and operated by a partnership of more than forty major research libraries, offers answers to some of these questions and an opportunity for libraries to collectively explore this new territory. Literature Review Simultaneous with lively reporting and debate in prominent popular news sources and magazines regarding Google Books and the outcomes of mass digitization projects, researchers have explored the implications of mass digitization for libraries and the collaborative possibilities for addressing the challenges of digital preservation, access, support for scholarly research, and collection management in light of new, massive digital collections. (1) The specter of commercial hosting of research library content by Google juxtaposed with the responsibility of libraries to uphold their users' right to access information, as well as their mission to preserve it, is a theme addressed by a number of researchers and library leaders. Hahn concluded that "it may be foolish to expect that commercial companies will share librarians' values and commitment to digitized material preservation" and that "research libraries alone will be held accountable for fulfilling that vital preservation mission." (2) In 2008, Brantley, then executive director of the Digital Library Federation, urged libraries to "trade for our owl account" because libraries "stand for what no other organization in this world can: the fundamental right of access to information, and the compulsion to preserve it for future generations." (3) Leetaru made a case that the output of mass digitization is "access digitization" rather than "preservation digitization." (4) He acknowledged that placing responsibility for long-term storage with libraries "is a legitimate argument, especially in light of Microsoft's recent withdrawal from book digitization," but concluded that the academic community has so far failed to provide good access service for mass digitized books. …