Big data at scale for digital humanities: An architecture for the HathiTrust research center

Big Data in the humanities is a new phenomenon that is expected to revolutionize the process of humanities research. The HathiTrust Research Center (HTRC) is a cyberinfrastructure to support humanities research on big humanities data. The HathiTrust Research Center has been designed to make the technology serve the researcher to make the content easy to find, to make the research tools efficient and effective, to allow researchers to customize their environment, to allow researchers to combine their own data with that of the HTRC, and to allow researchers to contribute tools. The architecture has multiple layers of abstraction providing a secure, scalable, extendable, and generalizable interface for both human and computational users. Stacy T. Kowalczyk Dominican University, USA Yiming Sun Indiana University, USA Zong Peng Indiana University, USA Beth Plale Indiana University, USA Aaron Todd Indiana University, USA Loretta Auvil University of Illinois, USA Craig Willis University of Illinois, USA Jiaan Zeng Indiana University, USA Milinda Pathirage Indiana University, USA Samitha Liyanage Indiana University, USA Guangchen Ruan Indiana University, USA J. Stephen Downie University of Illinois, USA DOI: 10.4018/978-1-4666-4699-5.ch011

[1]  Akrivi Katifori,et al.  Digital Library Reference Model - In a Nutshell , 2011 .

[2]  Patrik Svensson,et al.  The Landscape of Digital Humanities , 2010, Digit. Humanit. Q..

[3]  Pratyusa K. Manadhata Big data for security: challenges, opportunities, and examples , 2012, BADGERS '12.

[4]  Elizabeth Sadler Project Blacklight: a next generation library catalog at a first generation university , 2009, Libr. Hi Tech.

[5]  Cesare Pautasso,et al.  Restful web services vs. "big"' web services: making the right architectural decision , 2008, WWW.

[6]  John A. Walsh,et al.  The liberty of invention: alchemical discourse and information technology standardization , 2012, Lit. Linguistic Comput..

[7]  Guan Le,et al.  Survey on NoSQL database , 2011, 2011 6th International Conference on Pervasive Computing and Applications.

[8]  Tobias Blanke,et al.  Digital Humanities Quarterly Special Cluster on Arts and Humanities e-Science , 2009, Digit. Humanit. Q..

[9]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[10]  Leigh Cunningham The Librarian as Digital Humanist: The Collaborative Role of the Research Library in Digital Humanities Projects , 2010 .

[11]  Martin Gilje Jaatun,et al.  Beyond lightning: A survey on security challenges in cloud computing , 2013, Comput. Electr. Eng..

[12]  Heather Christenson Hathitrust: A research library at web Scale , 2011 .

[13]  Geoffrey C. Fox,et al.  What is cyberinfrastructure , 2010, SIGUCCS '10.

[14]  Dick Hardt,et al.  The OAuth 2.0 Authorization Framework , 2012, RFC.

[15]  David Tcheng,et al.  A general approach to data-intensive computing using the Meandre component-based framework , 2010, Wands '10.

[16]  Keng Siau,et al.  Advanced Topics In Database Research , 2005 .

[17]  Carl Hewitt,et al.  A Universal Modular ACTOR Formalism for Artificial Intelligence , 1973, IJCAI.

[18]  Rick Cattell,et al.  Scalable SQL and NoSQL data stores , 2011, SGMD.

[19]  Adam Jacobs,et al.  The pathologies of big data , 2009, Commun. ACM.

[20]  J. Unsworth Our Cultural Commonwealth: The report of the American Council of Learned Societies Commission on Cyberinfrastructure for the Humanities and Social Sciences , 2006 .

[21]  Cesare Pastorino,et al.  The Digital Index Chemicus : toward a digital tool for studying Isaac Newton’s Index Chemicus , 2008 .

[22]  Ravi S. Sandhu,et al.  The NIST model for role-based access control: towards a unified standard , 2000, RBAC '00.

[23]  Anwar M. Ghuloum,et al.  ViewpointFace the inevitable, embrace parallelism , 2009, CACM.

[24]  Ehud Gudes,et al.  Security Issues in NoSQL Databases , 2011, 2011IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications.

[25]  L. Manovich,et al.  Trending: The Promises and the Challenges of Big Social Data , 2012 .

[26]  Christine L. Borgman,et al.  The Digital Future is Now: A Call to Action for the Humanities , 2009, Digit. Humanit. Q..

[27]  Sheila Anderson e-Science and the Arts and Humanities: scoping an agenda , 2007 .

[28]  Edd Dumbill,et al.  Making Sense of Big Data , 2013, Big Data.

[29]  Kevin B. Gunn,et al.  Digital Humanities Where to start , 2012 .

[30]  Charlotte P. Lee,et al.  Sustaining the development of cyberinfrastructure: an organization adapting to change , 2012, CSCW.

[31]  Yuan Jin,et al.  Spatial cyberinfrastructures, ontologies, and the humanities , 2011, Proceedings of the National Academy of Sciences.

[32]  Gregory R. Crane,et al.  What Do You Do with a Million Books? , 2006, D Lib Mag..

[33]  Stefan Jablonski,et al.  NoSQL evaluation: A use case oriented survey , 2011, 2011 International Conference on Cloud and Service Computing.

[34]  Andrew J. Hutton,et al.  Lustre: Building a File System for 1,000-node Clusters , 2003 .

[35]  Patrik Svensson From Optical Fiber To Conceptual Cyberinfrastructure , 2011, Digit. Humanit. Q..

[36]  Daniel V. Pitti Designing Sustainable Projects and Publications , 2007 .

[37]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[38]  Christopher J. Mackie Cyberinfrastructure, institutions and sustainability , 2007, First Monday.

[39]  Marlon Pierce,et al.  Cyberinfrastructure Software Sustainability and Reusability: Report from an NSF-funded workshop held 27 & 28 March 2009 , 2010 .