Developing Sustainable Data Services in Cyberinfrastructure for Higher Education: Requirements and Lessons Learned

The University of California, San Diego (UC San Diego) Research Cyber infrastructure (RCI) program provides long-term quality services in centralized storage, colocation, computing, data curation, networking and technical expertise. To help define the data storage needs and set priorities, the RCI data services (RCIDS) team conducted a series of interviews with faculty and senior staff members between September 2012 and February 2013. A total of 50 groups from 29 separate departments and organized research units (ORUs) participated in the interviews, representing more than 600 UC San Diego researchers. From human genomic sequences, marine natural products, to cosmological simulations, their diverse datasets are shared with hundreds of thousands of users worldwide. The top 10 requirements on data services and the top 5 existing challenges and risks as reported by UC San Diego researchers have been identified. Based upon these requirements, the RCIDS team recommends a Network Attached Storage (NAS) data service to be first deployed with a sustainable business model. Additional services will be developed through further discussion with the research community and in view of emerging cloud computing technologies. An extensive discussion is provided on the implementation plan, cloud-based data services, and the lessons learned in building sustainable e-science infrastructure for higher education research.