Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science

NSF-funded computing centers have primarily focused on delivering high-performance computing resources to academic researchers with the most computationally demanding applications. But now that computational science is so pervasive, there is a need for infrastructure that can serve more researchers and disciplines than just those at the peak of the HPC pyramid. Here we describe SDSC's Comet system, which is scheduled for production in January 2015 and was designed to address the needs of a much larger and more expansive science community-- the "long tail of science". Comet will have a peak performance of 2 petaflop/s, mostly delivered using Intel's next generation Xeon processor. It will include some large-memory and GPU-accelerated nodes, node-local flash memory, 7 PB of Performance Storage, and 6 PB of Durable Storage. These features, together with the availability of high performance virtualization, will enable users to run complex, heterogeneous workloads on a single integrated resource.

[1]  M. Jette,et al.  Simple Linux Utility for Resource Management , 2009 .

[2]  Pierre Sens,et al.  Stream Processing of Healthcare Sensor Data: Studying User Traces to Identify Challenges from a Big Data Perspective , 2015, ANT/SEIT.

[3]  Carsten Kutzner,et al.  GROMACS 4:  Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation. , 2008, Journal of chemical theory and computation.

[4]  John Shalf,et al.  Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[5]  Mahidhar Tatineni,et al.  SR-IOV: Performance Benefits for Virtualized Interconnects , 2014, XSEDE '14.

[6]  John J. Rehr,et al.  A high performance scientific cloud computing environment for materials simulations , 2012, Comput. Phys. Commun..

[7]  Daniel S. Katz,et al.  Survey of cyberinfrastructure needs and interests of NSF-funded principal investigators , 2011 .

[8]  Michael L. Hines,et al.  The NEURON Book , 2006 .

[9]  David Hart Deep and wide metrics for HPC resource capability and project usage , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[10]  Andy B. Yoo,et al.  Approved for Public Release; Further Dissemination Unlimited X-ray Pulse Compression Using Strained Crystals X-ray Pulse Compression Using Strained Crystals , 2002 .

[11]  Mahidhar Tatineni,et al.  Using Gordon to accelerate LHC science , 2013, XSEDE.

[12]  K.K. Yoshimoto,et al.  Implementations of Urgent Computing on Production HPC Systems , 2012, ICCS.

[13]  Rupak Biswas,et al.  Performance evaluation of Amazon EC2 for NASA HPC applications , 2012, ScienceCloud '12.

[14]  James M. Bower,et al.  The book of GENESIS - exploring realistic neural models with the GEneral NEural SImulation System (2. ed.) , 1994 .

[15]  R. M. Wallace,et al.  Terrain Analysis Using Digital Elevation Models , 2001 .

[16]  Duncan Poole,et al.  Routine Microsecond Molecular Dynamics Simulations with AMBER on GPUs. 1. Generalized Born , 2012, Journal of chemical theory and computation.

[17]  Weimin Zhang,et al.  Q-Chem 2.0: a high-performance ab initio electronic structure program package , 2000, J. Comput. Chem..

[18]  Jian Wang,et al.  SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler , 2012, GigaScience.

[19]  Gregor von Laszewski,et al.  Data Analytics Driven Cyberinfrastructure Operations, Planning and Analysis Using XDMoD , 2012 .

[20]  Jianpeng Ma,et al.  CHARMM: The biomolecular simulation program , 2009, J. Comput. Chem..

[21]  A. Rambaut,et al.  BEAST: Bayesian evolutionary analysis by sampling trees , 2007, BMC Evolutionary Biology.

[22]  Mahidhar Tatineni,et al.  Trestles: a high-productivity HPC system targeted to modest-scale and gateway users , 2011 .

[23]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[24]  Kenneth Yoshimoto,et al.  Analyzing throughput and utilization on trestles , 2012, XSEDE '12.

[25]  Laxmikant V. Kalé,et al.  Scalable molecular dynamics with NAMD , 2005, J. Comput. Chem..

[26]  M. Prange,et al.  Scientific Computing in the Cloud , 2008, Computing in Science & Engineering.

[27]  Alexandros Stamatakis,et al.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models , 2006, Bioinform..

[28]  Nancy Wilkins-Diehr,et al.  TeraGrid Science Gateways and Their Impact on Science , 2008, Computer.

[29]  John P. Huelsenbeck,et al.  MRBAYES: Bayesian inference of phylogenetic trees , 2001, Bioinform..

[30]  Mark A. Miller,et al.  The CIPRES science gateway: enabling high-impact science for phylogenetics researchers with limited resources , 2012, XSEDE '12.

[31]  Duncan Poole,et al.  Routine Microsecond Molecular Dynamics Simulations with AMBER on GPUs. 2. Explicit Solvent Particle Mesh Ewald. , 2013, Journal of chemical theory and computation.

[32]  Acci Task Force on Campus Bridging National Science Foundation Advisory Committee for Cyberinfrastructure Task Force on Campus Bridging Final Report , 2011 .