In recent years, big data analysis has been widely applied to many research fields including biology, physics, transportation, and material science. Even though the demands for big data migration and big data analysis are dramatically increasing in campus IT infrastructures, there are several technical challenges that need to be addressed. First of all, frequent big data transmission between storage systems in different research groups imposes heavy burdens on a regular campus network. Second, the current campus IT infrastructure is not designed to fully utilize the hardware capacity for big data migration and analysis. Last but not the least, running big data applications on top of large-scale high-performance computing facilities is not straightforward, especially for researchers and engineers in non-IT disciplines. We develop a campus IT cyberinfrastructure for big data migration and analysis, called BIC-LSU, which consists of a task-aware Clos OpenFlow network, high-performance cache storage servers, customized high-performance transfer applications, a light-weight control framework to manipulate existing big data storage systems and job scheduling systems, and a comprehensive social networking-enabled web portal. BIC-LSU achieves 40Gb/s disk-to-disk big data transmission, maintains short average transmission task completion time, enables the convergence of control on commonly deployed storage and job scheduling systems, and enhances easiness of big data analysis with a universal user-friendly interface. BIC-LSU software requires minimum dependencies and has high extensibility. Other research institutes can easily customize and deploy BIC-LSU as an augmented service on their existing IT infrastructures.
[1]
Nick McKeown,et al.
pFabric: minimal near-optimal datacenter transport
,
2013,
SIGCOMM.
[2]
Steven J. M. Jones,et al.
Abyss: a Parallel Assembler for Short Read Sequence Data Material Supplemental Open Access
,
2022
.
[3]
Charles Clos,et al.
A study of non-blocking switching networks
,
1953
.
[4]
Thomas L. Madden,et al.
The BLAST Sequence Analysis Tool
,
2013
.
[5]
Ian Foster,et al.
The Globus toolkit
,
1998
.
[6]
Mahidhar Tatineni,et al.
Storage utilization in the long tail of science
,
2015,
XSEDE.
[7]
Nancy Wilkins-Diehr,et al.
Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science
,
2014,
XSEDE '14.
[8]
George Varghese,et al.
Efficient fair queueing using deficit round robin
,
1995,
SIGCOMM '95.
[9]
Ion Stoica,et al.
Efficient coflow scheduling with Varys
,
2014,
SIGCOMM.
[10]
Devavrat Shah,et al.
Maximal matching scheduling is good enough
,
2003,
GLOBECOM '03. IEEE Global Telecommunications Conference (IEEE Cat. No.03CH37489).