Dimensioning storage and computing clusters for efficient high throughput computing

Scientific experiments are producing huge amounts of data, and the size of their datasets and total volume of data continues increasing. These data are then processed by researchers belonging to large scientific collaborations, with the Large Hadron Collider being a good example. The focal point of scientific data centers has shifted from efficiently coping with PetaByte scale storage to deliver quality data processing throughput. The dimensioning of the internal components in High Throughput Computing (HTC) data centers is of crucial importance to cope with all the activities demanded by the experiments, both the online (data acceptance) and the offline (data processing, simulation and user analysis). This requires a precise setup involving disk and tape storage services, a computing cluster and the internal networking to prevent bottlenecks, overloads and undesired slowness that lead to losses cpu cycles and batch jobs failures. In this paper we point out relevant features for running a successful data storage and processing service in an intensive HTC environment.