Infrastructure

Our HPC cluster infrastructure is a computing environment designed for high-performance computing tasks. The cluster consists of a dedicated head node for management and coordination, and 22 computing nodes. Combined, these nodes provide a total of 1032 CPU cores, offering significant parallel processing capabilities. Additionally, the cluster provides a total memory capacity of 27 TB of RAM, ensuring ample memory for data-intensive applications. To accelerate specific workloads, we have integrated 6 GPUs across the cluster. Managed by the Slurm job scheduler, our cluster efficiently allocates resources and schedules jobs to maximize computational efficiency.

The HPC cluster is attached to a high performance Ceph storage cluster with ~4 PB raw storage capacity.

HPC Cluster:

Node	CPU cores	Memory (RAM)	GPU
HPE DL580 (head, CPU, GPU)	72	3 TB	2x Nvidia A40 48GB
HPE DL580 (compute, CPU)	64	3 TB	–
8x HPE XL220n (compute, CPU)	48 (each)	1 TB	–
10x HPE XL230a (compute, CPU)	44 (each)	1 TB	–
2x HPE XL290n (compute, CPU, GPU)	48 (each)	1 TB	1x Nvidia A100 80GB (each)
1x HPE DL385 (compute, CPU, GPU)	48	1 TB	2x Nvidia RTX8000 48GB (each)

Ceph storage Cluster:

Node	CPU cores	Memory (RAM)	Storage
12x HPE Apollo 4200 (OSD)	24	512 GB	~4 PB (raw)
3x HPE DL360 (MON, MDS)	16	512 GB	–
1x HPE DL20 (frontend)	6	32 GB	–