Our HPC cluster infrastructure is a computing environment designed for high-performance computing tasks. The cluster consists of 11 nodes, with a dedicated head node for management and coordination, and 10 computing nodes. Combined, these nodes provide a total of 552 CPU cores, offering significant parallel processing capabilities. Additionally, the cluster provides a total memory capacity of 13 TB of RAM, ensuring ample memory for data-intensive applications. To accelerate specific workloads, we have integrated 4 GPUs across the cluster. Managed by the Slurm job scheduler, our cluster efficiently allocates resources and schedules jobs to maximize computational efficiency.
The HPC cluster is attached to a high performance Ceph storage cluster with ~4 PB raw storage capacity.
HPC Cluster:
Node | CPU cores | Memory (RAM) | GPU |
HPE DL580 (head, CPU, GPU) | 72 | 3 TB | 2x Nvidia A40 48GB |
HPE DL580 (compute, CPU) | 64 | 3 TB | – |
8x HPE XL220n (compute, CPU) | 48 (each) | 1 TB | – |
10x HPE XL230a (compute, CPU) | 44 (each) | 1 TB | – |
2x HPE XL290n (compute, CPU, GPU) | 48 (each) | 1 TB | 1x Nvidia A100 80GB (each) |
1x HPE DL385 (compute, CPU, GPU) | 48 | 1 TB | 2x Nvidia RTX8000 48GB (each) |
Ceph storage Cluster:
Node | CPU cores | Memory (RAM) | Storage | |
12x HPE Apollo 4200 (OSD) | 24 | 512 GB | ~4 PB (raw) | |
3x HPE DL360 (MON, MDS) | 16 | 512 GB | – | |
1x HPE DL20 (frontend) | 6 | 32 GB | – |