HPC¶
Preamble¶
Context and intents: - Tension between performance and portability (hardware-dependent) - Difference between portability and reproducibility - We won't talk about - Workflow (link to another card) - Floating-point computation (link) - deterministic distributed computations (link) - We focus here on deployment issues
References - https://hal.science/hal-03010231v1
Deployment¶
Deployment and available tools¶
- goal: reduce variability as much as possible (can't possibly eliminate it because supercomputers differ too much)
- commonly-used tools vs. reproducibility in time and space (PC vs SC, SC vs SC)
- modules
- Spack, EasyBuild -> link to "package managers"
- conda
- Guix
- Singularity, Apptainer, Docker, podman, PCOCC (TGCC: https://github.com/cea-hpc/pcocc) -> link to "containers" & "container images"
- link to "Performance portability"
Performance portability (MPI, CPUs, GPUs)¶
- MPI stack
- drivers for the right hardware
- "bring your own MPI" (containers) vs. using the vendor-provided MPI
- CPUs
- optimizing for the right set of vector instructions (AVX2, AVX-512, etc.)
-
GPUs
-
CUDA vs. HIP/ROCm
-
for CUDA
- deployment issue: rely on the local
libcuda.so - build software against the right CUDA version for the target machine
- produce code targeting the right GPU(s) (H100, etc.)
- deployment issue: rely on the local
- for HIP/ROCm
- can "bring your own HIP/ROCm stack"
- ?
-