Skip to content

HPC

Preamble

Context and intents: - Tension between performance and portability (hardware-dependent) - Difference between portability and reproducibility - We won't talk about - Workflow (link to another card) - Floating-point computation (link) - deterministic distributed computations (link) - We focus here on deployment issues

References - https://hal.science/hal-03010231v1

Deployment

Deployment and available tools

  • goal: reduce variability as much as possible (can't possibly eliminate it because supercomputers differ too much)
  • commonly-used tools vs. reproducibility in time and space (PC vs SC, SC vs SC)
    • modules
    • Spack, EasyBuild -> link to "package managers"
    • conda
    • Guix
    • Singularity, Apptainer, Docker, podman, PCOCC (TGCC: https://github.com/cea-hpc/pcocc) -> link to "containers" & "container images"
      • link to "Performance portability"

Performance portability (MPI, CPUs, GPUs)

  • MPI stack
    • drivers for the right hardware
    • "bring your own MPI" (containers) vs. using the vendor-provided MPI
  • CPUs
    • optimizing for the right set of vector instructions (AVX2, AVX-512, etc.)
  • GPUs

    • CUDA vs. HIP/ROCm

    • for CUDA

      • deployment issue: rely on the local libcuda.so
      • build software against the right CUDA version for the target machine
      • produce code targeting the right GPU(s) (H100, etc.)
    • for HIP/ROCm
      • can "bring your own HIP/ROCm stack"
      • ?