Computing Resources - Training Platform

Target group

  • Scientific research, companies, public institutions

Your requirements

  • Use of high-performance computing (HPC) resources
  • Need for GPU-based systems for training tasks
  • Direct resource access via shell and batch system

Our offer

For the applications and services of the AI service centre, a GPU-based HPC system with current NVIDIA A100 and H100 GPUs is provided for training tasks in research, development and technology.

The computing nodes are connected to each other and to the storage resources also provided via a high performance InfiniBand network.

The training system is installed in Göttingen and consists of 35 nodes with 4 NVIDIA A100 SXM4 GPUs (80 GB HBM2e memory) as well as 11 nodes with 4 NVIDIA H100 SXM5 GPUs (94 GB HBM2e memory) each. The GPUs inside each node are connected with NVLINK and the nodes are connected with an InfiniBand HDR fabric (2x 200 GBit/s per node).

Software, models and data can be installed via self-service or integrated via the KISSKI catalogues. The system can either be used by direct access or as a technical basis for the KISSKI services.

Requirements

For direct use of the training platform, a current SSH client is required, depending on the applications with a local X server.

Individual requirements apply for indirect use of the computing resources through the services offered by KISSKI.

Success stories

Service type

Hardware

Contact person

Christian Boehme

Planned start date

immediately