System setup for AI workloads on ROCm#
2025-09-30
3 min read time
Before you begin training or inference on AMD Instinct™ GPUs, complete the following system setup and validation steps to ensure optimal performance.
Prerequisite system validation#
First, confirm that your system meets all software and hardware prerequisites. See Prerequisite system validation before running AI workloads.
Docker images for AMD Instinct GPUs#
AMD provides prebuilt Docker images for AMD Instinct™ MI300X and MI325X GPUs. These images include ROCm-enabled deep learning frameworks and essential software components. They support single-node and multi-node configurations and are ready for training and inference workloads out of the box.
Multi-node training#
For instructions on enabling multi-node training, see Multi-node setup for AI workloads.
System optimization and validation#
Before running workloads, verify that the system is configured correctly and operating at peak efficiency. Recommended steps include:
Disabling NUMA auto-balancing
Running system benchmarks to validate hardware performance
For details on running system health checks, see System health benchmarks for AI workloads.