Training a model with Primus and Megatron-LM

Training a model with Primus and Megatron-LM#

2026-06-30

29 min read time

Applies to Linux

Primus is a unified and flexible training framework for AMD Instinct GPUs designed to support multiple training engine backends – including Megatron – to deliver scalable, high-performance model training. Performance acceleration is powered by Primus Turbo and ROCm libraries.

Note

The rocm/pytorch-training Docker Hub registry will be deprecated soon in favor of rocm/primus. The rocm/primus Docker containers will cover PyTorch training ecosystem frameworks, including Megatron-LM and torchtitan.

Primus with Megatron is designed to replace the ROCm Megatron-LM training workflow. To learn how to migrate workloads from Megatron-LM to Primus with Megatron, see Migrating workloads to Primus (Megatron backend) from Megatron-LM.

AMD provides a ready-to-use Docker images for MI355X, MI350X, MI325X, and MI300X GPUs containing essential components for Primus, ROCm, and Megatron-LM.

rocm/primus:v26.4

Software component	Version
ROCm	7.14.0a20260608
PyTorch	2.12.0+git7e98855
Python	3.12.3
Transformer Engine	2.14.0.dev0+e6ede467
Flash Attention	2.8.3
hipBLASLt	1.4.0-c2fafc16
Triton	3.7.0+gitb4e20bbe
RCCL	2.28.9

Supported models#

The following models are pre-optimized for performance on AMD Instinct GPUs. Some instructions, commands, and training examples in this documentation might vary by model – select one to get started.

Model

Meta Llama

AMD Zebra-Llama

OpenAI GPT-OSS

DeepSeek

Mistral AI

Qwen

Variant

Llama 3.3 70B

Llama 3.1 8B

Llama 3.1 70B

Llama 2 7B

Llama 2 70B

Llama 3.1 405B

Zebra-Llama 1B

Zebra-Llama 3B

Zebra-Llama 8B

GPT-OSS-20B

GPT-OSS-120B

DeepSeek-V2-Lite

Mixtral 8x7B

Mixtral 8x22B

Qwen 3 32B SFT

Qwen 3 32B LoRA

Qwen 3 30B A3B

Qwen 2.5 7B

Qwen 2.5 72B

Qwen3 235B (A22B)

Note

Some models, such as Llama, require an external license agreement through a third party (for example, Meta).

System validation#

Before running AI workloads, it’s important to validate that your AMD hardware is configured correctly and performing optimally.

If you have already validated your system settings, including aspects like NUMA auto-balancing, you can skip this step. Otherwise, complete the procedures in the System validation and optimization guide to properly configure your system settings before starting training.

To test for optimal performance, consult the recommended System health benchmarks. This suite of tests will help you verify and fine-tune your system’s configuration.

Environment setup#

Use the following instructions to set up the environment, configure the script to train models, and reproduce the benchmark results on AMD Instinct GPUs.

Pull the Docker image

Pull the rocm/primus:v26.4 Docker image from Docker Hub.
```
docker pull rocm/primus:v26.4
```

Launch the Docker container.

docker run -it \
    --device /dev/dri \
    --device /dev/kfd \
    --device /dev/infiniband \
    --network host --ipc host \
    --group-add video \
    --cap-add SYS_PTRACE \
    --security-opt seccomp=unconfined \
    --privileged \
    -v $HOME:/userHome \
    --shm-size 128G \
    --name primus_training_env \
    rocm/primus:v26.4

Use these commands if you exit the primus_training_env container and need to return to it.

docker start primus_training_env
docker exec -it primus_training_env bash

The Docker container hosts the Primus repository at tag release/v26.4.

Configuration#

Primus defines a training configuration in YAML for each model in examples/megatron/configs.

For example, to update training parameters for Llama 3.3 70B, you can update examples/megatron/configs/llama3.3_70B-pretrain.yaml. Training configuration YAML files for other models follow this naming convention.

For example, to update training parameters for Llama 3.1 8B, you can update examples/megatron/configs/llama3.1_8B-pretrain.yaml. Training configuration YAML files for other models follow this naming convention.

For example, to update training parameters for Llama 3.1 70B, you can update examples/megatron/configs/llama3.1_70B-pretrain.yaml. Training configuration YAML files for other models follow this naming convention.

For example, to update training parameters for Llama 2 7B, you can update examples/megatron/configs/llama2_7B-pretrain.yaml. Training configuration YAML files for other models follow this naming convention.

For example, to update training parameters for Llama 2 70B, you can update examples/megatron/configs/llama2_70B-pretrain.yaml. Training configuration YAML files for other models follow this naming convention.

For example, to update training parameters for Llama 3.1 405B, you can update examples/megatron/configs/llama3.1_405B-FP8-pretrain.yaml. Training configuration YAML files for other models follow this naming convention.

For example, to update training parameters for Zebra-Llama 1B, you can update examples/megatron/configs/zebra_llama_1b-pretrain.yaml. Training configuration YAML files for other models follow this naming convention.

For example, to update training parameters for Zebra-Llama 3B, you can update examples/megatron/configs/zebra_llama_3b-pretrain.yaml. Training configuration YAML files for other models follow this naming convention.

For example, to update training parameters for Zebra-Llama 8B, you can update examples/megatron/configs/zebra_llama_8b-pretrain.yaml. Training configuration YAML files for other models follow this naming convention.

For example, to update training parameters for GPT-OSS-20B, you can update examples/megatron/configs/gpt_oss_20B-BF16-pretrain.yaml. Training configuration YAML files for other models follow this naming convention.

For example, to update training parameters for GPT-OSS-120B, you can update examples/megatron/configs/gpt_oss_120B-BF16-pretrain.yaml. Training configuration YAML files for other models follow this naming convention.

For example, to update training parameters for DeepSeek-V2-Lite, you can update examples/megatron/configs/deepseek_v2_lite-pretrain.yaml. Training configuration YAML files for other models follow this naming convention.

For example, to update training parameters for Mixtral 8x7B, you can update examples/megatron/configs/mixtral_8x7B_v0.1-pretrain.yaml. Training configuration YAML files for other models follow this naming convention.

For example, to update training parameters for Mixtral 8x22B, you can update examples/megatron/configs/mixtral_8x22B_v0.1-BF16-pretrain.yaml. Training configuration YAML files for other models follow this naming convention.

For example, to update training parameters for Qwen 3 32B SFT, you can update examples/megatron/configs/qwen3_32b_sft_posttrain.yaml. Training configuration YAML files for other models follow this naming convention.

For example, to update training parameters for Qwen 3 32B LoRA, you can update examples/megatron/configs/qwen3_32b_lora_posttrain.yaml. Training configuration YAML files for other models follow this naming convention.

For example, to update training parameters for Qwen 3 30B A3B, you can update examples/megatron/configs/qwen3_30B_A3B-pretrain.yaml. Training configuration YAML files for other models follow this naming convention.

For example, to update training parameters for Qwen 2.5 7B, you can update examples/megatron/configs/qwen2.5_7B-BF16-pretrain.yaml. Training configuration YAML files for other models follow this naming convention.

For example, to update training parameters for Qwen 2.5 72B, you can update examples/megatron/configs/qwen2.5_72B-pretrain.yaml. Training configuration YAML files for other models follow this naming convention.

For example, to update training parameters for Qwen3 235B (A22B), you can update examples/megatron/configs/qwen3_235B_A22B-BF16-pretrain.yaml. Training configuration YAML files for other models follow this naming convention.

Note

See Key options for more information on configuration options.

Dataset options#

You can use either mock data or real data for training.

Mock data can be useful for testing and validation. Use the mock_data field to toggle between mock and real data. The default value is true for enabled.
```
mock_data: true
```
If you’re using a real dataset, update the train_data_path field to point to the location of your dataset.
```
mock_data: false
train_data_path: /path/to/your/dataset
```
Ensure that the files are accessible inside the Docker container.

Tokenizer#

Set the HF_TOKEN environment variable with right permissions to access the tokenizer for each model.

# Export your HF_TOKEN in the workspace
export HF_TOKEN=<your_hftoken>

Previous versions#

See Megatron-LM training performance testing version history to find documentation for previous releases of the Primus with Megatron-LM training recipe.

Training a model with Primus and Megatron-LM

Contents

Training a model with Primus and Megatron-LM#

Supported models#

System validation#

Environment setup#

Configuration#

Dataset options#

Tokenizer#

Run training#

Single node training#

Multi-node training examples#

Key options#

Further reading#

Previous versions#