Stanford Megatron-LM on ROCm installation

Stanford Megatron-LM on ROCm installation#

2025-10-21

5 min read time

Applies to Linux

Stanford Megatron-LM is a large-scale language model (LLM) training framework designed to train massive transformer-based models efficiently through data parallelism.

This topic covers setup instructions and the necessary files to build, test, and run Stanford Megatron-LM with ROCm support in a Docker environment. To learn more about Stanford Megatron-LM on ROCm, including its use cases, recommendations, as well as hardware and software compatibility, see Stanford Megatron-LM compatibility.

Note

Stanford Megatron-LM is supported on ROCm 6.3.0.

Install Stanford Megatron-LM#

To install Stanford Megatron-LM on ROCm, you have the following options:

Use the prebuilt Docker image (recommended)
Build your own Docker image

Use a prebuilt Docker image with Stanford Megatron-LM pre-installed#

The recommended way to set up a Stanford Megatron-LM environment and avoid potential installation issues is with Docker. The tested, prebuilt image includes Stanford Megatron-LM, PyTorch, ROCm, and other dependencies.

Prebuilt Docker images with Stanford Megatron-LM configured for ROCm 6.3.0 are available on Docker Hub.

Pull the Docker image:

docker pull rocm/stanford-megatron-lm:stanford-megatron-lm85f95ae_rocm6.3.0_ubuntu24.04_py3.12_pytorch2.4.0

Launch and connect to the container:

docker run -it --rm \
--privileged -v ./:/app \
--network=host --device=/dev/kfd \
--device=/dev/dri --group-add video \
--name=my_megatron --cap-add=SYS_PTRACE \
--security-opt seccomp=unconfined \
--ipc=host --shm-size 16G \
rocm/stanford-megatron-lm:stanford-megatron-lm85f95ae_rocm6.3.0_ubuntu24.04_py3.12_pytorch2.4.0

Build your own Docker image#

Download a base Docker image with the correct ROCm and PyTorch version

docker pull rocm/pytorch:rocm6.3_ubuntu24.04_py3.12_pytorch_release_2.4.0

Start a Docker container using the downloaded image

docker run -it \
--device=/dev/kfd \
--device=/dev/dri \
--group-add video \
rocm/pytorch:rocm6.3_ubuntu24.04_py3.12_pytorch_release_2.4.0

Set up dependencies
```
pip install regex nltk pybind11
```

Git clone the ROCm/Stanford-Megatron-LM repository

git clone https://github.com/ROCm/Stanford-Megatron-LM.git
cd Stanford-Megatron-LM
git checkout rocm_6_3_patch
./apply_patch

Install Stanford Megatron-LM
```
python setup.py install
```

Set up your datasets#

The following instructions explain how you use the script pretrain_gpt.sh to run the Megatron-LM training process. If you are working with other model sizes, the process is similar, but you will need to use a different script.

Once inside the Stanford Megatron-LM Directory, you can download the BooksCorpus dataset or Oscar dataset by utilizing the helper scripts stored in dataset directory:

Oscar dataset
gpt2-vocab.json
gpt2-merges.txt

cd tools
wget https://huggingface.co/bigscience/misc-test-data/resolve/main/stas/oscar-1GB.jsonl.xz
xz -d oscar-1GB.jsonl.xz
wget -O gpt2-vocab.json https://huggingface.co/openai-community/gpt2/resolve/main/vocab.json
wget -O gpt2-merges.txt https://huggingface.co/mkhalifa/gpt2-biographies/resolve/main/merges.txt

Pre-process the datasets

export BASE_SRC_PATH=$(dirname $(pwd))
export BASE_DATA_PATH=${BASE_SRC_PATH}/tools
python preprocess_data.py \
--input ${BASE_DATA_PATH}/oscar-1GB.jsonl \
--output-prefix ${BASE_DATA_PATH}/my-gpt2 \
--vocab-file ${BASE_DATA_PATH}/gpt2-vocab.json \
--dataset-impl mmap \
--tokenizer-type GPT2BPETokenizer \
--merge-file ${BASE_DATA_PATH}/gpt2-merges.txt \
--append-eod \
--workers 8 \
--chunk-size 10

Set up the pre-training script

Edit the pretrain_gpt.sh file so that it points to the correct locations.

DATA_PATH=${BASE_DATA_PATH}/my-gpt2_text_document
CHECKPOINT_PATH=${BASE_DATA_PATH}/checkpoints
VOCAB_FILE=${BASE_DATA_PATH}/gpt2-vocab.json
MERGE_FILE=${BASE_DATA_PATH}/gpt2-merges.txt

Run the pre-training script

From the project root directory, execute the following command:
```
./examples/pretrain_gpt.sh
```

Test the Stanford Megatron-LM installation#

Stanford Megatron-LM unit tests are optional for validating your installation if you used a prebuilt Docker image from AMD ROCm Docker Hub.

To run unit tests manually and validate your installation fully, follow these steps:

Note

Run the following from the Stanford Megatron-LM root. Running the unit tests requires 8 GPUs as they test for distributed functionality.

torchrun --nproc_per_node=8 -m pytest tests/ 2>&1 | tee "stanford_megatron_lm_tests.log"

Run a Stanford Megatron-LM example#

Recommended example: pretraining_gpt.sh. For detailed steps, refer to the ROCm AI blog.

The blog mentions Megablocks, but you can run Stanford Megatron-LM with the same steps.