Stanford Megatron-LM on ROCm#
2025-07-16
5 min read time
Stanford Megatron-LM is a large-scale language model training framework developed by NVIDIA. It is designed to train massive transformer-based language models efficiently by model and data parallelism.
For hardware, software, and third-party framework compatibility between ROCm and Stanford-Megatron-LM, see:
Note
Stanford Megatron-LM is supported on ROCm 6.3.0.
Install Stanford Megatron-LM#
To install Stanford Megatron-LM on ROCm, you have the following options:
Use the prebuilt Docker image (recommended)
Use a prebuilt Docker image with Stanford Megatron-LM pre-installed#
The recommended way to set up a Stanford Megatron-LM environment and avoid potential installation issues is with Docker. The tested, prebuilt image includes Stanford Megatron-LM, PyTorch, ROCm, and other dependencies.
Prebuilt Docker images with Stanford Megatron-LM configured for ROCm 6.3.0 are available on Docker Hub.
Pull the Docker image:
docker pull rocm/stanford-megatron-lm:stanford-megatron-lm85f95ae_rocm6.3.0_ubuntu24.04_py3.12_pytorch2.4.0
Launch and connect to the container:
docker run -it --rm \ --privileged -v ./:/app \ --network=host --device=/dev/kfd \ --device=/dev/dri --group-add video \ --name=my_megatron --cap-add=SYS_PTRACE \ --security-opt seccomp=unconfined \ --ipc=host --shm-size 16G \ rocm/stanford-megatron-lm:stanford-megatron-lm85f95ae_rocm6.3.0_ubuntu24.04_py3.12_pytorch2.4.0
Build your own Docker image#
Download a base Docker image with the correct ROCm and PyTorch version
docker pull rocm/pytorch:rocm6.3_ubuntu24.04_py3.12_pytorch_release_2.4.0
Start a Docker container using the downloaded image
docker run -it --device=/dev/kfd --device=/dev/dri --group-add video rocm/pytorch:rocm6.3_ubuntu24.04_py3.12_pytorch_release_2.4.0
Set up dependencies
pip install regex nltk pybind11
Git clone the ROCm/Stanford-Megatron-LM repository
git clone https://github.com/ROCm/Stanford-Megatron-LM.git cd Stanford-Megatron-LM git checkout rocm_6_3_patch ./apply_patch
Install Stanford Megatron-LM
python setup.py install
Set up your datasets#
The following instructions explain how you use the script pretrain_gpt.sh
to run the Megatron-LM training process.
If you are working with other model sizes, the process is similar, but you will need to use a different script.
Once inside the Stanford Megatron-LM Directory, you can download the
BooksCorpus
dataset orOscar
dataset by utilizing the helper scripts stored in dataset directory:Oscar dataset
gpt2-vocab.json
gpt2-merges.txt
cd tools wget https://huggingface.co/bigscience/misc-test-data/resolve/main/stas/oscar-1GB.jsonl.xz xz -d oscar-1GB.jsonl.xz wget -O gpt2-vocab.json https://huggingface.co/openai-community/gpt2/resolve/main/vocab.json wget -O gpt2-merges.txt https://huggingface.co/mkhalifa/gpt2-biographies/resolve/main/merges.txt
Pre-process the datasets
export BASE_SRC_PATH=$(dirname $(pwd)) export BASE_DATA_PATH=${BASE_SRC_PATH}/tools python preprocess_data.py \ --input ${BASE_DATA_PATH}/oscar-1GB.jsonl \ --output-prefix ${BASE_DATA_PATH}/my-gpt2 \ --vocab-file ${BASE_DATA_PATH}/gpt2-vocab.json \ --dataset-impl mmap \ --tokenizer-type GPT2BPETokenizer \ --merge-file ${BASE_DATA_PATH}/gpt2-merges.txt \ --append-eod \ --workers 8 \ --chunk-size 10
Set up the pre-training script
Edit the
pretrain_gpt.sh
file so that it points to the correct locations.DATA_PATH=${BASE_DATA_PATH}/my-gpt2_text_document CHECKPOINT_PATH=${BASE_DATA_PATH}/checkpoints VOCAB_FILE=${BASE_DATA_PATH}/gpt2-vocab.json MERGE_FILE=${BASE_DATA_PATH}/gpt2-merges.txt
Run the pre-training script
From the project root directory, execute the following command:
./examples/pretrain_gpt.sh
Test the Stanford Megatron-LM installation#
Stanford Megatron-LM unit tests are optional for validating your installation if you used a prebuilt Docker image from AMD ROCm Docker Hub.
To run unit tests manually and validate your installation fully, follow these steps:
Note
Run the following from the Stanford Megatron-LM root. Running the unit tests requires 8 GPUs as they test for distributed functionality.
torchrun --nproc_per_node=8 -m pytest tests/ 2>&1 | tee "stanford_megatron_lm_tests.log"
Run a Stanford Megatron-LM example#
Recommended example: pretraining_gpt.sh
.
For detailed steps, refer to the ROCm AI blog.
The blog mentions Megablocks, but you can run Stanford Megatron-LM with the same steps.