vLLM inference#

2026-02-25

4 min read time

Applies to Linux

The ROCm-enabled vLLM Docker image offers a prebuilt, optimized environment for large language model (LLM) inference on AMD Instinct MI355X, MI350X, MI325X and MI300X GPUs. This ROCm vLLM Docker image integrates vLLM and PyTorch tailored specifically for AMD Instinct data center GPUs.

This container integrates ROCm, PyTorch, and vLLM with optimizations tailored for AMD Instinct data center GPUs, enabling consistent and reproducible inference deployments.

What’s new#

  • For vLLM release notes on model support, hardware and performance improvements, and other highlights, see the vLLM Releases page on GitHub.

  • It’s now recommended to use the upstream vLLM documentation at docs.vllm.ai for the latest inference and deployment guides.

Get started#

For a consistent and portable inference environment, it’s recommended to use Docker. vLLM offers a Docker image vllm/vllm-openai-rocm for deployment on AMD GPUs. Use the following command to pull the latest Docker image from Docker Hub.

docker pull vllm/vllm-openai-rocm:latest

After pulling the Docker image, follow the vLLM usage documentation: Using vLLM.

Further reading#

Previous versions#

It’s now recommended to use the upstream vLLM documentation at docs.vllm.ai for the latest deployment guides.

You can find legacy versions of this documentation at vLLM inference performance testing version history which provide instructions for inference performance testing for select models. See the Use AMD’s Docker images note in the vLLM documentation for more information.