FlashInfer on ROCm installation#

2025-10-02

5 min read time

Applies to Linux

FlashInfer is a library and kernel generator for Large Language Models (LLMs) that provides high-performance implementation of graphics processing units (GPUs) kernels.

This topic covers installation. To learn more about FlashInfer on ROCm, including its use cases, recommendations, as well as hardware and software compatibility, see FlashInfer compatibility.

Note

FlashInfer is supported on ROCm 6.4.1.

Install FlashInfer on ROCm#

To install FlashInfer on ROCm, you have the following options:

Use a prebuilt Docker image with FlashInfer pre-installed#

Docker is the recommended method to set up a FlashInfer environment, and it avoids potential installation issues. The tested, prebuilt image includes FlashInfer, PyTorch, ROCm, and other dependencies.

  1. Pull the Docker image:

    docker pull rocm/flashinfer:flashinfer-0.2.5_rocm6.4_ubuntu24.04_py3.12_pytorch2.7
    
  2. Launch and connect to the container:

    docker run -it --rm \
    --privileged -v ./:/app \
    --network=host --device=/dev/kfd \
    --device=/dev/dri --group-add video \
    --name=my_flashinfer --cap-add=SYS_PTRACE \
    --security-opt seccomp=unconfined \
    --ipc=host --shm-size 16G \
    rocm/flashinfer:flashinfer-0.2.5_rocm6.4_ubuntu24.04_py3.12_pytorch2.7
    

Build your own Docker image#

FlashInfer supports the ROCm platform and can be run directly by setting up a Docker container from scratch. A Dockerfile is provided in the ROCm/flashinfer repository to help you get started.

  1. Clone the ROCm/flashinfer repository:

    git clone https://github.com/ROCm/flashinfer.git
    
  2. Enter the directory and build the Dockerfile:

    cd flashinfer
    docker build -t rocm/flashinfer:flashinfer-0.2.5_rocm6.4_ubuntu24.04_py3.12_pytorch2.7
    
  3. . Run the Docker container:

    docker run -it --device=/dev/kfd --device=/dev/dri --group-add video rocm/flashinfer:flashinfer-0.2.5_rocm6.4_ubuntu24.04_py3.12_pytorch2.7
    
  4. The above step will create a Docker container with FlashInfer pre-installed. During this process, the Dockerfile will have pre-installed and setup a micromamba environment named flashinfer-py3.12-torch2.7.1-rocm6.4.1.

Install FlashInfer using pip#

Use a base PyTorch Docker image and follow these steps to install FlashInfer using pip.

  1. Pull the base ROCm PyTorch Docker image:

    docker pull rocm/pytorch:rocm6.4.1_ubuntu24.04_py3.12_pytorch_release_2.7.1
    
  2. Change the <container name> and then use the following command:

    docker run -it --privileged --network=host --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --ipc=host --shm-size 128G --name=<container name> rocm/pytorch:rocm6.4.1_ubuntu24.04_py3.12_pytorch_release_2.7.1
    
  3. After setting up the container, install FlashInfer from the AMD-hosted PyPI repository

    pip install flashinfer==0.2.5.post10 --extra-index-url=https://pypi.amd.com/simple
    

Test the FlashInfer installation#

Once you have the Docker container running, start using FlashInfer by following these steps:

  1. Activate the micromamba environment:

    Note

    If you followed Install FlashInfer using pip, you do not need to activate the micromamba environment. If you followed Use a prebuilt Docker image with FlashInfer pre-installed or Build your own Docker image, don’t forget this step.

    micromamba activate flashinfer-py3.12-torch2.7.1-rocm6.4.1
    
  2. Enter the FlashInfer directory:

    Note

    If you followed Install FlashInfer using pip, ensure you git clone the repository first.

    git clone https://github.com/ROCm/flashinfer.git
    
    cd flashinfer/
    
  3. Run the example provided in the flashinfer/examples directory. This example runs Batch Decode and then verifies the output.

    python examples/test_batch_decode_example.py
    
  1. If FlashInfer was installed correctly, you should see the following output:

    PASS
    
  2. The above output indicates that FlashInfer is installed correctly. You can now use FlashInfer in your projects.

Run a FlashInfer example#

The ROCm/flashinfer repository has example code that you can run FlashInfer with. You can save the following code snippet to a python script once you have FlashInfer installed and run the script to try it out.

  1. Save the following code snippet:

    import torch
    import flashinfer
    
    kv_len = 2048
    num_kv_heads = 32
    head_dim = 128
    
    k = torch.randn(kv_len, num_kv_heads, head_dim).half().to(0)
    v = torch.randn(kv_len, num_kv_heads, head_dim).half().to(0)
    
    # decode attention
    
    num_qo_heads = 32
    q = torch.randn(num_qo_heads, head_dim).half().to(0)
    
    o = flashinfer.single_decode_with_kv_cache(q, k, v) # decode attention without RoPE on-the-fly
    o_rope_on_the_fly = flashinfer.single_decode_with_kv_cache(q, k, v, pos_encoding_mode="ROPE_LLAMA") # decode with LLaMA style RoPE on-the-fly
    
  2. Save it to a python script by renaming <example_name>:

    python <example_name>.py
    
  3. Run the script to use FlashInfer.