AMD ROCm Documentation

Release Notes for the latest version of AMD ROCm.

Installation Guide

Programming Guide

This guide provides documentation on the ROCm programming model and programming interface. It describes the hardware implementation and provides guidance on how to achieve maximum performance. The appendices include:

  • a list of all ROCm-enabled devices

  • detailed description of all extensions to the C language

  • listings of supported mathematical functions

  • C++ features supported in host and device code

  • technical specifications of various devices

  • introduction to the low-level driver API

ROCm stack offers options for multiple programming-languages

Heterogeneous Compute (HC) programming installation requirements, methods to install on various platforms, and how to build it from source

Build-in Macros, HCC Profiler mode, and API Documentaion

HIP programming, installation requirements, methods to install on various platfroms, and how to build it from source

HIP Porting, Debugging, Bugs, FAQ and other aspects of HIP

OpenCL Architecture, AMD Implementation, Profiling, and other aspects of OpenCL

Performance and optimization for various device types such as GCN devices

GCN ISA Manuals

  • GCN 1.1 - For information on ISA Manual for Hawaii (Sea Islands Series Instruction Set Architecture)

  • GCN 2.0 - For information on ISA Manual for Fiji and Polaris (AMD Accelerated Parallel Processing technology)

  • Vega - Provides “Vega” Instruction Set Architecture, Program Organization, Mode register and more details.

  • Inline GCN ISA Assembly Guide - Covers various concepts of AMDGCN Assembly, DS Permute Instructions, Parameters to a Kernel, GPR Counting.

ROCm API References

ROCm Tools


Provides details on AOMP, a scripted build of LLVM and supporting software. Supports OpenMP target offload on AMD GPUs. Since AOMP is a clang/llvm compiler, it also supports GPU offloading with HIP, CUDA, and OpenCL.


Provides details on ROCm Validation Suite (RVS), a system administrator’s and cluster manager’s tool for detecting and troubleshooting common problems affecting AMD GPU(s) running in a high-performance computing environment, enabled using the ROCm software stack on a compatible platform.

ROCm Libraries

This section provides details on rocFFT,it is a AMD’s software library compiled with the CUDA compiler using HIP tools for running on Nvidia GPU devices.
This section provides details on rocBLAS, it is a library for BLAS on ROCm.rocBLAS is implemented in the HIP programming language and optimized for AMD’s latest discrete GPUs.
This section provides details on hipBLAS, it is a BLAS marshalling library, with multiple supported backends. hipBLAS exports an interface that does not require the client to change. Currently,it supports rocBLAS and cuBLAS as backends.
This section provides details on hcRNG. It is a software library ,where uniform random number generators targeting the AMD heterogeneous hardware via HCC compiler runtime is implemented..
This section provides details on Eigen.It is a C++ template library which provides linear algebra for matrices, vectors, numerical solvers, and related algorithms.
This section provides details on clFFT.It is a software library which contains FFT functions written in OpenCL,and clFFt also supports running on CPU devices to facilitate debugging and heterogeneous programming.
This section provides details on clBLAS. It makes easier for developers to utilize the inherent performance and power efficiency benefits of heterogeneous computing.
This section provides details on clSPARSE, it is an OpenCL library which implements Sparse linear algebra routines.
This section provides details on clRNG,This is a library for uniform random number generation in OpenCL.
This section provides details on hcFFT, it hosts the HCC based FFT Library and targets GPU acceleration of FFT routines on AMD devices.
This section provides details on Tensile. It is a tool for creating a benchmark-driven backend library for GEMMs,N-dimensional tensor contractions and multiplies two multi-dimensional objects together on a GPU.
This section provides details on rocALUTION. It is a sparse linear algebra library with focus on exploring fine-grained parallelism, targeting modern processors and accelerators including multi/many-core CPU and GPU platforms. It can be seen as middle-ware between different parallel backends and application specific packages.
This section provides details on rocSPARSE.It is a library that contains basic linear algebra subroutines for sparse matrices and vectors written in HiP for GPU devices. It is designed to be used from C and C++ code.
This section provides details on rocThrust. It is a parallel algorithmn library.
hipCUB This section provides details on hipCUB.
It is a thin wrapper library on top of rocPRIM or CUB. It enables developers to port the project using CUB library to the HIP layer and to
run them on AMD hardware.
ROCm SMI Library This section provides details on ROCm SMI library. The ROCm System Management Interface Library, or ROCm SMI library is part of the Radeon Open Compute ROCm software stack. It is a C library for linux that provides a user space interface for applications to monitor and control GPU aplications.
RCCL This section provides details on ROCm Communications Collectives Library. It is a stand alone library of standard collective communication routines for GPUS, implememting all-reduce, all gather, reduce, broadcast, and reduce scatter.

This section provides information on AMD’s graph optimization engine.

ROCm Compiler SDK

This section provide complete description on LLVM such as introduction, Code Object, Code conventions, Source languages, etc.,
This section describes about application binary interface (ABI) provided by the AMD, implementation of the HSA runtime. It also provides details on Kernel, AMD Queue and Signals.
Documentation on instruction related to ROCm Device Library overview,Building and Testing related information with respect to Device Library is provided.
This section refers the user-mode API interfaces and libraries necessary for host applications to launch compute kernels to available HSA ROCm kernel agents. we can find installation details and Infrastructure details related to ROCr.

ROCm System Management

ROCm System Management Interface a complete guide to use and work with rocm-smi tool.
This section provides information on sysfs file structure with details related to file structure related to system are captured in sysfs.
KFD Kernel Topology is the system file structure which describes about AMD GPU related information such as nodes, Memory, Cache and IO-links.

ROCm Virtualization & Containers

Here PCIe Passthrough on KVM is described. A KVM-based instructions assume a headless host with an input/output memory management unit (IOMMU) to pass peripheral devices such as a GPU to guest virtual machines.more information can be found on the same here.
A framework for building the software layers defined in the Radeon Open Compute Platform into portable docker images. Detailed Information related to ROCm-Docker can be found.

Remote Device Programming

ROCmRDMA is the solution designed to allow third-party kernel drivers to utilize DMA access to the GPU memory. Complete indoemation related to ROCmRDMA is Documented here.
This section gives information related to UCX, How to install, Running UCX and much more
This section gives information related to MPI.
This section gives information related to IPC.

Deep Learning on ROCm

This section provides details on ROCm Deep Learning concepts.
The porting guide highlights the key differences between the current cuDNN and MIOpen APIs.
This section provides detailed chart of Frameworks supported by ROCm and repository details.
Here Tutorials on different DeepLearning Frameworks are documented.

System Level Debug

Here in this section we have details regardinf various system related debugs and commands for isssues faced while using ROCm.


This section Provide details related to few Concepts of HIP and other sections.

ROCm Glossary

ROCm Glossary gives highlight concept and their main concept of how they work.