ROCm 6.3.2 release notes#
2025-01-28
12 min read time
The release notes provide a summary of notable changes since the previous ROCm release.
Note
If you’re using Radeon™ PRO or Radeon GPUs in a workstation setting with a display connected, see the Use ROCm on Radeon GPUs documentation to verify compatibility and system requirements.
Release highlights#
The following are notable improvements in ROCm 6.3.2. For changes to individual components, see Detailed component changes.
ROCm documentation updates#
ROCm documentation continues to be updated to provide clearer and more comprehensive guidance for a wider variety of user needs and use cases.
Documentation about ROCm compatibility with deep learning frameworks has been added. These topics outline ROCm-enabled features for each deep learning framework, key ROCm libraries that can influence the capabilities, validated Docker image tags, and features supported across the available ROCm and framework versions. For more information, see:
The HIP C++ language extensions and Kernel language C++ support topics have been reorganized to make them easier to find and review. The topics have also been enhanced with new content.
Operating system and hardware support changes#
ROCm 6.3.2 adds support for Azure Linux 3.0 (kernel: 6.6). Azure Linux is supported only on AMD Instinct accelerators. For more information, see Azure Linux installation.
See the Compatibility matrix for more information about operating system and hardware compatibility.
ROCm components#
The following table lists the versions of ROCm components for ROCm 6.3.2, including any version changes from 6.3.1 to 6.3.2. Click the component’s updated version to go to a list of its changes. Click to go to the component’s source code on GitHub.
Category | Group | Name | Version | |
---|---|---|---|---|
Libraries | Machine learning and computer vision | Composable Kernel | 1.1.0 | |
MIGraphX | 2.11.0 | |||
MIOpen | 3.3.0 | |||
MIVisionX | 3.1.0 | |||
rocAL | 2.1.0 | |||
rocDecode | 0.8.0 | |||
rocJPEG | 0.6.0 | |||
rocPyDecode | 0.2.0 | |||
RPP | 1.9.1 | |||
Communication | RCCL | 2.21.5 | ||
Math | hipBLAS | 2.3.0 | ||
hipBLASLt | 0.10.0 | |||
hipFFT | 1.0.17 | |||
hipfort | 0.5.0 ⇒ 0.5.1 | |||
hipRAND | 2.11.1 | |||
hipSOLVER | 2.3.0 | |||
hipSPARSE | 3.1.2 | |||
hipSPARSELt | 0.2.2 | |||
rocALUTION | 3.2.1 | |||
rocBLAS | 4.3.0 | |||
rocFFT | 1.0.31 | |||
rocRAND | 3.2.0 | |||
rocSOLVER | 3.27.0 | |||
rocSPARSE | 3.3.0 | |||
rocWMMA | 1.6.0 | |||
Tensile | 4.42.0 | |||
Primitives | hipCUB | 3.3.0 | ||
hipTensor | 1.4.0 | |||
rocPRIM | 3.3.0 | |||
rocThrust | 3.3.0 | |||
Tools | System management | AMD SMI | 24.7.1 | |
ROCm Data Center Tool | 0.3.0 | |||
rocminfo | 1.0.0 | |||
ROCm SMI | 7.4.0 | |||
ROCmValidationSuite | 1.1.0 | |||
Performance | ROCm Bandwidth Test | 1.4.0 | ||
ROCm Compute Profiler | 3.0.0 | |||
ROCm Systems Profiler | 0.1.0 ⇒ 0.1.1 | |||
ROCProfiler | 2.0.0 ⇒ 2.0.0 | |||
ROCprofiler-SDK | 0.5.0 ⇒ 0.5.0 | |||
ROCTracer | 4.1.0 | |||
Development | HIPIFY | 18.0.0 | ||
ROCdbgapi | 0.77.0 | |||
ROCm CMake | 0.14.0 | |||
ROCm Debugger (ROCgdb) | 15.2 | |||
ROCr Debug Agent | 2.0.3 | |||
Compilers | HIPCC | 1.1.1 | ||
llvm-project | 18.0.0 | |||
Runtimes | HIP | 6.3.1 ⇒ 6.3.2 | ||
ROCr Runtime | 1.14.0 |
Detailed component changes#
The following sections describe key changes to ROCm components.
HIP (6.3.2)#
Added#
Tracking of Heterogeneous System Architecture (HSA) handlers:
Adds an atomic counter to track the outstanding HSA handlers.
Waits on CPU for the callbacks if the number exceeds the defined value.
Codes to capture Architected Queueing Language (AQL) packets for HIP graph memory copy node between host and device. HIP enqueues AQL packets during graph launch.
Control to use system pool implementation in runtime commands handling. By default, it is disabled.
A new path to avoid
WaitAny
calls inAsyncEventsLoop
. The new path is selected by default.Runtime control on decrement counter only if the event is popped. There is a new way to restore dead signals cleanup for the old path.
A new logic in runtime to track the age of events from the kernel mode driver.
Optimized#
HSA callback performance. The HIP runtime creates and submits commands in the queue and interacts with HSA through a callback function. HIP waits for the CPU status from HSA to optimize the handling of events, profiling, commands, and HSA signals for higher performance.
Runtime optimization which combines all logic of
WaitAny
in a single processing loop and avoids extra memory allocations or reference counting. The runtime won’t spin on the CPU if all events are busy.Multi-threaded dispatches for performance improvement.
Command submissions and processing between CPU and GPU by introducing a way to limit the software batch size.
Switch to
std::shared_mutex
in book/keep logic in streams from multiple threads simultaneously, for performance improvement in specific customer applications.std::shared_mutex
is used in memory object mapping, for performance improvement.
Resolved issues#
Race condition in multi-threaded producer/consumer scenario with
hipMallocFromPoolAsync
.Segmentation fault with
hipStreamLegacy
while using the APIhipStreamWaitEvent
.Usage of
hipStreamLegacy
in HIP event record.A soft hang in graph execution process from HIP user object. The fix handles the release of graph execution object properly considering synchronization on the device/stream. The user application now behaves the same with
hipUserObject
on both the AMD ROCm and NVIDIA CUDA platforms.
hipfort (0.5.1)#
Added#
Support for building with LLVM Flang.
Resolved issues#
Fixed the exported
hipfort::hipsparse
CMake target.
ROCm Systems Profiler (0.1.1)#
Resolved issues#
Fixed an error when building from source on some SUSE and RHEL systems when using the
ROCPROFSYS_BUILD_DYNINST
option.
ROCProfiler (2.0.0)#
Changed#
Replaced
CU_UTILIZATION
metric withSIMD_UTILIZATION
for better accuracy.
Resolved issues#
Fixed the
VALUBusy
andSALUBusy
activity metrics for accuracy on MI300.
ROCprofiler-SDK (0.5.0)#
Added#
Support for system-wide collection of SQ counters across all HSA processes.
Changed#
rocprofiler_sample_device_counting_service
API updated to return counter output immediately, when called in synchronous mode.
ROCm known issues#
ROCm known issues are noted on GitHub. For known issues related to individual components, review the Detailed component changes.
ROCm resolved issues#
The following are previously known issues resolved in this release. For resolved issues related to individual components, review the Detailed component changes.
TransferBench packages not functional#
Issue with TransferBench packages not being compiled properly has been fixed. For more information, See GitHub issue #4081.
ROCm Compute Profiler CTest failure in CI#
When running the ROCm Compute Profiler (rocprof-compute
) CTest in the Azure CI environment, the
rocprof-compute
execution test failed. This issue was due to an outdated test file that was not renamed
(omniperf
to rocprof-compute
), and the ROCM_PATH
environment variable not being set in
the Azure CI environment, resulting in the tool being unable to extract chip information as expected.
This issue has been fixed in the ROCm 6.3.2 release. See GitHub issue #4085.
MIVisionX memory access fault in Canny edge detection#
An issue where Canny edge detection kernels accessed out-of-bounds memory locations while computing gradient intensities on edge pixels has been fixed. This issue was isolated to Canny-specific use cases on Instinct MI300 series accelerators. See GitHub issue #4086.
AMD VCN instability with rocDecode#
A firmware crash on gfx942 devices when AMD Video Core Next (VCN) was used for rocDecode operations has been resolved.
ROCm upcoming changes#
The following changes to the ROCm software stack are anticipated for future releases.
AMDGPU wavefront size compiler macro deprecation#
The __AMDGCN_WAVEFRONT_SIZE__
macro will be deprecated in an upcoming
release. It is recommended to remove any use of this macro. For more information, see AMDGPU
support.
HIPCC Perl scripts deprecation#
The HIPCC Perl scripts (hipcc.pl
and hipconfig.pl
) will be removed in an upcoming release.