Index A | B | C | D | G | H | I | L | M | O | P | R | S | V | W | X A AccVGPR Active cycle ALU AMD device architecture AMD SMI AMDGPU assembly AMDGPU intermediate representation Arithmetic bandwidth Arithmetic intensity B Bank conflict Branch efficiency C Compute unit versioning Compute units Compute-bound CU utilization D Data movement engine G GCD GFX IP GFX IP major version GFX IP minor version Global memory GPU RAM (VRAM) Graphics L1 cache Grid H HIP C++ language extension HIP compiler HIP kernel HIP memory hierarchy HIP runtime API HIP runtime compiler HIP thread hierarchy I Infinity Cache (L3 cache) Issue efficiency L L0 instruction cache L0 scalar cache L0 vector cache L1 instruction cache L1 scalar cache L1 vector cache L2 cache Latency hiding Little's Law LLVM target name Load/store unit Local data share M Matrix cores (MFMA units) Memory bandwidth Memory coalescing Memory-bound O Occupancy Overhead P Peak rate Pipe utilization R Register file Register pressure Registers ROCgdb ROCm and LLVM binary utilities ROCm programming model ROCm software platform rocprofv3 Roofline model S SALU SGPR SGPR file SIMD core Special function unit V VALU VGPR VGPR file W Wavefront (Warp) Wavefront divergence Wavefront execution state Wavefront scheduler Wavefront size WGP Work-group (Block) Work-item (Thread) X XCD