Index A | B | C | D | E | F | G | H | I | K | L | M | N | O | P | R | S | T | U | V | W A Add+Multiply alignment arithmetic logic unit B bank conflict batched GEMM block Size block tile blockwise_copy (C++ function) blockwise_gemm_xdl (C++ function) C clear (C++ function) Col2Im compute unit coordinate transformation primitives copy (C++ function), [1] D dense tensor depth (C++ function), [1], [2], [3], [4], [5] descriptor device dilation E elementwise epilogue F fast changing dimension fused add multiply G GEMM GEMV general matrix multiply general matrix vector multiplication get (C++ function), [1], [2], [3] GGEMM global memory grid grouped GEMM H host host-device transfer I Im2Col inner dimension input K kernel L launch parameters layout (C++ function) Layout (C++ struct) LDS LDS banks load tile local data share M make_blockwise_gemm_xdl_c_local_partition (C++ function) make_blockwise_gemm_xdl_c_vgpr (C++ function) make_layout (C++ function), [1] make_local_partition (C++ function), [1] make_local_tile (C++ function), [1] make_register_tensor (C++ function) make_tensor (C++ function) matrix core matrix fused multiply-add memory coalescing MemoryTypeEnum (C++ type) MFMA N naive GEMM O occupancy operation outer dimension P pad (C++ function) padding permute pinned memory pipeline policy problem problem shape R rank (C++ function), [1], [2], [3], [4], [5] reference kernel register S scalar general purpose register SGPR shape (C++ function), [1] SIMD SIMT single-instruction, multi-data single-instruction, multi-thread size (C++ function), [1], [2], [3], [4], [5], [6] slice (C++ function), [1], [2] sparse tensor Split-K GEMM store tile stride T Tensor (C++ struct) tile tile distribution tile partitioner tile programming API tile window transpose U unmerge (C++ function) user customized tile pipeline user customized tile pipeline optimization V vanilla GEMM vector vector general purpose register VGEMM VGPR W wave tile wavefront work group work-item