HIP terminology comparison with OpenCL, Cuda, C++ AMP and HCC¶
Term |
CUDA |
HIP |
HC |
C++AMP |
OpenCL |
Device |
int deviceId |
int deviceId |
hc::accelerator |
concurrency:: accelerator |
cl_device |
Queue |
cudaStream_t |
hipStream_t |
hc:: accelerator_view |
concurrency:: accelerator_view |
cl_command_queue |
Event |
cudaEvent_t |
hipEvent_t |
hc:: completion_future |
concurrency:: completion_future |
cl_event |
Memory |
void * |
void * |
void *; hc::array; hc::array_view |
concurrency::array_view |
cl_mem |
|
|
|
|
|
|
Thread index |
threadIdx.x |
hipThreadIdx_x |
t_idx.local[0] |
t_idx.local[0] |
get_local_id(0) |
Block index |
blockIdx.x |
hipBlockIdx_x |
t_idx.tile[0] |
t_idx.tile[0] |
get_group_id(0) |
Block dim |
blockDim.x |
hipBlockDim_x |
t_ext.tile_dim[0] |
t_idx.tile_dim0 |
get_local_size(0) |
Grid-dim |
gridDim.x |
hipGridDim_x |
t_ext[0] |
t_ext[0] |
get_global_size(0) |
Device Function |
__device__ |
__device__ |
[[hc]] (detected automatically in many case) |
restrict(amp) |
Implied in device Compilation |
Host Function |
|
__host_ (default) |
[[cpu]] (default) |
strict(cpu) (default) |
Implied in host Compilation |
Host + Device Function |
__host__ __device__ |
__device__ |
[[hc]] [[cpu]] |
restrict(amp,cpu) |
No equivalent |
Kernel Launch |
<<< >>> |
|
hc:: parallel_for_each |
concurrency:: parallel_for_each |
clEnqueueND- RangeKernel |
Global Memory |
__global__ |
__global__ |
Unnecessary/ Implied |
Unnecessary/Implied |
__global |
Group Memory |
__shared__ |
__shared__ |
tile_static |
tile_static |
__local |
Constant |
__constant__ |
__constant__ |
Unnecessary/ Implied |
Unnecessary / Implied |
__constant |
__syncthreads |
__syncthreads |
tile_static.barrier() |
t_idx.barrier() |
barrier(CLK_LOCAL_MEMFENCE) |
|
Atomic Builtins |
atomicAdd |
atomicAdd |
hc::atomic_fetch_add |
concurrency:: atomic_fetch_add |
atomic_add |
Precise Math |
cos(f) |
cos(f) |
hc:: precise_math::cos(f) |
concurrency:: precise_math::cos(f) |
cos(f) |
Fast Math |
__cos(f) |
__cos(f) |
hc::fast_math::cos(f) |
concurrency:: fast_math::cos(f) |
native_cos(f) |
Vector |
float4 |
float4 |
hc:: short_vector::float4 |
concurrency:: graphics::float_4 |
float4 |
Notes¶
For HC and C++AMP, assume a captured tiled_ext named “t_ext” and captured extent named “ext”. These languages use captured variables to pass information to the kernel rather than using special built-in functions so the exact variable name may vary.
The indexing functions (starting with thread-index) show the terminology for a 1D grid. Some APIs use reverse order of xyz / 012 indexing for 3D grids.
HC allows tile dimensions to be specified at runtime while C++AMP requires that tile dimensions be specified at compile-time.Thus hc syntax for tile dims is
t_ext.tile_dim[0]
while C++AMP ist_ext.tile_dim0
.