Load and store callbacks

Load and store callbacks#

rocFFT includes experimental functionality to call user-defined device functions when loading input from global memory at the transform start or when storing output to global memory at the transform end.

These optional user-defined callback functions can be supplied to the library using rocfft_execution_info_set_load_callback() and rocfft_execution_info_set_store_callback().

Device functions supplied as callbacks must load and store element data types appropriate for the transform being executed.

Transform type

Load element type

Store element type

Complex-to-complex, half-precision

_Float16_2

_Float16_2

Complex-to-complex, single-precision

float2

float2

Complex-to-complex, double-precision

double2

double2

Real-to-complex, single-precision

float

float2

Real-to-complex, half-precision

_Float16

_Float16_2

Real-to-complex, double-precision

double

double2

Complex-to-real, half-precision

_Float16_2

_Float16

Complex-to-real, single-precision

float2

float

Complex-to-real, double-precision

double2

double

The callback function signatures must match the specifications below.

Tdata load_callback(Tdata* buffer, size_t offset, void* callback_data, void* shared_memory);
void store_callback(Tdata* buffer, size_t offset, Tdata element, void* callback_data, void* shared_memory);

The parameters for the functions are as follows:

  • Tdata: The data type of each element being loaded or stored from the input or output.

  • buffer: Pointer to the input (for load callbacks) or output (for store callbacks) in device memory that was passed to rocfft_execute().

  • offset: The offset of the location being read from or written to. This counts by elements from the buffer pointer.

  • element: For store callbacks only, the element to be stored.

  • callback_data: A pointer value accepted by rocfft_execution_info_set_load_callback() and rocfft_execution_info_set_store_callback() which is passed through to the callback function.

  • shared_memory: A pointer to an amount of shared memory requested when the callback is set. Shared memory is not supported, so this parameter is always null.

Callback functions are called exactly once for each element being loaded or stored in a transform. Multiple kernels can be launched to decompose a transform, which means that separate kernels might call the load and store callbacks for a transform if both are specified.

Callbacks functions are only supported for transforms that do not use planar format for input or output.