/home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-composable-kernel/checkouts/develop/include/ck/wrapper/operations/copy.hpp File Reference

/home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-composable-kernel/checkouts/develop/include/ck/wrapper/operations/copy.hpp File Reference#

Composable Kernel: /home/docs/checkouts/readthedocs.org/user_builds/advanced-micro-devices-composable-kernel/checkouts/develop/include/ck/wrapper/operations/copy.hpp File Reference
copy.hpp File Reference

Go to the source code of this file.

Functions

template<typename DimAccessOrderTuple , index_t VectorDim, index_t ScalarPerVector, typename SrcTensorType , typename DstTensorType >
__device__ void copy (const SrcTensorType &src_tensor, DstTensorType &dst_tensor)
 Perform optimized copy between two tensors partitions (threadwise copy). Tensors must have the same size. More...
 
template<typename SrcTensorType , typename DstTensorType >
__host__ __device__ void copy (const SrcTensorType &src_tensor, DstTensorType &dst_tensor)
 Perform generic copy between two tensors partitions (threadwise copy). Tensors must have the same size. More...
 
template<typename DimAccessOrderTuple , index_t VectorDim, index_t ScalarPerVector, typename SrcTensorType , typename DstTensorType , typename ThreadShape , typename ThreadUnrolledDesc >
__device__ void blockwise_copy (const SrcTensorType &src_tensor, DstTensorType &dst_tensor, [[maybe_unused]] const Layout< ThreadShape, ThreadUnrolledDesc > &thread_layout)
 Perform optimized blockwise copy between two tensors. Tensors must have the same size. More...
 

Function Documentation

◆ blockwise_copy()

template<typename DimAccessOrderTuple , index_t VectorDim, index_t ScalarPerVector, typename SrcTensorType , typename DstTensorType , typename ThreadShape , typename ThreadUnrolledDesc >
__device__ void blockwise_copy ( const SrcTensorType &  src_tensor,
DstTensorType &  dst_tensor,
[[maybe_unused] ] const Layout< ThreadShape, ThreadUnrolledDesc > &  thread_layout 
)

Perform optimized blockwise copy between two tensors. Tensors must have the same size.

Note
At now Vgpr and Sgpr are not supported.
Template Parameters
DimAccessOrderTupleTuple with dimension access order.
VectorDimDimension for vectorize read and write.
ScalarPerVectorNumber of scalar per vectorize read and write.
Parameters
src_tensorSource tensor.
dst_tensorDestination tensor.
thread_layoutThread layout per each dimension for copy.

◆ copy() [1/2]

template<typename DimAccessOrderTuple , index_t VectorDim, index_t ScalarPerVector, typename SrcTensorType , typename DstTensorType >
__device__ void copy ( const SrcTensorType &  src_tensor,
DstTensorType &  dst_tensor 
)

Perform optimized copy between two tensors partitions (threadwise copy). Tensors must have the same size.

Template Parameters
DimAccessOrderTupleTuple with dimension access order.
VectorDimDimension for vectorized read and write.
ScalarPerVectorNumber of scalar per vectorized read and write.
Parameters
src_tensorSource tensor.
dst_tensorDestination tensor.

◆ copy() [2/2]

template<typename SrcTensorType , typename DstTensorType >
__host__ __device__ void copy ( const SrcTensorType &  src_tensor,
DstTensorType &  dst_tensor 
)

Perform generic copy between two tensors partitions (threadwise copy). Tensors must have the same size.

Parameters
src_tensorSource tensor.
dst_tensorDestination tensor.