Skip to main content
Ctrl+K
This page contains proposed changes for a future release of ROCm. Read the latest Linux release of ROCm documentation for your production environments.
AMD Logo
ROCm™ Software Future Release Version List
  • GitHub
  • Community
  • Blogs
  • ROCm Developer Hub
  • Instinct™ Docs
  • Infinity Hub
  • Support

ROCm documentation

rocPRIM 4.1.0 Documentation

Installation

  • Prerequisites
  • Installation overview
  • Installing on Linux
  • Building and installing from source

Conceptual

  • Block and stripe configurations
  • Types of rocPRIM operations
  • rocPRIM operation scope
  • rocPRIM Performance tuning
  • Implementing traits for custom types in rocPRIM

How to

  • Use the SPIR-V target

Reference

  • Glossary of rocPRIM terms
  • Data type support
  • Iterators
  • rocPRIM Utility types
  • Device-Wide Operations
    • Configuring the Kernels
    • Transform
    • Unique
    • Sort
    • Partial Sort
    • Nth Element
    • Merge
    • Partition
    • Run Length Encode
    • Scan
    • Search N
    • Select
    • Reduce
    • Adjacent Difference
    • Adjacent Find
    • Binary Search
    • Histogram
    • DeviceCopy
    • Memcpy
    • Find first of
    • Find end
    • Search
  • Block-Wide Operations
    • Operation classes
      • Load
      • Store
      • Adjacent difference
      • Discontinuity
      • Scan
      • Reduce
      • Run-length decode
      • Shuffle
      • Exchange
      • Sort
      • Histogram
    • Data movement functions
  • Warp-Level Operations
    • Load
    • Store
    • Reduce
    • Scan
    • Sort
    • Shuffle
    • Exchange
  • Thread-Level Operations
    • Radix Key Encoder/Decoder
    • Operators
    • Load
    • Reduce
    • Scan
    • Search
    • Store
  • rocPRIM Developer guidelines
  • License
  • Device-Wide Operations
  • Search

Search

Contents

  • Configuring the kernel
    • search_config
  • search
    • search()

Search#

Configuring the kernel#

template<unsigned int BlockSize, unsigned int ItemsPerThread, unsigned int MaxSharedKeyBytes>
struct search_config : public rocprim::detail::search_config_params#

Configuration of device-level search/find_end.

Template Parameters:
  • BlockSize – number of threads in a block.

  • ItemsPerThread – number of items processed by each thread.

  • MaxSharedKeyBytes – maximum number of bytes for which a shared key is used.

search#

template<class Config = default_config, class InputIterator1, class InputIterator2, class OutputIterator, class BinaryFunction = rocprim::equal_to<typename std::iterator_traits<InputIterator1>::value_type>>
inline hipError_t rocprim::search(void *temporary_storage, size_t &storage_size, InputIterator1 input, InputIterator2 keys, OutputIterator output, size_t size, size_t keys_size, BinaryFunction compare_function = BinaryFunction(), hipStream_t stream = 0, bool debug_synchronous = false)#

Searches for the first occurrence of the sequence.

Searches the input for the first occurence of a sequence, according to a particular comparison function. If found, the index of the first item of the found sequence in the input is returned. Otherwise, returns the size of the input.

Overview

  • The contents of the inputs are not altered by the function.

  • Returns the required size of temporary_storage in storage_size if temporary_storage is a null pointer.

  • Accepts custom compare_functions for search across the device.

  • Streams in graph capture mode are supported

Example

In this example a device-level search is performed where input values are represented by an array of unsigned integers and the key is also an array of unsigned integers.

The full example is on GitHub.

#include <rocprim/rocprim.hpp>

// Prepare input and output (declare pointers, allocate device memory etc.)
size_t size;           // e.g., 10
size_t key_size;       // e.g., 3
unsigned int * input;  // e.g., [ 6, 3, 5, 4, 1, 8, 2, 5, 4, 1 ]
unsigned int * key;    // e.g., [ 5, 4, 1 ]
unsigned int * output; // e.g., empty array of size 1

size_t temporary_storage_size_bytes;
void * temporary_storage_ptr = nullptr;
// Get required size of the temporary storage
rocprim::search(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input, key, output, size, key_size
);

// allocate temporary storage
hipMalloc(&temporary_storage_ptr, temporary_storage_size_bytes);

// perform search
rocprim::search(
    temporary_storage_ptr, temporary_storage_size_bytes,
    input, key, output, size, key_size
);
// output:   [ 2 ]

Template Parameters:
  • Config – [optional] configuration of the primitive, must be default_config or search_config.

  • InputIterator1 – [inferred] random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • InputIterator2 – [inferred] random-access iterator type of the input range. Must meet the requirements of a C++ InputIterator concept. It can be a simple pointer type.

  • OutputIterator – [inferred] random-access iterator type of the input range. Must meet the requirements of a C++ OutputIterator concept. It can be a simple pointer type.

  • BinaryFunction – [inferred] Type of binary function that accepts two arguments of the type InputIterator1 and returns a value convertible to bool. Default type is rocprim::less<>.

Parameters:
  • temporary_storage – [in] pointer to a device-accessible temporary storage. When a null pointer is passed, the required allocation size (in bytes) is written to storage_size and function returns without performing the search.

  • storage_size – [inout] reference to a size (in bytes) of temporary_storage.

  • input – [in] iterator to the input range.

  • keys – [in] iterator to the key range.

  • output – [out] iterator to the output range. The output is one element.

  • size – [in] number of elements in the input range.

  • keys_size – [in] number of elements in the key range.

  • compare_function – [in] binary operation function object that will be used for comparison. The signature of the function should be equivalent to the following: bool f(const T &a, const T &b);. The signature does not need to have const &, but function object must not modify the objects passed to it. The comparator must meet the C++ named requirement BinaryPredicate. The default value is BinaryFunction().

  • stream – [in] [optional] HIP stream object. Default is 0 (default stream).

  • debug_synchronous – [in] [optional] If true, synchronization after every kernel launch is forced in order to check for errors. Default value is false.

Returns:

hipSuccess (0) after successful search; otherwise a HIP runtime error of type hipError_t.

previous

Find end

next

Block-Wide Operations

Contents
  • Configuring the kernel
    • search_config
  • search
    • search()

  • Terms and Conditions
  • ROCm Licenses and Disclaimers
  • Privacy
  • Trademarks
  • Supply Chain Transparency
  • Fair and Open Competition
  • UK Tax Strategy
  • Cookie Policy
  • Cookie Settings
© 2025 Advanced Micro Devices, Inc