rocRAND programming guide#
This topic discusses some issues to consider when using rocRAND in your application.
Generator types#
There are two main generator classes in rocRAND
Pseudo-Random Number Generators (PRNGs)
Quasi-Random Number Generators (QRNGs)
The following pseudo-random number generators are available:
XORWOW
MRG32K3A
MTGP32
Philox 4x32-10
MRG31K3P
LFSR113
MT19937
ThreeFry 2x32-20, 4x32-30, 2x64-20, and 4x64-20
The following quasi-random number generators are available:
Sobol32
Sobol64
Scrambled Sobol32
Scrambled Sobol64
Ordering#
rocRAND generators can be configured to change how the results are ordered in global memory. These parameters can be used to, for example, tune the performance versus the reproducibility of rocRAND generators. The following ordering types are available:
ROCRAND_ORDERING_PSEUDO_BEST
ROCRAND_ORDERING_PSEUDO_DEFAULT
ROCRAND_ORDERING_PSEUDO_SEEDED
ROCRAND_ORDERING_PSEUDO_LEGACY
ROCRAND_ORDERING_PSEUDO_DYNAMIC
ROCRAND_ORDERING_QUASI_DEFAULT
ROCRAND_ORDERING_PSEUDO_DEFAULT
and ROCRAND_ORDERING_QUASI_DEFAULT
are the default ordering types
for pseudo- and quasi-random number generators, respectively. ROCRAND_ORDERING_PSEUDO_DEFAULT
is the
same as ROCRAND_ORDERING_PSEUDO_BEST
and ROCRAND_ORDERING_PSEUDO_LEGACY
.
ROCRAND_ORDERING_PSEUDO_DYNAMIC
indicates that rocRAND can change the output ordering
to obtain the best performance for a particular generator on a particular GPU.
Using this ordering, the generated sequences can vary between GPU models and rocRAND versions.
For more information about generating these configurations, see Kernel configurations for dynamic ordering.
ROCRAND_ORDERING_PSEUDO_DYNAMIC
is not supported for generators created with rocrand_create_generator_host
.
ROCRAND_ORDERING_PSEUDO_LEGACY
indicates that rocRAND should generate values in a backward-compatible way.
When this type is set, rocRAND generates exactly the same sequences across releases.
All supported orderings for all generators are listed below:
Ordering |
|
---|---|
|
The same as |
|
The same as |
|
The same as |
|
There are \(131072\) generators in total, each of which are separated by \(2^{67}\) values. The results are generated in an interleaved fashion. The result at offset \(n\) in memory is generated from offset \((n\;\mathrm{mod}\; 131072) \cdot 2^{67} + \lfloor n / 131072 \rfloor\) in the XORWOW sequence for a particular seed. |
|
The ordering depends on the GPU that is used. |
Ordering |
|
---|---|
|
The same as |
|
The same as |
|
There are \(131072\) generators in total, each of which are separated by \(2^{76}\) values. The results are generated in an interleaved fashion. The result at offset \(n\) in memory is generated from offset \((n\;\mathrm{mod}\; 131072) \cdot 2^{76} + \lfloor n / 131072 \rfloor\) in the MRG32K3A sequence for a particular seed. |
|
The ordering depends on the GPU that is used. |
Ordering |
|
---|---|
|
The same as |
|
The same as |
|
There are \(512\) generators in total, each of which generates \(256\) values per iteration. Blocks of \(256\) elements from generators are concatenated to form the output. The result at offset \(n\) in memory is generated from generator \(\lfloor n / 256\rfloor\;\mathrm{mod}\; 512\). |
|
The ordering depends on the GPU that is used. |
Ordering |
|
---|---|
|
The same as |
|
The same as |
|
There is only one Philox generator, and the result at offset \(n\) is the \(n\)-th value from this generator. |
|
The ordering depends on the GPU that is used. |
Ordering |
|
---|---|
|
The same as |
|
The same as |
|
The Mersenne Twister sequence is generated from \(8192\) generators in total, and each of these are separated by \(2^{1000}\) values. Each generator generates \(8\) elements per iteration. The result at offset \(n\) is generated from generator \((\lfloor n / 8\rfloor\;\mathrm{mod}\; 8192) \cdot 2^{1000} + \lfloor n / (8 \cdot 8192) \rfloor + \lfloor n / 8 \rfloor\). |
Ordering |
|
---|---|
|
The same as |
|
The same as |
|
There are \(131072\) generators in total, each of which are separated by \(2^{72}\) values. The results are generated in an interleaved fashion. The result at offset \(n\) in memory is generated from offset \((n\;\mathrm{mod}\; 131072) \cdot 2^{72} + \lfloor n / 131072 \rfloor\) in the MRG31K3P sequence for a particular seed. |
|
The ordering depends on the GPU that is used. |
Ordering |
|
---|---|
|
The same as |
|
The same as |
|
There are \(131072\) generators in total, each of which are separated by \(2^{55}\) values. The results are generated in an interleaved fashion. The result at offset \(n\) in memory is generated from offset \((n\;\mathrm{mod}\; 131072) \cdot 2^{55} + \lfloor n / 131072 \rfloor\) in the LFSR113 sequence for a particular seed. |
|
The ordering depends on the GPU that is used. |
Ordering |
|
---|---|
|
The same as |
|
The same as |
|
There is only one ThreeFry generator, and the result at offset \(n\) is the \(n\)-th value from this generator. |
|
The ordering depends on the GPU that is used. |
Ordering |
|
---|---|
|
The (scrambled) 32- and 64-bit sobol quasi-random number generators generated the result from \(d\) dimensions by flattening them into the output. The result at offset \(n\) in memory is generated from offset \(n\;\mathrm{mod}\; d\) in dimension \(\lfloor n / d \rfloor\), where \(d\) is the generator’s number of dimensions. |
Using rocRAND in HIP Graphs#
rocRAND supports capturing the random number generation with HIP Graphs. However, the construction, initialization, and cleanup of the generator objects must take place outside of the recorded section. See the following example (error handling is omitted for brevity):
size_t size = 1000;
float* data_0;
unsigned int* data_1;
hipMalloc(&data_0, sizeof(*data_0) * size);
hipMalloc(&data_1, sizeof(*data_1) * size);
hipGraph_t graph;
hipGraphCreate(&graph, 0);
hipStream_t stream;
hipStreamCreateWithFlags(&stream, hipStreamNonBlocking);
rocrand_generator generator;
rocrand_create_generator(&generator, ROCRAND_RNG_PSEUDO_DEFAULT);
rocrand_set_stream(generator, stream);
rocrand_initialize_generator(generator);
hipStreamBeginCapture(stream, hipStreamCaptureModeGlobal);
rocrand_generate_normal(generator, data_0, size, 10.0F, 2.0F);
rocrand_generate_poisson(generator, data_1, size, 3);
hipStreamEndCapture(stream, &graph);
hipGraphExec_t instance;
hipGraphInstantiate(&instance, graph, nullptr, nullptr, 0);
hipGraphLaunch(instance, stream);
hipStreamSynchronize(stream);
hipGraphExecDestroy(instance);
rocrand_destroy_generator(generator);
hipStreamDestroy(stream);
hipGraphDestroy(graph);
hipFree(data_1);
hipFree(data_0);