API#

This section provides a detailed list of the library API

Host Utility Functions#

template<typename DataType>
void rocalution::allocate_host(int64_t n, DataType **ptr)#

Allocate buffer on the host.

allocate_host allocates a buffer on the host.

Parameters:
  • n[in] number of elements the buffer need to be allocated for

  • ptr[out] pointer to the position in memory where the buffer should be allocated, it is expected that *ptr == NULL

Template Parameters:

DataType – can be char, int, unsigned int, float, double, std::complex<float> or std::complex<double>.

template<typename DataType>
void rocalution::free_host(DataType **ptr)#

Free buffer on the host.

free_host deallocates a buffer on the host. *ptr will be set to NULL after successful deallocation.

Parameters:

ptr[inout] pointer to the position in memory where the buffer should be deallocated, it is expected that *ptr != NULL

Template Parameters:

DataType – can be char, int, unsigned int, float, double, std::complex<float> or std::complex<double>.

template<typename DataType>
void rocalution::set_to_zero_host(int64_t n, DataType *ptr)#

Set a host buffer to zero.

set_to_zero_host sets a host buffer to zero.

Parameters:
  • n[in] number of elements

  • ptr[inout] pointer to the host buffer

Template Parameters:

DataType – can be char, int, unsigned int, float, double, std::complex<float> or std::complex<double>.

double rocalution::rocalution_time(void)#

Return current time in microseconds.

Backend Manager#

int rocalution::init_rocalution(int rank = -1, int dev_per_node = 1)#

Initialize rocALUTION platform.

init_rocalution defines a backend descriptor with information about the hardware and its specifications. All objects created after that contain a copy of this descriptor. If the specifications of the global descriptor are changed (e.g. set different number of threads) and new objects are created, only the new objects will use the new configurations.

For control, the library provides the following functions

Example
#include <rocalution/rocalution.hpp>

using namespace rocalution;

int main(int argc, char* argv[])
{
    init_rocalution();

    // ...

    stop_rocalution();

    return 0;
}

Parameters:
  • rank[in] specifies MPI rank when multi-node environment

  • dev_per_node[in] number of accelerator devices per node, when in multi-GPU environment

int rocalution::stop_rocalution(void)#

Shutdown rocALUTION platform.

stop_rocalution shuts down the rocALUTION platform.

void rocalution::set_device_rocalution(int dev)#

Set the accelerator device.

set_device_rocalution lets the user select the accelerator device that is supposed to be used for the computation.

Parameters:

dev[in] accelerator device ID for computation

void rocalution::set_omp_threads_rocalution(int nthreads)#

Set number of OpenMP threads.

The number of threads which rocALUTION will use can be set with set_omp_threads_rocalution or by the global OpenMP environment variable (for Unix-like OS this is OMP_NUM_THREADS). During the initialization phase, the library provides affinity thread-core mapping:

  • If the number of cores (including SMT cores) is greater or equal than two times the number of threads, then all the threads can occupy every second core ID (e.g. 0, 2, 4, \(\ldots\)). This is to avoid having two threads working on the same physical core, when SMT is enabled.

  • If the number of threads is less or equal to the number of cores (including SMT), and the previous clause is false, then the threads can occupy every core ID (e.g. 0, 1, 2, 3, \(\ldots\)).

  • If non of the above criteria is matched, then the default thread-core mapping is used (typically set by the OS).

Note

The thread-core mapping is available only for Unix-like OS.

Note

The user can disable the thread affinity by calling set_omp_affinity_rocalution(), before initializing the library (i.e. before init_rocalution()).

Parameters:

nthreads[in] number of OpenMP threads

void rocalution::set_omp_affinity_rocalution(bool affinity)#

Enable/disable OpenMP host affinity.

set_omp_affinity_rocalution enables / disables OpenMP host affinity.

Parameters:

affinity[in] boolean to turn on/off OpenMP host affinity

void rocalution::set_omp_threshold_rocalution(int threshold)#

Set OpenMP threshold size.

Whenever you want to work on a small problem, you might observe that the OpenMP host backend is (slightly) slower than using no OpenMP. This is mainly attributed to the small amount of work, which every thread should perform and the large overhead of forking/joining threads. This can be avoid by the OpenMP threshold size parameter in rocALUTION. The default threshold is set to 10000, which means that all matrices under (and equal) this size will use only one thread (disregarding the number of OpenMP threads set in the system). The threshold can be modified with set_omp_threshold_rocalution.

Parameters:

threshold[in] OpenMP threshold size

void rocalution::info_rocalution(void)#

Print info about rocALUTION.

info_rocalution prints information about the rocALUTION platform

void rocalution::info_rocalution(const struct Rocalution_Backend_Descriptor &backend_descriptor)#

Print info about specific rocALUTION backend descriptor.

info_rocalution prints information about the rocALUTION platform of the specific backend descriptor.

Parameters:

backend_descriptor[in] rocALUTION backend descriptor

void rocalution::disable_accelerator_rocalution(bool onoff = true)#

Disable/Enable the accelerator.

If you want to disable the accelerator (without re-compiling the code), you need to call disable_accelerator_rocalution before init_rocalution().

Parameters:

onoff[in] boolean to turn on/off the accelerator

void rocalution::_rocalution_sync(void)#

Sync rocALUTION.

_rocalution_sync blocks the host until all active asynchronous transfers are completed (this is a global sync).

Base Rocalution#

template<typename ValueType>
class BaseRocalution : public rocalution::RocalutionObj#

Base class for all operators and vectors.

Template Parameters:

ValueType – - can be int, float, double, std::complex<float> and std::complex<double>

Subclassed by rocalution::Operator< ValueType >, rocalution::Vector< ValueType >

Public Functions

virtual void MoveToAccelerator(void) = 0#

Move the object to the accelerator backend.

virtual void MoveToHost(void) = 0#

Move the object to the host backend.

virtual void MoveToAcceleratorAsync(void)#

Move the object to the accelerator backend with async move.

virtual void MoveToHostAsync(void)#

Move the object to the host backend with async move.

virtual void Sync(void)#

Sync (the async move)

virtual void CloneBackend(const BaseRocalution<ValueType> &src)#

Clone the Backend descriptor from another object.

With CloneBackend, the backend can be cloned without copying any data. This is especially useful, if several objects should reside on the same backend, but keep their original data.

Example
LocalVector<ValueType> vec;
LocalMatrix<ValueType> mat;

// Allocate and initialize vec and mat
// ...

LocalVector<ValueType> tmp;
// By cloning backend, tmp and vec will have the same backend as mat
tmp.CloneBackend(mat);
vec.CloneBackend(mat);

// The following matrix vector multiplication will be performed on the backend
// selected in mat
mat.Apply(vec, &tmp);

Parameters:

src[in] Object, where the backend should be cloned from.

virtual void Info(void) const = 0#

Print object information.

Info can print object information about any rocALUTION object. This information consists of object properties and backend data.

Example
mat.Info();
vec.Info();

virtual void Clear(void) = 0#

Clear (free all data) the object.

Operator#

template<typename ValueType>
class Operator : public rocalution::BaseRocalution<ValueType>#

Operator class.

The Operator class defines the generic interface for applying an operator (e.g. matrix or stencil) from/to global and local vectors.

Template Parameters:

ValueType – - can be int, float, double, std::complex<float> and std::complex<double>

Subclassed by rocalution::GlobalMatrix< ValueType >, rocalution::LocalMatrix< ValueType >, rocalution::LocalStencil< ValueType >

Public Functions

virtual int64_t GetM(void) const = 0#

Return the number of rows in the matrix/stencil.

virtual int64_t GetN(void) const = 0#

Return the number of columns in the matrix/stencil.

virtual int64_t GetNnz(void) const = 0#

Return the number of non-zeros in the matrix/stencil.

virtual int64_t GetLocalM(void) const#

Return the number of rows in the local matrix/stencil.

virtual int64_t GetLocalN(void) const#

Return the number of columns in the local matrix/stencil.

virtual int64_t GetLocalNnz(void) const#

Return the number of non-zeros in the local matrix/stencil.

virtual int64_t GetGhostM(void) const#

Return the number of rows in the ghost matrix/stencil.

virtual int64_t GetGhostN(void) const#

Return the number of columns in the ghost matrix/stencil.

virtual int64_t GetGhostNnz(void) const#

Return the number of non-zeros in the ghost matrix/stencil.

virtual void Transpose(void)#

Transpose the operator.

virtual void Apply(const LocalVector<ValueType> &in, LocalVector<ValueType> *out) const#

Apply the operator, out = Operator(in), where in and out are local vectors.

virtual void ApplyAdd(const LocalVector<ValueType> &in, ValueType scalar, LocalVector<ValueType> *out) const#

Apply and add the operator, out += scalar * Operator(in), where in and out are local vectors.

virtual void Apply(const GlobalVector<ValueType> &in, GlobalVector<ValueType> *out) const#

Apply the operator, out = Operator(in), where in and out are global vectors.

virtual void ApplyAdd(const GlobalVector<ValueType> &in, ValueType scalar, GlobalVector<ValueType> *out) const#

Apply and add the operator, out += scalar * Operator(in), where in and out are global vectors.

Vector#

template<typename ValueType>
class Vector : public rocalution::BaseRocalution<ValueType>#

Vector class.

The Vector class defines the generic interface for local and global vectors.

Template Parameters:

ValueType – - can be int, float, double, std::complex<float> and std::complex<double>

Subclassed by rocalution::LocalVector< int >, rocalution::GlobalVector< ValueType >, rocalution::LocalVector< ValueType >

Unnamed Group

virtual void CopyFrom(const LocalVector<ValueType> &src)#

Copy vector from another vector.

CopyFrom copies values from another vector.

Example
LocalVector<ValueType> vec1, vec2;

// Allocate and initialize vec1 and vec2
// ...

// Move vec1 to accelerator
// vec1.MoveToAccelerator();

// Now, vec1 is on the accelerator (if available)
// and vec2 is on the host

// Copy vec1 to vec2 (or vice versa) will move data between host and
// accelerator backend
vec1.CopyFrom(vec2);

Note

This function allows cross platform copying. One of the objects could be allocated on the accelerator backend.

Parameters:

src[in] Vector, where values should be copied from.

virtual void CopyFrom(const GlobalVector<ValueType> &src)#

Copy vector from another vector.

CopyFrom copies values from another vector.

Example
LocalVector<ValueType> vec1, vec2;

// Allocate and initialize vec1 and vec2
// ...

// Move vec1 to accelerator
// vec1.MoveToAccelerator();

// Now, vec1 is on the accelerator (if available)
// and vec2 is on the host

// Copy vec1 to vec2 (or vice versa) will move data between host and
// accelerator backend
vec1.CopyFrom(vec2);

Note

This function allows cross platform copying. One of the objects could be allocated on the accelerator backend.

Parameters:

src[in] Vector, where values should be copied from.

Unnamed Group

virtual void CloneFrom(const LocalVector<ValueType> &src)#

Clone the vector.

CloneFrom clones the entire vector, with data and backend descriptor from another Vector.

Example
LocalVector<ValueType> vec;

// Allocate and initialize vec (host or accelerator)
// ...

LocalVector<ValueType> tmp;

// By cloning vec, tmp will have identical values and will be on the same
// backend as vec
tmp.CloneFrom(vec);

Parameters:

src[in] Vector to clone from.

virtual void CloneFrom(const GlobalVector<ValueType> &src)#

Clone the vector.

CloneFrom clones the entire vector, with data and backend descriptor from another Vector.

Example
LocalVector<ValueType> vec;

// Allocate and initialize vec (host or accelerator)
// ...

LocalVector<ValueType> tmp;

// By cloning vec, tmp will have identical values and will be on the same
// backend as vec
tmp.CloneFrom(vec);

Parameters:

src[in] Vector to clone from.

Public Functions

virtual int64_t GetSize(void) const = 0#

Return the size of the vector.

virtual int64_t GetLocalSize(void) const#

Return the size of the local vector.

virtual bool Check(void) const = 0#

Perform a sanity check of the vector.

Checks, if the vector contains valid data, i.e. if the values are not infinity and not NaN (not a number).

Return values:
  • true – if the vector is ok (empty vector is also ok).

  • false – if there is something wrong with the values.

virtual void Clear(void) = 0#

Clear (free all data) the object.

virtual void Zeros(void) = 0#

Set all values of the vector to 0.

virtual void Ones(void) = 0#

Set all values of the vector to 1.

virtual void SetValues(ValueType val) = 0#

Set all values of the vector to given argument.

virtual void SetRandomUniform(unsigned long long seed, ValueType a = static_cast<ValueType>(-1), ValueType b = static_cast<ValueType>(1)) = 0#

Fill the vector with random values from interval [a,b].

virtual void SetRandomNormal(unsigned long long seed, ValueType mean = static_cast<ValueType>(0), ValueType var = static_cast<ValueType>(1)) = 0#

Fill the vector with random values from normal distribution.

virtual void ReadFileASCII(const std::string &filename) = 0#

Read vector from ASCII file.

Read a vector from ASCII file.

Example
LocalVector<ValueType> vec;
vec.ReadFileASCII("my_vector.dat");

Parameters:

filename[in] name of the file containing the ASCII data.

virtual void WriteFileASCII(const std::string &filename) const = 0#

Write vector to ASCII file.

Write a vector to ASCII file.

Example
LocalVector<ValueType> vec;

// Allocate and fill vec
// ...

vec.WriteFileASCII("my_vector.dat");

Parameters:

filename[in] name of the file to write the ASCII data to.

virtual void ReadFileBinary(const std::string &filename) = 0#

Read vector from binary file.

Read a vector from binary file. For details on the format, see WriteFileBinary().

Example
LocalVector<ValueType> vec;
vec.ReadFileBinary("my_vector.bin");

Parameters:

filename[in] name of the file containing the data.

virtual void WriteFileBinary(const std::string &filename) const = 0#

Write vector to binary file.

Write a vector to binary file.

The binary format contains a header, the rocALUTION version and the vector data as follows

// Header
out << "#rocALUTION binary vector file" << std::endl;

// rocALUTION version
out.write((char*)&version, sizeof(int));

// Vector data
out.write((char*)&size, sizeof(int));
out.write((char*)vec_val, size * sizeof(double));

Example
LocalVector<ValueType> vec;

// Allocate and fill vec
// ...

vec.WriteFileBinary("my_vector.bin");

Note

Vector values array is always stored in double precision (e.g. double or std::complex<double>).

Parameters:

filename[in] name of the file to write the data to.

virtual void CopyFromAsync(const LocalVector<ValueType> &src)#

Async copy from another local vector.

virtual void CopyFromFloat(const LocalVector<float> &src)#

Copy values from another local float vector.

virtual void CopyFromDouble(const LocalVector<double> &src)#

Copy values from another local double vector.

virtual void CopyFrom(const LocalVector<ValueType> &src, int64_t src_offset, int64_t dst_offset, int64_t size)#

Copy vector from another vector with offsets and size.

CopyFrom copies values with specific source and destination offsets and sizes from another vector.

Note

This function allows cross platform copying. One of the objects could be allocated on the accelerator backend.

Parameters:
  • src[in] Vector, where values should be copied from.

  • src_offset[in] source offset.

  • dst_offset[in] destination offset.

  • size[in] number of entries to be copied.

virtual void AddScale(const LocalVector<ValueType> &x, ValueType alpha)#

Perform vector update of type this = this + alpha * x.

virtual void AddScale(const GlobalVector<ValueType> &x, ValueType alpha)#

Perform vector update of type this = this + alpha * x.

virtual void ScaleAdd(ValueType alpha, const LocalVector<ValueType> &x)#

Perform vector update of type this = alpha * this + x.

virtual void ScaleAdd(ValueType alpha, const GlobalVector<ValueType> &x)#

Perform vector update of type this = alpha * this + x.

virtual void ScaleAddScale(ValueType alpha, const LocalVector<ValueType> &x, ValueType beta)#

Perform vector update of type this = alpha * this + x * beta.

virtual void ScaleAddScale(ValueType alpha, const GlobalVector<ValueType> &x, ValueType beta)#

Perform vector update of type this = alpha * this + x * beta.

virtual void ScaleAddScale(ValueType alpha, const LocalVector<ValueType> &x, ValueType beta, int64_t src_offset, int64_t dst_offset, int64_t size)#

Perform vector update of type this = alpha * this + x * beta with offsets.

virtual void ScaleAddScale(ValueType alpha, const GlobalVector<ValueType> &x, ValueType beta, int64_t src_offset, int64_t dst_offset, int64_t size)#

Perform vector update of type this = alpha * this + x * beta with offsets.

virtual void ScaleAdd2(ValueType alpha, const LocalVector<ValueType> &x, ValueType beta, const LocalVector<ValueType> &y, ValueType gamma)#

Perform vector update of type this = alpha * this + x * beta + y * gamma.

virtual void ScaleAdd2(ValueType alpha, const GlobalVector<ValueType> &x, ValueType beta, const GlobalVector<ValueType> &y, ValueType gamma)#

Perform vector update of type this = alpha * this + x * beta + y * gamma.

virtual void Scale(ValueType alpha) = 0#

Perform vector scaling this = alpha * this.

virtual ValueType Dot(const LocalVector<ValueType> &x) const#

Compute dot (scalar) product, return this^T y.

virtual ValueType Dot(const GlobalVector<ValueType> &x) const#

Compute dot (scalar) product, return this^T y.

virtual ValueType DotNonConj(const LocalVector<ValueType> &x) const#

Compute non-conjugate dot (scalar) product, return this^T y.

virtual ValueType DotNonConj(const GlobalVector<ValueType> &x) const#

Compute non-conjugate dot (scalar) product, return this^T y.

virtual ValueType Norm(void) const = 0#

Compute \(L_2\) norm of the vector, return = srqt(this^T this)

virtual ValueType Reduce(void) const = 0#

Reduce the vector.

virtual ValueType InclusiveSum(void) = 0#

Compute Inclusive sum.

virtual ValueType InclusiveSum(const LocalVector<ValueType> &vec)#

Compute Inclusive sum.

virtual ValueType InclusiveSum(const GlobalVector<ValueType> &vec)#

Compute Inclusive sum.

virtual ValueType ExclusiveSum(void) = 0#

Compute exclusive sum.

virtual ValueType ExclusiveSum(const LocalVector<ValueType> &vec)#

Compute exclusive sum.

virtual ValueType ExclusiveSum(const GlobalVector<ValueType> &vec)#

Compute exclusive sum.

virtual ValueType Asum(void) const = 0#

Compute the sum of absolute values of the vector, return = sum(|this|)

virtual int64_t Amax(ValueType &value) const = 0#

Compute the absolute max of the vector, return = index(max(|this|))

virtual void PointWiseMult(const LocalVector<ValueType> &x)#

Perform point-wise multiplication (element-wise) of this = this * x.

virtual void PointWiseMult(const GlobalVector<ValueType> &x)#

Perform point-wise multiplication (element-wise) of this = this * x.

virtual void PointWiseMult(const LocalVector<ValueType> &x, const LocalVector<ValueType> &y)#

Perform point-wise multiplication (element-wise) of this = x * y.

virtual void PointWiseMult(const GlobalVector<ValueType> &x, const GlobalVector<ValueType> &y)#

Perform point-wise multiplication (element-wise) of this = x * y.

virtual void Power(double power) = 0#

Perform power operation to a vector.

Local Matrix#

template<typename ValueType>
class LocalMatrix : public rocalution::Operator<ValueType>#

LocalMatrix class.

A LocalMatrix is called local, because it will always stay on a single system. The system can contain several CPUs via UMA or NUMA memory system or it can contain an accelerator.

A number of matrix formats are supported. These are CSR, BCSR, MCSR, COO, DIA, ELL, HYB, and DENSE.

Note

For CSR type matrices, the column indices must be sorted in increasing order. For COO matrices, the row indices must be sorted in increasing order. The function Check can be used to check whether a matrix contains valid data. For CSR and COO matrices, the function Sort can be used to sort the row or column indices respectively.

Template Parameters:

ValueType – - can be int, float, double, std::complex<float> and std::complex<double>

Unnamed Group

void AllocateCSR(const std::string &name, int64_t nnz, int64_t nrow, int64_t ncol)#

Allocate a local matrix with name and sizes.

The local matrix allocation functions require a name of the object (this is only for information purposes) and corresponding number of non-zero elements, number of rows and number of columns. Furthermore, depending on the matrix format, additional parameters are required.

Example
LocalMatrix<ValueType> mat;

mat.AllocateCSR("my CSR matrix", 456, 100, 100);
mat.Clear();

mat.AllocateCOO("my COO matrix", 200, 100, 100);
mat.Clear();

void AllocateBCSR(const std::string &name, int64_t nnzb, int64_t nrowb, int64_t ncolb, int blockdim)#

Allocate a local matrix with name and sizes.

The local matrix allocation functions require a name of the object (this is only for information purposes) and corresponding number of non-zero elements, number of rows and number of columns. Furthermore, depending on the matrix format, additional parameters are required.

Example
LocalMatrix<ValueType> mat;

mat.AllocateCSR("my CSR matrix", 456, 100, 100);
mat.Clear();

mat.AllocateCOO("my COO matrix", 200, 100, 100);
mat.Clear();

void AllocateMCSR(const std::string &name, int64_t nnz, int64_t nrow, int64_t ncol)#

Allocate a local matrix with name and sizes.

The local matrix allocation functions require a name of the object (this is only for information purposes) and corresponding number of non-zero elements, number of rows and number of columns. Furthermore, depending on the matrix format, additional parameters are required.

Example
LocalMatrix<ValueType> mat;

mat.AllocateCSR("my CSR matrix", 456, 100, 100);
mat.Clear();

mat.AllocateCOO("my COO matrix", 200, 100, 100);
mat.Clear();

void AllocateCOO(const std::string &name, int64_t nnz, int64_t nrow, int64_t ncol)#

Allocate a local matrix with name and sizes.

The local matrix allocation functions require a name of the object (this is only for information purposes) and corresponding number of non-zero elements, number of rows and number of columns. Furthermore, depending on the matrix format, additional parameters are required.

Example
LocalMatrix<ValueType> mat;

mat.AllocateCSR("my CSR matrix", 456, 100, 100);
mat.Clear();

mat.AllocateCOO("my COO matrix", 200, 100, 100);
mat.Clear();

void AllocateDIA(const std::string &name, int64_t nnz, int64_t nrow, int64_t ncol, int ndiag)#

Allocate a local matrix with name and sizes.

The local matrix allocation functions require a name of the object (this is only for information purposes) and corresponding number of non-zero elements, number of rows and number of columns. Furthermore, depending on the matrix format, additional parameters are required.

Example
LocalMatrix<ValueType> mat;

mat.AllocateCSR("my CSR matrix", 456, 100, 100);
mat.Clear();

mat.AllocateCOO("my COO matrix", 200, 100, 100);
mat.Clear();

void AllocateELL(const std::string &name, int64_t nnz, int64_t nrow, int64_t ncol, int max_row)#

Allocate a local matrix with name and sizes.

The local matrix allocation functions require a name of the object (this is only for information purposes) and corresponding number of non-zero elements, number of rows and number of columns. Furthermore, depending on the matrix format, additional parameters are required.

Example
LocalMatrix<ValueType> mat;

mat.AllocateCSR("my CSR matrix", 456, 100, 100);
mat.Clear();

mat.AllocateCOO("my COO matrix", 200, 100, 100);
mat.Clear();

void AllocateHYB(const std::string &name, int64_t ell_nnz, int64_t coo_nnz, int ell_max_row, int64_t nrow, int64_t ncol)#

Allocate a local matrix with name and sizes.

The local matrix allocation functions require a name of the object (this is only for information purposes) and corresponding number of non-zero elements, number of rows and number of columns. Furthermore, depending on the matrix format, additional parameters are required.

Example
LocalMatrix<ValueType> mat;

mat.AllocateCSR("my CSR matrix", 456, 100, 100);
mat.Clear();

mat.AllocateCOO("my COO matrix", 200, 100, 100);
mat.Clear();

void AllocateDENSE(const std::string &name, int64_t nrow, int64_t ncol)#

Allocate a local matrix with name and sizes.

The local matrix allocation functions require a name of the object (this is only for information purposes) and corresponding number of non-zero elements, number of rows and number of columns. Furthermore, depending on the matrix format, additional parameters are required.

Example
LocalMatrix<ValueType> mat;

mat.AllocateCSR("my CSR matrix", 456, 100, 100);
mat.Clear();

mat.AllocateCOO("my COO matrix", 200, 100, 100);
mat.Clear();

Unnamed Group

void SetDataPtrCOO(int **row, int **col, ValueType **val, std::string name, int64_t nnz, int64_t nrow, int64_t ncol)#

Initialize a LocalMatrix on the host with externally allocated data.

SetDataPtr functions have direct access to the raw data via pointers. Already allocated data can be set by passing their pointers.

Example
// Allocate a CSR matrix
int* csr_row_ptr   = new int[100 + 1];
int* csr_col_ind   = new int[345];
ValueType* csr_val = new ValueType[345];

// Fill the CSR matrix
// ...

// rocALUTION local matrix object
LocalMatrix<ValueType> mat;

// Set the CSR matrix data, csr_row_ptr, csr_col and csr_val pointers become
// invalid
mat.SetDataPtrCSR(&csr_row_ptr, &csr_col, &csr_val, "my_matrix", 345, 100, 100);

Note

Setting data pointers will leave the original pointers empty (set to NULL).

void SetDataPtrCSR(PtrType **row_offset, int **col, ValueType **val, std::string name, int64_t nnz, int64_t nrow, int64_t ncol)#

Initialize a LocalMatrix on the host with externally allocated data.

SetDataPtr functions have direct access to the raw data via pointers. Already allocated data can be set by passing their pointers.

Example
// Allocate a CSR matrix
int* csr_row_ptr   = new int[100 + 1];
int* csr_col_ind   = new int[345];
ValueType* csr_val = new ValueType[345];

// Fill the CSR matrix
// ...

// rocALUTION local matrix object
LocalMatrix<ValueType> mat;

// Set the CSR matrix data, csr_row_ptr, csr_col and csr_val pointers become
// invalid
mat.SetDataPtrCSR(&csr_row_ptr, &csr_col, &csr_val, "my_matrix", 345, 100, 100);

Note

Setting data pointers will leave the original pointers empty (set to NULL).

void SetDataPtrBCSR(int **row_offset, int **col, ValueType **val, std::string name, int64_t nnzb, int64_t nrowb, int64_t ncolb, int blockdim)#

Initialize a LocalMatrix on the host with externally allocated data.

SetDataPtr functions have direct access to the raw data via pointers. Already allocated data can be set by passing their pointers.

Example
// Allocate a CSR matrix
int* csr_row_ptr   = new int[100 + 1];
int* csr_col_ind   = new int[345];
ValueType* csr_val = new ValueType[345];

// Fill the CSR matrix
// ...

// rocALUTION local matrix object
LocalMatrix<ValueType> mat;

// Set the CSR matrix data, csr_row_ptr, csr_col and csr_val pointers become
// invalid
mat.SetDataPtrCSR(&csr_row_ptr, &csr_col, &csr_val, "my_matrix", 345, 100, 100);

Note

Setting data pointers will leave the original pointers empty (set to NULL).

void SetDataPtrMCSR(int **row_offset, int **col, ValueType **val, std::string name, int64_t nnz, int64_t nrow, int64_t ncol)#

Initialize a LocalMatrix on the host with externally allocated data.

SetDataPtr functions have direct access to the raw data via pointers. Already allocated data can be set by passing their pointers.

Example
// Allocate a CSR matrix
int* csr_row_ptr   = new int[100 + 1];
int* csr_col_ind   = new int[345];
ValueType* csr_val = new ValueType[345];

// Fill the CSR matrix
// ...

// rocALUTION local matrix object
LocalMatrix<ValueType> mat;

// Set the CSR matrix data, csr_row_ptr, csr_col and csr_val pointers become
// invalid
mat.SetDataPtrCSR(&csr_row_ptr, &csr_col, &csr_val, "my_matrix", 345, 100, 100);

Note

Setting data pointers will leave the original pointers empty (set to NULL).

void SetDataPtrELL(int **col, ValueType **val, std::string name, int64_t nnz, int64_t nrow, int64_t ncol, int max_row)#

Initialize a LocalMatrix on the host with externally allocated data.

SetDataPtr functions have direct access to the raw data via pointers. Already allocated data can be set by passing their pointers.

Example
// Allocate a CSR matrix
int* csr_row_ptr   = new int[100 + 1];
int* csr_col_ind   = new int[345];
ValueType* csr_val = new ValueType[345];

// Fill the CSR matrix
// ...

// rocALUTION local matrix object
LocalMatrix<ValueType> mat;

// Set the CSR matrix data, csr_row_ptr, csr_col and csr_val pointers become
// invalid
mat.SetDataPtrCSR(&csr_row_ptr, &csr_col, &csr_val, "my_matrix", 345, 100, 100);

Note

Setting data pointers will leave the original pointers empty (set to NULL).

void SetDataPtrDIA(int **offset, ValueType **val, std::string name, int64_t nnz, int64_t nrow, int64_t ncol, int num_diag)#

Initialize a LocalMatrix on the host with externally allocated data.

SetDataPtr functions have direct access to the raw data via pointers. Already allocated data can be set by passing their pointers.

Example
// Allocate a CSR matrix
int* csr_row_ptr   = new int[100 + 1];
int* csr_col_ind   = new int[345];
ValueType* csr_val = new ValueType[345];

// Fill the CSR matrix
// ...

// rocALUTION local matrix object
LocalMatrix<ValueType> mat;

// Set the CSR matrix data, csr_row_ptr, csr_col and csr_val pointers become
// invalid
mat.SetDataPtrCSR(&csr_row_ptr, &csr_col, &csr_val, "my_matrix", 345, 100, 100);

Note

Setting data pointers will leave the original pointers empty (set to NULL).

void SetDataPtrDENSE(ValueType **val, std::string name, int64_t nrow, int64_t ncol)#

Initialize a LocalMatrix on the host with externally allocated data.

SetDataPtr functions have direct access to the raw data via pointers. Already allocated data can be set by passing their pointers.

Example
// Allocate a CSR matrix
int* csr_row_ptr   = new int[100 + 1];
int* csr_col_ind   = new int[345];
ValueType* csr_val = new ValueType[345];

// Fill the CSR matrix
// ...

// rocALUTION local matrix object
LocalMatrix<ValueType> mat;

// Set the CSR matrix data, csr_row_ptr, csr_col and csr_val pointers become
// invalid
mat.SetDataPtrCSR(&csr_row_ptr, &csr_col, &csr_val, "my_matrix", 345, 100, 100);

Note

Setting data pointers will leave the original pointers empty (set to NULL).

Unnamed Group

void LeaveDataPtrCOO(int **row, int **col, ValueType **val)#

Leave a LocalMatrix to host pointers.

LeaveDataPtr functions have direct access to the raw data via pointers. A LocalMatrix object can leave its raw data to host pointers. This will leave the LocalMatrix empty.

Example
// rocALUTION CSR matrix object
LocalMatrix<ValueType> mat;

// Allocate the CSR matrix
mat.AllocateCSR("my_matrix", 345, 100, 100);

// Fill CSR matrix
// ...

int* csr_row_ptr   = NULL;
int* csr_col_ind   = NULL;
ValueType* csr_val = NULL;

// Get (steal) the data from the matrix, this will leave the local matrix
// object empty
mat.LeaveDataPtrCSR(&csr_row_ptr, &csr_col_ind, &csr_val);

void LeaveDataPtrCSR(PtrType **row_offset, int **col, ValueType **val)#

Leave a LocalMatrix to host pointers.

LeaveDataPtr functions have direct access to the raw data via pointers. A LocalMatrix object can leave its raw data to host pointers. This will leave the LocalMatrix empty.

Example
// rocALUTION CSR matrix object
LocalMatrix<ValueType> mat;

// Allocate the CSR matrix
mat.AllocateCSR("my_matrix", 345, 100, 100);

// Fill CSR matrix
// ...

int* csr_row_ptr   = NULL;
int* csr_col_ind   = NULL;
ValueType* csr_val = NULL;

// Get (steal) the data from the matrix, this will leave the local matrix
// object empty
mat.LeaveDataPtrCSR(&csr_row_ptr, &csr_col_ind, &csr_val);

void LeaveDataPtrBCSR(int **row_offset, int **col, ValueType **val, int &blockdim)#

Leave a LocalMatrix to host pointers.

LeaveDataPtr functions have direct access to the raw data via pointers. A LocalMatrix object can leave its raw data to host pointers. This will leave the LocalMatrix empty.

Example
// rocALUTION CSR matrix object
LocalMatrix<ValueType> mat;

// Allocate the CSR matrix
mat.AllocateCSR("my_matrix", 345, 100, 100);

// Fill CSR matrix
// ...

int* csr_row_ptr   = NULL;
int* csr_col_ind   = NULL;
ValueType* csr_val = NULL;

// Get (steal) the data from the matrix, this will leave the local matrix
// object empty
mat.LeaveDataPtrCSR(&csr_row_ptr, &csr_col_ind, &csr_val);

void LeaveDataPtrMCSR(int **row_offset, int **col, ValueType **val)#

Leave a LocalMatrix to host pointers.

LeaveDataPtr functions have direct access to the raw data via pointers. A LocalMatrix object can leave its raw data to host pointers. This will leave the LocalMatrix empty.

Example
// rocALUTION CSR matrix object
LocalMatrix<ValueType> mat;

// Allocate the CSR matrix
mat.AllocateCSR("my_matrix", 345, 100, 100);

// Fill CSR matrix
// ...

int* csr_row_ptr   = NULL;
int* csr_col_ind   = NULL;
ValueType* csr_val = NULL;

// Get (steal) the data from the matrix, this will leave the local matrix
// object empty
mat.LeaveDataPtrCSR(&csr_row_ptr, &csr_col_ind, &csr_val);

void LeaveDataPtrELL(int **col, ValueType **val, int &max_row)#

Leave a LocalMatrix to host pointers.

LeaveDataPtr functions have direct access to the raw data via pointers. A LocalMatrix object can leave its raw data to host pointers. This will leave the LocalMatrix empty.

Example
// rocALUTION CSR matrix object
LocalMatrix<ValueType> mat;

// Allocate the CSR matrix
mat.AllocateCSR("my_matrix", 345, 100, 100);

// Fill CSR matrix
// ...

int* csr_row_ptr   = NULL;
int* csr_col_ind   = NULL;
ValueType* csr_val = NULL;

// Get (steal) the data from the matrix, this will leave the local matrix
// object empty
mat.LeaveDataPtrCSR(&csr_row_ptr, &csr_col_ind, &csr_val);

void LeaveDataPtrDIA(int **offset, ValueType **val, int &num_diag)#

Leave a LocalMatrix to host pointers.

LeaveDataPtr functions have direct access to the raw data via pointers. A LocalMatrix object can leave its raw data to host pointers. This will leave the LocalMatrix empty.

Example
// rocALUTION CSR matrix object
LocalMatrix<ValueType> mat;

// Allocate the CSR matrix
mat.AllocateCSR("my_matrix", 345, 100, 100);

// Fill CSR matrix
// ...

int* csr_row_ptr   = NULL;
int* csr_col_ind   = NULL;
ValueType* csr_val = NULL;

// Get (steal) the data from the matrix, this will leave the local matrix
// object empty
mat.LeaveDataPtrCSR(&csr_row_ptr, &csr_col_ind, &csr_val);

void LeaveDataPtrDENSE(ValueType **val)#

Leave a LocalMatrix to host pointers.

LeaveDataPtr functions have direct access to the raw data via pointers. A LocalMatrix object can leave its raw data to host pointers. This will leave the LocalMatrix empty.

Example
// rocALUTION CSR matrix object
LocalMatrix<ValueType> mat;

// Allocate the CSR matrix
mat.AllocateCSR("my_matrix", 345, 100, 100);

// Fill CSR matrix
// ...

int* csr_row_ptr   = NULL;
int* csr_col_ind   = NULL;
ValueType* csr_val = NULL;

// Get (steal) the data from the matrix, this will leave the local matrix
// object empty
mat.LeaveDataPtrCSR(&csr_row_ptr, &csr_col_ind, &csr_val);

Public Functions

virtual void Info(void) const#

Shows simple info about the matrix.

unsigned int GetFormat(void) const#

Return the matrix format id (see matrix_formats.hpp)

int GetBlockDimension(void) const#

Return the matrix block dimension.

virtual int64_t GetM(void) const#

Return the number of rows in the local matrix.

virtual int64_t GetN(void) const#

Return the number of columns in the local matrix.

virtual int64_t GetNnz(void) const#

Return the number of non-zeros in the local matrix.

bool Check(void) const#

Perform a sanity check of the matrix.

Checks, if the matrix contains valid data, i.e. if the values are not infinity and not NaN (not a number) and if the structure of the matrix is correct (e.g. indices cannot be negative, CSR and COO matrices have to be sorted, etc.).

Return values:
  • true – if the matrix is ok (empty matrix is also ok).

  • false – if there is something wrong with the structure or values.

virtual void Clear(void)#

Clear (free) the matrix.

void Zeros(void)#

Set all matrix values to zero.

void Scale(ValueType alpha)#

Scale all values in the matrix.

void ScaleDiagonal(ValueType alpha)#

Scale the diagonal entries of the matrix with alpha, all diagonal elements must exist.

void ScaleOffDiagonal(ValueType alpha)#

Scale the off-diagonal entries of the matrix with alpha, all diagonal elements must exist.

void AddScalar(ValueType alpha)#

Add a scalar to all matrix values.

void AddScalarDiagonal(ValueType alpha)#

Add alpha to the diagonal entries of the matrix, all diagonal elements must exist.

void AddScalarOffDiagonal(ValueType alpha)#

Add alpha to the off-diagonal entries of the matrix, all diagonal elements must exist.

void ExtractSubMatrix(int64_t row_offset, int64_t col_offset, int64_t row_size, int64_t col_size, LocalMatrix<ValueType> *mat) const#

Extract a sub-matrix with row/col_offset and row/col_size.

void ExtractSubMatrices(int row_num_blocks, int col_num_blocks, const int *row_offset, const int *col_offset, LocalMatrix<ValueType> ***mat) const#

Extract array of non-overlapping sub-matrices (row/col_num_blocks define the blocks for rows/columns; row/col_offset have sizes col/row_num_blocks+1, where [i+1]-[i] defines the i-th size of the sub-matrix)

void ExtractDiagonal(LocalVector<ValueType> *vec_diag) const#

Extract the diagonal values of the matrix into a LocalVector.

void ExtractInverseDiagonal(LocalVector<ValueType> *vec_inv_diag) const#

Extract the inverse (reciprocal) diagonal values of the matrix into a LocalVector.

void ExtractU(LocalMatrix<ValueType> *U, bool diag) const#

Extract the upper triangular matrix.

void ExtractL(LocalMatrix<ValueType> *L, bool diag) const#

Extract the lower triangular matrix.

void Permute(const LocalVector<int> &permutation)#

Perform (forward) permutation of the matrix.

void PermuteBackward(const LocalVector<int> &permutation)#

Perform (backward) permutation of the matrix.

void CMK(LocalVector<int> *permutation) const#

Create permutation vector for CMK reordering of the matrix.

The Cuthill-McKee ordering minimize the bandwidth of a given sparse matrix.

Example
LocalVector<int> cmk;

mat.CMK(&cmk);
mat.Permute(cmk);

Parameters:

permutation[out] permutation vector for CMK reordering

void RCMK(LocalVector<int> *permutation) const#

Create permutation vector for reverse CMK reordering of the matrix.

The Reverse Cuthill-McKee ordering minimize the bandwidth of a given sparse matrix.

Example
LocalVector<int> rcmk;

mat.RCMK(&rcmk);
mat.Permute(rcmk);

Parameters:

permutation[out] permutation vector for reverse CMK reordering

void ConnectivityOrder(LocalVector<int> *permutation) const#

Create permutation vector for connectivity reordering of the matrix.

Connectivity ordering returns a permutation, that sorts the matrix by non-zero entries per row.

Example
LocalVector<int> conn;

mat.ConnectivityOrder(&conn);
mat.Permute(conn);

Parameters:

permutation[out] permutation vector for connectivity reordering

void MultiColoring(int &num_colors, int **size_colors, LocalVector<int> *permutation) const#

Perform multi-coloring decomposition of the matrix.

The Multi-Coloring algorithm builds a permutation (coloring of the matrix) in a way such that no two adjacent nodes in the sparse matrix have the same color.

Example
LocalVector<int> mc;
int num_colors;
int* block_colors = NULL;

mat.MultiColoring(num_colors, &block_colors, &mc);
mat.Permute(mc);

Parameters:
  • num_colors[out] number of colors

  • size_colors[out] pointer to array that holds the number of nodes for each color

  • permutation[out] permutation vector for multi-coloring reordering

void MaximalIndependentSet(int &size, LocalVector<int> *permutation) const#

Perform maximal independent set decomposition of the matrix.

The Maximal Independent Set algorithm finds a set with maximal size, that contains elements that do not depend on other elements in this set.

Example
LocalVector<int> mis;
int size;

mat.MaximalIndependentSet(size, &mis);
mat.Permute(mis);

Parameters:
  • size[out] number of independent sets

  • permutation[out] permutation vector for maximal independent set reordering

void ZeroBlockPermutation(int &size, LocalVector<int> *permutation) const#

Return a permutation for saddle-point problems (zero diagonal entries)

For Saddle-Point problems, (i.e. matrices with zero diagonal entries), the Zero Block Permutation maps all zero-diagonal elements to the last block of the matrix.

Example
LocalVector<int> zbp;
int size;

mat.ZeroBlockPermutation(size, &zbp);
mat.Permute(zbp);

Parameters:
  • size[out]

  • permutation[out] permutation vector for zero block permutation

void ILU0Factorize(void)#

Perform ILU(0) factorization.

void LUFactorize(void)#

Perform LU factorization.

void ILUTFactorize(double t, int maxrow)#

Perform ILU(t,m) factorization based on threshold and maximum number of elements per row.

void ILUpFactorize(int p, bool level = true)#

Perform ILU(p) factorization based on power.

void LUAnalyse(void)#

Analyse the structure (level-scheduling)

void LUAnalyseClear(void)#

Delete the analysed data (see LUAnalyse)

void LUSolve(const LocalVector<ValueType> &in, LocalVector<ValueType> *out) const#

Solve LU out = in; if level-scheduling algorithm is provided then the graph traversing is performed in parallel.

void ICFactorize(LocalVector<ValueType> *inv_diag)#

Perform IC(0) factorization.

void LLAnalyse(void)#

Analyse the structure (level-scheduling)

void LLAnalyseClear(void)#

Delete the analysed data (see LLAnalyse)

void LLSolve(const LocalVector<ValueType> &in, LocalVector<ValueType> *out) const#

Solve LL^T out = in; if level-scheduling algorithm is provided then the graph traversing is performed in parallel.

void LLSolve(const LocalVector<ValueType> &in, const LocalVector<ValueType> &inv_diag, LocalVector<ValueType> *out) const#

Solve LL^T out = in; if level-scheduling algorithm is provided then the graph traversing is performed in parallel.

void LAnalyse(bool diag_unit = false)#

Analyse the structure (level-scheduling) L-part.

  • diag_unit == true the diag is 1;

  • diag_unit == false the diag is 0;

void LAnalyseClear(void)#

Delete the analysed data (see LAnalyse) L-part.

void LSolve(const LocalVector<ValueType> &in, LocalVector<ValueType> *out) const#

Solve L out = in; if level-scheduling algorithm is provided then the graph traversing is performed in parallel.

void UAnalyse(bool diag_unit = false)#

Analyse the structure (level-scheduling) U-part;.

  • diag_unit == true the diag is 1;

  • diag_unit == false the diag is 0;

void UAnalyseClear(void)#

Delete the analysed data (see UAnalyse) U-part.

void USolve(const LocalVector<ValueType> &in, LocalVector<ValueType> *out) const#

Solve U out = in; if level-scheduling algorithm is provided then the graph traversing is performed in parallel.

void Householder(int idx, ValueType &beta, LocalVector<ValueType> *vec) const#

Compute Householder vector.

void QRDecompose(void)#

QR Decomposition.

void QRSolve(const LocalVector<ValueType> &in, LocalVector<ValueType> *out) const#

Solve QR out = in.

void Invert(void)#

Matrix inversion using QR decomposition.

void ReadFileMTX(const std::string &filename)#

Read matrix from MTX (Matrix Market Format) file.

Read a matrix from Matrix Market Format file.

Example
LocalMatrix<ValueType> mat;
mat.ReadFileMTX("my_matrix.mtx");

Parameters:

filename[in] name of the file containing the MTX data.

void WriteFileMTX(const std::string &filename) const#

Write matrix to MTX (Matrix Market Format) file.

Write a matrix to Matrix Market Format file.

Example
LocalMatrix<ValueType> mat;

// Allocate and fill mat
// ...

mat.WriteFileMTX("my_matrix.mtx");

Parameters:

filename[in] name of the file to write the MTX data to.

void ReadFileCSR(const std::string &filename)#

Read matrix from CSR (rocALUTION binary format) file.

Read a CSR matrix from binary file. For details on the format, see WriteFileCSR().

Example
LocalMatrix<ValueType> mat;
mat.ReadFileCSR("my_matrix.csr");

Parameters:

filename[in] name of the file containing the data.

void WriteFileCSR(const std::string &filename) const#

Write CSR matrix to binary file.

Write a CSR matrix to binary file.

The binary format contains a header, the rocALUTION version and the matrix data as follows

// Header
out << "#rocALUTION binary csr file" << std::endl;

// rocALUTION version
out.write((char*)&version, sizeof(int));

// CSR matrix data
out.write((char*)&m, sizeof(int));
out.write((char*)&n, sizeof(int));
out.write((char*)&nnz, sizeof(int64_t));
out.write((char*)csr_row_ptr, (m + 1) * sizeof(int));
out.write((char*)csr_col_ind, nnz * sizeof(int));
out.write((char*)csr_val, nnz * sizeof(double));

Example
LocalMatrix<ValueType> mat;

// Allocate and fill mat
// ...

mat.WriteFileCSR("my_matrix.csr");

Note

Vector values array is always stored in double precision (e.g. double or std::complex<double>).

Parameters:

filename[in] name of the file to write the data to.

virtual void MoveToAccelerator(void)#

Move all data (i.e. move the matrix) to the accelerator.

virtual void MoveToAcceleratorAsync(void)#

Move all data (i.e. move the matrix) to the accelerator asynchronously.

virtual void MoveToHost(void)#

Move all data (i.e. move the matrix) to the host.

virtual void MoveToHostAsync(void)#

Move all data (i.e. move the matrix) to the host asynchronously.

virtual void Sync(void)#

Synchronize the matrix.

void CopyFrom(const LocalMatrix<ValueType> &src)#

Copy matrix from another LocalMatrix.

CopyFrom copies values and structure from another local matrix. Source and destination matrix should be in the same format.

Example
LocalMatrix<ValueType> mat1, mat2;

// Allocate and initialize mat1 and mat2
// ...

// Move mat1 to accelerator
// mat1.MoveToAccelerator();

// Now, mat1 is on the accelerator (if available)
// and mat2 is on the host

// Copy mat1 to mat2 (or vice versa) will move data between host and
// accelerator backend
mat1.CopyFrom(mat2);

Note

This function allows cross platform copying. One of the objects could be allocated on the accelerator backend.

Parameters:

src[in] Local matrix where values and structure should be copied from.

void CopyFromAsync(const LocalMatrix<ValueType> &src)#

Async copy matrix (values and structure) from another LocalMatrix.

void CloneFrom(const LocalMatrix<ValueType> &src)#

Clone the matrix.

CloneFrom clones the entire matrix, including values, structure and backend descriptor from another LocalMatrix.

Example
LocalMatrix<ValueType> mat;

// Allocate and initialize mat (host or accelerator)
// ...

LocalMatrix<ValueType> tmp;

// By cloning mat, tmp will have identical values and structure and will be on
// the same backend as mat
tmp.CloneFrom(mat);

Parameters:

src[in] LocalMatrix to clone from.

void UpdateValuesCSR(ValueType *val)#

Update CSR matrix entries only, structure will remain the same.

void CopyFromCSR(const PtrType *row_offsets, const int *col, const ValueType *val)#

Copy (import) CSR matrix described in three arrays (offsets, columns, values). The object data has to be allocated (call AllocateCSR first)

void CopyToCSR(PtrType *row_offsets, int *col, ValueType *val) const#

Copy (export) CSR matrix described in three arrays (offsets, columns, values). The output arrays have to be allocated.

void CopyFromCOO(const int *row, const int *col, const ValueType *val)#

Copy (import) COO matrix described in three arrays (rows, columns, values). The object data has to be allocated (call AllocateCOO first)

void CopyToCOO(int *row, int *col, ValueType *val) const#

Copy (export) COO matrix described in three arrays (rows, columns, values). The output arrays have to be allocated.

void CopyFromHostCSR(const PtrType *row_offset, const int *col, const ValueType *val, const std::string &name, int64_t nnz, int64_t nrow, int64_t ncol)#

Allocates and copies (imports) a host CSR matrix.

If the CSR matrix data pointers are only accessible as constant, the user can create a LocalMatrix object and pass const CSR host pointers. The LocalMatrix will then be allocated and the data will be copied to the corresponding backend, where the original object was located at.

Parameters:
  • row_offset[in] CSR matrix row offset pointers.

  • col[in] CSR matrix column indices.

  • val[in] CSR matrix values array.

  • name[in] Matrix object name.

  • nnz[in] Number of non-zero elements.

  • nrow[in] Number of rows.

  • ncol[in] Number of columns.

void CreateFromMap(const LocalVector<int> &map, int64_t n, int64_t m)#

Create a restriction matrix operator based on an int vector map.

void CreateFromMap(const LocalVector<int> &map, int64_t n, int64_t m, LocalMatrix<ValueType> *pro)#

Create a restriction and prolongation matrix operator based on an int vector map.

void ConvertToCSR(void)#

Convert the matrix to CSR structure.

void ConvertToMCSR(void)#

Convert the matrix to MCSR structure.

void ConvertToBCSR(int blockdim)#

Convert the matrix to BCSR structure.

void ConvertToCOO(void)#

Convert the matrix to COO structure.

void ConvertToELL(void)#

Convert the matrix to ELL structure.

void ConvertToDIA(void)#

Convert the matrix to DIA structure.

void ConvertToHYB(void)#

Convert the matrix to HYB structure.

void ConvertToDENSE(void)#

Convert the matrix to DENSE structure.

void ConvertTo(unsigned int matrix_format, int blockdim = 1)#

Convert the matrix to specified matrix ID format.

virtual void Apply(const LocalVector<ValueType> &in, LocalVector<ValueType> *out) const#

Perform matrix-vector multiplication, out = this * in;.

Example
// rocALUTION structures
LocalMatrix<T> A;
LocalVector<T> x;
LocalVector<T> y;

// Allocate matrices and vectors
A.AllocateCSR("my CSR matrix", 456, 100, 100);
x.Allocate("x", A.GetN());
y.Allocate("y", A.GetM());

// Fill data in A matrix and x vector

A.Apply(x, &y);

virtual void ApplyAdd(const LocalVector<ValueType> &in, ValueType scalar, LocalVector<ValueType> *out) const#

Perform matrix-vector multiplication, out = scalar * this * in;.

Example
// rocALUTION structures
LocalMatrix<T> A;
LocalVector<T> x;
LocalVector<T> y;

// Allocate matrices and vectors
A.AllocateCSR("my CSR matrix", 456, 100, 100);
x.Allocate("x", A.GetN());
y.Allocate("y", A.GetM());

// Fill data in A matrix and x vector

T scalar = 2.0;
A.Apply(x, scalar, &y);

void SymbolicPower(int p)#

Perform symbolic computation (structure only) of \(|this|^p\).

void MatrixAdd(const LocalMatrix<ValueType> &mat, ValueType alpha = static_cast<ValueType>(1), ValueType beta = static_cast<ValueType>(1), bool structure = false)#

Perform matrix addition, this = alpha*this + beta*mat;.

  • if structure==false the sparsity pattern of the matrix is not changed;

  • if structure==true a new sparsity pattern is computed

void MatrixMult(const LocalMatrix<ValueType> &A, const LocalMatrix<ValueType> &B)#

Multiply two matrices, this = A * B.

void DiagonalMatrixMult(const LocalVector<ValueType> &diag)#

Multiply the matrix with diagonal matrix (stored in LocalVector), as DiagonalMatrixMultR()

void DiagonalMatrixMultL(const LocalVector<ValueType> &diag)#

Multiply the matrix with diagonal matrix (stored in LocalVector), this=diag*this.

void DiagonalMatrixMultR(const LocalVector<ValueType> &diag)#

Multiply the matrix with diagonal matrix (stored in LocalVector), this=this*diag.

void TripleMatrixProduct(const LocalMatrix<ValueType> &R, const LocalMatrix<ValueType> &A, const LocalMatrix<ValueType> &P)#

Triple matrix product C=RAP.

void Gershgorin(ValueType &lambda_min, ValueType &lambda_max) const#

Compute the spectrum approximation with Gershgorin circles theorem.

void Compress(double drop_off)#

Delete all entries in the matrix which abs(a_ij) <= drop_off; the diagonal elements are never deleted.

virtual void Transpose(void)#

Transpose the matrix.

void Transpose(LocalMatrix<ValueType> *T) const#

Transpose the matrix.

void Sort(void)#

Sort the matrix indices.

Sorts the matrix by indices.

  • For CSR matrices, column values are sorted.

  • For COO matrices, row indices are sorted.

void Key(long int &row_key, long int &col_key, long int &val_key) const#

Compute a unique hash key for the matrix arrays.

Typically, it is hard to compare if two matrices have the same structure (and values). To do so, rocALUTION provides a keying function, that generates three keys, for the row index, column index and values array.

Parameters:
  • row_key[out] row index array key

  • col_key[out] column index array key

  • val_key[out] values array key

void ReplaceColumnVector(int idx, const LocalVector<ValueType> &vec)#

Replace a column vector of a matrix.

void ReplaceRowVector(int idx, const LocalVector<ValueType> &vec)#

Replace a row vector of a matrix.

void ExtractColumnVector(int idx, LocalVector<ValueType> *vec) const#

Extract values from a column of a matrix to a vector.

void ExtractRowVector(int idx, LocalVector<ValueType> *vec) const#

Extract values from a row of a matrix to a vector.

void AMGConnect(ValueType eps, LocalVector<int> *connections) const#

Strong couplings for aggregation-based AMG.

void AMGAggregate(const LocalVector<int> &connections, LocalVector<int> *aggregates) const#

Plain aggregation - Modification of a greedy aggregation scheme from Vanek (1996)

void AMGPMISAggregate(const LocalVector<int> &connections, LocalVector<int> *aggregates) const#

Parallel aggregation - Parallel maximal independent set aggregation scheme from Bell, Dalton, & Olsen (2012)

void AMGSmoothedAggregation(ValueType relax, const LocalVector<int> &aggregates, const LocalVector<int> &connections, LocalMatrix<ValueType> *prolong, int lumping_strat = 0) const#

Interpolation scheme based on smoothed aggregation from Vanek (1996)

void AMGAggregation(const LocalVector<int> &aggregates, LocalMatrix<ValueType> *prolong) const#

Aggregation-based interpolation scheme.

void RSCoarsening(float eps, LocalVector<int> *CFmap, LocalVector<bool> *S) const#

Ruge Stueben coarsening.

void RSPMISCoarsening(float eps, LocalVector<int> *CFmap, LocalVector<bool> *S) const#

Parallel maximal independent set coarsening for RS AMG.

void RSDirectInterpolation(const LocalVector<int> &CFmap, const LocalVector<bool> &S, LocalMatrix<ValueType> *prolong) const#

Ruge Stueben Direct Interpolation.

void RSExtPIInterpolation(const LocalVector<int> &CFmap, const LocalVector<bool> &S, bool FF1, LocalMatrix<ValueType> *prolong) const#

Ruge Stueben Ext+i Interpolation.

void FSAI(int power, const LocalMatrix<ValueType> *pattern)#

Factorized Sparse Approximate Inverse assembly for given system matrix power pattern or external sparsity pattern.

void SPAI(void)#

SParse Approximate Inverse assembly for given system matrix pattern.

void InitialPairwiseAggregation(ValueType beta, int &nc, LocalVector<int> *G, int &Gsize, int **rG, int &rGsize, int ordering) const#

Initial Pairwise Aggregation scheme.

void InitialPairwiseAggregation(const LocalMatrix<ValueType> &mat, ValueType beta, int &nc, LocalVector<int> *G, int &Gsize, int **rG, int &rGsize, int ordering) const#

Initial Pairwise Aggregation scheme for split matrices.

void FurtherPairwiseAggregation(ValueType beta, int &nc, LocalVector<int> *G, int &Gsize, int **rG, int &rGsize, int ordering) const#

Further Pairwise Aggregation scheme.

void FurtherPairwiseAggregation(const LocalMatrix<ValueType> &mat, ValueType beta, int &nc, LocalVector<int> *G, int &Gsize, int **rG, int &rGsize, int ordering) const#

Further Pairwise Aggregation scheme for split matrices.

void CoarsenOperator(LocalMatrix<ValueType> *Ac, int nrow, int ncol, const LocalVector<int> &G, int Gsize, const int *rG, int rGsize) const#

Build coarse operator for pairwise aggregation scheme.

Local Stencil#

template<typename ValueType>
class LocalStencil : public rocalution::Operator<ValueType>#

LocalStencil class.

A LocalStencil is called local, because it will always stay on a single system. The system can contain several CPUs via UMA or NUMA memory system or it can contain an accelerator.

Template Parameters:

ValueType – - can be int, float, double, std::complex<float> and std::complex<double>

Public Functions

LocalStencil(unsigned int type)#

Initialize a local stencil with a type.

virtual void Info() const#

Shows simple info about the stencil.

int64_t GetNDim(void) const#

Return the dimension of the stencil.

virtual int64_t GetM(void) const#

Return the number of rows in the local stencil.

virtual int64_t GetN(void) const#

Return the number of columns in the local stencil.

virtual int64_t GetNnz(void) const#

Return the number of non-zeros in the local stencil.

void SetGrid(int size)#

Set the stencil grid size.

virtual void Clear()#

Clear (free) the stencil.

virtual void Apply(const LocalVector<ValueType> &in, LocalVector<ValueType> *out) const#

Perform stencil-vector multiplication, out = this * in;.

virtual void ApplyAdd(const LocalVector<ValueType> &in, ValueType scalar, LocalVector<ValueType> *out) const#

Perform stencil-vector multiplication, out = scalar * this * in;.

virtual void MoveToAccelerator(void)#

Move all data (i.e. move the stencil) to the accelerator.

virtual void MoveToHost(void)#

Move all data (i.e. move the stencil) to the host.

Global Matrix#

template<typename ValueType>
class GlobalMatrix : public rocalution::Operator<ValueType>#

GlobalMatrix class.

A GlobalMatrix is called global, because it can stay on a single or on multiple nodes in a network. For this type of communication, MPI is used.

A number of matrix formats are supported. These are CSR, BCSR, MCSR, COO, DIA, ELL, HYB, and DENSE.

Note

For CSR type matrices, the column indices must be sorted in increasing order. For COO matrices, the row indices must be sorted in increasing order. The function Check can be used to check whether a matrix contains valid data. For CSR and COO matrices, the function Sort can be used to sort the row or column indices respectively.

Template Parameters:

ValueType – - can be int, float, double, std::complex<float> and std::complex<double>

Public Functions

explicit GlobalMatrix(const ParallelManager &pm)#

Initialize a global matrix with a parallel manager.

virtual int64_t GetM(void) const#

Return the number of rows in the global matrix.

virtual int64_t GetN(void) const#

Return the number of columns in the global matrix.

virtual int64_t GetNnz(void) const#

Return the number of non-zeros in the global matrix.

virtual int64_t GetLocalM(void) const#

Return the number of rows in the interior matrix.

virtual int64_t GetLocalN(void) const#

Return the number of columns in the interior matrix.

virtual int64_t GetLocalNnz(void) const#

Return the number of non-zeros in the interior matrix.

virtual int64_t GetGhostM(void) const#

Return the number of rows in the ghost matrix.

virtual int64_t GetGhostN(void) const#

Return the number of columns in the ghost matrix.

virtual int64_t GetGhostNnz(void) const#

Return the number of non-zeros in the ghost matrix.

unsigned int GetFormat(void) const#

Return the global matrix format id (see matrix_formats.hpp)

virtual void MoveToAccelerator(void)#

Move all data (i.e. move the part of the global matrix stored on this rank) to the accelerator.

virtual void MoveToHost(void)#

Move all data (i.e. move the part of the global matrix stored on this rank) to the host.

virtual void Info(void) const#

Shows simple info about the matrix.

virtual bool Check(void) const#

Perform a sanity check of the matrix.

Checks, if the matrix contains valid data, i.e. if the values are not infinity and not NaN (not a number) and if the structure of the matrix is correct (e.g. indices cannot be negative, CSR and COO matrices have to be sorted, etc.).

Return values:
  • true – if the matrix is ok (empty matrix is also ok).

  • false – if there is something wrong with the structure or values.

void AllocateCSR(const std::string &name, int64_t local_nnz, int64_t ghost_nnz)#

Allocate CSR Matrix.

void AllocateCOO(const std::string &name, int64_t local_nnz, int64_t ghost_nnz)#

Allocate COO Matrix.

virtual void Clear(void)#

Clear (free) the matrix.

void SetParallelManager(const ParallelManager &pm)#

Set the parallel manager of a global matrix.

void SetDataPtrCSR(PtrType **local_row_offset, int **local_col, ValueType **local_val, PtrType **ghost_row_offset, int **ghost_col, ValueType **ghost_val, std::string name, int64_t local_nnz, int64_t ghost_nnz)#

Initialize a CSR matrix on the host with externally allocated data.

void SetDataPtrCOO(int **local_row, int **local_col, ValueType **local_val, int **ghost_row, int **ghost_col, ValueType **ghost_val, std::string name, int64_t local_nnz, int64_t ghost_nnz)#

Initialize a COO matrix on the host with externally allocated data.

void SetLocalDataPtrCSR(PtrType **row_offset, int **col, ValueType **val, std::string name, int64_t nnz)#

Initialize a CSR matrix on the host with externally allocated local data.

void SetLocalDataPtrCOO(int **row, int **col, ValueType **val, std::string name, int64_t nnz)#

Initialize a COO matrix on the host with externally allocated local data.

void SetGhostDataPtrCSR(PtrType **row_offset, int **col, ValueType **val, std::string name, int64_t nnz)#

Initialize a CSR matrix on the host with externally allocated ghost data.

void SetGhostDataPtrCOO(int **row, int **col, ValueType **val, std::string name, int64_t nnz)#

Initialize a COO matrix on the host with externally allocated ghost data.

void LeaveDataPtrCSR(PtrType **local_row_offset, int **local_col, ValueType **local_val, PtrType **ghost_row_offset, int **ghost_col, ValueType **ghost_val)#

Leave a CSR matrix to host pointers.

void LeaveDataPtrCOO(int **local_row, int **local_col, ValueType **local_val, int **ghost_row, int **ghost_col, ValueType **ghost_val)#

Leave a COO matrix to host pointers.

void LeaveLocalDataPtrCSR(PtrType **row_offset, int **col, ValueType **val)#

Leave a local CSR matrix to host pointers.

void LeaveLocalDataPtrCOO(int **row, int **col, ValueType **val)#

Leave a local COO matrix to host pointers.

void LeaveGhostDataPtrCSR(PtrType **row_offset, int **col, ValueType **val)#

Leave a CSR ghost matrix to host pointers.

void LeaveGhostDataPtrCOO(int **row, int **col, ValueType **val)#

Leave a COO ghost matrix to host pointers.

void CloneFrom(const GlobalMatrix<ValueType> &src)#

Clone the entire matrix (values,structure+backend descr) from another GlobalMatrix.

void CopyFrom(const GlobalMatrix<ValueType> &src)#

Copy matrix (values and structure) from another GlobalMatrix.

void ConvertToCSR(void)#

Convert the matrix to CSR structure.

void ConvertToMCSR(void)#

Convert the matrix to MCSR structure.

void ConvertToBCSR(int blockdim)#

Convert the matrix to BCSR structure.

void ConvertToCOO(void)#

Convert the matrix to COO structure.

void ConvertToELL(void)#

Convert the matrix to ELL structure.

void ConvertToDIA(void)#

Convert the matrix to DIA structure.

void ConvertToHYB(void)#

Convert the matrix to HYB structure.

void ConvertToDENSE(void)#

Convert the matrix to DENSE structure.

void ConvertTo(unsigned int matrix_format, int blockdim = 1)#

Convert the matrix to specified matrix ID format.

virtual void Apply(const GlobalVector<ValueType> &in, GlobalVector<ValueType> *out) const#

Perform matrix-vector multiplication, out = this * in;.

virtual void ApplyAdd(const GlobalVector<ValueType> &in, ValueType scalar, GlobalVector<ValueType> *out) const#

Perform matrix-vector multiplication, out = scalar * this * in;.

virtual void Transpose(void)#

Transpose the matrix.

void Transpose(GlobalMatrix<ValueType> *T) const#

Transpose the matrix.

void TripleMatrixProduct(const GlobalMatrix<ValueType> &R, const GlobalMatrix<ValueType> &A, const GlobalMatrix<ValueType> &P)#

Triple matrix product C=RAP.

void ReadFileMTX(const std::string &filename)#

Read matrix from MTX (Matrix Market Format) file.

void WriteFileMTX(const std::string &filename) const#

Write matrix to MTX (Matrix Market Format) file.

void ReadFileCSR(const std::string &filename)#

Read matrix from CSR (ROCALUTION binary format) file.

void WriteFileCSR(const std::string &filename) const#

Write matrix to CSR (ROCALUTION binary format) file.

void Sort(void)#

Sort the matrix indices.

Sorts the matrix by indices.

  • For CSR matrices, column values are sorted.

  • For COO matrices, row indices are sorted.

void ExtractInverseDiagonal(GlobalVector<ValueType> *vec_inv_diag) const#

Extract the inverse (reciprocal) diagonal values of the matrix into a GlobalVector.

void Scale(ValueType alpha)#

Scale all the values in the matrix.

void InitialPairwiseAggregation(ValueType beta, int &nc, LocalVector<int> *G, int &Gsize, int **rG, int &rGsize, int ordering) const#

Initial Pairwise Aggregation scheme.

void FurtherPairwiseAggregation(ValueType beta, int &nc, LocalVector<int> *G, int &Gsize, int **rG, int &rGsize, int ordering) const#

Further Pairwise Aggregation scheme.

void CoarsenOperator(GlobalMatrix<ValueType> *Ac, int nrow, int ncol, const LocalVector<int> &G, int Gsize, const int *rG, int rGsize) const#

Build coarse operator for pairwise aggregation scheme.

void CreateFromMap(const LocalVector<int> &map, int64_t n, int64_t m, GlobalMatrix<ValueType> *pro)#

Create a restriction and prolongation matrix operator based on an int vector map.

void RSCoarsening(float eps, LocalVector<int> *CFmap, LocalVector<bool> *S) const#

Ruge Stueben coarsening.

void RSPMISCoarsening(float eps, LocalVector<int> *CFmap, LocalVector<bool> *S) const#

Parallel maximal independent set coarsening for RS AMG.

void RSDirectInterpolation(const LocalVector<int> &CFmap, const LocalVector<bool> &S, GlobalMatrix<ValueType> *prolong) const#

Ruge Stueben Direct Interpolation.

void RSExtPIInterpolation(const LocalVector<int> &CFmap, const LocalVector<bool> &S, bool FF1, GlobalMatrix<ValueType> *prolong) const#

Ruge Stueben Ext+i Interpolation.

Local Vector#

template<typename ValueType>
class LocalVector : public rocalution::Vector<ValueType>#

LocalVector class.

A LocalVector is called local, because it will always stay on a single system. The system can contain several CPUs via UMA or NUMA memory system or it can contain an accelerator.

Template Parameters:

ValueType – - can be int, float, double, std::complex<float> and std::complex<double>

Unnamed Group

ValueType &operator[](int64_t i)#

Access operator (only for host data)

The elements in the vector can be accessed via [] operators, when the vector is allocated on the host.

Example
// rocALUTION local vector object
LocalVector<ValueType> vec;

// Allocate vector
vec.Allocate("my_vector", 100);

// Initialize vector with 1
vec.Ones();

// Set even elements to -1
for(int64_t i = 0; i < vec.GetSize(); i += 2)
{
  vec[i] = -1;
}

Parameters:

i[in] access data at index i

Returns:

value at index i

const ValueType &operator[](int64_t i) const#

Access operator (only for host data)

The elements in the vector can be accessed via [] operators, when the vector is allocated on the host.

Example
// rocALUTION local vector object
LocalVector<ValueType> vec;

// Allocate vector
vec.Allocate("my_vector", 100);

// Initialize vector with 1
vec.Ones();

// Set even elements to -1
for(int64_t i = 0; i < vec.GetSize(); i += 2)
{
  vec[i] = -1;
}

Parameters:

i[in] access data at index i

Returns:

value at index i

Unnamed Group

virtual void ScaleAddScale(ValueType alpha, const LocalVector<ValueType> &x, ValueType beta)#

Perform scalar-vector multiplication and add another scaled vector (i.e. axpby), this = alpha * this + beta * x;.

Example
// rocALUTION structures
LocalVector<T> x;
LocalVector<T> y;

// Allocate vectors
x.Allocate("x", 100);
y.Allocate("y", 100);

x.Ones();
y.Ones();

T alpha = 2.0;
T beta = -1.0;
y.ScaleAddScale(alpha, x, beta);

virtual void ScaleAddScale(ValueType alpha, const LocalVector<ValueType> &x, ValueType beta, int64_t src_offset, int64_t dst_offset, int64_t size)#

Perform scalar-vector multiplication and add another scaled vector (i.e. axpby), this = alpha * this + beta * x;.

Example
// rocALUTION structures
LocalVector<T> x;
LocalVector<T> y;

// Allocate vectors
x.Allocate("x", 100);
y.Allocate("y", 100);

x.Ones();
y.Ones();

T alpha = 2.0;
T beta = -1.0;
y.ScaleAddScale(alpha, x, beta);

virtual void ScaleAdd2(ValueType alpha, const LocalVector<ValueType> &x, ValueType beta, const LocalVector<ValueType> &y, ValueType gamma)#

Perform vector update of type this = alpha*this + x*beta + y*gamma.

Unnamed Group

virtual ValueType InclusiveSum(void)#

Compute inclsuive sum of vector.

// rocALUTION structures
LocalVector<T> y;

// Allocate vectors
y.Allocate("y", 100);

y.Ones();

T sum = y.InclusiveSum();
Example

Given starting vector: this = [1, 1, 1, 1] After performing inclusive sum out vector will be: this = [1, 2, 3, 4] The function returns 4.

virtual ValueType InclusiveSum(const LocalVector<ValueType> &vec)#

Compute inclsuive sum of vector.

// rocALUTION structures
LocalVector<T> y;

// Allocate vectors
y.Allocate("y", 100);

y.Ones();

T sum = y.InclusiveSum();
Example

Given starting vector: this = [1, 1, 1, 1] After performing inclusive sum out vector will be: this = [1, 2, 3, 4] The function returns 4.

Unnamed Group

virtual ValueType ExclusiveSum(void)#

Compute exclusive sum of vector.

// rocALUTION structures
LocalVector<T> y;

// Allocate vectors
y.Allocate("y", 100);

y.Ones();

T sum = y.ExclusiveSum();
Example

Given starting vector: this = [1, 1, 1, 1] After performing exclusive sum out vector will be: this = [0, 1, 2, 3] The function returns 3.

virtual ValueType ExclusiveSum(const LocalVector<ValueType> &vec)#

Compute exclusive sum of vector.

// rocALUTION structures
LocalVector<T> y;

// Allocate vectors
y.Allocate("y", 100);

y.Ones();

T sum = y.ExclusiveSum();
Example

Given starting vector: this = [1, 1, 1, 1] After performing exclusive sum out vector will be: this = [0, 1, 2, 3] The function returns 3.

Unnamed Group

virtual void PointWiseMult(const LocalVector<ValueType> &x)#

Perform pointwise multiplication of vector.

Perform pointwise multiplication of vector components with the vector components of x, this = this * x. Alternatively, one can also perform pointwise multiplication of vector components of x with vector components of y and set that to the current ‘this’ vector, this = x * y.

Example
// rocALUTION structures
LocalVector<T> x;
LocalVector<T> y;

// Allocate vectors
y.Allocate("x", 100);
y.Allocate("y", 100);

x.Ones();
y.Ones();

y.PointWiseMult(x);

virtual void PointWiseMult(const LocalVector<ValueType> &x, const LocalVector<ValueType> &y)#

Perform pointwise multiplication of vector.

Perform pointwise multiplication of vector components with the vector components of x, this = this * x. Alternatively, one can also perform pointwise multiplication of vector components of x with vector components of y and set that to the current ‘this’ vector, this = x * y.

Example
// rocALUTION structures
LocalVector<T> x;
LocalVector<T> y;

// Allocate vectors
y.Allocate("x", 100);
y.Allocate("y", 100);

x.Ones();
y.Ones();

y.PointWiseMult(x);

Public Functions

virtual void MoveToAccelerator(void)#

Move all data (i.e. move the vector) to the accelerator.

virtual void MoveToAcceleratorAsync(void)#

Move all data (i.e. move the vector) to the accelerator asynchronously.

virtual void MoveToHost(void)#

Move all data (i.e. move the vector) to the host.

virtual void MoveToHostAsync(void)#

Move all data (i.e. move the vector) to the host asynchronously.

virtual void Sync(void)#

Synchronize the vector.

virtual void Info(void) const#

Shows simple info about the vector.

virtual int64_t GetSize(void) const#

Return the size of the vector.

virtual bool Check(void) const#

Perform a sanity check of the vector.

Checks, if the vector contains valid data, i.e. if the values are not infinity and not NaN (not a number).

Return values:
  • true – if the vector is ok (empty vector is also ok).

  • false – if there is something wrong with the values.

void Allocate(std::string name, int64_t size)#

Allocate a local vector with name and size.

The local vector allocation function requires a name of the object (this is only for information purposes) and corresponding size description for vector objects.

Example
LocalVector<ValueType> vec;

vec.Allocate("my vector", 100);
vec.Clear();

Parameters:
  • name[in] object name

  • size[in] number of elements in the vector

void SetDataPtr(ValueType **ptr, std::string name, int64_t size)#

Initialize a LocalVector on the host with externally allocated data.

SetDataPtr has direct access to the raw data via pointers. Already allocated data can be set by passing the pointer.

Example
// Allocate vector
ValueType* ptr_vec = new ValueType[200];

// Fill vector
// ...

// rocALUTION local vector object
LocalVector<ValueType> vec;

// Set the vector data, ptr_vec will become invalid
vec.SetDataPtr(&ptr_vec, "my_vector", 200);

Note

Setting data pointer will leave the original pointer empty (set to NULL).

void LeaveDataPtr(ValueType **ptr)#

Leave a LocalVector to host pointers.

LeaveDataPtr has direct access to the raw data via pointers. A LocalVector object can leave its raw data to a host pointer. This will leave the LocalVector empty.

Example
// rocALUTION local vector object
LocalVector<ValueType> vec;

// Allocate the vector
vec.Allocate("my_vector", 100);

// Fill vector
// ...

ValueType* ptr_vec = NULL;

// Get (steal) the data from the vector, this will leave the local vector object empty
vec.LeaveDataPtr(&ptr_vec);

virtual void Clear()#

Clear (free) the vector.

virtual void Zeros()#

Set the values of the vector to zero.

virtual void Ones()#

Set the values of the vector to one.

virtual void SetValues(ValueType val)#

Set the values of the vector to given argument.

virtual void SetRandomUniform(unsigned long long seed, ValueType a = static_cast<ValueType>(-1), ValueType b = static_cast<ValueType>(1))#

Set the values of the vector to random uniformly distributed values (between -1 and 1)

virtual void SetRandomNormal(unsigned long long seed, ValueType mean = static_cast<ValueType>(0), ValueType var = static_cast<ValueType>(1))#

Set the values of the vector to random normally distributed values (between 0 and 1)

virtual void ReadFileASCII(const std::string &filename)#

Read LocalVector from ASCII file.

virtual void WriteFileASCII(const std::string &filename) const#

Write LocalVector to ASCII file.

virtual void ReadFileBinary(const std::string &filename)#

Read LocalVector from binary file.

virtual void WriteFileBinary(const std::string &filename) const#

Write LocalVector to binary file.

virtual void CopyFrom(const LocalVector<ValueType> &src)#

Clone the entire vector (values,structure+backend descr) from another LocalVector.

virtual void CopyFromAsync(const LocalVector<ValueType> &src)#

Async copy from another local vector.

virtual void CopyFromFloat(const LocalVector<float> &src)#

Copy values from another local float vector.

virtual void CopyFromDouble(const LocalVector<double> &src)#

Copy values from another local double vector.

virtual void CopyFrom(const LocalVector<ValueType> &src, int64_t src_offset, int64_t dst_offset, int64_t size)#

Copy from another vector.

void CopyFromPermute(const LocalVector<ValueType> &src, const LocalVector<int> &permutation)#

Copy a vector under permutation (forward permutation)

void CopyFromPermuteBackward(const LocalVector<ValueType> &src, const LocalVector<int> &permutation)#

Copy a vector under permutation (backward permutation)

virtual void CloneFrom(const LocalVector<ValueType> &src)#

Clone from another vector.

void CopyFromData(const ValueType *data)#

Copy (import) vector.

Copy (import) vector data that is described in one array (values). The object data has to be allocated with Allocate(), using the corresponding size of the data, first.

Parameters:

data[in] data to be imported.

void CopyFromHostData(const ValueType *data)#

Copy (import) vector from host data.

Copy (import) vector data that is described in one host array (values). The object data has to be allocated with Allocate(), using the corresponding size of the data, first.

Parameters:

data[in] data to be imported from host.

void CopyToData(ValueType *data) const#

Copy (export) vector.

Copy (export) vector data that is described in one array (values). The output array has to be allocated, using the corresponding size of the data, first. Size can be obtain by GetSize().

Parameters:

data[out] exported data.

void CopyToHostData(ValueType *data) const#

Copy (export) vector to host data.

Copy (export) vector data that is described in one array (values). The output array has to be allocated on the host, using the corresponding size of the data, first. Size can be obtain by GetSize().

Parameters:

data[out] exported data on host.

void Permute(const LocalVector<int> &permutation)#

Perform in-place permutation (forward) of the vector.

void PermuteBackward(const LocalVector<int> &permutation)#

Perform in-place permutation (backward) of the vector.

void Restriction(const LocalVector<ValueType> &vec_fine, const LocalVector<int> &map)#

Restriction operator based on restriction mapping vector.

void Prolongation(const LocalVector<ValueType> &vec_coarse, const LocalVector<int> &map)#

Prolongation operator based on restriction mapping vector.

virtual void AddScale(const LocalVector<ValueType> &x, ValueType alpha)#

Perform scalar-vector multiplication and add it to another vector, this = this + alpha * x;.

Example
// rocALUTION structures
LocalVector<T> x;
LocalVector<T> y;

// Allocate vectors
x.Allocate("x", 100);
y.Allocate("y", 100);

x.Ones();
y.Ones();

T alpha = 2.0;
y.AddScale(x, alpha);

virtual void ScaleAdd(ValueType alpha, const LocalVector<ValueType> &x)#

Perform scalar-vector multiplication and add another vector, this = alpha * this + x;.

Example
// rocALUTION structures
LocalVector<T> x;
LocalVector<T> y;

// Allocate vectors
x.Allocate("x", 100);
y.Allocate("y", 100);

x.Ones();
y.Ones();

T alpha = 2.0;
y.ScaleAdd(alpha, x);

virtual void Scale(ValueType alpha)#

Scale vector, this = alpha * this;.

Example
// rocALUTION structures
LocalVector<T> y;

// Allocate vectors
y.Allocate("y", 100);

y.Ones();

T alpha = 2.0;
y.Scale(alpha);

virtual ValueType Dot(const LocalVector<ValueType> &x) const#

Perform dot product.

Perform dot product of ‘this’ vector and the vector x. In the case of complex types, this performs conjugate dot product.

Example
// rocALUTION structures
LocalVector<T> x;
LocalVector<T> y;

// Allocate vectors
y.Allocate("x", 100);
y.Allocate("y", 100);

x.Ones();
y.Ones();

y.Dot(x);

virtual ValueType DotNonConj(const LocalVector<ValueType> &x) const#

Perform dot product.

Example
// rocALUTION structures
LocalVector<T> x;
LocalVector<T> y;

// Allocate vectors
y.Allocate("x", 100);
y.Allocate("y", 100);

x.Ones();
y.Ones();

y.Dot(x);

virtual ValueType Norm(void) const#

Compute L2 (Euclidean) norm of vector.

Example
// rocALUTION structures
LocalVector<T> y;

// Allocate vectors
y.Allocate("y", 100);

y.Ones();

T norm2 = y.Norm();

virtual ValueType Reduce(void) const#

Reduce (sum) the vector components.

Example
// rocALUTION structures
LocalVector<T> y;

// Allocate vectors
y.Allocate("y", 100);

y.Ones();

T sum = y.Reduce();

virtual ValueType Asum(void) const#

Compute absolute value sum of vector components.

Example
// rocALUTION structures
LocalVector<T> y;

// Allocate vectors
y.Allocate("y", 100);

y.Ones();

y.Scale(-1.0);
T sum = y.Asum();

virtual int64_t Amax(ValueType &value) const#

Compute maximum absolute value component of vector.

Example
// rocALUTION structures
LocalVector<T> y;

// Allocate vectors
y.Allocate("y", 100);

y.Ones();

T max = y.Amax();

virtual void Power(double power)#

Take the power of each vector component.

void GetIndexValues(const LocalVector<int> &index, LocalVector<ValueType> *values) const#

Get indexed values.

void SetIndexValues(const LocalVector<int> &index, const LocalVector<ValueType> &values)#

Set indexed values.

void AddIndexValues(const LocalVector<int> &index, const LocalVector<ValueType> &values)#

Add indexed values.

void GetContinuousValues(int64_t start, int64_t end, ValueType *values) const#

Get continuous indexed values.

void SetContinuousValues(int64_t start, int64_t end, const ValueType *values)#

Set continuous indexed values.

void ExtractCoarseMapping(int64_t start, int64_t end, const int *index, int nc, int *size, int *map) const#

Extract coarse boundary mapping.

void ExtractCoarseBoundary(int64_t start, int64_t end, const int *index, int nc, int *size, int *boundary) const#

Extract coarse boundary index.

void Sort(LocalVector<ValueType> *sorted, LocalVector<int> *perm = NULL) const#

Out-of-place radix sort that can also obtain the permutation.

Global Vector#

template<typename ValueType>
class GlobalVector : public rocalution::Vector<ValueType>#

GlobalVector class.

A GlobalVector is called global, because it can stay on a single or on multiple nodes in a network. For this type of communication, MPI is used.

Template Parameters:

ValueType – - can be int, float, double, std::complex<float> and std::complex<double>

Public Functions

explicit GlobalVector(const ParallelManager &pm)#

Initialize a global vector with a parallel manager.

virtual void MoveToAccelerator(void)#

Move all data (i.e. move the part of the global vector stored on this rank) to the accelerator.

virtual void MoveToHost(void)#

Move all data (i.e. move the part of the global vector stored on this rank) to the host.

virtual void Info(void) const#

Shows simple info about the matrix.

virtual bool Check(void) const#

Perform a sanity check of the vector.

Checks, if the vector contains valid data, i.e. if the values are not infinity and not NaN (not a number).

Return values:
  • true – if the vector is ok (empty vector is also ok).

  • false – if there is something wrong with the values.

virtual int64_t GetSize(void) const#

Return the size of the global vector.

virtual int64_t GetLocalSize(void) const#

Return the size of the interior part of the global vector.

virtual void Allocate(std::string name, int64_t size)#

Allocate a global vector with name and size.

virtual void Clear(void)#

Clear (free) the vector.

void SetParallelManager(const ParallelManager &pm)#

Set the parallel manager of a global vector.

virtual void Zeros(void)#

Set all vector interior values to zero.

virtual void Ones(void)#

Set all vector interior values to ones.

virtual void SetValues(ValueType val)#

Set the values of the interior vector to given argument.

virtual void SetRandomUniform(unsigned long long seed, ValueType a = static_cast<ValueType>(-1), ValueType b = static_cast<ValueType>(1))#

Set the values of the interior vector to random uniformly distributed values (between -1 and 1)

virtual void SetRandomNormal(unsigned long long seed, ValueType mean = static_cast<ValueType>(0), ValueType var = static_cast<ValueType>(1))#

Set the values of the interior vector to random normally distributed values (between 0 and 1)

virtual void CloneFrom(const GlobalVector<ValueType> &src)#

Clone the entire vector (values,structure+backend descr) from another GlobalVector.

ValueType &operator[](int64_t i)#

Access operator (only for host data)

const ValueType &operator[](int64_t i) const#

Access operator (only for host data)

void SetDataPtr(ValueType **ptr, std::string name, int64_t size)#

Initialize the local part of a global vector with externally allocated data.

void LeaveDataPtr(ValueType **ptr)#

Get a pointer to the data from the local part of a global vector and free the global vector object.

virtual void CopyFrom(const GlobalVector<ValueType> &src)#

Copy vector (values and structure) from another GlobalVector.

virtual void ReadFileASCII(const std::string &filename)#

Read GlobalVector from ASCII file. This method reads the current ranks interior vector from the file.

virtual void WriteFileASCII(const std::string &filename) const#

Write GlobalVector to ASCII file. This method writes the current ranks interior vector to the file.

virtual void ReadFileBinary(const std::string &filename)#

Read GlobalVector from binary file. This method reads the current ranks interior vector from the file.

virtual void WriteFileBinary(const std::string &filename) const#

Write GlobalVector to binary file. This method writes the current ranks interior vector to the file.

virtual void AddScale(const GlobalVector<ValueType> &x, ValueType alpha)#

Perform scalar-vector multiplication and add it to another vector, this = this + alpha * x;.

virtual void ScaleAdd(ValueType alpha, const GlobalVector<ValueType> &x)#

Perform scalar-vector multiplication and add another vector, this = alpha * this + x;.

virtual void ScaleAdd2(ValueType alpha, const GlobalVector<ValueType> &x, ValueType beta, const GlobalVector<ValueType> &y, ValueType gamma)#

Perform vector update of type this = alpha*this + x*beta + y*gamma.

virtual void ScaleAddScale(ValueType alpha, const GlobalVector<ValueType> &x, ValueType beta)#

Perform scalar-vector multiplication and add another scaled vector (i.e. axpby), this = alpha * this + beta * x;.

virtual void Scale(ValueType alpha)#

Scale vector, this = alpha * this;.

virtual ValueType Dot(const GlobalVector<ValueType> &x) const#

Perform dot product.

virtual ValueType DotNonConj(const GlobalVector<ValueType> &x) const#

Perform non conjugate (when T is complex) dot product.

virtual ValueType Norm(void) const#

Compute L2 (Euclidean) norm of vector.

virtual ValueType Reduce(void) const#

Reduce (sum) the vector components.

virtual ValueType InclusiveSum(void)#

Compute inclsuive sum of vector.

virtual ValueType InclusiveSum(const GlobalVector<ValueType> &vec)#

Compute inclsuive sum of vector.

virtual ValueType ExclusiveSum(void)#

Compute exclsuive sum of vector.

virtual ValueType ExclusiveSum(const GlobalVector<ValueType> &vec)#

Compute exclsuive sum of vector.

virtual ValueType Asum(void) const#

Compute absolute value sum of vector components.

virtual int64_t Amax(ValueType &value) const#

Compute maximum absolute value component of vector.

virtual void PointWiseMult(const GlobalVector<ValueType> &x)#

Perform pointwise multiplication of vector.

virtual void PointWiseMult(const GlobalVector<ValueType> &x, const GlobalVector<ValueType> &y)#

Perform pointwise multiplication of vector.

virtual void Power(double power)#

Take the power of each vector component.

void Restriction(const GlobalVector<ValueType> &vec_fine, const LocalVector<int> &map)#

Restriction operator based on restriction mapping vector.

void Prolongation(const GlobalVector<ValueType> &vec_coarse, const LocalVector<int> &map)#

Prolongation operator based on restriction mapping vector.

Base Classes#

template<typename ValueType>
class BaseMatrix#
template<typename ValueType>
class BaseStencil#
template<typename ValueType>
class BaseVector#
template<typename ValueType>
class HostMatrix#
template<typename ValueType>
class HostStencil#
template<typename ValueType>
class HostVector#
template<typename ValueType>
class AcceleratorMatrix#
template<typename ValueType>
class AcceleratorStencil#
template<typename ValueType>
class AcceleratorVector#

Parallel Manager#

class ParallelManager : public rocalution::RocalutionObj#

Parallel Manager class.

The parallel manager class handles the communication and the mapping of the global operators. Each global operator and vector need to be initialized with a valid parallel manager in order to perform any operation. For many distributed simulations, the underlying operator is already distributed. This information need to be passed to the parallel manager.

Public Functions

void SetMPICommunicator(const void *comm)#

Set the MPI communicator.

void Clear(void)#

Clear all allocated resources.

inline const void *GetComm(void) const#

Return communicator.

inline int GetRank(void) const#

Return rank.

int64_t GetGlobalNrow(void) const#

Return the global number of rows.

int64_t GetGlobalNcol(void) const#

Return the global number of columns.

int64_t GetLocalNrow(void) const#

Return the local number of rows.

int64_t GetLocalNcol(void) const#

Return the local number of columns.

int GetNumReceivers(void) const#

Return the number of receivers.

int GetNumSenders(void) const#

Return the number of senders.

int GetNumProcs(void) const#

Return the number of involved processes.

int64_t GetGlobalRowBegin(int rank = -1) const#

Return the global row begin.

int64_t GetGlobalRowEnd(int rank = -1) const#

Return the global row end.

int64_t GetGlobalColumnBegin(int rank = -1) const#

Return the global column begin.

int64_t GetGlobalColumnEnd(int rank = -1) const#

Return the global column end.

void SetGlobalNrow(int64_t nrow)#

Initialize the global number of rows.

void SetGlobalNcol(int64_t ncol)#

Initialize the global number of columns.

void SetLocalNrow(int64_t nrow)#

Initialize the local number of rows.

void SetLocalNcol(int64_t ncol)#

Initialize the local number of columns.

void SetBoundaryIndex(int size, const int *index)#

Set all boundary indices of this ranks process.

const int *GetBoundaryIndex(void) const#

Get all boundary indices of this ranks process.

const int64_t *GetGhostToGlobalMap(void) const#

Get ghost to global mapping for this rank.

void SetReceivers(int nrecv, const int *recvs, const int *recv_offset)#

Number of processes, the current process is receiving data from, array of the processes, the current process is receiving data from and offsets, where the boundary for process ‘receiver’ starts.

void SetSenders(int nsend, const int *sends, const int *send_offset)#

Number of processes, the current process is sending data to, array of the processes, the current process is sending data to and offsets where the ghost part for process ‘sender’ starts.

void LocalToGlobal(int proc, int local, int &global)#

Mapping local to global.

void GlobalToLocal(int global, int &proc, int &local)#

Mapping global to local.

bool Status(void) const#

Check sanity status of parallel manager.

void ReadFileASCII(const std::string &filename)#

Read file that contains all relevant parallel manager data.

void WriteFileASCII(const std::string &filename) const#

Write file that contains all relevant parallel manager data.

Solvers#

template<class OperatorType, class VectorType, typename ValueType>
class Solver : public rocalution::RocalutionObj#

Base class for all solvers and preconditioners.

Most of the solvers can be performed on linear operators LocalMatrix, LocalStencil and GlobalMatrix - i.e. the solvers can be performed locally (on a shared memory system) or in a distributed manner (on a cluster) via MPI. The only exception is the AMG (Algebraic Multigrid) solver which has two versions (one for LocalMatrix and one for GlobalMatrix class). The only pure local solvers (which do not support global/MPI operations) are the mixed-precision defect-correction solver and all direct solvers.

All solvers need three template parameters - Operators, Vectors and Scalar type.

The Solver class is purely virtual and provides an interface for

  • SetOperator() to set the operator \(A\), i.e. the user can pass the matrix here.

  • Build() to build the solver (including preconditioners, sub-solvers, etc.). The user need to specify the operator first before calling Build().

  • Solve() to solve the system \(Ax = b\). The user need to pass a right-hand-side \(b\) and a vector \(x\), where the solution will be obtained.

  • Print() to show solver information.

  • ReBuildNumeric() to only re-build the solver numerically (if possible).

  • MoveToHost() and MoveToAccelerator() to offload the solver (including preconditioners and sub-solvers) to the host/accelerator.

Template Parameters:

Subclassed by rocalution::IterativeLinearSolver< OperatorTypeH, VectorTypeH, ValueTypeH >, rocalution::DirectLinearSolver< OperatorType, VectorType, ValueType >, rocalution::IterativeLinearSolver< OperatorType, VectorType, ValueType >, rocalution::Preconditioner< OperatorType, VectorType, ValueType >

Public Functions

void SetOperator(const OperatorType &op)#

Set the Operator of the solver.

virtual void ResetOperator(const OperatorType &op)#

Reset the operator; see ReBuildNumeric()

virtual void Print(void) const = 0#

Print information about the solver.

virtual void Solve(const VectorType &rhs, VectorType *x) = 0#

Solve Operator x = rhs.

virtual void SolveZeroSol(const VectorType &rhs, VectorType *x)#

Solve Operator x = rhs, setting initial x = 0.

virtual void Clear(void)#

Clear (free all local data) the solver.

virtual void Build(void)#

Build the solver (data allocation, structure and numerical computation)

virtual void BuildMoveToAcceleratorAsync(void)#

Build the solver and move it to the accelerator asynchronously.

virtual void Sync(void)#

Synchronize the solver.

virtual void ReBuildNumeric(void)#

Rebuild the solver only with numerical computation (no allocation or data structure computation)

virtual void MoveToHost(void)#

Move all data (i.e. move the solver) to the host.

virtual void MoveToAccelerator(void)#

Move all data (i.e. move the solver) to the accelerator.

virtual void Verbose(int verb = 1)#

Provide verbose output of the solver.

  • verb = 0 -> no output

  • verb = 1 -> print info about the solver (start, end);

  • verb = 2 -> print (iter, residual) via iteration control;

inline void FlagPrecond(void)#

Mark this solver as being a preconditioner.

inline void FlagSmoother(void)#

Mark this solver as being a smoother.

Iterative Linear Solvers#

template<class OperatorType, class VectorType, typename ValueType>
class IterativeLinearSolver : public rocalution::Solver<OperatorType, VectorType, ValueType>#

Base class for all linear iterative solvers.

The iterative solvers are controlled by an iteration control object, which monitors the convergence properties of the solver, i.e. maximum number of iteration, relative tolerance, absolute tolerance and divergence tolerance. The iteration control can also record the residual history and store it in an ASCII file.

All iterative solvers are controlled based on

  • Absolute stopping criteria, when \(|r_{k}|_{L_{p}} < \epsilon_{abs}\)

  • Relative stopping criteria, when \(|r_{k}|_{L_{p}} / |r_{1}|_{L_{p}} \leq \epsilon_{rel}\)

  • Divergence stopping criteria, when \(|r_{k}|_{L_{p}} / |r_{1}|_{L_{p}} \geq \epsilon_{div}\)

  • Maximum number of iteration \(N\), when \(k = N\)

where \(k\) is the current iteration, \(r_{k}\) the residual for the current iteration \(k\) (i.e. \(r_{k} = b - Ax_{k}\)) and \(r_{1}\) the starting residual (i.e. \(r_{1} = b - Ax_{init}\)). In addition, the minimum number of iterations \(M\) can be specified. In this case, the solver will not stop to iterate, before \(k \geq M\).

The \(L_{p}\) norm is used for the computation, where \(p\) could be 1, 2 and \(\infty\). The norm computation can be set with SetResidualNorm() with 1 for \(L_{1}\), 2 for \(L_{2}\) and 3 for \(L_{\infty}\). For the computation with \(L_{\infty}\), the index of the maximum value can be obtained with GetAmaxResidualIndex(). If this function is called and \(L_{\infty}\) was not selected, this function will return -1.

The reached criteria can be obtained with GetSolverStatus(), returning

  • 0, if no criteria has been reached yet

  • 1, if absolute tolerance has been reached

  • 2, if relative tolerance has been reached

  • 3, if divergence tolerance has been reached

  • 4, if maximum number of iteration has been reached

Template Parameters:

Subclassed by rocalution::BaseMultiGrid< OperatorType, VectorType, ValueType >, rocalution::BiCGStab< OperatorType, VectorType, ValueType >, rocalution::BiCGStabl< OperatorType, VectorType, ValueType >, rocalution::CG< OperatorType, VectorType, ValueType >, rocalution::CR< OperatorType, VectorType, ValueType >, rocalution::Chebyshev< OperatorType, VectorType, ValueType >, rocalution::FCG< OperatorType, VectorType, ValueType >, rocalution::FGMRES< OperatorType, VectorType, ValueType >, rocalution::FixedPoint< OperatorType, VectorType, ValueType >, rocalution::GMRES< OperatorType, VectorType, ValueType >, rocalution::IDR< OperatorType, VectorType, ValueType >, rocalution::QMRCGStab< OperatorType, VectorType, ValueType >

Public Functions

void Init(double abs_tol, double rel_tol, double div_tol, int max_iter)#

Initialize the solver with absolute/relative/divergence tolerance and maximum number of iterations.

void Init(double abs_tol, double rel_tol, double div_tol, int min_iter, int max_iter)#

Initialize the solver with absolute/relative/divergence tolerance and minimum/maximum number of iterations.

void InitMinIter(int min_iter)#

Set the minimum number of iterations.

void InitMaxIter(int max_iter)#

Set the maximum number of iterations.

void InitTol(double abs, double rel, double div)#

Set the absolute/relative/divergence tolerance.

void SetResidualNorm(int resnorm)#

Set the residual norm to \(L_1\), \(L_2\) or \(L_\infty\) norm.

  • resnorm = 1 -> \(L_1\) norm

  • resnorm = 2 -> \(L_2\) norm

  • resnorm = 3 -> \(L_\infty\) norm

void RecordResidualHistory(void)#

Record the residual history.

void RecordHistory(const std::string &filename) const#

Write the history to file.

virtual void Verbose(int verb = 1)#

Set the solver verbosity output.

virtual void Solve(const VectorType &rhs, VectorType *x)#

Solve Operator x = rhs.

virtual void SetPreconditioner(Solver<OperatorType, VectorType, ValueType> &precond)#

Set a preconditioner of the linear solver.

virtual int GetIterationCount(void)#

Return the iteration count.

virtual double GetCurrentResidual(void)#

Return the current residual.

virtual int GetSolverStatus(void)#

Return the current status.

virtual int64_t GetAmaxResidualIndex(void)#

Return absolute maximum index of residual vector when using \(L_\infty\) norm.

template<class OperatorType, class VectorType, typename ValueType>
class FixedPoint : public rocalution::IterativeLinearSolver<OperatorType, VectorType, ValueType>#

Fixed-Point Iteration Scheme.

The Fixed-Point iteration scheme is based on additive splitting of the matrix \(A = M + N\). The scheme reads

\[ x_{k+1} = M^{-1} (b - N x_{k}). \]
It can also be reformulated as a weighted defect correction scheme
\[ x_{k+1} = x_{k} - \omega M^{-1} (Ax_{k} - b). \]
The inversion of \(M\) can be performed by preconditioners (Jacobi, Gauss-Seidel, ILU, etc.) or by any type of solvers.

Template Parameters:

Public Functions

virtual void Print(void) const#

Print information about the solver.

virtual void ReBuildNumeric(void)#

Rebuild the solver only with numerical computation (no allocation or data structure computation)

void SetRelaxation(ValueType omega)#

Set relaxation parameter \(\omega\).

virtual void Build(void)#

Build the solver (data allocation, structure and numerical computation)

virtual void Clear(void)#

Clear (free all local data) the solver.

virtual void SolveZeroSol(const VectorType &rhs, VectorType *x)#

Solve Operator x = rhs, setting initial x = 0.

template<class OperatorTypeH, class VectorTypeH, typename ValueTypeH, class OperatorTypeL, class VectorTypeL, typename ValueTypeL>
class MixedPrecisionDC : public rocalution::IterativeLinearSolver<OperatorTypeH, VectorTypeH, ValueTypeH>#

Mixed-Precision Defect Correction Scheme.

The Mixed-Precision solver is based on a defect-correction scheme. The current implementation of the library is using host based correction in double precision and accelerator computation in single precision. The solver is implemeting the scheme

\[ x_{k+1} = x_{k} + A^{-1} r_{k}, \]
where the computation of the residual \(r_{k} = b - Ax_{k}\) and the update \(x_{k+1} = x_{k} + d_{k}\) are performed on the host in double precision. The computation of the residual system \(Ad_{k} = r_{k}\) is performed on the accelerator in single precision. In addition to the setup functions of the iterative solver, the user need to specify the inner ( \(Ad_{k} = r_{k}\)) solver.

Template Parameters:

Public Functions

virtual void Print(void) const#

Print information about the solver.

void Set(Solver<OperatorTypeL, VectorTypeL, ValueTypeL> &Solver_L)#

Set the inner solver for \(Ad_{k} = r_{k}\).

virtual void Build(void)#

Build the solver (data allocation, structure and numerical computation)

virtual void ReBuildNumeric(void)#

Rebuild the solver only with numerical computation (no allocation or data structure computation)

virtual void Clear(void)#

Clear (free all local data) the solver.

template<class OperatorType, class VectorType, typename ValueType>
class Chebyshev : public rocalution::IterativeLinearSolver<OperatorType, VectorType, ValueType>#

Chebyshev Iteration Scheme.

The Chebyshev Iteration scheme (also known as acceleration scheme) is similar to the CG method but requires minimum and maximum eigenvalues of the operator. [1]

Template Parameters:

Public Functions

virtual void Print(void) const#

Print information about the solver.

void Set(ValueType lambda_min, ValueType lambda_max)#

Set the minimum and maximum eigenvalues of the operator.

virtual void Build(void)#

Build the solver (data allocation, structure and numerical computation)

virtual void ReBuildNumeric(void)#

Rebuild the solver only with numerical computation (no allocation or data structure computation)

virtual void Clear(void)#

Clear (free all local data) the solver.

Krylov Subspace Solvers#

template<class OperatorType, class VectorType, typename ValueType>
class BiCGStab : public rocalution::IterativeLinearSolver<OperatorType, VectorType, ValueType>#

Bi-Conjugate Gradient Stabilized Method.

The Bi-Conjugate Gradient Stabilized method is a variation of CGS and solves sparse (non) symmetric linear systems \(Ax=b\). [11]

Template Parameters:

Public Functions

virtual void Print(void) const#

Print information about the solver.

virtual void Build(void)#

Build the solver (data allocation, structure and numerical computation)

virtual void ReBuildNumeric(void)#

Rebuild the solver only with numerical computation (no allocation or data structure computation)

virtual void Clear(void)#

Clear (free all local data) the solver.

template<class OperatorType, class VectorType, typename ValueType>
class BiCGStabl : public rocalution::IterativeLinearSolver<OperatorType, VectorType, ValueType>#

Bi-Conjugate Gradient Stabilized (l) Method.

The Bi-Conjugate Gradient Stabilized (l) method is a generalization of BiCGStab for solving sparse (non) symmetric linear systems \(Ax=b\). It minimizes residuals over \(l\)-dimensional Krylov subspaces. The degree \(l\) can be set with SetOrder(). [4]

Template Parameters:

Public Functions

virtual void Print(void) const#

Print information about the solver.

virtual void Build(void)#

Build the solver (data allocation, structure and numerical computation)

virtual void ReBuildNumeric(void)#

Rebuild the solver only with numerical computation (no allocation or data structure computation)

virtual void Clear(void)#

Clear (free all local data) the solver.

virtual void SetOrder(int l)#

Set the order.

template<class OperatorType, class VectorType, typename ValueType>
class CG : public rocalution::IterativeLinearSolver<OperatorType, VectorType, ValueType>#

Conjugate Gradient Method.

The Conjugate Gradient method is the best known iterative method for solving sparse symmetric positive definite (SPD) linear systems \(Ax=b\). It is based on orthogonal projection onto the Krylov subspace \(\mathcal{K}_{m}(r_{0}, A)\), where \(r_{0}\) is the initial residual. The method can be preconditioned, where the approximation should also be SPD. [11]

Template Parameters:

Public Functions

virtual void Print(void) const#

Print information about the solver.

virtual void Build(void)#

Build the solver (data allocation, structure and numerical computation)

virtual void BuildMoveToAcceleratorAsync(void)#

Build the solver and move it to the accelerator asynchronously.

virtual void Sync(void)#

Synchronize the solver.

virtual void ReBuildNumeric(void)#

Rebuild the solver only with numerical computation (no allocation or data structure computation)

virtual void Clear(void)#

Clear (free all local data) the solver.

template<class OperatorType, class VectorType, typename ValueType>
class CR : public rocalution::IterativeLinearSolver<OperatorType, VectorType, ValueType>#

Conjugate Residual Method.

The Conjugate Residual method is an iterative method for solving sparse symmetric semi-positive definite linear systems \(Ax=b\). It is a Krylov subspace method and differs from the much more popular Conjugate Gradient method that the system matrix is not required to be positive definite. The method can be preconditioned where the approximation should also be SPD or semi-positive definite. [11]

Template Parameters:

Public Functions

virtual void Print(void) const#

Print information about the solver.

virtual void Build(void)#

Build the solver (data allocation, structure and numerical computation)

virtual void ReBuildNumeric(void)#

Rebuild the solver only with numerical computation (no allocation or data structure computation)

virtual void Clear(void)#

Clear (free all local data) the solver.

template<class OperatorType, class VectorType, typename ValueType>
class FCG : public rocalution::IterativeLinearSolver<OperatorType, VectorType, ValueType>#

Flexible Conjugate Gradient Method.

The Flexible Conjugate Gradient method is an iterative method for solving sparse symmetric positive definite linear systems \(Ax=b\). It is similar to the Conjugate Gradient method with the only difference, that it allows the preconditioner \(M^{-1}\) to be not a constant operator. This can be especially helpful if the operation \(M^{-1}x\) is the result of another iterative process and not a constant operator. [9]

Template Parameters:
  • OperatorType