# 3.3. LAPACK Functions#

LAPACK routines solve complex Numerical Linear Algebra problems. These functions are organized in the following categories:

Note

Throughout the APIs’ descriptions, we use the following notations:

• x[i] stands for the i-th element of vector x, while A[i,j] represents the element in the i-th row and j-th column of matrix A. Indices are 1-based, i.e. x is the first element of x.

• If X is a real vector or matrix, $$X^T$$ indicates its transpose; if X is complex, then $$X^H$$ represents its conjugate transpose. When X could be real or complex, we use X’ to indicate X transposed or X conjugate transposed, accordingly.

• x_i $$=x_i$$; we sometimes use both notations, $$x_i$$ when displaying mathematical equations, and x_i in the text describing the function parameters.

## 3.3.1. Triangular factorizations#

### 3.3.1.1. rocsolver_<type>potf2()#

rocblas_status rocsolver_zpotf2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *info)#
rocblas_status rocsolver_cpotf2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *info)#
rocblas_status rocsolver_dpotf2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *info)#
rocblas_status rocsolver_spotf2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *info)#

POTF2 computes the Cholesky factorization of a real symmetric (complex Hermitian) positive definite matrix A.

(This is the unblocked version of the algorithm).

The factorization has the form:

$\begin{split} \begin{array}{cl} A = U'U & \: \text{if uplo is upper, or}\\ A = LL' & \: \text{if uplo is lower.} \end{array} \end{split}$

U is an upper triangular matrix and L is lower triangular.

Parameters:
• handle[in] rocblas_handle.

• uplo[in]

rocblas_fill.

Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

• n[in]

rocblas_int. n >= 0.

The number of rows and columns of matrix A.

• A[inout]

pointer to type. Array on the GPU of dimension lda*n.

On entry, the matrix A to be factored. On exit, the lower or upper triangular factor.

• lda[in]

rocblas_int. lda >= n.

Specifies the leading dimension of A.

• info[out]

pointer to a rocblas_int on the GPU.

If info = 0, successful factorization of matrix A. If info = i > 0, the leading minor of order i of A is not positive definite. The factorization stopped at this point.

### 3.3.1.2. rocsolver_<type>potf2_batched()#

rocblas_status rocsolver_zpotf2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_cpotf2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dpotf2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_spotf2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#

POTF2_BATCHED computes the Cholesky factorization of a batch of real symmetric (complex Hermitian) positive definite matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix $$A_j$$ in the batch has the form:

$\begin{split} \begin{array}{cl} A_j = U_j'U_j & \: \text{if uplo is upper, or}\\ A_j = L_jL_j' & \: \text{if uplo is lower.} \end{array} \end{split}$

$$U_j$$ is an upper triangular matrix and $$L_j$$ is lower triangular.

Parameters:
• handle[in] rocblas_handle.

• uplo[in]

rocblas_fill.

Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

• n[in]

rocblas_int. n >= 0.

The number of rows and columns of matrix A_j.

• A[inout]

array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

On entry, the matrices A_j to be factored. On exit, the upper or lower triangular factors.

• lda[in]

rocblas_int. lda >= n.

Specifies the leading dimension of A_j.

• info[out]

pointer to rocblas_int. Array of batch_count integers on the GPU.

If info[j] = 0, successful factorization of matrix A_j. If info[j] = i > 0, the leading minor of order i of A_j is not positive definite. The j-th factorization stopped at this point.

• batch_count[in]

rocblas_int. batch_count >= 0.

Number of matrices in the batch.

### 3.3.1.3. rocsolver_<type>potf2_strided_batched()#

rocblas_status rocsolver_zpotf2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_cpotf2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dpotf2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_spotf2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#

POTF2_STRIDED_BATCHED computes the Cholesky factorization of a batch of real symmetric (complex Hermitian) positive definite matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix $$A_j$$ in the batch has the form:

$\begin{split} \begin{array}{cl} A_j = U_j'U_j & \: \text{if uplo is upper, or}\\ A_j = L_jL_j' & \: \text{if uplo is lower.} \end{array} \end{split}$

$$U_j$$ is an upper triangular matrix and $$L_j$$ is lower triangular.

Parameters:
• handle[in] rocblas_handle.

• uplo[in]

rocblas_fill.

Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

• n[in]

rocblas_int. n >= 0.

The number of rows and columns of matrix A_j.

• A[inout]

pointer to type. Array on the GPU (the size depends on the value of strideA).

On entry, the matrices A_j to be factored. On exit, the upper or lower triangular factors.

• lda[in]

rocblas_int. lda >= n.

Specifies the leading dimension of A_j.

• strideA[in]

rocblas_stride.

Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

• info[out]

pointer to rocblas_int. Array of batch_count integers on the GPU.

If info[j] = 0, successful factorization of matrix A_j. If info[j] = i > 0, the leading minor of order i of A_j is not positive definite. The j-th factorization stopped at this point.

• batch_count[in]

rocblas_int. batch_count >= 0.

Number of matrices in the batch.

### 3.3.1.4. rocsolver_<type>potrf()#

rocblas_status rocsolver_zpotrf(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *info)#
rocblas_status rocsolver_cpotrf(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *info)#
rocblas_status rocsolver_dpotrf(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *info)#
rocblas_status rocsolver_spotrf(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *info)#

POTRF computes the Cholesky factorization of a real symmetric (complex Hermitian) positive definite matrix A.

(This is the blocked version of the algorithm).

The factorization has the form:

$\begin{split} \begin{array}{cl} A = U'U & \: \text{if uplo is upper, or}\\ A = LL' & \: \text{if uplo is lower.} \end{array} \end{split}$

U is an upper triangular matrix and L is lower triangular.

Parameters:
• handle[in] rocblas_handle.

• uplo[in]

rocblas_fill.

Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

• n[in]

rocblas_int. n >= 0.

The number of rows and columns of matrix A.

• A[inout]

pointer to type. Array on the GPU of dimension lda*n.

On entry, the matrix A to be factored. On exit, the lower or upper triangular factor.

• lda[in]

rocblas_int. lda >= n.

Specifies the leading dimension of A.

• info[out]

pointer to a rocblas_int on the GPU.

If info = 0, successful factorization of matrix A. If info = i > 0, the leading minor of order i of A is not positive definite. The factorization stopped at this point.

### 3.3.1.5. rocsolver_<type>potrf_batched()#

rocblas_status rocsolver_zpotrf_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_cpotrf_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dpotrf_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_spotrf_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *info, const rocblas_int batch_count)#

POTRF_BATCHED computes the Cholesky factorization of a batch of real symmetric (complex Hermitian) positive definite matrices.

(This is the blocked version of the algorithm).

The factorization of matrix $$A_j$$ in the batch has the form:

$\begin{split} \begin{array}{cl} A_j = U_j'U_j & \: \text{if uplo is upper, or}\\ A_j = L_jL_j' & \: \text{if uplo is lower.} \end{array} \end{split}$

$$U_j$$ is an upper triangular matrix and $$L_j$$ is lower triangular.

Parameters:
• handle[in] rocblas_handle.

• uplo[in]

rocblas_fill.

Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

• n[in]

rocblas_int. n >= 0.

The number of rows and columns of matrix A_j.

• A[inout]

array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

On entry, the matrices A_j to be factored. On exit, the upper or lower triangular factors.

• lda[in]

rocblas_int. lda >= n.

Specifies the leading dimension of A_j.

• info[out]

pointer to rocblas_int. Array of batch_count integers on the GPU.

If info[j] = 0, successful factorization of matrix A_j. If info[j] = i > 0, the leading minor of order i of A_j is not positive definite. The j-th factorization stopped at this point.

• batch_count[in]

rocblas_int. batch_count >= 0.

Number of matrices in the batch.

### 3.3.1.6. rocsolver_<type>potrf_strided_batched()#

rocblas_status rocsolver_zpotrf_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_cpotrf_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dpotrf_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_spotrf_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *info, const rocblas_int batch_count)#

POTRF_STRIDED_BATCHED computes the Cholesky factorization of a batch of real symmetric (complex Hermitian) positive definite matrices.

(This is the blocked version of the algorithm).

The factorization of matrix $$A_j$$ in the batch has the form:

$\begin{split} \begin{array}{cl} A_j = U_j'U_j & \: \text{if uplo is upper, or}\\ A_j = L_jL_j' & \: \text{if uplo is lower.} \end{array} \end{split}$

$$U_j$$ is an upper triangular matrix and $$L_j$$ is lower triangular.

Parameters:
• handle[in] rocblas_handle.

• uplo[in]

rocblas_fill.

Specifies whether the factorization is upper or lower triangular. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

• n[in]

rocblas_int. n >= 0.

The number of rows and columns of matrix A_j.

• A[inout]

pointer to type. Array on the GPU (the size depends on the value of strideA).

On entry, the matrices A_j to be factored. On exit, the upper or lower triangular factors.

• lda[in]

rocblas_int. lda >= n.

Specifies the leading dimension of A_j.

• strideA[in]

rocblas_stride.

Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

• info[out]

pointer to rocblas_int. Array of batch_count integers on the GPU.

If info[j] = 0, successful factorization of matrix A_j. If info[j] = i > 0, the leading minor of order i of A_j is not positive definite. The j-th factorization stopped at this point.

• batch_count[in]

rocblas_int. batch_count >= 0.

Number of matrices in the batch.

### 3.3.1.7. rocsolver_<type>getf2()#

rocblas_status rocsolver_zgetf2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#
rocblas_status rocsolver_cgetf2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#
rocblas_status rocsolver_dgetf2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#
rocblas_status rocsolver_sgetf2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#

GETF2 computes the LU factorization of a general m-by-n matrix A using partial pivoting with row interchanges.

(This is the unblocked Level-2-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with small and mid-size matrices if optimizations are enabled (default option). For more details, see the “Tuning rocSOLVER performance” section of the Library Design Guide).

The factorization has the form

$A = PLU$

where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

Parameters:
• handle[in] rocblas_handle.

• m[in]

rocblas_int. m >= 0.

The number of rows of the matrix A.

• n[in]

rocblas_int. n >= 0.

The number of columns of the matrix A.

• A[inout]

pointer to type. Array on the GPU of dimension lda*n.

On entry, the m-by-n matrix A to be factored. On exit, the factors L and U from the factorization. The unit diagonal elements of L are not stored.

• lda[in]

rocblas_int. lda >= m.

Specifies the leading dimension of A.

• ipiv[out]

pointer to rocblas_int. Array on the GPU of dimension min(m,n).

The vector of pivot indices. Elements of ipiv are 1-based indices. For 1 <= i <= min(m,n), the row i of the matrix was interchanged with row ipiv[i]. Matrix P of the factorization can be derived from ipiv.

• info[out]

pointer to a rocblas_int on the GPU.

If info = 0, successful exit. If info = i > 0, U is singular. U[i,i] is the first zero pivot.

### 3.3.1.8. rocsolver_<type>getf2_batched()#

rocblas_status rocsolver_zgetf2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_cgetf2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dgetf2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_sgetf2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#

GETF2_BATCHED computes the LU factorization of a batch of general m-by-n matrices using partial pivoting with row interchanges.

(This is the unblocked Level-2-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with small and mid-size matrices if optimizations are enabled (default option). For more details, see the “Tuning rocSOLVER performance” section of the Library Design Guide).

The factorization of matrix $$A_j$$ in the batch has the form

$A_j = P_jL_jU_j$

where $$P_j$$ is a permutation matrix, $$L_j$$ is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and $$U_j$$ is upper triangular (upper trapezoidal if m < n).

Parameters:
• handle[in] rocblas_handle.

• m[in]

rocblas_int. m >= 0.

The number of rows of all matrices A_j in the batch.

• n[in]

rocblas_int. n >= 0.

The number of columns of all matrices A_j in the batch.

• A[inout]

array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

On entry, the m-by-n matrices A_j to be factored. On exit, the factors L_j and U_j from the factorizations. The unit diagonal elements of L_j are not stored.

• lda[in]

rocblas_int. lda >= m.

Specifies the leading dimension of matrices A_j.

• ipiv[out]

pointer to rocblas_int. Array on the GPU (the size depends on the value of strideP).

Contains the vectors of pivot indices ipiv_j (corresponding to A_j). Dimension of ipiv_j is min(m,n). Elements of ipiv_j are 1-based indices. For each instance A_j in the batch and for 1 <= i <= min(m,n), the row i of the matrix A_j was interchanged with row ipiv_j[i]. Matrix P_j of the factorization can be derived from ipiv_j.

• strideP[in]

rocblas_stride.

Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use case is strideP >= min(m,n).

• info[out]

pointer to rocblas_int. Array of batch_count integers on the GPU.

If info[j] = 0, successful exit for factorization of A_j. If info[j] = i > 0, U_j is singular. U_j[i,i] is the first zero pivot.

• batch_count[in]

rocblas_int. batch_count >= 0.

Number of matrices in the batch.

### 3.3.1.9. rocsolver_<type>getf2_strided_batched()#

rocblas_status rocsolver_zgetf2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_cgetf2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dgetf2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_sgetf2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#

GETF2_STRIDED_BATCHED computes the LU factorization of a batch of general m-by-n matrices using partial pivoting with row interchanges.

(This is the unblocked Level-2-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with small and mid-size matrices if optimizations are enabled (default option). For more details, see the “Tuning rocSOLVER performance” section of the Library Design Guide).

The factorization of matrix $$A_j$$ in the batch has the form

$A_j = P_jL_jU_j$

where $$P_j$$ is a permutation matrix, $$L_j$$ is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and $$U_j$$ is upper triangular (upper trapezoidal if m < n).

Parameters:
• handle[in] rocblas_handle.

• m[in]

rocblas_int. m >= 0.

The number of rows of all matrices A_j in the batch.

• n[in]

rocblas_int. n >= 0.

The number of columns of all matrices A_j in the batch.

• A[inout]

pointer to type. Array on the GPU (the size depends on the value of strideA).

On entry, the m-by-n matrices A_j to be factored. On exit, the factors L_j and U_j from the factorization. The unit diagonal elements of L_j are not stored.

• lda[in]

rocblas_int. lda >= m.

Specifies the leading dimension of matrices A_j.

• strideA[in]

rocblas_stride.

Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n

• ipiv[out]

pointer to rocblas_int. Array on the GPU (the size depends on the value of strideP).

Contains the vectors of pivots indices ipiv_j (corresponding to A_j). Dimension of ipiv_j is min(m,n). Elements of ipiv_j are 1-based indices. For each instance A_j in the batch and for 1 <= i <= min(m,n), the row i of the matrix A_j was interchanged with row ipiv_j[i]. Matrix P_j of the factorization can be derived from ipiv_j.

• strideP[in]

rocblas_stride.

Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use case is strideP >= min(m,n).

• info[out]

pointer to rocblas_int. Array of batch_count integers on the GPU.

If info[j] = 0, successful exit for factorization of A_j. If info[j] = i > 0, U_j is singular. U_j[i,i] is the first zero pivot.

• batch_count[in]

rocblas_int. batch_count >= 0.

Number of matrices in the batch.

### 3.3.1.10. rocsolver_<type>getrf()#

rocblas_status rocsolver_zgetrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#
rocblas_status rocsolver_cgetrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#
rocblas_status rocsolver_dgetrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#
rocblas_status rocsolver_sgetrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#

GETRF computes the LU factorization of a general m-by-n matrix A using partial pivoting with row interchanges.

(This is the blocked Level-3-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with mid-size matrices if optimizations are enabled (default option). For more details, see the “Tuning rocSOLVER performance” section of the Library Design Guide).

The factorization has the form

$A = PLU$

where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).

Parameters:
• handle[in] rocblas_handle.

• m[in]

rocblas_int. m >= 0.

The number of rows of the matrix A.

• n[in]

rocblas_int. n >= 0.

The number of columns of the matrix A.

• A[inout]

pointer to type. Array on the GPU of dimension lda*n.

On entry, the m-by-n matrix A to be factored. On exit, the factors L and U from the factorization. The unit diagonal elements of L are not stored.

• lda[in]

rocblas_int. lda >= m.

Specifies the leading dimension of A.

• ipiv[out]

pointer to rocblas_int. Array on the GPU of dimension min(m,n).

The vector of pivot indices. Elements of ipiv are 1-based indices. For 1 <= i <= min(m,n), the row i of the matrix was interchanged with row ipiv[i]. Matrix P of the factorization can be derived from ipiv.

• info[out]

pointer to a rocblas_int on the GPU.

If info = 0, successful exit. If info = i > 0, U is singular. U[i,i] is the first zero pivot.

### 3.3.1.11. rocsolver_<type>getrf_batched()#

rocblas_status rocsolver_zgetrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_cgetrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dgetrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_sgetrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#

GETRF_BATCHED computes the LU factorization of a batch of general m-by-n matrices using partial pivoting with row interchanges.

(This is the blocked Level-3-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with mid-size matrices if optimizations are enabled (default option). For more details, see the “Tuning rocSOLVER performance” section of the Library Design Guide).

The factorization of matrix $$A_j$$ in the batch has the form

$A_j = P_jL_jU_j$

where $$P_j$$ is a permutation matrix, $$L_j$$ is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and $$U_j$$ is upper triangular (upper trapezoidal if m < n).

Parameters:
• handle[in] rocblas_handle.

• m[in]

rocblas_int. m >= 0.

The number of rows of all matrices A_j in the batch.

• n[in]

rocblas_int. n >= 0.

The number of columns of all matrices A_j in the batch.

• A[inout]

array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

On entry, the m-by-n matrices A_j to be factored. On exit, the factors L_j and U_j from the factorizations. The unit diagonal elements of L_j are not stored.

• lda[in]

rocblas_int. lda >= m.

Specifies the leading dimension of matrices A_j.

• ipiv[out]

pointer to rocblas_int. Array on the GPU (the size depends on the value of strideP).

Contains the vectors of pivot indices ipiv_j (corresponding to A_j). Dimension of ipiv_j is min(m,n). Elements of ipiv_j are 1-based indices. For each instance A_j in the batch and for 1 <= i <= min(m,n), the row i of the matrix A_j was interchanged with row ipiv_j[i]. Matrix P_j of the factorization can be derived from ipiv_j.

• strideP[in]

rocblas_stride.

Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use case is strideP >= min(m,n).

• info[out]

pointer to rocblas_int. Array of batch_count integers on the GPU.

If info[j] = 0, successful exit for factorization of A_j. If info[j] = i > 0, U_j is singular. U_j[i,i] is the first zero pivot.

• batch_count[in]

rocblas_int. batch_count >= 0.

Number of matrices in the batch.

### 3.3.1.12. rocsolver_<type>getrf_strided_batched()#

rocblas_status rocsolver_zgetrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_cgetrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dgetrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_sgetrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#

GETRF_STRIDED_BATCHED computes the LU factorization of a batch of general m-by-n matrices using partial pivoting with row interchanges.

(This is the blocked Level-3-BLAS version of the algorithm. An optimized internal implementation without rocBLAS calls could be executed with mid-size matrices if optimizations are enabled (default option). For more details, see the “Tuning rocSOLVER performance” section of the Library Design Guide).

The factorization of matrix $$A_j$$ in the batch has the form

$A_j = P_jL_jU_j$

where $$P_j$$ is a permutation matrix, $$L_j$$ is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and $$U_j$$ is upper triangular (upper trapezoidal if m < n).

Parameters:
• handle[in] rocblas_handle.

• m[in]

rocblas_int. m >= 0.

The number of rows of all matrices A_j in the batch.

• n[in]

rocblas_int. n >= 0.

The number of columns of all matrices A_j in the batch.

• A[inout]

pointer to type. Array on the GPU (the size depends on the value of strideA).

On entry, the m-by-n matrices A_j to be factored. On exit, the factors L_j and U_j from the factorization. The unit diagonal elements of L_j are not stored.

• lda[in]

rocblas_int. lda >= m.

Specifies the leading dimension of matrices A_j.

• strideA[in]

rocblas_stride.

Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n

• ipiv[out]

pointer to rocblas_int. Array on the GPU (the size depends on the value of strideP).

Contains the vectors of pivots indices ipiv_j (corresponding to A_j). Dimension of ipiv_j is min(m,n). Elements of ipiv_j are 1-based indices. For each instance A_j in the batch and for 1 <= i <= min(m,n), the row i of the matrix A_j was interchanged with row ipiv_j[i]. Matrix P_j of the factorization can be derived from ipiv_j.

• strideP[in]

rocblas_stride.

Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use case is strideP >= min(m,n).

• info[out]

pointer to rocblas_int. Array of batch_count integers on the GPU.

If info[j] = 0, successful exit for factorization of A_j. If info[j] = i > 0, U_j is singular. U_j[i,i] is the first zero pivot.

• batch_count[in]

rocblas_int. batch_count >= 0.

Number of matrices in the batch.

### 3.3.1.13. rocsolver_<type>sytf2()#

rocblas_status rocsolver_zsytf2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#
rocblas_status rocsolver_csytf2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#
rocblas_status rocsolver_dsytf2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#
rocblas_status rocsolver_ssytf2(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#

SYTF2 computes the factorization of a symmetric indefinite matrix $$A$$ using Bunch-Kaufman diagonal pivoting.

(This is the unblocked version of the algorithm).

The factorization has the form

$\begin{split} \begin{array}{cl} A = U D U^T & \: \text{or}\\ A = L D L^T & \end{array} \end{split}$

where $$U$$ or $$L$$ is a product of permutation and unit upper/lower triangular matrices (depending on the value of uplo), and $$D$$ is a symmetric block diagonal matrix with 1-by-1 and 2-by-2 diagonal blocks $$D(k)$$.

Specifically, $$U$$ and $$L$$ are computed as

$\begin{split} \begin{array}{cl} U = P(n) U(n) \cdots P(k) U(k) \cdots & \: \text{and}\\ L = P(1) L(1) \cdots P(k) L(k) \cdots & \end{array} \end{split}$

where $$k$$ decreases from $$n$$ to 1 (increases from 1 to $$n$$) in steps of 1 or 2, depending on the order of block $$D(k)$$, and $$P(k)$$ is a permutation matrix defined by $$ipiv[k]$$. If we let $$s$$ denote the order of block $$D(k)$$, then $$U(k)$$ and $$L(k)$$ are unit upper/lower triangular matrices defined as

$\begin{split} U(k) = \left[ \begin{array}{ccc} I_{k-s} & v & 0 \\ 0 & I_s & 0 \\ 0 & 0 & I_{n-k} \end{array} \right] \end{split}$

and

$\begin{split} L(k) = \left[ \begin{array}{ccc} I_{k-1} & 0 & 0 \\ 0 & I_s & 0 \\ 0 & v & I_{n-k-s+1} \end{array} \right]. \end{split}$

If $$s = 1$$, then $$D(k)$$ is stored in $$A[k,k]$$ and $$v$$ is stored in the upper/lower part of column $$k$$ of $$A$$. If $$s = 2$$ and uplo is upper, then $$D(k)$$ is stored in $$A[k-1,k-1]$$, $$A[k-1,k]$$, and $$A[k,k]$$, and $$v$$ is stored in the upper parts of columns $$k-1$$ and $$k$$ of $$A$$. If $$s = 2$$ and uplo is lower, then $$D(k)$$ is stored in $$A[k,k]$$, $$A[k+1,k]$$, and $$A[k+1,k+1]$$, and $$v$$ is stored in the lower parts of columns $$k$$ and $$k+1$$ of $$A$$.

Parameters:
• handle[in] rocblas_handle.

• uplo[in]

rocblas_fill.

Specifies whether the upper or lower part of the matrix A is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

• n[in]

rocblas_int. n >= 0.

The number of rows and columns of the matrix A.

• A[inout]

pointer to type. Array on the GPU of dimension lda*n.

On entry, the symmetric matrix A to be factored. On exit, the block diagonal matrix D and the multipliers needed to compute U or L.

• lda[in]

rocblas_int. lda >= n.

Specifies the leading dimension of A.

• ipiv[out]

pointer to rocblas_int. Array on the GPU of dimension n.

The vector of pivot indices. Elements of ipiv are 1-based indices. For 1 <= k <= n, if ipiv[k] > 0 then rows and columns k and ipiv[k] were interchanged and D[k,k] is a 1-by-1 diagonal block. If, instead, ipiv[k] = ipiv[k-1] < 0 and uplo is upper (or ipiv[k] = ipiv[k+1] < 0 and uplo is lower), then rows and columns k-1 and -ipiv[k] (or rows and columns k+1 and -ipiv[k]) were interchanged and D[k-1,k-1] to D[k,k] (or D[k,k] to D[k+1,k+1]) is a 2-by-2 diagonal block.

• info[out]

pointer to a rocblas_int on the GPU.

If info = 0, successful exit. If info = i > 0, D is singular. D[i,i] is the first diagonal zero.

### 3.3.1.14. rocsolver_<type>sytf2_batched()#

rocblas_status rocsolver_zsytf2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_csytf2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dsytf2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_ssytf2_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#

SYTF2_BATCHED computes the factorization of a batch of symmetric indefinite matrices using Bunch-Kaufman diagonal pivoting.

(This is the unblocked version of the algorithm).

The factorization has the form

$\begin{split} \begin{array}{cl} A_j = U_j D_j U_j^T & \: \text{or}\\ A_j = L_j D_j L_j^T & \end{array} \end{split}$

where $$U_j$$ or $$L_j$$ is a product of permutation and unit upper/lower triangular matrices (depending on the value of uplo), and $$D_j$$ is a symmetric block diagonal matrix with 1-by-1 and 2-by-2 diagonal blocks $$D_j(k)$$.

Specifically, $$U_j$$ and $$L_j$$ are computed as

$\begin{split} \begin{array}{cl} U_j = P_j(n) U_j(n) \cdots P_j(k) U_j(k) \cdots & \: \text{and}\\ L_j = P_j(1) L_j(1) \cdots P_j(k) L_j(k) \cdots & \end{array} \end{split}$

where $$k$$ decreases from $$n$$ to 1 (increases from 1 to $$n$$) in steps of 1 or 2, depending on the order of block $$D_j(k)$$, and $$P_j(k)$$ is a permutation matrix defined by $$ipiv_j[k]$$. If we let $$s$$ denote the order of block $$D_j(k)$$, then $$U_j(k)$$ and $$L_j(k)$$ are unit upper/lower triangular matrices defined as

$\begin{split} U_j(k) = \left[ \begin{array}{ccc} I_{k-s} & v & 0 \\ 0 & I_s & 0 \\ 0 & 0 & I_{n-k} \end{array} \right] \end{split}$

and

$\begin{split} L_j(k) = \left[ \begin{array}{ccc} I_{k-1} & 0 & 0 \\ 0 & I_s & 0 \\ 0 & v & I_{n-k-s+1} \end{array} \right]. \end{split}$

If $$s = 1$$, then $$D_j(k)$$ is stored in $$A_j[k,k]$$ and $$v$$ is stored in the upper/lower part of column $$k$$ of $$A_j$$. If $$s = 2$$ and uplo is upper, then $$D_j(k)$$ is stored in $$A_j[k-1,k-1]$$, $$A_j[k-1,k]$$, and $$A_j[k,k]$$, and $$v$$ is stored in the upper parts of columns $$k-1$$ and $$k$$ of $$A_j$$. If $$s = 2$$ and uplo is lower, then $$D_j(k)$$ is stored in $$A_j[k,k]$$, $$A_j[k+1,k]$$, and $$A_j[k+1,k+1]$$, and $$v$$ is stored in the lower parts of columns $$k$$ and $$k+1$$ of $$A_j$$.

Parameters:
• handle[in] rocblas_handle.

• uplo[in]

rocblas_fill.

Specifies whether the upper or lower part of the matrices A_j are stored. If uplo indicates lower (or upper), then the upper (or lower) part of A_j is not used.

• n[in]

rocblas_int. n >= 0.

The number of rows and columns of all matrices A_j in the batch.

• A[inout]

array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

On entry, the symmetric matrices A_j to be factored. On exit, the block diagonal matrices D_j and the multipliers needed to compute U_j or L_j.

• lda[in]

rocblas_int. lda >= n.

Specifies the leading dimension of matrices A_j.

• ipiv[out]

pointer to rocblas_int. Array on the GPU of dimension n.

The vector of pivot indices. Elements of ipiv are 1-based indices. For 1 <= k <= n, if ipiv_j[k] > 0 then rows and columns k and ipiv_j[k] were interchanged and D_j[k,k] is a 1-by-1 diagonal block. If, instead, ipiv_j[k] = ipiv_j[k-1] < 0 and uplo is upper (or ipiv_j[k] = ipiv_j[k+1] < 0 and uplo is lower), then rows and columns k-1 and -ipiv_j[k] (or rows and columns k+1 and -ipiv_j[k]) were interchanged and D_j[k-1,k-1] to D_j[k,k] (or D_j[k,k] to D_j[k+1,k+1]) is a 2-by-2 diagonal block.

• strideP[in]

rocblas_stride.

Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use case is strideP >= n.

• info[out]

pointer to rocblas_int. Array of batch_count integers on the GPU.

If info[j] = 0, successful exit for factorization of A_j. If info[j] = i > 0, D_j is singular. D_j[i,i] is the first diagonal zero.

• batch_count[in]

rocblas_int. batch_count >= 0.

Number of matrices in the batch.

### 3.3.1.15. rocsolver_<type>sytf2_strided_batched()#

rocblas_status rocsolver_zsytf2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_csytf2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dsytf2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_ssytf2_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#

SYTF2_STRIDED_BATCHED computes the factorization of a batch of symmetric indefinite matrices using Bunch-Kaufman diagonal pivoting.

(This is the unblocked version of the algorithm).

The factorization has the form

$\begin{split} \begin{array}{cl} A_j = U_j D_j U_j^T & \: \text{or}\\ A_j = L_j D_j L_j^T & \end{array} \end{split}$

where $$U_j$$ or $$L_j$$ is a product of permutation and unit upper/lower triangular matrices (depending on the value of uplo), and $$D_j$$ is a symmetric block diagonal matrix with 1-by-1 and 2-by-2 diagonal blocks $$D_j(k)$$.

Specifically, $$U_j$$ and $$L_j$$ are computed as

$\begin{split} \begin{array}{cl} U_j = P_j(n) U_j(n) \cdots P_j(k) U_j(k) \cdots & \: \text{and}\\ L_j = P_j(1) L_j(1) \cdots P_j(k) L_j(k) \cdots & \end{array} \end{split}$

where $$k$$ decreases from $$n$$ to 1 (increases from 1 to $$n$$) in steps of 1 or 2, depending on the order of block $$D_j(k)$$, and $$P_j(k)$$ is a permutation matrix defined by $$ipiv_j[k]$$. If we let $$s$$ denote the order of block $$D_j(k)$$, then $$U_j(k)$$ and $$L_j(k)$$ are unit upper/lower triangular matrices defined as

$\begin{split} U_j(k) = \left[ \begin{array}{ccc} I_{k-s} & v & 0 \\ 0 & I_s & 0 \\ 0 & 0 & I_{n-k} \end{array} \right] \end{split}$

and

$\begin{split} L_j(k) = \left[ \begin{array}{ccc} I_{k-1} & 0 & 0 \\ 0 & I_s & 0 \\ 0 & v & I_{n-k-s+1} \end{array} \right]. \end{split}$

If $$s = 1$$, then $$D_j(k)$$ is stored in $$A_j[k,k]$$ and $$v$$ is stored in the upper/lower part of column $$k$$ of $$A_j$$. If $$s = 2$$ and uplo is upper, then $$D_j(k)$$ is stored in $$A_j[k-1,k-1]$$, $$A_j[k-1,k]$$, and $$A_j[k,k]$$, and $$v$$ is stored in the upper parts of columns $$k-1$$ and $$k$$ of $$A_j$$. If $$s = 2$$ and uplo is lower, then $$D_j(k)$$ is stored in $$A_j[k,k]$$, $$A_j[k+1,k]$$, and $$A_j[k+1,k+1]$$, and $$v$$ is stored in the lower parts of columns $$k$$ and $$k+1$$ of $$A_j$$.

Parameters:
• handle[in] rocblas_handle.

• uplo[in]

rocblas_fill.

Specifies whether the upper or lower part of the matrices A_j are stored. If uplo indicates lower (or upper), then the upper (or lower) part of A_j is not used.

• n[in]

rocblas_int. n >= 0.

The number of rows and columns of all matrices A_j in the batch.

• A[inout]

pointer to type. Array on the GPU (the size depends on the value of strideA).

On entry, the symmetric matrices A_j to be factored. On exit, the block diagonal matrices D_j and the multipliers needed to compute U_j or L_j.

• lda[in]

rocblas_int. lda >= n.

Specifies the leading dimension of matrices A_j.

• strideA[in]

rocblas_stride.

Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n

• ipiv[out]

pointer to rocblas_int. Array on the GPU of dimension n.

The vector of pivot indices. Elements of ipiv are 1-based indices. For 1 <= k <= n, if ipiv_j[k] > 0 then rows and columns k and ipiv_j[k] were interchanged and D_j[k,k] is a 1-by-1 diagonal block. If, instead, ipiv_j[k] = ipiv_j[k-1] < 0 and uplo is upper (or ipiv_j[k] = ipiv_j[k+1] < 0 and uplo is lower), then rows and columns k-1 and -ipiv_j[k] (or rows and columns k+1 and -ipiv_j[k]) were interchanged and D_j[k-1,k-1] to D_j[k,k] (or D_j[k,k] to D_j[k+1,k+1]) is a 2-by-2 diagonal block.

• strideP[in]

rocblas_stride.

Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use case is strideP >= n.

• info[out]

pointer to rocblas_int. Array of batch_count integers on the GPU.

If info[j] = 0, successful exit for factorization of A_j. If info[j] = i > 0, D_j is singular. D_j[i,i] is the first diagonal zero.

• batch_count[in]

rocblas_int. batch_count >= 0.

Number of matrices in the batch.

### 3.3.1.16. rocsolver_<type>sytrf()#

rocblas_status rocsolver_zsytrf(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#
rocblas_status rocsolver_csytrf(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#
rocblas_status rocsolver_dsytrf(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#
rocblas_status rocsolver_ssytrf(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, rocblas_int *ipiv, rocblas_int *info)#

SYTRF computes the factorization of a symmetric indefinite matrix $$A$$ using Bunch-Kaufman diagonal pivoting.

(This is the blocked version of the algorithm).

The factorization has the form

$\begin{split} \begin{array}{cl} A = U D U^T & \: \text{or}\\ A = L D L^T & \end{array} \end{split}$

where $$U$$ or $$L$$ is a product of permutation and unit upper/lower triangular matrices (depending on the value of uplo), and $$D$$ is a symmetric block diagonal matrix with 1-by-1 and 2-by-2 diagonal blocks $$D(k)$$.

Specifically, $$U$$ and $$L$$ are computed as

$\begin{split} \begin{array}{cl} U = P(n) U(n) \cdots P(k) U(k) \cdots & \: \text{and}\\ L = P(1) L(1) \cdots P(k) L(k) \cdots & \end{array} \end{split}$

where $$k$$ decreases from $$n$$ to 1 (increases from 1 to $$n$$) in steps of 1 or 2, depending on the order of block $$D(k)$$, and $$P(k)$$ is a permutation matrix defined by $$ipiv[k]$$. If we let $$s$$ denote the order of block $$D(k)$$, then $$U(k)$$ and $$L(k)$$ are unit upper/lower triangular matrices defined as

$\begin{split} U(k) = \left[ \begin{array}{ccc} I_{k-s} & v & 0 \\ 0 & I_s & 0 \\ 0 & 0 & I_{n-k} \end{array} \right] \end{split}$

and

$\begin{split} L(k) = \left[ \begin{array}{ccc} I_{k-1} & 0 & 0 \\ 0 & I_s & 0 \\ 0 & v & I_{n-k-s+1} \end{array} \right]. \end{split}$

If $$s = 1$$, then $$D(k)$$ is stored in $$A[k,k]$$ and $$v$$ is stored in the upper/lower part of column $$k$$ of $$A$$. If $$s = 2$$ and uplo is upper, then $$D(k)$$ is stored in $$A[k-1,k-1]$$, $$A[k-1,k]$$, and $$A[k,k]$$, and $$v$$ is stored in the upper parts of columns $$k-1$$ and $$k$$ of $$A$$. If $$s = 2$$ and uplo is lower, then $$D(k)$$ is stored in $$A[k,k]$$, $$A[k+1,k]$$, and $$A[k+1,k+1]$$, and $$v$$ is stored in the lower parts of columns $$k$$ and $$k+1$$ of $$A$$.

Parameters:
• handle[in] rocblas_handle.

• uplo[in]

rocblas_fill.

Specifies whether the upper or lower part of the matrix A is stored. If uplo indicates lower (or upper), then the upper (or lower) part of A is not used.

• n[in]

rocblas_int. n >= 0.

The number of rows and columns of the matrix A.

• A[inout]

pointer to type. Array on the GPU of dimension lda*n.

On entry, the symmetric matrix A to be factored. On exit, the block diagonal matrix D and the multipliers needed to compute U or L.

• lda[in]

rocblas_int. lda >= n.

Specifies the leading dimension of A.

• ipiv[out]

pointer to rocblas_int. Array on the GPU of dimension n.

The vector of pivot indices. Elements of ipiv are 1-based indices. For 1 <= k <= n, if ipiv[k] > 0 then rows and columns k and ipiv[k] were interchanged and D[k,k] is a 1-by-1 diagonal block. If, instead, ipiv[k] = ipiv[k-1] < 0 and uplo is upper (or ipiv[k] = ipiv[k+1] < 0 and uplo is lower), then rows and columns k-1 and -ipiv[k] (or rows and columns k+1 and -ipiv[k]) were interchanged and D[k-1,k-1] to D[k,k] (or D[k,k] to D[k+1,k+1]) is a 2-by-2 diagonal block.

• info[out]

pointer to a rocblas_int on the GPU.

If info = 0, successful exit. If info = i > 0, D is singular. D[i,i] is the first diagonal zero.

### 3.3.1.17. rocsolver_<type>sytrf_batched()#

rocblas_status rocsolver_zsytrf_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_csytrf_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dsytrf_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_ssytrf_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *const A[], const rocblas_int lda, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#

SYTRF_BATCHED computes the factorization of a batch of symmetric indefinite matrices using Bunch-Kaufman diagonal pivoting.

(This is the blocked version of the algorithm).

The factorization has the form

$\begin{split} \begin{array}{cl} A_j = U_j D_j U_j^T & \: \text{or}\\ A_j = L_j D_j L_j^T & \end{array} \end{split}$

where $$U_j$$ or $$L_j$$ is a product of permutation and unit upper/lower triangular matrices (depending on the value of uplo), and $$D_j$$ is a symmetric block diagonal matrix with 1-by-1 and 2-by-2 diagonal blocks $$D_j(k)$$.

Specifically, $$U_j$$ and $$L_j$$ are computed as

$\begin{split} \begin{array}{cl} U_j = P_j(n) U_j(n) \cdots P_j(k) U_j(k) \cdots & \: \text{and}\\ L_j = P_j(1) L_j(1) \cdots P_j(k) L_j(k) \cdots & \end{array} \end{split}$

where $$k$$ decreases from $$n$$ to 1 (increases from 1 to $$n$$) in steps of 1 or 2, depending on the order of block $$D_j(k)$$, and $$P_j(k)$$ is a permutation matrix defined by $$ipiv_j[k]$$. If we let $$s$$ denote the order of block $$D_j(k)$$, then $$U_j(k)$$ and $$L_j(k)$$ are unit upper/lower triangular matrices defined as

$\begin{split} U_j(k) = \left[ \begin{array}{ccc} I_{k-s} & v & 0 \\ 0 & I_s & 0 \\ 0 & 0 & I_{n-k} \end{array} \right] \end{split}$

and

$\begin{split} L_j(k) = \left[ \begin{array}{ccc} I_{k-1} & 0 & 0 \\ 0 & I_s & 0 \\ 0 & v & I_{n-k-s+1} \end{array} \right]. \end{split}$

If $$s = 1$$, then $$D_j(k)$$ is stored in $$A_j[k,k]$$ and $$v$$ is stored in the upper/lower part of column $$k$$ of $$A_j$$. If $$s = 2$$ and uplo is upper, then $$D_j(k)$$ is stored in $$A_j[k-1,k-1]$$, $$A_j[k-1,k]$$, and $$A_j[k,k]$$, and $$v$$ is stored in the upper parts of columns $$k-1$$ and $$k$$ of $$A_j$$. If $$s = 2$$ and uplo is lower, then $$D_j(k)$$ is stored in $$A_j[k,k]$$, $$A_j[k+1,k]$$, and $$A_j[k+1,k+1]$$, and $$v$$ is stored in the lower parts of columns $$k$$ and $$k+1$$ of $$A_j$$.

Parameters:
• handle[in] rocblas_handle.

• uplo[in]

rocblas_fill.

Specifies whether the upper or lower part of the matrices A_j are stored. If uplo indicates lower (or upper), then the upper (or lower) part of A_j is not used.

• n[in]

rocblas_int. n >= 0.

The number of rows and columns of all matrices A_j in the batch.

• A[inout]

array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

On entry, the symmetric matrices A_j to be factored. On exit, the block diagonal matrices D_j and the multipliers needed to compute U_j or L_j.

• lda[in]

rocblas_int. lda >= n.

Specifies the leading dimension of matrices A_j.

• ipiv[out]

pointer to rocblas_int. Array on the GPU of dimension n.

The vector of pivot indices. Elements of ipiv are 1-based indices. For 1 <= k <= n, if ipiv_j[k] > 0 then rows and columns k and ipiv_j[k] were interchanged and D_j[k,k] is a 1-by-1 diagonal block. If, instead, ipiv_j[k] = ipiv_j[k-1] < 0 and uplo is upper (or ipiv_j[k] = ipiv_j[k+1] < 0 and uplo is lower), then rows and columns k-1 and -ipiv_j[k] (or rows and columns k+1 and -ipiv_j[k]) were interchanged and D_j[k-1,k-1] to D_j[k,k] (or D_j[k,k] to D_j[k+1,k+1]) is a 2-by-2 diagonal block.

• strideP[in]

rocblas_stride.

Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use case is strideP >= n.

• info[out]

pointer to rocblas_int. Array of batch_count integers on the GPU.

If info[j] = 0, successful exit for factorization of A_j. If info[j] = i > 0, D_j is singular. D_j[i,i] is the first diagonal zero.

• batch_count[in]

rocblas_int. batch_count >= 0.

Number of matrices in the batch.

### 3.3.1.18. rocsolver_<type>sytrf_strided_batched()#

rocblas_status rocsolver_zsytrf_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_csytrf_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_dsytrf_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#
rocblas_status rocsolver_ssytrf_strided_batched(rocblas_handle handle, const rocblas_fill uplo, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_int *ipiv, const rocblas_stride strideP, rocblas_int *info, const rocblas_int batch_count)#

SYTRF_STRIDED_BATCHED computes the factorization of a batch of symmetric indefinite matrices using Bunch-Kaufman diagonal pivoting.

(This is the blocked version of the algorithm).

The factorization has the form

$\begin{split} \begin{array}{cl} A_j = U_j D_j U_j^T & \: \text{or}\\ A_j = L_j D_j L_j^T & \end{array} \end{split}$

where $$U_j$$ or $$L_j$$ is a product of permutation and unit upper/lower triangular matrices (depending on the value of uplo), and $$D_j$$ is a symmetric block diagonal matrix with 1-by-1 and 2-by-2 diagonal blocks $$D_j(k)$$.

Specifically, $$U_j$$ and $$L_j$$ are computed as

$\begin{split} \begin{array}{cl} U_j = P_j(n) U_j(n) \cdots P_j(k) U_j(k) \cdots & \: \text{and}\\ L_j = P_j(1) L_j(1) \cdots P_j(k) L_j(k) \cdots & \end{array} \end{split}$

where $$k$$ decreases from $$n$$ to 1 (increases from 1 to $$n$$) in steps of 1 or 2, depending on the order of block $$D_j(k)$$, and $$P_j(k)$$ is a permutation matrix defined by $$ipiv_j[k]$$. If we let $$s$$ denote the order of block $$D_j(k)$$, then $$U_j(k)$$ and $$L_j(k)$$ are unit upper/lower triangular matrices defined as

$\begin{split} U_j(k) = \left[ \begin{array}{ccc} I_{k-s} & v & 0 \\ 0 & I_s & 0 \\ 0 & 0 & I_{n-k} \end{array} \right] \end{split}$

and

$\begin{split} L_j(k) = \left[ \begin{array}{ccc} I_{k-1} & 0 & 0 \\ 0 & I_s & 0 \\ 0 & v & I_{n-k-s+1} \end{array} \right]. \end{split}$

If $$s = 1$$, then $$D_j(k)$$ is stored in $$A_j[k,k]$$ and $$v$$ is stored in the upper/lower part of column $$k$$ of $$A_j$$. If $$s = 2$$ and uplo is upper, then $$D_j(k)$$ is stored in $$A_j[k-1,k-1]$$, $$A_j[k-1,k]$$, and $$A_j[k,k]$$, and $$v$$ is stored in the upper parts of columns $$k-1$$ and $$k$$ of $$A_j$$. If $$s = 2$$ and uplo is lower, then $$D_j(k)$$ is stored in $$A_j[k,k]$$, $$A_j[k+1,k]$$, and $$A_j[k+1,k+1]$$, and $$v$$ is stored in the lower parts of columns $$k$$ and $$k+1$$ of $$A_j$$.

Parameters:
• handle[in] rocblas_handle.

• uplo[in]

rocblas_fill.

Specifies whether the upper or lower part of the matrices A_j are stored. If uplo indicates lower (or upper), then the upper (or lower) part of A_j is not used.

• n[in]

rocblas_int. n >= 0.

The number of rows and columns of all matrices A_j in the batch.

• A[inout]

pointer to type. Array on the GPU (the size depends on the value of strideA).

On entry, the symmetric matrices A_j to be factored. On exit, the block diagonal matrices D_j and the multipliers needed to compute U_j or L_j.

• lda[in]

rocblas_int. lda >= n.

Specifies the leading dimension of matrices A_j.

• strideA[in]

rocblas_stride.

Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n

• ipiv[out]

pointer to rocblas_int. Array on the GPU of dimension n.

The vector of pivot indices. Elements of ipiv are 1-based indices. For 1 <= k <= n, if ipiv_j[k] > 0 then rows and columns k and ipiv_j[k] were interchanged and D_j[k,k] is a 1-by-1 diagonal block. If, instead, ipiv_j[k] = ipiv_j[k-1] < 0 and uplo is upper (or ipiv_j[k] = ipiv_j[k+1] < 0 and uplo is lower), then rows and columns k-1 and -ipiv_j[k] (or rows and columns k+1 and -ipiv_j[k]) were interchanged and D_j[k-1,k-1] to D_j[k,k] (or D_j[k,k] to D_j[k+1,k+1]) is a 2-by-2 diagonal block.

• strideP[in]

rocblas_stride.

Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use case is strideP >= n.

• info[out]

pointer to rocblas_int. Array of batch_count integers on the GPU.

If info[j] = 0, successful exit for factorization of A_j. If info[j] = i > 0, D_j is singular. D_j[i,i] is the first diagonal zero.

• batch_count[in]

rocblas_int. batch_count >= 0.

Number of matrices in the batch.

## 3.3.2. Orthogonal factorizations#

### 3.3.2.1. rocsolver_<type>geqr2()#

rocblas_status rocsolver_zgeqr2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)#
rocblas_status rocsolver_cgeqr2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)#
rocblas_status rocsolver_dgeqr2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)#
rocblas_status rocsolver_sgeqr2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)#

GEQR2 computes a QR factorization of a general m-by-n matrix A.

(This is the unblocked version of the algorithm).

The factorization has the form

$\begin{split} A = Q\left[\begin{array}{c} R\\ 0 \end{array}\right] \end{split}$

where R is upper triangular (upper trapezoidal if m < n), and Q is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

$Q = H_1H_2\cdots H_k, \quad \text{with} \: k = \text{min}(m,n)$

Each Householder matrix $$H_i$$ is given by

$H_i = I - \text{ipiv}[i] \cdot v_i v_i'$

where the first i-1 elements of the Householder vector $$v_i$$ are zero, and $$v_i[i] = 1$$.

Parameters:
• handle[in] rocblas_handle.

• m[in]

rocblas_int. m >= 0.

The number of rows of the matrix A.

• n[in]

rocblas_int. n >= 0.

The number of columns of the matrix A.

• A[inout]

pointer to type. Array on the GPU of dimension lda*n.

On entry, the m-by-n matrix to be factored. On exit, the elements on and above the diagonal contain the factor R; the elements below the diagonal are the last m - i elements of Householder vector v_i.

• lda[in]

rocblas_int. lda >= m.

Specifies the leading dimension of A.

• ipiv[out]

pointer to type. Array on the GPU of dimension min(m,n).

The Householder scalars.

### 3.3.2.2. rocsolver_<type>geqr2_batched()#

rocblas_status rocsolver_zgeqr2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgeqr2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgeqr2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgeqr2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#

GEQR2_BATCHED computes the QR factorization of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix $$A_j$$ in the batch has the form

$\begin{split} A_j = Q_j\left[\begin{array}{c} R_j\\ 0 \end{array}\right] \end{split}$

where $$R_j$$ is upper triangular (upper trapezoidal if m < n), and $$Q_j$$ is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

$Q_j = H_{j_1}H_{j_2}\cdots H_{j_k}, \quad \text{with} \: k = \text{min}(m,n)$

Each Householder matrix $$H_{j_i}$$ is given by

$H_{j_i} = I - \text{ipiv}_j[i] \cdot v_{j_i} v_{j_i}'$

where the first i-1 elements of Householder vector $$v_{j_i}$$ are zero, and $$v_{j_i}[i] = 1$$.

Parameters:
• handle[in] rocblas_handle.

• m[in]

rocblas_int. m >= 0.

The number of rows of all the matrices A_j in the batch.

• n[in]

rocblas_int. n >= 0.

The number of columns of all the matrices A_j in the batch.

• A[inout]

Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and above the diagonal contain the factor R_j. The elements below the diagonal are the last m - i elements of Householder vector v_(j_i).

• lda[in]

rocblas_int. lda >= m.

Specifies the leading dimension of matrices A_j.

• ipiv[out]

pointer to type. Array on the GPU (the size depends on the value of strideP).

Contains the vectors ipiv_j of corresponding Householder scalars.

• strideP[in]

rocblas_stride.

Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

• batch_count[in]

rocblas_int. batch_count >= 0.

Number of matrices in the batch.

### 3.3.2.3. rocsolver_<type>geqr2_strided_batched()#

rocblas_status rocsolver_zgeqr2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgeqr2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgeqr2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgeqr2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#

GEQR2_STRIDED_BATCHED computes the QR factorization of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix $$A_j$$ in the batch has the form

$\begin{split} A_j = Q_j\left[\begin{array}{c} R_j\\ 0 \end{array}\right] \end{split}$

where $$R_j$$ is upper triangular (upper trapezoidal if m < n), and $$Q_j$$ is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

$Q_j = H_{j_1}H_{j_2}\cdots H_{j_k}, \quad \text{with} \: k = \text{min}(m,n)$

Each Householder matrix $$H_{j_i}$$ is given by

$H_{j_i} = I - \text{ipiv}_j[i] \cdot v_{j_i} v_{j_i}'$

where the first i-1 elements of Householder vector $$v_{j_i}$$ are zero, and $$v_{j_i}[i] = 1$$.

Parameters:
• handle[in] rocblas_handle.

• m[in]

rocblas_int. m >= 0.

The number of rows of all the matrices A_j in the batch.

• n[in]

rocblas_int. n >= 0.

The number of columns of all the matrices A_j in the batch.

• A[inout]

pointer to type. Array on the GPU (the size depends on the value of strideA).

On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and above the diagonal contain the factor R_j. The elements below the diagonal are the last m - i elements of Householder vector v_(j_i).

• lda[in]

rocblas_int. lda >= m.

Specifies the leading dimension of matrices A_j.

• strideA[in]

rocblas_stride.

Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

• ipiv[out]

pointer to type. Array on the GPU (the size depends on the value of strideP).

Contains the vectors ipiv_j of corresponding Householder scalars.

• strideP[in]

rocblas_stride.

Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

• batch_count[in]

rocblas_int. batch_count >= 0.

Number of matrices in the batch.

### 3.3.2.4. rocsolver_<type>geqrf()#

rocblas_status rocsolver_zgeqrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)#
rocblas_status rocsolver_cgeqrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)#
rocblas_status rocsolver_dgeqrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)#
rocblas_status rocsolver_sgeqrf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)#

GEQRF computes a QR factorization of a general m-by-n matrix A.

(This is the blocked version of the algorithm).

The factorization has the form

$\begin{split} A = Q\left[\begin{array}{c} R\\ 0 \end{array}\right] \end{split}$

where R is upper triangular (upper trapezoidal if m < n), and Q is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

$Q = H_1H_2\cdots H_k, \quad \text{with} \: k = \text{min}(m,n)$

Each Householder matrix $$H_i$$ is given by

$H_i = I - \text{ipiv}[i] \cdot v_i v_i'$

where the first i-1 elements of the Householder vector $$v_i$$ are zero, and $$v_i[i] = 1$$.

Parameters:
• handle[in] rocblas_handle.

• m[in]

rocblas_int. m >= 0.

The number of rows of the matrix A.

• n[in]

rocblas_int. n >= 0.

The number of columns of the matrix A.

• A[inout]

pointer to type. Array on the GPU of dimension lda*n.

On entry, the m-by-n matrix to be factored. On exit, the elements on and above the diagonal contain the factor R; the elements below the diagonal are the last m - i elements of Householder vector v_i.

• lda[in]

rocblas_int. lda >= m.

Specifies the leading dimension of A.

• ipiv[out]

pointer to type. Array on the GPU of dimension min(m,n).

The Householder scalars.

### 3.3.2.5. rocsolver_<type>geqrf_batched()#

rocblas_status rocsolver_zgeqrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgeqrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgeqrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgeqrf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#

GEQRF_BATCHED computes the QR factorization of a batch of general m-by-n matrices.

(This is the blocked version of the algorithm).

The factorization of matrix $$A_j$$ in the batch has the form

$\begin{split} A_j = Q_j\left[\begin{array}{c} R_j\\ 0 \end{array}\right] \end{split}$

where $$R_j$$ is upper triangular (upper trapezoidal if m < n), and $$Q_j$$ is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

$Q_j = H_{j_1}H_{j_2}\cdots H_{j_k}, \quad \text{with} \: k = \text{min}(m,n)$

Each Householder matrix $$H_{j_i}$$ is given by

$H_{j_i} = I - \text{ipiv}_j[i] \cdot v_{j_i} v_{j_i}'$

where the first i-1 elements of Householder vector $$v_{j_i}$$ are zero, and $$v_{j_i}[i] = 1$$.

Parameters:
• handle[in] rocblas_handle.

• m[in]

rocblas_int. m >= 0.

The number of rows of all the matrices A_j in the batch.

• n[in]

rocblas_int. n >= 0.

The number of columns of all the matrices A_j in the batch.

• A[inout]

Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and above the diagonal contain the factor R_j. The elements below the diagonal are the last m - i elements of Householder vector v_(j_i).

• lda[in]

rocblas_int. lda >= m.

Specifies the leading dimension of matrices A_j.

• ipiv[out]

pointer to type. Array on the GPU (the size depends on the value of strideP).

Contains the vectors ipiv_j of corresponding Householder scalars.

• strideP[in]

rocblas_stride.

Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

• batch_count[in]

rocblas_int. batch_count >= 0.

Number of matrices in the batch.

### 3.3.2.6. rocsolver_<type>geqrf_strided_batched()#

rocblas_status rocsolver_zgeqrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgeqrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgeqrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgeqrf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#

GEQRF_STRIDED_BATCHED computes the QR factorization of a batch of general m-by-n matrices.

(This is the blocked version of the algorithm).

The factorization of matrix $$A_j$$ in the batch has the form

$\begin{split} A_j = Q_j\left[\begin{array}{c} R_j\\ 0 \end{array}\right] \end{split}$

where $$R_j$$ is upper triangular (upper trapezoidal if m < n), and $$Q_j$$ is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

$Q_j = H_{j_1}H_{j_2}\cdots H_{j_k}, \quad \text{with} \: k = \text{min}(m,n)$

Each Householder matrix $$H_{j_i}$$ is given by

$H_{j_i} = I - \text{ipiv}_j[i] \cdot v_{j_i} v_{j_i}'$

where the first i-1 elements of Householder vector $$v_{j_i}$$ are zero, and $$v_{j_i}[i] = 1$$.

Parameters:
• handle[in] rocblas_handle.

• m[in]

rocblas_int. m >= 0.

The number of rows of all the matrices A_j in the batch.

• n[in]

rocblas_int. n >= 0.

The number of columns of all the matrices A_j in the batch.

• A[inout]

pointer to type. Array on the GPU (the size depends on the value of strideA).

On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and above the diagonal contain the factor R_j. The elements below the diagonal are the last m - i elements of Householder vector v_(j_i).

• lda[in]

rocblas_int. lda >= m.

Specifies the leading dimension of matrices A_j.

• strideA[in]

rocblas_stride.

Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

• ipiv[out]

pointer to type. Array on the GPU (the size depends on the value of strideP).

Contains the vectors ipiv_j of corresponding Householder scalars.

• strideP[in]

rocblas_stride.

Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

• batch_count[in]

rocblas_int. batch_count >= 0.

Number of matrices in the batch.

### 3.3.2.7. rocsolver_<type>gerq2()#

rocblas_status rocsolver_zgerq2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)#
rocblas_status rocsolver_cgerq2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)#
rocblas_status rocsolver_dgerq2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)#
rocblas_status rocsolver_sgerq2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)#

GERQ2 computes a RQ factorization of a general m-by-n matrix A.

(This is the unblocked version of the algorithm).

The factorization has the form

$A = \left[\begin{array}{cc} 0 & R \end{array}\right] Q$

where R is upper triangular (upper trapezoidal if m > n), and Q is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

$Q = H_1'H_2' \cdots H_k', \quad \text{with} \: k = \text{min}(m,n).$

Each Householder matrix $$H_i$$ is given by

$H_i = I - \text{ipiv}[i] \cdot v_i v_i'$

where the last n-i elements of the Householder vector $$v_i$$ are zero, and $$v_i[i] = 1$$.

Parameters:
• handle[in] rocblas_handle.

• m[in]

rocblas_int. m >= 0.

The number of rows of the matrix A.

• n[in]

rocblas_int. n >= 0.

The number of columns of the matrix A.

• A[inout]

pointer to type. Array on the GPU of dimension lda*n.

On entry, the m-by-n matrix to be factored. On exit, the elements on and above the (m-n)-th subdiagonal (when m >= n) or the (n-m)-th superdiagonal (when n > m) contain the factor R; the elements below the sub/superdiagonal are the first i - 1 elements of Householder vector v_i.

• lda[in]

rocblas_int. lda >= m.

Specifies the leading dimension of A.

• ipiv[out]

pointer to type. Array on the GPU of dimension min(m,n).

The Householder scalars.

### 3.3.2.8. rocsolver_<type>gerq2_batched()#

rocblas_status rocsolver_zgerq2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgerq2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgerq2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgerq2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#

GERQ2_BATCHED computes the RQ factorization of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix $$A_j$$ in the batch has the form

$A_j = \left[\begin{array}{cc} 0 & R_j \end{array}\right] Q_j$

where $$R_j$$ is upper triangular (upper trapezoidal if m > n), and $$Q_j$$ is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

$Q_j = H_{j_1}'H_{j_2}' \cdots H_{j_k}', \quad \text{with} \: k = \text{min}(m,n).$

Each Householder matrices $$H_{j_i}$$ is given by

$H_{j_i} = I - \text{ipiv}_j[i] \cdot v_{j_i} v_{j_i}'$

where the last n-i elements of Householder vector $$v_{j_i}$$ are zero, and $$v_{j_i}[i] = 1$$.

Parameters:
• handle[in] rocblas_handle.

• m[in]

rocblas_int. m >= 0.

The number of rows of all the matrices A_j in the batch.

• n[in]

rocblas_int. n >= 0.

The number of columns of all the matrices A_j in the batch.

• A[inout]

Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and above the (m-n)-th subdiagonal (when m >= n) or the (n-m)-th superdiagonal (when n > m) contain the factor R_j; the elements below the sub/superdiagonal are the first i - 1 elements of Householder vector v_(j_i).

• lda[in]

rocblas_int. lda >= m.

Specifies the leading dimension of matrices A_j.

• ipiv[out]

pointer to type. Array on the GPU (the size depends on the value of strideP).

Contains the vectors ipiv_j of corresponding Householder scalars.

• strideP[in]

rocblas_stride.

Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

• batch_count[in]

rocblas_int. batch_count >= 0.

Number of matrices in the batch.

### 3.3.2.9. rocsolver_<type>gerq2_strided_batched()#

rocblas_status rocsolver_zgerq2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgerq2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgerq2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgerq2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#

GERQ2_STRIDED_BATCHED computes the RQ factorization of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix $$A_j$$ in the batch has the form

$A_j = \left[\begin{array}{cc} 0 & R_j \end{array}\right] Q_j$

where $$R_j$$ is upper triangular (upper trapezoidal if m > n), and $$Q_j$$ is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

$Q_j = H_{j_1}'H_{j_2}' \cdots H_{j_k}', \quad \text{with} \: k = \text{min}(m,n).$

Each Householder matrices $$H_{j_i}$$ is given by

$H_{j_i} = I - \text{ipiv}_j[i] \cdot v_{j_i} v_{j_i}'$

where the last n-i elements of Householder vector $$v_{j_i}$$ are zero, and $$v_{j_i}[i] = 1$$.

Parameters:
• handle[in] rocblas_handle.

• m[in]

rocblas_int. m >= 0.

The number of rows of all the matrices A_j in the batch.

• n[in]

rocblas_int. n >= 0.

The number of columns of all the matrices A_j in the batch.

• A[inout]

pointer to type. Array on the GPU (the size depends on the value of strideA).

On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and above the (m-n)-th subdiagonal (when m >= n) or the (n-m)-th superdiagonal (when n > m) contain the factor R_j; the elements below the sub/superdiagonal are the first i - 1 elements of Householder vector v_(j_i).

• lda[in]

rocblas_int. lda >= m.

Specifies the leading dimension of matrices A_j.

• strideA[in]

rocblas_stride.

Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

• ipiv[out]

pointer to type. Array on the GPU (the size depends on the value of strideP).

Contains the vectors ipiv_j of corresponding Householder scalars.

• strideP[in]

rocblas_stride.

Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

• batch_count[in]

rocblas_int. batch_count >= 0.

Number of matrices in the batch.

### 3.3.2.10. rocsolver_<type>gerqf()#

rocblas_status rocsolver_zgerqf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)#
rocblas_status rocsolver_cgerqf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)#
rocblas_status rocsolver_dgerqf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)#
rocblas_status rocsolver_sgerqf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)#

GERQF computes a RQ factorization of a general m-by-n matrix A.

(This is the blocked version of the algorithm).

The factorization has the form

$A = \left[\begin{array}{cc} 0 & R \end{array}\right] Q$

where R is upper triangular (upper trapezoidal if m > n), and Q is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

$Q = H_1'H_2' \cdots H_k', \quad \text{with} \: k = \text{min}(m,n).$

Each Householder matrix $$H_i$$ is given by

$H_i = I - \text{ipiv}[i] \cdot v_i v_i'$

where the last n-i elements of the Householder vector $$v_i$$ are zero, and $$v_i[i] = 1$$.

Parameters:
• handle[in] rocblas_handle.

• m[in]

rocblas_int. m >= 0.

The number of rows of the matrix A.

• n[in]

rocblas_int. n >= 0.

The number of columns of the matrix A.

• A[inout]

pointer to type. Array on the GPU of dimension lda*n.

On entry, the m-by-n matrix to be factored. On exit, the elements on and above the (m-n)-th subdiagonal (when m >= n) or the (n-m)-th superdiagonal (when n > m) contain the factor R; the elements below the sub/superdiagonal are the first i - 1 elements of Householder vector v_i.

• lda[in]

rocblas_int. lda >= m.

Specifies the leading dimension of A.

• ipiv[out]

pointer to type. Array on the GPU of dimension min(m,n).

The Householder scalars.

### 3.3.2.11. rocsolver_<type>gerqf_batched()#

rocblas_status rocsolver_zgerqf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgerqf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgerqf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgerqf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#

GERQF_BATCHED computes the RQ factorization of a batch of general m-by-n matrices.

(This is the blocked version of the algorithm).

The factorization of matrix $$A_j$$ in the batch has the form

$A_j = \left[\begin{array}{cc} 0 & R_j \end{array}\right] Q_j$

where $$R_j$$ is upper triangular (upper trapezoidal if m > n), and $$Q_j$$ is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

$Q_j = H_{j_1}'H_{j_2}' \cdots H_{j_k}', \quad \text{with} \: k = \text{min}(m,n).$

Each Householder matrices $$H_{j_i}$$ is given by

$H_{j_i} = I - \text{ipiv}_j[i] \cdot v_{j_i} v_{j_i}'$

where the last n-i elements of Householder vector $$v_{j_i}$$ are zero, and $$v_{j_i}[i] = 1$$.

Parameters:
• handle[in] rocblas_handle.

• m[in]

rocblas_int. m >= 0.

The number of rows of all the matrices A_j in the batch.

• n[in]

rocblas_int. n >= 0.

The number of columns of all the matrices A_j in the batch.

• A[inout]

Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and above the (m-n)-th subdiagonal (when m >= n) or the (n-m)-th superdiagonal (when n > m) contain the factor R_j; the elements below the sub/superdiagonal are the first i - 1 elements of Householder vector v_(j_i).

• lda[in]

rocblas_int. lda >= m.

Specifies the leading dimension of matrices A_j.

• ipiv[out]

pointer to type. Array on the GPU (the size depends on the value of strideP).

Contains the vectors ipiv_j of corresponding Householder scalars.

• strideP[in]

rocblas_stride.

Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

• batch_count[in]

rocblas_int. batch_count >= 0.

Number of matrices in the batch.

### 3.3.2.12. rocsolver_<type>gerqf_strided_batched()#

rocblas_status rocsolver_zgerqf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgerqf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgerqf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgerqf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#

GERQF_STRIDED_BATCHED computes the RQ factorization of a batch of general m-by-n matrices.

(This is the blocked version of the algorithm).

The factorization of matrix $$A_j$$ in the batch has the form

$A_j = \left[\begin{array}{cc} 0 & R_j \end{array}\right] Q_j$

where $$R_j$$ is upper triangular (upper trapezoidal if m > n), and $$Q_j$$ is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

$Q_j = H_{j_1}'H_{j_2}' \cdots H_{j_k}', \quad \text{with} \: k = \text{min}(m,n).$

Each Householder matrices $$H_{j_i}$$ is given by

$H_{j_i} = I - \text{ipiv}_j[i] \cdot v_{j_i} v_{j_i}'$

where the last n-i elements of Householder vector $$v_{j_i}$$ are zero, and $$v_{j_i}[i] = 1$$.

Parameters:
• handle[in] rocblas_handle.

• m[in]

rocblas_int. m >= 0.

The number of rows of all the matrices A_j in the batch.

• n[in]

rocblas_int. n >= 0.

The number of columns of all the matrices A_j in the batch.

• A[inout]

pointer to type. Array on the GPU (the size depends on the value of strideA).

On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and above the (m-n)-th subdiagonal (when m >= n) or the (n-m)-th superdiagonal (when n > m) contain the factor R_j; the elements below the sub/superdiagonal are the first i - 1 elements of Householder vector v_(j_i).

• lda[in]

rocblas_int. lda >= m.

Specifies the leading dimension of matrices A_j.

• strideA[in]

rocblas_stride.

Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

• ipiv[out]

pointer to type. Array on the GPU (the size depends on the value of strideP).

Contains the vectors ipiv_j of corresponding Householder scalars.

• strideP[in]

rocblas_stride.

Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

• batch_count[in]

rocblas_int. batch_count >= 0.

Number of matrices in the batch.

### 3.3.2.13. rocsolver_<type>geql2()#

rocblas_status rocsolver_zgeql2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)#
rocblas_status rocsolver_cgeql2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)#
rocblas_status rocsolver_dgeql2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)#
rocblas_status rocsolver_sgeql2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)#

GEQL2 computes a QL factorization of a general m-by-n matrix A.

(This is the unblocked version of the algorithm).

The factorization has the form

$\begin{split} A = Q\left[\begin{array}{c} 0\\ L \end{array}\right] \end{split}$

where L is lower triangular (lower trapezoidal if m < n), and Q is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

$Q = H_kH_{k-1}\cdots H_1, \quad \text{with} \: k = \text{min}(m,n)$

Each Householder matrix $$H_i$$ is given by

$H_i = I - \text{ipiv}[i] \cdot v_i v_i'$

where the last m-i elements of the Householder vector $$v_i$$ are zero, and $$v_i[i] = 1$$.

Parameters:
• handle[in] rocblas_handle.

• m[in]

rocblas_int. m >= 0.

The number of rows of the matrix A.

• n[in]

rocblas_int. n >= 0.

The number of columns of the matrix A.

• A[inout]

pointer to type. Array on the GPU of dimension lda*n.

On entry, the m-by-n matrix to be factored. On exit, the elements on and below the (m-n)-th subdiagonal (when m >= n) or the (n-m)-th superdiagonal (when n > m) contain the factor L; the elements above the sub/superdiagonal are the first i - 1 elements of Householder vector v_i.

• lda[in]

rocblas_int. lda >= m.

Specifies the leading dimension of A.

• ipiv[out]

pointer to type. Array on the GPU of dimension min(m,n).

The Householder scalars.

### 3.3.2.14. rocsolver_<type>geql2_batched()#

rocblas_status rocsolver_zgeql2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgeql2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgeql2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgeql2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#

GEQL2_BATCHED computes the QL factorization of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix $$A_j$$ in the batch has the form

$\begin{split} A_j = Q_j\left[\begin{array}{c} 0\\ L_j \end{array}\right] \end{split}$

where $$L_j$$ is lower triangular (lower trapezoidal if m < n), and $$Q_j$$ is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

$Q = H_{j_k}H_{j_{k-1}}\cdots H_{j_1}, \quad \text{with} \: k = \text{min}(m,n)$

Each Householder matrix $$H_{j_i}$$ is given by

$H_{j_i} = I - \text{ipiv}_j[i] \cdot v_{j_i} v_{j_i}'$

where the last m-i elements of the Householder vector $$v_{j_i}$$ are zero, and $$v_{j_i}[i] = 1$$.

Parameters:
• handle[in] rocblas_handle.

• m[in]

rocblas_int. m >= 0.

The number of rows of all the matrices A_j in the batch.

• n[in]

rocblas_int. n >= 0.

The number of columns of all the matrices A_j in the batch.

• A[inout]

Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and below the (m-n)-th subdiagonal (when m >= n) or the (n-m)-th superdiagonal (when n > m) contain the factor L_j; the elements above the sub/superdiagonal are the first i - 1 elements of Householder vector v_(j_i).

• lda[in]

rocblas_int. lda >= m.

Specifies the leading dimension of matrices A_j.

• ipiv[out]

pointer to type. Array on the GPU (the size depends on the value of strideP).

Contains the vectors ipiv_j of corresponding Householder scalars.

• strideP[in]

rocblas_stride.

Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

• batch_count[in]

rocblas_int. batch_count >= 0.

Number of matrices in the batch.

### 3.3.2.15. rocsolver_<type>geql2_strided_batched()#

rocblas_status rocsolver_zgeql2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgeql2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgeql2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgeql2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#

GEQL2_STRIDED_BATCHED computes the QL factorization of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix $$A_j$$ in the batch has the form

$\begin{split} A_j = Q_j\left[\begin{array}{c} 0\\ L_j \end{array}\right] \end{split}$

where $$L_j$$ is lower triangular (lower trapezoidal if m < n), and $$Q_j$$ is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

$Q = H_{j_k}H_{j_{k-1}}\cdots H_{j_1}, \quad \text{with} \: k = \text{min}(m,n)$

Each Householder matrix $$H_{j_i}$$ is given by

$H_{j_i} = I - \text{ipiv}_j[i] \cdot v_{j_i} v_{j_i}'$

where the last m-i elements of the Householder vector $$v_{j_i}$$ are zero, and $$v_{j_i}[i] = 1$$.

Parameters:
• handle[in] rocblas_handle.

• m[in]

rocblas_int. m >= 0.

The number of rows of all the matrices A_j in the batch.

• n[in]

rocblas_int. n >= 0.

The number of columns of all the matrices A_j in the batch.

• A[inout]

pointer to type. Array on the GPU (the size depends on the value of strideA).

On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and below the (m-n)-th subdiagonal (when m >= n) or the (n-m)-th superdiagonal (when n > m) contain the factor L_j; the elements above the sub/superdiagonal are the first i - 1 elements of Householder vector v_(j_i).

• lda[in]

rocblas_int. lda >= m.

Specifies the leading dimension of matrices A_j.

• strideA[in]

rocblas_stride.

Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

• ipiv[out]

pointer to type. Array on the GPU (the size depends on the value of strideP).

Contains the vectors ipiv_j of corresponding Householder scalars.

• strideP[in]

rocblas_stride.

Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

• batch_count[in]

rocblas_int. batch_count >= 0.

Number of matrices in the batch.

### 3.3.2.16. rocsolver_<type>geqlf()#

rocblas_status rocsolver_zgeqlf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)#
rocblas_status rocsolver_cgeqlf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)#
rocblas_status rocsolver_dgeqlf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)#
rocblas_status rocsolver_sgeqlf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)#

GEQLF computes a QL factorization of a general m-by-n matrix A.

(This is the blocked version of the algorithm).

The factorization has the form

$\begin{split} A = Q\left[\begin{array}{c} 0\\ L \end{array}\right] \end{split}$

where L is lower triangular (lower trapezoidal if m < n), and Q is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

$Q = H_kH_{k-1}\cdots H_1, \quad \text{with} \: k = \text{min}(m,n)$

Each Householder matrix $$H_i$$ is given by

$H_i = I - \text{ipiv}[i] \cdot v_i v_i'$

where the last m-i elements of the Householder vector $$v_i$$ are zero, and $$v_i[i] = 1$$.

Parameters:
• handle[in] rocblas_handle.

• m[in]

rocblas_int. m >= 0.

The number of rows of the matrix A.

• n[in]

rocblas_int. n >= 0.

The number of columns of the matrix A.

• A[inout]

pointer to type. Array on the GPU of dimension lda*n.

On entry, the m-by-n matrix to be factored. On exit, the elements on and below the (m-n)-th subdiagonal (when m >= n) or the (n-m)-th superdiagonal (when n > m) contain the factor L; the elements above the sub/superdiagonal are the first i - 1 elements of Householder vector v_i.

• lda[in]

rocblas_int. lda >= m.

Specifies the leading dimension of A.

• ipiv[out]

pointer to type. Array on the GPU of dimension min(m,n).

The Householder scalars.

### 3.3.2.17. rocsolver_<type>geqlf_batched()#

rocblas_status rocsolver_zgeqlf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgeqlf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgeqlf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgeqlf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#

GEQLF_BATCHED computes the QL factorization of a batch of general m-by-n matrices.

(This is the blocked version of the algorithm).

The factorization of matrix $$A_j$$ in the batch has the form

$\begin{split} A_j = Q_j\left[\begin{array}{c} 0\\ L_j \end{array}\right] \end{split}$

where $$L_j$$ is lower triangular (lower trapezoidal if m < n), and $$Q_j$$ is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

$Q = H_{j_k}H_{j_{k-1}}\cdots H_{j_1}, \quad \text{with} \: k = \text{min}(m,n)$

Each Householder matrix $$H_{j_i}$$ is given by

$H_{j_i} = I - \text{ipiv}_j[i] \cdot v_{j_i} v_{j_i}'$

where the last m-i elements of the Householder vector $$v_{j_i}$$ are zero, and $$v_{j_i}[i] = 1$$.

Parameters:
• handle[in] rocblas_handle.

• m[in]

rocblas_int. m >= 0.

The number of rows of all the matrices A_j in the batch.

• n[in]

rocblas_int. n >= 0.

The number of columns of all the matrices A_j in the batch.

• A[inout]

Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and below the (m-n)-th subdiagonal (when m >= n) or the (n-m)-th superdiagonal (when n > m) contain the factor L_j; the elements above the sub/superdiagonal are the first i - 1 elements of Householder vector v_(j_i).

• lda[in]

rocblas_int. lda >= m.

Specifies the leading dimension of matrices A_j.

• ipiv[out]

pointer to type. Array on the GPU (the size depends on the value of strideP).

Contains the vectors ipiv_j of corresponding Householder scalars.

• strideP[in]

rocblas_stride.

Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

• batch_count[in]

rocblas_int. batch_count >= 0.

Number of matrices in the batch.

### 3.3.2.18. rocsolver_<type>geqlf_strided_batched()#

rocblas_status rocsolver_zgeqlf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgeqlf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgeqlf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgeqlf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#

GEQLF_STRIDED_BATCHED computes the QL factorization of a batch of general m-by-n matrices.

(This is the blocked version of the algorithm).

The factorization of matrix $$A_j$$ in the batch has the form

$\begin{split} A_j = Q_j\left[\begin{array}{c} 0\\ L_j \end{array}\right] \end{split}$

where $$L_j$$ is lower triangular (lower trapezoidal if m < n), and $$Q_j$$ is a m-by-m orthogonal/unitary matrix represented as the product of Householder matrices

$Q = H_{j_k}H_{j_{k-1}}\cdots H_{j_1}, \quad \text{with} \: k = \text{min}(m,n)$

Each Householder matrix $$H_{j_i}$$ is given by

$H_{j_i} = I - \text{ipiv}_j[i] \cdot v_{j_i} v_{j_i}'$

where the last m-i elements of the Householder vector $$v_{j_i}$$ are zero, and $$v_{j_i}[i] = 1$$.

Parameters:
• handle[in] rocblas_handle.

• m[in]

rocblas_int. m >= 0.

The number of rows of all the matrices A_j in the batch.

• n[in]

rocblas_int. n >= 0.

The number of columns of all the matrices A_j in the batch.

• A[inout]

pointer to type. Array on the GPU (the size depends on the value of strideA).

On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and below the (m-n)-th subdiagonal (when m >= n) or the (n-m)-th superdiagonal (when n > m) contain the factor L_j; the elements above the sub/superdiagonal are the first i - 1 elements of Householder vector v_(j_i).

• lda[in]

rocblas_int. lda >= m.

Specifies the leading dimension of matrices A_j.

• strideA[in]

rocblas_stride.

Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

• ipiv[out]

pointer to type. Array on the GPU (the size depends on the value of strideP).

Contains the vectors ipiv_j of corresponding Householder scalars.

• strideP[in]

rocblas_stride.

Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

• batch_count[in]

rocblas_int. batch_count >= 0.

Number of matrices in the batch.

### 3.3.2.19. rocsolver_<type>gelq2()#

rocblas_status rocsolver_zgelq2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)#
rocblas_status rocsolver_cgelq2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)#
rocblas_status rocsolver_dgelq2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)#
rocblas_status rocsolver_sgelq2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)#

GELQ2 computes a LQ factorization of a general m-by-n matrix A.

(This is the unblocked version of the algorithm).

The factorization has the form

$A = \left[\begin{array}{cc} L & 0 \end{array}\right] Q$

where L is lower triangular (lower trapezoidal if m > n), and Q is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

$Q = H_k'H_{k-1}' \cdots H_1', \quad \text{with} \: k = \text{min}(m,n).$

Each Householder matrix $$H_i$$ is given by

$H_i = I - \text{ipiv}[i] \cdot v_i' v_i$

where the first i-1 elements of the Householder vector $$v_i$$ are zero, and $$v_i[i] = 1$$.

Parameters:
• handle[in] rocblas_handle.

• m[in]

rocblas_int. m >= 0.

The number of rows of the matrix A.

• n[in]

rocblas_int. n >= 0.

The number of columns of the matrix A.

• A[inout]

pointer to type. Array on the GPU of dimension lda*n.

On entry, the m-by-n matrix to be factored. On exit, the elements on and below the diagonal contain the factor L; the elements above the diagonal are the last n - i elements of Householder vector v_i.

• lda[in]

rocblas_int. lda >= m.

Specifies the leading dimension of A.

• ipiv[out]

pointer to type. Array on the GPU of dimension min(m,n).

The Householder scalars.

### 3.3.2.20. rocsolver_<type>gelq2_batched()#

rocblas_status rocsolver_zgelq2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgelq2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgelq2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgelq2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#

GELQ2_BATCHED computes the LQ factorization of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix $$A_j$$ in the batch has the form

$A_j = \left[\begin{array}{cc} L_j & 0 \end{array}\right] Q_j$

where $$L_j$$ is lower triangular (lower trapezoidal if m > n), and $$Q_j$$ is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

$Q_j = H_{j_k}'H_{j_{k-1}}' \cdots H_{j_1}', \quad \text{with} \: k = \text{min}(m,n).$

Each Householder matrices $$H_{j_i}$$ is given by

$H_{j_i} = I - \text{ipiv}_j[i] \cdot v_{j_i}' v_{j_i}$

where the first i-1 elements of Householder vector $$v_{j_i}$$ are zero, and $$v_{j_i}[i] = 1$$.

Parameters:
• handle[in] rocblas_handle.

• m[in]

rocblas_int. m >= 0.

The number of rows of all the matrices A_j in the batch.

• n[in]

rocblas_int. n >= 0.

The number of columns of all the matrices A_j in the batch.

• A[inout]

Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and below the diagonal contain the factor L_j. The elements above the diagonal are the last n - i elements of Householder vector v_(j_i).

• lda[in]

rocblas_int. lda >= m.

Specifies the leading dimension of matrices A_j.

• ipiv[out]

pointer to type. Array on the GPU (the size depends on the value of strideP).

Contains the vectors ipiv_j of corresponding Householder scalars.

• strideP[in]

rocblas_stride.

Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

• batch_count[in]

rocblas_int. batch_count >= 0.

Number of matrices in the batch.

### 3.3.2.21. rocsolver_<type>gelq2_strided_batched()#

rocblas_status rocsolver_zgelq2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgelq2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgelq2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgelq2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#

GELQ2_STRIDED_BATCHED computes the LQ factorization of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

The factorization of matrix $$A_j$$ in the batch has the form

$A_j = \left[\begin{array}{cc} L_j & 0 \end{array}\right] Q_j$

where $$L_j$$ is lower triangular (lower trapezoidal if m > n), and $$Q_j$$ is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

$Q_j = H_{j_k}'H_{j_{k-1}}' \cdots H_{j_1}', \quad \text{with} \: k = \text{min}(m,n).$

Each Householder matrices $$H_{j_i}$$ is given by

$H_{j_i} = I - \text{ipiv}_j[i] \cdot v_{j_i}' v_{j_i}$

where the first i-1 elements of Householder vector $$v_{j_i}$$ are zero, and $$v_{j_i}[i] = 1$$.

Parameters:
• handle[in] rocblas_handle.

• m[in]

rocblas_int. m >= 0.

The number of rows of all the matrices A_j in the batch.

• n[in]

rocblas_int. n >= 0.

The number of columns of all the matrices A_j in the batch.

• A[inout]

pointer to type. Array on the GPU (the size depends on the value of strideA).

On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and below the diagonal contain the factor L_j. The elements above the diagonal are the last n - i elements of Householder vector v_(j_i).

• lda[in]

rocblas_int. lda >= m.

Specifies the leading dimension of matrices A_j.

• strideA[in]

rocblas_stride.

Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

• ipiv[out]

pointer to type. Array on the GPU (the size depends on the value of strideP).

Contains the vectors ipiv_j of corresponding Householder scalars.

• strideP[in]

rocblas_stride.

Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

• batch_count[in]

rocblas_int. batch_count >= 0.

Number of matrices in the batch.

### 3.3.2.22. rocsolver_<type>gelqf()#

rocblas_status rocsolver_zgelqf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, rocblas_double_complex *ipiv)#
rocblas_status rocsolver_cgelqf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, rocblas_float_complex *ipiv)#
rocblas_status rocsolver_dgelqf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *ipiv)#
rocblas_status rocsolver_sgelqf(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *ipiv)#

GELQF computes a LQ factorization of a general m-by-n matrix A.

(This is the blocked version of the algorithm).

The factorization has the form

$A = \left[\begin{array}{cc} L & 0 \end{array}\right] Q$

where L is lower triangular (lower trapezoidal if m > n), and Q is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

$Q = H_k'H_{k-1}' \cdots H_1', \quad \text{with} \: k = \text{min}(m,n).$

Each Householder matrix $$H_i$$ is given by

$H_i = I - \text{ipiv}[i] \cdot v_i' v_i$

where the first i-1 elements of the Householder vector $$v_i$$ are zero, and $$v_i[i] = 1$$.

Parameters:
• handle[in] rocblas_handle.

• m[in]

rocblas_int. m >= 0.

The number of rows of the matrix A.

• n[in]

rocblas_int. n >= 0.

The number of columns of the matrix A.

• A[inout]

pointer to type. Array on the GPU of dimension lda*n.

On entry, the m-by-n matrix to be factored. On exit, the elements on and below the diagonal contain the factor L; the elements above the diagonal are the last n - i elements of Householder vector v_i.

• lda[in]

rocblas_int. lda >= m.

Specifies the leading dimension of A.

• ipiv[out]

pointer to type. Array on the GPU of dimension min(m,n).

The Householder scalars.

### 3.3.2.23. rocsolver_<type>gelqf_batched()#

rocblas_status rocsolver_zgelqf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgelqf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgelqf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgelqf_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#

GELQF_BATCHED computes the LQ factorization of a batch of general m-by-n matrices.

(This is the blocked version of the algorithm).

The factorization of matrix $$A_j$$ in the batch has the form

$A_j = \left[\begin{array}{cc} L_j & 0 \end{array}\right] Q_j$

where $$L_j$$ is lower triangular (lower trapezoidal if m > n), and $$Q_j$$ is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

$Q_j = H_{j_k}'H_{j_{k-1}}' \cdots H_{j_1}', \quad \text{with} \: k = \text{min}(m,n).$

Each Householder matrices $$H_{j_i}$$ is given by

$H_{j_i} = I - \text{ipiv}_j[i] \cdot v_{j_i}' v_{j_i}$

where the first i-1 elements of Householder vector $$v_{j_i}$$ are zero, and $$v_{j_i}[i] = 1$$.

Parameters:
• handle[in] rocblas_handle.

• m[in]

rocblas_int. m >= 0.

The number of rows of all the matrices A_j in the batch.

• n[in]

rocblas_int. n >= 0.

The number of columns of all the matrices A_j in the batch.

• A[inout]

Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and below the diagonal contain the factor L_j. The elements above the diagonal are the last n - i elements of Householder vector v_(j_i).

• lda[in]

rocblas_int. lda >= m.

Specifies the leading dimension of matrices A_j.

• ipiv[out]

pointer to type. Array on the GPU (the size depends on the value of strideP).

Contains the vectors ipiv_j of corresponding Householder scalars.

• strideP[in]

rocblas_stride.

Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

• batch_count[in]

rocblas_int. batch_count >= 0.

Number of matrices in the batch.

### 3.3.2.24. rocsolver_<type>gelqf_strided_batched()#

rocblas_status rocsolver_zgelqf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_double_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgelqf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, rocblas_float_complex *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgelqf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgelqf_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *ipiv, const rocblas_stride strideP, const rocblas_int batch_count)#

GELQF_STRIDED_BATCHED computes the LQ factorization of a batch of general m-by-n matrices.

(This is the blocked version of the algorithm).

The factorization of matrix $$A_j$$ in the batch has the form

$A_j = \left[\begin{array}{cc} L_j & 0 \end{array}\right] Q_j$

where $$L_j$$ is lower triangular (lower trapezoidal if m > n), and $$Q_j$$ is a n-by-n orthogonal/unitary matrix represented as the product of Householder matrices

$Q_j = H_{j_k}'H_{j_{k-1}}' \cdots H_{j_1}', \quad \text{with} \: k = \text{min}(m,n).$

Each Householder matrices $$H_{j_i}$$ is given by

$H_{j_i} = I - \text{ipiv}_j[i] \cdot v_{j_i}' v_{j_i}$

where the first i-1 elements of Householder vector $$v_{j_i}$$ are zero, and $$v_{j_i}[i] = 1$$.

Parameters:
• handle[in] rocblas_handle.

• m[in]

rocblas_int. m >= 0.

The number of rows of all the matrices A_j in the batch.

• n[in]

rocblas_int. n >= 0.

The number of columns of all the matrices A_j in the batch.

• A[inout]

pointer to type. Array on the GPU (the size depends on the value of strideA).

On entry, the m-by-n matrices A_j to be factored. On exit, the elements on and below the diagonal contain the factor L_j. The elements above the diagonal are the last n - i elements of Householder vector v_(j_i).

• lda[in]

rocblas_int. lda >= m.

Specifies the leading dimension of matrices A_j.

• strideA[in]

rocblas_stride.

Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

• ipiv[out]

pointer to type. Array on the GPU (the size depends on the value of strideP).

Contains the vectors ipiv_j of corresponding Householder scalars.

• strideP[in]

rocblas_stride.

Stride from the start of one vector ipiv_j to the next one ipiv_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

• batch_count[in]

rocblas_int. batch_count >= 0.

Number of matrices in the batch.

## 3.3.3. Problem and matrix reductions#

### 3.3.3.1. rocsolver_<type>gebd2()#

rocblas_status rocsolver_zgebd2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, double *D, double *E, rocblas_double_complex *tauq, rocblas_double_complex *taup)#
rocblas_status rocsolver_cgebd2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, float *D, float *E, rocblas_float_complex *tauq, rocblas_float_complex *taup)#
rocblas_status rocsolver_dgebd2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *D, double *E, double *tauq, double *taup)#
rocblas_status rocsolver_sgebd2(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *D, float *E, float *tauq, float *taup)#

GEBD2 computes the bidiagonal form of a general m-by-n matrix A.

(This is the unblocked version of the algorithm).

The bidiagonal form is given by:

$B = Q' A P$

where B is upper bidiagonal if m >= n and lower bidiagonal if m < n, and Q and P are orthogonal/unitary matrices represented as the product of Householder matrices

$\begin{split} \begin{array}{cl} Q = H_1H_2\cdots H_n\: \text{and} \: P = G_1G_2\cdots G_{n-1}, & \: \text{if}\: m >= n, \:\text{or}\\ Q = H_1H_2\cdots H_{m-1}\: \text{and} \: P = G_1G_2\cdots G_{m}, & \: \text{if}\: m < n. \end{array} \end{split}$

Each Householder matrix $$H_i$$ and $$G_i$$ is given by

$\begin{split} \begin{array}{cl} H_i = I - \text{tauq}[i] \cdot v_i v_i', & \: \text{and}\\ G_i = I - \text{taup}[i] \cdot u_i' u_i. \end{array} \end{split}$

If m >= n, the first i-1 elements of the Householder vector $$v_i$$ are zero, and $$v_i[i] = 1$$; while the first i elements of the Householder vector $$u_i$$ are zero, and $$u_i[i+1] = 1$$. If m < n, the first i elements of the Householder vector $$v_i$$ are zero, and $$v_i[i+1] = 1$$; while the first i-1 elements of the Householder vector $$u_i$$ are zero, and $$u_i[i] = 1$$.

Parameters:
• handle[in] rocblas_handle.

• m[in]

rocblas_int. m >= 0.

The number of rows of the matrix A.

• n[in]

rocblas_int. n >= 0.

The number of columns of the matrix A.

• A[inout]

pointer to type. Array on the GPU of dimension lda*n.

On entry, the m-by-n matrix to be factored. On exit, the elements on the diagonal and superdiagonal (if m >= n), or subdiagonal (if m < n) contain the bidiagonal form B. If m >= n, the elements below the diagonal are the last m - i elements of Householder vector v_i, and the elements above the superdiagonal are the last n - i - 1 elements of Householder vector u_i. If m < n, the elements below the subdiagonal are the last m - i - 1 elements of Householder vector v_i, and the elements above the diagonal are the last n - i elements of Householder vector u_i.

• lda[in]

rocblas_int. lda >= m.

specifies the leading dimension of A.

• D[out]

pointer to real type. Array on the GPU of dimension min(m,n).

The diagonal elements of B.

• E[out]

pointer to real type. Array on the GPU of dimension min(m,n)-1.

The off-diagonal elements of B.

• tauq[out]

pointer to type. Array on the GPU of dimension min(m,n).

The Householder scalars associated with matrix Q.

• taup[out]

pointer to type. Array on the GPU of dimension min(m,n).

The Householder scalars associated with matrix P.

### 3.3.3.2. rocsolver_<type>gebd2_batched()#

rocblas_status rocsolver_zgebd2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, rocblas_double_complex *tauq, const rocblas_stride strideQ, rocblas_double_complex *taup, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgebd2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, rocblas_float_complex *tauq, const rocblas_stride strideQ, rocblas_float_complex *taup, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgebd2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *const A[], const rocblas_int lda, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, double *tauq, const rocblas_stride strideQ, double *taup, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgebd2_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *const A[], const rocblas_int lda, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, float *tauq, const rocblas_stride strideQ, float *taup, const rocblas_stride strideP, const rocblas_int batch_count)#

GEBD2_BATCHED computes the bidiagonal form of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

For each instance in the batch, the bidiagonal form is given by:

$B_j = Q_j' A_j P_j$

where $$B_j$$ is upper bidiagonal if m >= n and lower bidiagonal if m < n, and $$Q_j$$ and $$P_j$$ are orthogonal/unitary matrices represented as the product of Householder matrices

$\begin{split} \begin{array}{cl} Q_j = H_{j_1}H_{j_2}\cdots H_{j_n}\: \text{and} \: P_j = G_{j_1}G_{j_2}\cdots G_{j_{n-1}}, & \: \text{if}\: m >= n, \:\text{or}\\ Q_j = H_{j_1}H_{j_2}\cdots H_{j_{m-1}}\: \text{and} \: P_j = G_{j_1}G_{j_2}\cdots G_{j_m}, & \: \text{if}\: m < n. \end{array} \end{split}$

Each Householder matrix $$H_{j_i}$$ and $$G_{j_i}$$ is given by

$\begin{split} \begin{array}{cl} H_{j_i} = I - \text{tauq}_j[i] \cdot v_{j_i} v_{j_i}', & \: \text{and}\\ G_{j_i} = I - \text{taup}_j[i] \cdot u_{j_i}' u_{j_i}. \end{array} \end{split}$

If m >= n, the first i-1 elements of the Householder vector $$v_{j_i}$$ are zero, and $$v_{j_i}[i] = 1$$; while the first i elements of the Householder vector $$u_{j_i}$$ are zero, and $$u_{j_i}[i+1] = 1$$. If m < n, the first i elements of the Householder vector $$v_{j_i}$$ are zero, and $$v_{j_i}[i+1] = 1$$; while the first i-1 elements of the Householder vector $$u_{j_i}$$ are zero, and $$u_{j_i}[i] = 1$$.

Parameters:
• handle[in] rocblas_handle.

• m[in]

rocblas_int. m >= 0.

The number of rows of all the matrices A_j in the batch.

• n[in]

rocblas_int. n >= 0.

The number of columns of all the matrices A_j in the batch.

• A[inout]

Array of pointers to type. Each pointer points to an array on the GPU of dimension lda*n.

On entry, the m-by-n matrices A_j to be factored. On exit, the elements on the diagonal and superdiagonal (if m >= n), or subdiagonal (if m < n) contain the bidiagonal form B_j. If m >= n, the elements below the diagonal are the last m - i elements of Householder vector v_(j_i), and the elements above the superdiagonal are the last n - i - 1 elements of Householder vector u_(j_i). If m < n, the elements below the subdiagonal are the last m - i - 1 elements of Householder vector v_(j_i), and the elements above the diagonal are the last n - i elements of Householder vector u_(j_i).

• lda[in]

rocblas_int. lda >= m.

Specifies the leading dimension of matrices A_j.

• D[out]

pointer to real type. Array on the GPU (the size depends on the value of strideD).

The diagonal elements of B_j.

• strideD[in]

rocblas_stride.

Stride from the start of one vector D_j to the next one D_(j+1). There is no restriction for the value of strideD. Normal use case is strideD >= min(m,n).

• E[out]

pointer to real type. Array on the GPU (the size depends on the value of strideE).

The off-diagonal elements of B_j.

• strideE[in]

rocblas_stride.

Stride from the start of one vector E_j to the next one E_(j+1). There is no restriction for the value of strideE. Normal use case is strideE >= min(m,n)-1.

• tauq[out]

pointer to type. Array on the GPU (the size depends on the value of strideQ).

Contains the vectors tauq_j of Householder scalars associated with matrices Q_j.

• strideQ[in]

rocblas_stride.

Stride from the start of one vector tauq_j to the next one tauq_(j+1). There is no restriction for the value of strideQ. Normal use is strideQ >= min(m,n).

• taup[out]

pointer to type. Array on the GPU (the size depends on the value of strideP).

Contains the vectors taup_j of Householder scalars associated with matrices P_j.

• strideP[in]

rocblas_stride.

Stride from the start of one vector taup_j to the next one taup_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

• batch_count[in]

rocblas_int. batch_count >= 0.

Number of matrices in the batch.

### 3.3.3.3. rocsolver_<type>gebd2_strided_batched()#

rocblas_status rocsolver_zgebd2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, const rocblas_stride strideA, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, rocblas_double_complex *tauq, const rocblas_stride strideQ, rocblas_double_complex *taup, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgebd2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, const rocblas_stride strideA, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, rocblas_float_complex *tauq, const rocblas_stride strideQ, rocblas_float_complex *taup, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_dgebd2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, const rocblas_stride strideA, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, double *tauq, const rocblas_stride strideQ, double *taup, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_sgebd2_strided_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, const rocblas_stride strideA, float *D, const rocblas_stride strideD, float *E, const rocblas_stride strideE, float *tauq, const rocblas_stride strideQ, float *taup, const rocblas_stride strideP, const rocblas_int batch_count)#

GEBD2_STRIDED_BATCHED computes the bidiagonal form of a batch of general m-by-n matrices.

(This is the unblocked version of the algorithm).

For each instance in the batch, the bidiagonal form is given by:

$B_j = Q_j' A_j P_j$

where $$B_j$$ is upper bidiagonal if m >= n and lower bidiagonal if m < n, and $$Q_j$$ and $$P_j$$ are orthogonal/unitary matrices represented as the product of Householder matrices

$\begin{split} \begin{array}{cl} Q_j = H_{j_1}H_{j_2}\cdots H_{j_n}\: \text{and} \: P_j = G_{j_1}G_{j_2}\cdots G_{j_{n-1}}, & \: \text{if}\: m >= n, \:\text{or}\\ Q_j = H_{j_1}H_{j_2}\cdots H_{j_{m-1}}\: \text{and} \: P_j = G_{j_1}G_{j_2}\cdots G_{j_m}, & \: \text{if}\: m < n. \end{array} \end{split}$

Each Householder matrix $$H_{j_i}$$ and $$G_{j_i}$$ is given by

$\begin{split} \begin{array}{cl} H_{j_i} = I - \text{tauq}_j[i] \cdot v_{j_i} v_{j_i}', & \: \text{and}\\ G_{j_i} = I - \text{taup}_j[i] \cdot u_{j_i}' u_{j_i}. \end{array} \end{split}$

If m >= n, the first i-1 elements of the Householder vector $$v_{j_i}$$ are zero, and $$v_{j_i}[i] = 1$$; while the first i elements of the Householder vector $$u_{j_i}$$ are zero, and $$u_{j_i}[i+1] = 1$$. If m < n, the first i elements of the Householder vector $$v_{j_i}$$ are zero, and $$v_{j_i}[i+1] = 1$$; while the first i-1 elements of the Householder vector $$u_{j_i}$$ are zero, and $$u_{j_i}[i] = 1$$.

Parameters:
• handle[in] rocblas_handle.

• m[in]

rocblas_int. m >= 0.

The number of rows of all the matrices A_j in the batch.

• n[in]

rocblas_int. n >= 0.

The number of columns of all the matrices A_j in the batch.

• A[inout]

pointer to type. Array on the GPU (the size depends on the value of strideA).

On entry, the m-by-n matrices A_j to be factored. On exit, the elements on the diagonal and superdiagonal (if m >= n), or subdiagonal (if m < n) contain the bidiagonal form B_j. If m >= n, the elements below the diagonal are the last m - i elements of Householder vector v_(j_i), and the elements above the superdiagonal are the last n - i - 1 elements of Householder vector u_(j_i). If m < n, the elements below the subdiagonal are the last m - i - 1 elements of Householder vector v_(j_i), and the elements above the diagonal are the last n - i elements of Householder vector u_(j_i).

• lda[in]

rocblas_int. lda >= m.

Specifies the leading dimension of matrices A_j.

• strideA[in]

rocblas_stride.

Stride from the start of one matrix A_j to the next one A_(j+1). There is no restriction for the value of strideA. Normal use case is strideA >= lda*n.

• D[out]

pointer to real type. Array on the GPU (the size depends on the value of strideD).

The diagonal elements of B_j.

• strideD[in]

rocblas_stride.

Stride from the start of one vector D_j to the next one D_(j+1). There is no restriction for the value of strideD. Normal use case is strideD >= min(m,n).

• E[out]

pointer to real type. Array on the GPU (the size depends on the value of strideE).

The off-diagonal elements of B_j.

• strideE[in]

rocblas_stride.

Stride from the start of one vector E_j to the next one E_(j+1). There is no restriction for the value of strideE. Normal use case is strideE >= min(m,n)-1.

• tauq[out]

pointer to type. Array on the GPU (the size depends on the value of strideQ).

Contains the vectors tauq_j of Householder scalars associated with matrices Q_j.

• strideQ[in]

rocblas_stride.

Stride from the start of one vector tauq_j to the next one tauq_(j+1). There is no restriction for the value of strideQ. Normal use is strideQ >= min(m,n).

• taup[out]

pointer to type. Array on the GPU (the size depends on the value of strideP).

Contains the vectors taup_j of Householder scalars associated with matrices P_j.

• strideP[in]

rocblas_stride.

Stride from the start of one vector taup_j to the next one taup_(j+1). There is no restriction for the value of strideP. Normal use is strideP >= min(m,n).

• batch_count[in]

rocblas_int. batch_count >= 0.

Number of matrices in the batch.

### 3.3.3.4. rocsolver_<type>gebrd()#

rocblas_status rocsolver_zgebrd(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *A, const rocblas_int lda, double *D, double *E, rocblas_double_complex *tauq, rocblas_double_complex *taup)#
rocblas_status rocsolver_cgebrd(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *A, const rocblas_int lda, float *D, float *E, rocblas_float_complex *tauq, rocblas_float_complex *taup)#
rocblas_status rocsolver_dgebrd(rocblas_handle handle, const rocblas_int m, const rocblas_int n, double *A, const rocblas_int lda, double *D, double *E, double *tauq, double *taup)#
rocblas_status rocsolver_sgebrd(rocblas_handle handle, const rocblas_int m, const rocblas_int n, float *A, const rocblas_int lda, float *D, float *E, float *tauq, float *taup)#

GEBRD computes the bidiagonal form of a general m-by-n matrix A.

(This is the blocked version of the algorithm).

The bidiagonal form is given by:

$B = Q' A P$

where B is upper bidiagonal if m >= n and lower bidiagonal if m < n, and Q and P are orthogonal/unitary matrices represented as the product of Householder matrices

$\begin{split} \begin{array}{cl} Q = H_1H_2\cdots H_n\: \text{and} \: P = G_1G_2\cdots G_{n-1}, & \: \text{if}\: m >= n, \:\text{or}\\ Q = H_1H_2\cdots H_{m-1}\: \text{and} \: P = G_1G_2\cdots G_{m}, & \: \text{if}\: m < n. \end{array} \end{split}$

Each Householder matrix $$H_i$$ and $$G_i$$ is given by

$\begin{split} \begin{array}{cl} H_i = I - \text{tauq}[i] \cdot v_i v_i', & \: \text{and}\\ G_i = I - \text{taup}[i] \cdot u_i' u_i. \end{array} \end{split}$

If m >= n, the first i-1 elements of the Householder vector $$v_i$$ are zero, and $$v_i[i] = 1$$; while the first i elements of the Householder vector $$u_i$$ are zero, and $$u_i[i+1] = 1$$. If m < n, the first i elements of the Householder vector $$v_i$$ are zero, and $$v_i[i+1] = 1$$; while the first i-1 elements of the Householder vector $$u_i$$ are zero, and $$u_i[i] = 1$$.

Parameters:
• handle[in] rocblas_handle.

• m[in]

rocblas_int. m >= 0.

The number of rows of the matrix A.

• n[in]

rocblas_int. n >= 0.

The number of columns of the matrix A.

• A[inout]

pointer to type. Array on the GPU of dimension lda*n.

On entry, the m-by-n matrix to be factored. On exit, the elements on the diagonal and superdiagonal (if m >= n), or subdiagonal (if m < n) contain the bidiagonal form B. If m >= n, the elements below the diagonal are the last m - i elements of Householder vector v_i, and the elements above the superdiagonal are the last n - i - 1 elements of Householder vector u_i. If m < n, the elements below the subdiagonal are the last m - i - 1 elements of Householder vector v_i, and the elements above the diagonal are the last n - i elements of Householder vector u_i.

• lda[in]

rocblas_int. lda >= m.

specifies the leading dimension of A.

• D[out]

pointer to real type. Array on the GPU of dimension min(m,n).

The diagonal elements of B.

• E[out]

pointer to real type. Array on the GPU of dimension min(m,n)-1.

The off-diagonal elements of B.

• tauq[out]

pointer to type. Array on the GPU of dimension min(m,n).

The Householder scalars associated with matrix Q.

• taup[out]

pointer to type. Array on the GPU of dimension min(m,n).

The Householder scalars associated with matrix P.

### 3.3.3.5. rocsolver_<type>gebrd_batched()#

rocblas_status rocsolver_zgebrd_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_double_complex *const A[], const rocblas_int lda, double *D, const rocblas_stride strideD, double *E, const rocblas_stride strideE, rocblas_double_complex *tauq, const rocblas_stride strideQ, rocblas_double_complex *taup, const rocblas_stride strideP, const rocblas_int batch_count)#
rocblas_status rocsolver_cgebrd_batched(rocblas_handle handle, const rocblas_int m, const rocblas_int n, rocblas_float_complex *const A[], const rocblas_int lda, float *D, const