TensorFlow compatibility#
2025-01-09
14 min read time
TensorFlow is an open-source library for solving machine learning, deep learning, and AI problems. It can solve many problems across different sectors and industries but primarily focuses on neural network training and inference. It is one of the most popular and in-demand frameworks and is very active in open-source contribution and development.
The official TensorFlow repository includes full ROCm support. AMD maintains a TensorFlow ROCm repository in order to quickly add bug fixes, updates, and support for the latest ROCM versions.
ROCm TensorFlow release:
Offers Docker images with ROCm and TensorFlow pre-installed.
ROCm TensorFlow repository: ROCm/tensorflow-upstream
See the ROCm TensorFlow installation guide to get started.
Official TensorFlow release:
Official TensorFlow repository: tensorflow/tensorflow
See the TensorFlow API versions list.
Note
The official TensorFlow documentation does not cover ROCm support. Use the ROCm documentation for installation instructions for Tensorflow on ROCm. See TensorFlow on ROCm.
Docker image compatibility#
AMD validates and publishes ready-made TensorFlow images with ROCm backends on Docker Hub. The following Docker image tags and associated inventories are validated for ROCm 6.3.1. Click the icon to view the image on Docker Hub.
Docker image |
TensorFlow |
Dev |
Python |
TensorBoard |
---|---|---|---|---|
rocm/tensorflow | dev |
|||
rocm/tensorflow | dev |
|||
rocm/tensorflow | dev |
|||
rocm/tensorflow | dev |
|||
rocm/tensorflow | dev |
Critical ROCm libraries for TensorFlow#
TensorFlow depends on multiple components and the supported features of those components can affect the TensorFlow ROCm supported feature set. The versions in the following table refer to the first TensorFlow version where the ROCm library was introduced as a dependency.
ROCm library |
Version |
Purpose |
Used in |
---|---|---|---|
2.3.0 |
Provides GPU-accelerated Basic Linear Algebra Subprograms (BLAS) for matrix and vector operations. |
Accelerates operations like |
|
0.10.0 |
Extends hipBLAS with additional optimizations like fused kernels and integer tensor cores. |
Optimizes matrix multiplications and linear algebra operations used in layers like dense, convolutional, and RNNs in TensorFlow. |
|
3.3.0 |
Provides a C++ template library for parallel algorithms for reduction, scan, sort and select. |
Supports operations like |
|
1.0.17 |
Accelerates Fast Fourier Transforms (FFT) for signal processing tasks. |
Used for operations like signal processing, image filtering, and certain types of neural networks requiring FFT-based transformations. |
|
2.3.0 |
Provides GPU-accelerated direct linear solvers for dense and sparse systems. |
Optimizes linear algebra functions such as solving systems of linear equations, often used in optimization and training tasks. |
|
3.1.2 |
Optimizes sparse matrix operations for efficient computations on sparse data. |
Accelerates sparse matrix operations in models with sparse weight matrices or activations, commonly used in neural networks. |
|
3.3.0 |
Provides optimized deep learning primitives such as convolutions, pooling, normalization, and activation functions. |
Speeds up convolutional neural networks (CNNs) and other layers. Used
in TensorFlow for layers like |
|
2.21.5 |
Optimizes for multi-GPU communication for operations like AllReduce and Broadcast. |
Distributed data parallel training ( |
|
3.3.0 |
Provides a C++ template library for parallel algorithms like sorting, reduction, and scanning. |
Reduction operations like |
Supported and unsupported features#
The following section maps supported data types and GPU-accelerated TensorFlow features to their minimum supported ROCm and TensorFlow versions.
Data types#
The data type of a tensor is specified using the dtype
attribute or
argument, and TensorFlow supports a wide range of data types for different use
cases.
The basic, single data types of tf.dtypes are as follows:
Data type |
Description |
Since TensorFlow |
Since ROCm |
---|---|---|---|
|
16-bit bfloat (brain floating point). |
1.0.0 |
1.7 |
|
Boolean. |
1.0.0 |
1.7 |
|
128-bit complex. |
1.0.0 |
1.7 |
|
64-bit complex. |
1.0.0 |
1.7 |
|
64-bit (double precision) floating-point. |
1.0.0 |
1.7 |
|
16-bit (half precision) floating-point. |
1.0.0 |
1.7 |
|
32-bit (single precision) floating-point. |
1.0.0 |
1.7 |
|
64-bit (double precision) floating-point. |
1.0.0 |
1.7 |
|
16-bit (half precision) floating-point. |
2.0.0 |
2.0 |
|
Signed 16-bit integer. |
1.0.0 |
1.7 |
|
Signed 32-bit integer. |
1.0.0 |
1.7 |
|
Signed 64-bit integer. |
1.0.0 |
1.7 |
|
Signed 8-bit integer. |
1.0.0 |
1.7 |
|
Signed quantized 16-bit integer. |
1.0.0 |
1.7 |
|
Signed quantized 32-bit integer. |
1.0.0 |
1.7 |
|
Signed quantized 8-bit integer. |
1.0.0 |
1.7 |
|
Unsigned quantized 16-bit integer. |
1.0.0 |
1.7 |
|
Unsigned quantized 8-bit integer. |
1.0.0 |
1.7 |
|
Handle to a mutable, dynamically allocated resource. |
1.0.0 |
1.7 |
|
Variable-length string, represented as byte array. |
1.0.0 |
1.7 |
|
Unsigned 16-bit (word) integer. |
1.0.0 |
1.7 |
|
Unsigned 32-bit (dword) integer. |
1.5.0 |
1.7 |
|
Unsigned 64-bit (qword) integer. |
1.5.0 |
1.7 |
|
Unsigned 8-bit (byte) integer. |
1.0.0 |
1.7 |
|
Data of arbitrary type (known at runtime). |
1.4.0 |
1.7 |
Features#
This table provides an overview of key features in TensorFlow and their availability in ROCm.
Module |
Description |
Since TensorFlow |
Since ROCm |
---|---|---|---|
|
Operations for matrix and tensor computations, such as
|
1.4 |
1.8.2 |
|
GPU-accelerated building blocks for deep learning models, such as 2D
convolutions with |
1.0 |
1.8.2 |
|
GPU-accelerated functions for image preprocessing and augmentations,
such as resize images with |
1.1 |
1.8.2 |
|
GPU acceleration for Keras layers and models, including dense layers
( |
1.4 |
1.8.2 |
|
GPU-accelerated mathematical operations, such as sum across dimensions
with |
1.5 |
1.8.2 |
|
Functions for spectral analysis and signal transformations. |
1.13 |
2.1 |
|
GPU-accelerated data preprocessing for efficient input pipelines,
Prefetching with |
1.4 |
1.8.2 |
|
Enabling to scale computations across multiple devices on a single machine or across multiple machines. |
1.13 |
2.1 |
|
GPU-accelerated random number generation |
1.12 |
1.9.2 |
|
Enables dynamic tensor manipulation on GPUs. |
1.0 |
1.8.2 |
|
GPU-accelerated sparse matrix manipulations. |
1.9 |
1.9.0 |
|
GPU-accelerated NumPy-like API for numerical computations. |
2.4 |
4.1.1 |
|
Handling of variable-length sequences and ragged tensors with GPU support. |
1.13 |
2.1 |
|
Enable GPU-accelerated functions in optimization. |
1.14 |
2.4 |
|
Quantized operations for inference, accelerated on GPUs. |
1.12 |
1.9.2 |
Distributed library features#
Enables developers to scale computations across multiple devices on a single machine or across multiple machines.
Feature |
Description |
Since TensorFlow |
Since ROCm |
---|---|---|---|
|
Synchronous training across multiple workers using mirrored variables. |
2.0 |
3.0 |
|
Synchronous training across multiple GPUs on one machine. |
1.5 |
2.5 |
|
Efficiently trains models on Google TPUs. |
1.9 |
❌ |
|
Asynchronous training using parameter servers for variable management. |
2.1 |
4.0 |
|
Keeps variables on a single device and performs computation on multiple devices. |
2.3 |
4.1 |
|
Synchronous training across multiple devices and hosts. |
1.14 |
3.5 |
Distribution Strategies API |
High-level API to simplify distributed training configuration and execution. |
1.10 |
3.0 |
Unsupported TensorFlow features#
The following are GPU-accelerated TensorFlow features not currently supported by ROCm.
Feature |
Description |
Since TensorFlow |
---|---|---|
Mixed Precision with TF32 |
Mixed precision with TF32 is used for matrix multiplications, convolutions, and other linear algebra operations, particularly in deep learning workloads like CNNs and transformers. |
2.4 |
|
Efficiently trains models on Google TPUs. |
1.9 |
Use cases and recommendations#
The Training a Neural Collaborative Filtering (NCF) Recommender on an AMD GPU blog post discusses training an NCF recommender system using TensorFlow. It explains how NCF improves traditional collaborative filtering methods by leveraging neural networks to model non-linear user-item interactions. The post outlines the implementation using the recommenders library, focusing on the use of implicit data (for example, user interactions like viewing or purchasing) and how it addresses challenges like the lack of negative values.
The Creating a PyTorch/TensorFlow code environment on AMD GPUs blog post provides instructions for creating a machine learning environment for PyTorch and TensorFlow on AMD GPUs using ROCm. It covers steps like installing the libraries, cloning code repositories, installing dependencies, and troubleshooting potential issues with CUDA-based code. Additionally, it explains how to HIPify code (port CUDA code to HIP) and manage Docker images for a better experience on AMD GPUs. This guide aims to help data scientists and ML practitioners adapt their code for AMD GPUs.
For more use cases and recommendations, see the ROCm Tensorflow blog posts.