Using NHWC Batch Normalization with PyTorch#
This topic explains how to use NHWC Batch Normalization for MIOpen operations in PyTorch. NHWC is a deep-learning memory format that has certain performance advantages over traditional memory formats.
For information about installing and using PyTorch with ROCm, see PyTorch on ROCm. For a list of the ROCm components and features that PyTorch supports, see PyTorch compatibility. For more background on using PyTorch and ROCm for AI tasks, see Training a model with PyTorch for ROCm.
NHWC versus NCHW#
NHWC (also known as “Channels Last”) and NCHW are two types of memory formats for deep learning. They describe how multidimensional arrays (nD) are translated to a linear (one-dimensional) memory address space.
NCHW (Number of samples, channels, height, width): This is the default data layout in which channels are stored separately from one another. The height and width information is stored after the channels.
NHWC (Number of samples, height, width, channels): In this alternative format, channels are stored next to each other after the height and width information.
The performance of NHWC is better than that of NCHW and is close to that observed when using a blocked memory format. NHWC is also easier to work with for common operations.
For more information about these memory formats, see the PyTorch memory format documentation and the Intel Extension for PyTorch GitHub.
Batch Normalization#
Batch Normalization (also known as Batchnorm or BatchNorm) enables higher learning rates and reduces initialization overhead by normalizing layer inputs. Ordinarily, the distribution of the inputs to each layer changes as the parameters to the previous layer change. This makes it more difficult to train deep learning models and leads to lower learning rates. With Batch Normalization, normalization is part of the architecture and is performed for each training batch.
For more information on Batch Normalization, see Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift.
Enabling or disabling NHWC Batch Normalization for MIOpen using PyTorch#
The PyTorch open-source tensor library provides support for using NHWC Batch Normalization with MIOpen. In addition to Batch Normalization, NHWC support is also available for convolution and other MIOpen features.
NHWC Batch Normalization support in MIOpen can be used in a PyTorch environment using ROCm 7.0 or later. This configuration supports 2D and 3D NHWC Batch Normalization. 1D Batch Normalization is not applicable to the NHWC format.
PyTorch branch support#
The ROCm/pytorch
PyTorch images support NHWC Batch Normalization. ROCm 7.0 or later is required.
The following PyTorch branches support the NHWC Batch Normalization feature:
In the release/2.7
PyTorch branch, NHWC Batch Normalization support in MIOpen is enabled by default.
To use the native Batch Normalization approach with this image, use this command:
PYTORCH_MIOPEN_SUGGEST_NHWC_BATCHNORM=0
In the release/2.6
PyTorch branch, NHWC Batch Normalization support in MIOpen is disabled by default.
To enable NHWC Batch Normalization for this image, use this command:
PYTORCH_MIOPEN_SUGGEST_NHWC_BATCHNORM=1
For information about installing and using PyTorch on ROCm, see PyTorch on ROCm.
Supported configurations#
The following table shows the Batch Normalization support for NHWC and NCHW with various data types and modes.
It also indicates which backend is used with and without the PYTORCH_MIOPEN_SUGGEST_NHWC_BATCHNORM
environment variable enabled.
Note
Mixed mode means that the Batch Normalization module has a different type than the inputs, for example,
an input or gradient data type of FP16
or BF16
and a Batch Normalization type of FP32
.
If the Batch Normalization module has the same data type as the inputs, for instance, an
input or gradient data type of FP32
and a Batch Normalization module that is also FP32
, the mode is
“not mixed”.
Input data type |
Memory format |
Mode |
Mixed/not mixed |
Backend with NHWC Batch Normalization enabled |
Backend with NHWC Batch Normalization disabled |
---|---|---|---|---|---|
|
NCHW |
1D/2D/3D |
not mixed |
MIOpen |
MIOpen |
|
NHWC |
2D/3D |
not mixed |
native |
MIOpen |
|
NCHW |
1D/2D/3D |
mixed |
MIOpen |
MIOpen |
|
NCHW |
1D/2D/3D |
not mixed |
native |
native |
|
NHWC |
2D/3D |
mixed |
native |
MIOpen |
|
NHWC |
2D/3D |
not mixed |
native |
native |
|
NCHW |
1D/2D/3D |
mixed |
MIOpen (*) |
MIOpen (*) |
|
NCHW |
1D/2D/3D |
not mixed |
native |
native |
|
NHWC |
2D/3D |
mixed |
native |
MIOpen |
|
NHWC |
2D/3D |
not mixed |
native |
native |
(*) MIOpen is used with ROCm 6.4 and later. Otherwise, the native backend is used.
Disabling MIOpen for Batch Normalization in PyTorch#
In some situations, you might not want to use MIOpen as the backend for Batch Normalization operations. To disable the use of MIOpen with Batch Normalization, add this code to your application.
inp = torch.randn(size, requires_grad=True)
grad = torch.randn(size, requires_grad=False)
mod = nn.BatchNorm2d(inp.size(1), device="cuda")
with torch.backends.cudnn.flags(enabled=False): # this line disables MIOpen for the two lines below, native batchnorm will be used
out = mod(inp)
out.backward(grad)
Verifying NHWC Batch Normalization use with MIOpen#
For some operations, it can be difficult to determine the backend and memory format used. To verify whether MIOpen is being used and whether the memory format is NHWC or NCHW, run your program with the following environment variable:
MIOPEN_ENABLE_LOGGING_CMD=1
Here is an example command:
MIOPEN_ENABLE_LOGGING_CMD=1 python test_nn.py -v -k test_batchnorm_cudnn_nhwc
The output might look like this:
MIOpen(HIP): Command [LogCmdBNorm] ./bin/MIOpenDriver bnorm -n 4 -c 8 -H 2 -W 2 -m 1 --forw 1 -b 0 -r 1 -s 1 --layout NHWC
MIOpen(HIP): Command [LogCmdBNorm] ./bin/MIOpenDriver bnorm -n 4 -c 8 -H 2 -W 2 -m 1 --forw 0 -b 1 -s 1 --layout NHWC
MIOpen(HIP): Command [LogCmdBNorm] ./bin/MIOpenDriver bnorm -n 4 -c 8 -H 2 -W 2 -m 1 --forw 1 -b 0 -r 1 -s 1 --layout NCHW
MIOpen(HIP): Command [LogCmdBNorm] ./bin/MIOpenDriver bnorm -n 4 -c 8 -H 2 -W 2 -m 1 --forw 0 -b 1 -s 1 --layout NCHW
MIOpen(HIP): Command [LogCmdBNorm] ./bin/MIOpenDriver bnorm -n 2 -c 8 -H 8 -W 1 -m 1 --forw 1 -b 0 -r 1 -s 1 --layout NHWC
MIOpen(HIP): Command [LogCmdBNorm] ./bin/MIOpenDriver bnorm -n 2 -c 8 -H 8 -W 1 -m 1 --forw 0 -b 1 -s 1 --layout NHWC
MIOpen(HIP): Command [LogCmdBNorm] ./bin/MIOpenDriver bnorm -n 2 -c 8 -H 8 -W 1 -m 1 --forw 1 -b 0 -r 1 -s 1 --layout NCHW
MIOpen(HIP): Command [LogCmdBNorm] ./bin/MIOpenDriver bnorm -n 2 -c 8 -H 8 -W 1 -m 1 --forw 0 -b 1 -s 1 --layout NCHW
Each line corresponds to a different command or operation.
The ./bin/MIOpenDriver
string indicates that MIOpen was used for the operation.
The --layout
parameter shows whether NHWC or NCHW was used, for example, --layout NHWC
means the
NHWC memory format was used.
Running Batch Normalization tests#
Several test suites are available for Batch Normalization. To test Batch Normalization training using both NHWC and NCHW in 2D, run the following command:
python test_nn.py -v -k test_batchnorm_2D_train
To test Batch Normalization training using both NHWC and NCHW in 3D, run the following command:
python test_nn.py -v -k test_batchnorm_3D_train
To test Batch Normalization inference for 2D using both memory formats, use this command:
python test_nn.py -v -k test_batchnorm_2D_inference
To test the same functionality for 3D, use this command:
python test_nn.py -v -k test_batchnorm_3D_inference