Previous: S12Z, Up: Architectures [Contents][Index]
ROCGDB provides support for systems that have heterogeneous agents associated with commercially available AMD GPU devices (see Debugging Heterogeneous Programs) when the AMD ROCm platform is installed.
The following AMD GPU achitetures are supported:
Displayed as ‘vega10’ by ROCGDB and denoted as ‘gfx900’ by the compiler.
Displayed as ‘vega20’ by ROCGDB and denoted as ‘gfx906’ by the compiler.
Displayed as ‘arcturus’ by ROCGDB and denoted as ‘gfx908’ by the compiler.
Displayed as ‘aldebaran’ by ROCGDB and denoted as ‘gfx90a’ by the compiler.
Displayed as ‘navi10’ by ROCGDB and denoted as ‘gfx1010’ by the compiler.
Displayed as ‘navi12’ by ROCGDB and denoted as ‘gfx1011’ by the compiler.
Displayed as ‘navi14’ by ROCGDB and denoted as ‘gfx1012’ by the compiler.
Displayed as ‘sienna_cichlid’ by ROCGDB and denoted as ‘gfx1030’ by the compiler.
Displayed as ‘navy_flounder’ by ROCGDB and denoted as ‘gfx1031’ by the compiler.
ROCGDB supports the following source languages:
The HIP Programming Language is supported.
When compiling, the -g option should be used to produce debugging information suitable for use by ROCGDB. The --offload-arch option is used to specify the AMD GPU chips that the executable is required to support. For example, to compile a HIP program that can utilize “Vega 10” and “Vega 7nm” AMD GPU devices, with no optimization:
hipcc -O0 -g --offload-arch=gfx900 --offload-arch=gfx906 bit_extract.cpp -o bit_extract
The AMD ROCm compiler maps HIP source language device function work-items to the lanes of an AMD GPU wavefront, which are represented in ROCGDB as heterogeneous lanes.
Assembly code kernels are supported.
Other languages, including OpenCL and Fortran, are currently supported as the minimal pseudo-language, provided they are compiled specifying at least the AMD GPU Code Object V3 and DWARF 4 formats. See Unsupported Languages.
ROCGDB requires a compatible AMD GPU device driver to be installed. A warning message is displayed if either the device driver version or the version of the debug support it implements is unsupported. For example,
amd-dbgapi: warning: AMD GPU driver's version 1.6 not supported (version 2.x where x >= 1 required) amd-dbgapi: warning: AMD GPU driver's debug support version 9.0 not supported (version 10.x where x >= 1) required
ROCGDB will continue to function except no AMD GPU debugging will be possible.
ROCGDB requires each agent to have compatible firmware installed by the device driver. A warning message is displayed if unsupported firmware is detected. For example,
amd-dbgapi: warning: AMD GPU gpu_id 17619's firmware version 458 not supported (version >= 555 required)
ROCGDB will continue to function except no AMD GPU debugging will be possible on the agent.
ROCGDB requires a compatible AMD ROCm runtime to be loaded in order to detect AMD GPU code objects and wavefronts. A warning message is displayed if an unsupported AMD ROCm runtime is detected, or there is an error or restriction that prevents debugging. For example,
amd_dbgapi: warning: AMD GPU runtime's r_debug::r_version 5 not supported (r_debug::r_version >= 6 required)
ROCGDB will continue to function except no AMD GPU debugging will be possible.
AMD GPU heterogeneous agents are not listed by the ‘info agents’ command until the inferior has started executing the program.
Debugging an agent is unsupported if it has an architecture that is not supported by ROCGDB or the AMD GPU device driver, or the firmware version is not supported by ROCGDB.
The AMD GPU heterogeneous queue types reported by the ‘info queues’ command are:
An HSA AQL queue. The ‘(Single)’ suffix indicates it uses the single-producer protocol, ‘(Multi)’ suffix indicates the multi-producer protocol, and ‘(Coop)’ suffix indicates the multi-producer cooperative dispatch protocol.
An AMD PM4 queue.
A DMA queue.
An XGMI queue.
AMD GPU supports the following address spaces for the ‘info dispatches’ command:
Per work-group storage.
Per work-item storage.
The ‘info dispatches’ command uses the following BNF syntax for AMD GPU heterogeneous dispatch fences:
fence ::== [ barrier ] [ separator ] [ acquire ] [ separator ] [ release ] separator ::== "|" barrier ::== "B" acquire ::== "A" scope release ::== "R" scope scope ::== system | agent system ::== "s" agent ::== "a"
Where:
The elements are separated by ‘|’.
If present indicates the next heterogeneous packet will not be initiated until heterogeneous dispatch completes.
Indicates an acquire memory fence was performed before initiating the heterogeneous dispatch.
Indicates a release memory fence will be performed when the heterogeneous dispatch completes.
Indicates the memory fence is performed at the system memory scope.
Indicates the memory fence is performed at the agent memory scope.
An AMD GPU wavefront is represented in ROCGDB as a thread.
An AMD GPU wavefront can enter the halt state by:
S_SETHALT 1
instruction.
When a wavefront is in the halt state, it executes no further instructions. In addition, a wavefront that is associated with a queue that is in the queue error state (see AMD GPU Signals) is inhibited from executing further instructions. Continuing such wavefronts will not hit any breakpoints nor report completion of a single step command. If necessary, ‘Ctrl-C’ can be used to cancel the command.
Note that some AMD GPU architectures may have restrictions on providing information about AMD GPU wavefronts created when ROCGDB is not attached (see AMD GPU Attaching Restrictions).
When scheduler-locking is in effect (see set scheduler-locking), new wavefronts created by the resumed thread (either CPU thread or GPU wavefront) are held in the halt state.
AMD GPU supports the following reggroup values for the ‘info registers reggroup …’ command:
The number of scalar and vector registers is configured when a wavefront is created. Only allocated registers are displayed.
Scalar registers are reported as 32-bit signed integer values.
Vector registers are reported as a wavefront size vector of signed 32-bit values.
The pc
is reported as a function pointer value.
The exec
register is reported as a wavefront size-bit unsigned
integer value.
The vcc
and xnack_mask
pseudo registers are reported as
a wavefront size-bit unsigned integer value.
The flat_scratch
pseudo register is reported as a 64-bit
unsigned integer value.
The mode
, status
, and trapsts
registers are
reported as flag values. For example,
(gdb) p $mode $1 = [ FP_ROUND.32=NEAREST_EVEN FP_ROUND.64_16=NEAREST_EVEN FP_DENORM.32=FLUSH_NONE FP_DENORM.64_16=FLUSH_NONE DX10_CLAMP IEEE CSP=0 ] (gdb) p $status $2 = [ SPI_PRIO=0 USER_PRIO=0 TRAP_EN VCCZ VALID ] (gdb) p $trapsts $3 = [ EXCP_CYCLE=0 DP_RATE=FULL ]
Use the ‘ptype’ command to see the type of any register.
The ‘info sharedlibrary’ command will show the AMD GPU code objects together with the CPU code objects. For example:
(gdb) info sharedlibrary From To Syms Read Shared Object Library 0x00007fd120664ac0 0x00007fd120682790 Yes (*) /lib64/ld-linux-x86-64.so.2 ... 0x00007fd0125d8ec0 0x00007fd015f21630 Yes (*) /opt/rocm-3.5.0/hip/lib/../../lib/libamd_comgr.so 0x00007fd11d74e870 0x00007fd11d75a868 Yes (*) /lib/x86_64-linux-gnu/libtinfo.so.5 0x00007fd11d001000 0x00007fd11d00173c Yes file:///home/rocm/examples/bit_extract#offset=6477&size=10832 0x00007fd11d008000 0x00007fd11d00adc0 Yes (*) memory://95557/mem#offset=0x7fd0083e7f60&size=41416 (*): Shared library is missing debugging information. (gdb)
The code object path for AMD GPU code objects is shown as a URI (Universal Location Identifier) with a syntax defined by the following BNF syntax:
code_object_uri ::== file_uri | memory_uri file_uri ::== "file://" file_path [ range_specifier ] memory_uri ::== "memory://" process_id range_specifier range_specifier ::== [ "#" | "?" ] "offset=" number "&" "size=" number file_path ::== URI_ENCODED_OS_FILE_PATH process_id ::== DECIMAL_NUMBER number ::== HEX_NUMBER | DECIMAL_NUMBER | OCTAL_NUMBER
Where:
A C integral literal where hexadecimal values are prefixed by ‘0x’ or ‘0X’, and octal values by ‘0’.
The file’s path specified as a URI encoded UTF-8 string. In URI encoding, every character that is not in the regular expression ‘[a-zA-Z0-9/_.~-]’ is encoded as two uppercase hexidecimal digits proceeded by ‘%’. Directories in the path are separated by ‘/’.
A 0-based byte offset to the start of the code object. For a file URI, it is from the start of the file specified by the file_path, and if omitted defaults to 0. For a memory URI, it is the memory address and is required.
The number of bytes in the code object. For a file URI, if omitted it defaults to the size of the file. It is required for a memory URI.
The identity of the process owning the memory. For Linux it is the C unsigned integral decimal literal for the process pid.
AMD GPU code objects are loaded into each AMD GPU device separately. The ‘info sharedlibrary’ command will therefore show the same code object loaded multiple times. As a consequence, setting a breakpoint in AMD GPU code will result in multiple breakpoints if there are multiple AMD GPU devices.
If the source language runtime defers loading code objects until kernels are launched, then setting breakpoints may result in pending breakpoints that will be set when the code object is finally loaded.
The AMD GPU heterogeneous entities have the following target identifier formats:
The AMD GPU agent target identifier agent_systag string has the following format:
AMDGPU Agent (GPUID target-agent-id)
It is used in the ‘Target ID’ column of the ‘info agents’ command and is available using the $_agent_systag convenience variable.
The AMD GPU queue target identifier queue_systag string has the following format:
AMDGPU Queue agent-id:queue-id (QID target-queue-id)
It is used in the ‘Target ID’ column of the ‘info queues’ command and is available using the $_queue_systag convenience variable.
The AMD GPU dispatch target identifier dispatch_systag string has the following format:
AMDGPU Dispatch agent-id:queue-id:dispatch-id (PKID target-packet-id)
It is used in the ‘Target ID’ column of the ‘info dispatches’ command and is available using the $_dispatch_systag convenience variable. The target-packet-id corresponds to the dispatch packet that initiated the dispatch.
The AMD GPU thread target identifier (systag) string has the following format:
AMDGPU Wave agent-id:queue-id:dispatch-id:wave-id (work-group-x,work-group-y,work-group-z)/work-group-thread-index
It is used in the ‘Target ID’ column of the ‘info threads’ command and is available using the $_thread_systag convenience variable.
The AMD GPU lane target identifier (lane_systag) string has the following format:
AMDGPU Lane agent-id:queue-id:dispatch-id:wave-id/lane-index (work-group-x,work-group-y,work-group-z)[work-item-x,work-item-y,work-item-z]
It is used in the ‘Target ID’ column of the ‘info lanes’ command and is available using the $_lane_systag convenience variable.
The AMD GPU heterogeneous entities have the following convenience variables:
$_dispatch_pos
The string returned by the $_dispatch_pos
debugger convenience
variable has the following format:
(work-group-x,work-group-y,work-group-z)/work-group-thread-index
$_thread_workgroup_pos
The string returned by the $_thread_workgroup_pos
debugger
convenience variable has the following format:
work-group-thread-index
$_lane_workgroup_pos
The string returned by the $_lane_workgroup_pos
debugger
convenience variable has the following format:
[work-item-x,work-item-y,work-item-z]
Where:
The AMD GPU target agent identifier, queue identifier, dispatch identifier, and wave identifier respectively. The identifiers are global across all inferiors.
The AMD GPU target driver agent identifier and queue identifier identifier respectively. The identifiers are per process.
The AMD GPU target driver packet identifier. The identifier is per queue.
The grid position of the thread’s work-group within the heterogeneous dispatch.
The thread’s number within the heterogeneous work-group.
The heterogeneous lane index within the thread.
The position of the heterogeneous lane’s work-item within the heterogeneous work-group.
AMD GPU heterogeneous agents support the following address spaces:
global
the default global virtual address space
region
the per heterogeneous agent shared address space (GDS (Global
Data Store)) (see AMD GPU region
Address Space Restrictions)
group
the per heterogeneous work-group shared address space (LDS (Local Data Store))
private
the per heterogeneous lane private address space (Scratch)
generic
the generic address space that can access the global, group, or private address spaces (Flat)
The AMD GPU architecture default address space is global
.
The maint print address-spaces
command can be used to display the AMD GPU architecture address
spaces.
AMD GPU wavefronts can raise the following signals when executing instructions:
SIGILL
Execution of an illegal instruction.
SIGTRAP
Execution of a S_TRAP
instruction other than:
S_TRAP 1
which is used by ROCGDB to insert breakpoints.
S_TRAP 2
which raises SIGABRT
.
Note that S_TRAP 3
only raises a signal when ROCGDB is
attached to the inferior. Otherwise, it is treated as a no-operation.
The compiler generates S_TRAP 3
for the llvm.debugtrap
intrinsic.
SIGABRT
Execution of a S_TRAP 2
instruction. The compiler generates
S_TRAP 2
for the llvm.trap
intrinsic which is used for
assertions.
SIGFPE
Execution of a floating point or integer instruction detects a condition that is enabled to raise a signal. The conditions include:
By default, these conditions are not enabled to raise signals. The ‘set $mode’ command can be used to change the AMD GPU wavefront’s register that has bits controlling which conditions are enabled to raise signals. The ‘print $trapsts’ command can be used to inspect which conditions have been detected even if they are not enabled to raise a signal.
SIGBUS
Execution of an instruction that accessed global memory using an address that is outside the virtual address range.
SIGSEGV
Execution of an instruction that accessed a global memory page that is either not mapped or accessed with incompatible permissions.
If a single instruction raises more than one signal, they will be reported one at a time each time the wavefront is continued.
If any of these signals are delivered to the wavefront, it will cause the wavefront to enter the halt state and cause the AMD ROCm runtime to put the associated queue into the queue error state. All wavefronts associated with a queue that is in the queue error state are inhibited from executing further instructions even if they are not in the halt state. In addition, when the AMD ROCm runtime puts a queue into the queue error state it may invoke an application registered callback that could either abort the application or delete the queue which will delete any wavefronts associated with the queue.
The ROCGDB signal-related commands (see Signals) can be used to control when a signal is delivered to the inferior, what signal is delivered to the inferior, and even if a signal should not be delivered to the inferior.
If the ‘signal’ or ‘queue-signal’ commands are used to deliver a signal other than those listed above to an AMD GPU wavefront, then the following error will be displayed when the wavefront is resumed:
Resuming with signal signal is not supported by this agent.
The wavefront will not be resumed and no signal will be delivered. Use the ‘signal’ or ‘queue-signal’ commands to change the signal to deliver, or use ‘signal 0’ or ‘queue-signal 0’ to suppress delivering a signal.
Note that some AMD GPU architectures may have restrictions on supressing delivering signals to a wavefront (see AMD GPU Signal Restrictions).
A wavefront can report memory violation and address watch access events. However, the program location at which they are reported may be after the machine instruction that caused them. This can result in the reported source statement being incorrect. The following commands can be used to control this behavior:
set amdgpu precise-memory mode
‘set amdgpu precise-memory’ controls how AMD GPU devices detect memory violations and address watch events. Where mode can be:
off
The program location may not be immediately after the instruction that caused the memory violation or address watch event. This is the default.
on
Requests that the program location will be immediately after the instruction that caused a memory violation or address watch event. Enabling this mode may make the AMD GPU device execution significantly slower as it has to wait for each memory operation to complete before executing the next instruction.
For example:
(gdb) set amdgpu precise-memory off (gdb) show amdgpu precise-memory AMDGPU precise memory access reporting is off (gdb)
If a memory violation or address watch access event is reported for an AMD GPU thread that supports controlling precise memory detection when the mode is ‘off’, then the message includes an indication that the position may not be accurate. For example:
(gdb) run Warning: precise memory violation signal reporting is not enabled, reported location may not be accurate. See "show amdgpu precise-memory". Thread 6 "bit_extract" received signal SIGSEGV, Segmentation fault. 0x00007ffee6a0a028 in bit_extract_kernel (C_d=<optimized out>, A_d=<optimized out>, N=<optimized out>) at bit_extract.cpp:38 38 size_t offset = (hipBlockIdx_x * hipBlockDim_x + hipThreadIdx_x); (gdb)
The precise memory mode cannot be enabled until the inferior is started or attached. If at that time all AMD GPU devices accessible to the inferior support the ‘on’ mode, then it is enabled. For example:
(gdb) set amdgpu precise-memory on (gdb) show amdgpu precise-memory AMDGPU precise memory access reporting is on (currently disabled) (gdb) run ... (gdb) show amdgpu precise-memory AMDGPU precise memory access reporting is on (currently enabled) (gdb)
Alternatively, if at that time any of the AMD GPU devices accessible to the inferior do not support the ‘on’ mode, then a warning is reported and the mode is not enabled. For example:
(gdb) set amdgpu precise-memory on (gdb) show amdgpu precise-memory AMDGPU precise memory access reporting is on (currently disabled) (gdb) run AMDGPU precise memory access reporting could not be enabled (gdb)
If the inferior is already executing when setting the ‘on’ mode, then a warning will be reported immediately. For example:
(gdb) set amdgpu precise-memory on AMDGPU precise memory access reporting could not be enabled (gdb) show amdgpu precise-memory AMDGPU precise memory access reporting is on (currently disabled) (gdb)
Otherwise, setting the ‘on’ mode will enable it immediately. For example:
(gdb) set amdgpu precise-memory on (gdb) show amdgpu precise-memory AMDGPU precise memory access reporting is on (currently enabled) (gdb)
The set amdgpu precise-memory
parameter is per-inferior. When an
inferior forks and a child inferior is created as a result, the child inferior
inherits the parameter value of the parent inferior.
show amdgpu precise-memory
‘show amdgpu precise-memory’ displays the currently requested AMD GPU precise memory setting. If ‘on’ has been requested, the message also indicates if it is currently enabled. For example:
(gdb) show amdgpu precise-memory AMDGPU precise memory access reporting is on (currently disabled) (gdb)
The ‘set debug amdgpu log-level level’ command can be used to enable diagnostic messages for the AMD GPU target. The ‘show debug amdgpu log-level’ command displays the current AMD GPU target log level. See set debug amdgpu.
For example, the following will enable information messages and send the log to a new file:
(gdb) set debug amdgpu log-level info (gdb) set logging overwrite (gdb) set logging file log.out (gdb) set logging debugredirect on (gdb) set logging on
If you want to print the log to both the console and a file, omit the ‘set logging debugredirect on’ command. See Logging Output.
The ‘show amdgpu version’ command can be used to print the ROCdbgapi library version and file path. For example:
(gdb) show amdgpu version ROCdbgapi 0.56.0 (0.56.0-rocm-rel-4.5-56) Loaded from `/opt/rocm-4.5.0/lib/librocm-dbgapi.so.0'
ROCGDB AMD GPU support is currently a prototype and has the following restrictions. Future releases aim to address these restrictions.
The command line interface ‘info agents’, ‘info queues’, ‘info dispatches’, ‘queue find’, and ‘dispatch find’ commands are supported. However, these have no Python bindings.
The debugger convenience variable $_wave_id
is available which
returns a string that has the format:
(work-group-x,work-group-y,work-group-z)/work-group-thread-index
Where:
The grid position of the thread’s work-group within the heterogeneous dispatch.
The thread’s number within the heterogeneous work-group.
Error while mapping shared library sections: `file:///rocm/bit_extract#offset=6751&size=3136': ELF file ABI version (0) is not supported.
tbreak
so only one thread reports the breakpoint and the
other threads hitting the breakpoint will be continued. A similar
effect can be achieved by deleting the breakpoint manually when it is
hit.
amd-dbgapi: warning: At least one agent is busy (debugging may be enabled by another process)
ROCGDB will continue to function except no AMD GPU debugging will be possible.
The Linux cgroups facility can be used to limit which AMD GPU devices are used by a process. In order for a ROCGDB process to access the AMD GPU devices of the process it is debugging, the AMD GPU devices must be included in the ROCGDB process cgroup.
Therefore, multiple ROCGDB processes can each debug a process provided the cgroups specify disjoint sets of AMD GPU devices. However, a single ROCGDB process cannot debug multiple inferiors that use AMD GPU devices even if those inferiors have cgroups that specify disjoint AMD GPU devices. This is because the ROCGDB process must have all the AMD GPU devices in its cgroups and so will attempt to enable debugging for all AMD GPU devices for all inferiors it is debugging.
It is suggested to use Docker rather than cgroups directly to limit the AMD GPU devices visible inside a container:
The render-minor-number for a device can be obtained by looking at the ‘drm_render_minor’ field value from:
cat /sys/class/kfd/kfd/topology/nodes/<node-number>/properties
All processes running in the container will see the same subset of devices. By having two containers with non-overlapping sets of AMD GPUs, it is possible to use ROCGDB in both containers at the same time since each AMD GPU device will only have one ROCGDB process accessing it.
For example:
docker run -it --rm --cap-add=SYS_PTRACE --security-opt seccomp=unconfined \ --device=/dev/kfd --device=/dev/drm/card0 --device=/dev/dri/renderD128 \ --group-add render ubuntu:20.04 /bin/bash
If source line positions are used that only correspond to source lines in unloaded code objects, then ROCGDB may not set pending breakpoints, and instead set breakpoints in unpredictable places of the loaded code objects if they contain code from the same file. This can result in unexpected breakpoint hits being reported. When the code object containing the source lines is loaded, the incorrect breakpoints will be removed and replaced by the correct ones. This problem can be avoided by only setting breakpoints in unloaded code objects using symbol or function names.
The HIP_ENABLE_DEFERRED_LOADING
environment variable can be
used to disable deferred code object loading by the HIP runtime. This
ensures all code objects will be loaded when the inferior reaches the
beginning of the main
function.
For example,
export HIP_ENABLE_DEFERRED_LOADING=0
Note: If deferred code object loading is disabled and the
application performs a fork
, then the program may crash.
Note: Disabling code object loading can result in errors being reported when executing ROCGDB due to open file limitations when the application contains a large number of embedded device code objects. With deferred code object loading enabled, only the device code objects actually invoked are loaded, and so ROCGDB opens fewer files.
The AMD GPU supported architectures provide a maximum of 4 hardware write watchpoints. Precise read watchpoints or access watchpoints are not supported.
The x86 architecture provides 4 hardware watchpoints that can each monitor up to 8 bytes.
When ROCGDB is used with x86 and AMD GPU devices, hardware watchpoints are therefore limited to at most 4 write watchpoints that have a collective size of up to 32 bytes. The collective size is calculated by adding the size of each watchpoint rounded up to a multiple of 8 bytes. Software emulation will be used for watchpoints that exceed the hardware limitations.
Currently, watchpoints are only created on the CPU, and not the AMD
GPU, until the AMD ROCm runtime is initialized. With
deferred code object loading disabled this does not happen until the
inferior reaches the beginning of the main
function. With
deferred code object loading enabled this does not happen until the
first kernel is executed. This also means that, when the inferior is
re-run, watchpoints are only re-activated on the CPU, not on the AMD
GPU.
HSA_ENABLE_SDMA
environment variable can be set to ‘0’
to disable the AMD ROCm runtime from using DMA for
transfers between the CPU and AMD GPU.
scheduler-locking
after the whole program stopped, and then
resume an AMD GPU thread. For example:
Thread 6 hit Breakpoint 1, with lanes [0-63], kernel () at test.cpp:38 38 size_t l = 0; (gdb) info threads Id Target Id Frame 1 Thread 0x7ffff6493880 (LWP 2222574) 0x00007ffff6cb989b in sched_yield () at ../sysdeps/unix/syscall-template.S:78 2 Thread 0x7ffff6492700 (LWP 2222582) 0x00007ffff6ccb50b in ioctl () at ../sysdeps/unix/syscall-template.S:78 4 Thread 0x7ffff5aff700 (LWP 2222584) 0x00007ffff6ccb50b in ioctl () at ../sysdeps/unix/syscall-template.S:78 5 Thread 0x7ffff515d700 (LWP 2222585) 0x00007ffff6764d81 in rocr::core::InterruptSignal::WaitRelaxed() from /opt/rocm/lib/libhsa-runtime64.so.1 * 6 AMDGPU Wave 1:1:1:1 (0,0,0)/0 kernel () at test.cpp:38 (gdb) del 1 (gdb) set scheduler-locking on (gdb) c Continuing. ^C
Above, ROCGDB does not respond to ‘Ctrl-C’. The only way to unblock the situation is to kill the ROCGDB process.
The HSA_LOADER_ENABLE_MMAP_URI
environment variable can be used
to request that the AMD ROCm runtime attempt to determine
the file containing the code object memory so that ‘file://’ URIs
can be reported.
For example,
export HSA_LOADER_ENABLE_MMAP_URI=1
gdbserver
is not supported.
SIGSEGV
signal may prevent the
wavefront from being put in the halt state, but the AMD ROCm
runtime may still put the associated queue into the queue error state.
Suppressing delivering some signals, such as SIGSEGV
, for a
wavefront may also suppress the same signal raised by other
AMD GPU hardware such as from DMA or from the
packet processor, preventing the AMD ROCm runtime
being notified.
See AMD GPU Signals.
For example,
(gdb) info threads Id Target Id Frame * 1 Thread 0x7ffff6987840 (LWP 62056) "bit_extract" 0x00007ffff6da489b in sched_yield () at ../sysdeps/unix/syscall-template.S:78 2 Thread 0x7ffff6986700 (LWP 62064) "bit_extract" 0x00007ffff6db650b in ioctl () at ../sysdeps/unix/syscall-template.S:78 3 Thread 0x7ffff5f7f700 (LWP 62066) "bit_extract" 0x00007ffff6db650b in ioctl () at ../sysdeps/unix/syscall-template.S:78 4 Thread 0x7ffff597f700 (LWP 62067) "bit_extract" 0x00007ffff6db650b in ioctl () at ../sysdeps/unix/syscall-template.S:78 5 AMDGPU Wave 1:2:?:1 (?,?,?)/? "bit_extract" bit_extract_kernel (C_d=<optimized out>, A_d=<optimized out>, N=<optimized out>) at bit_extract.cpp:41
This does not affect wavefronts created while ROCGDB is attached which are always capable of reporting this information.
If the HSA_ENABLE_DEBUG
environment variable is set to ‘1’
when the AMD ROCm runtime is initialized, then this
information will be available for all archiectures even for wavefronts
created when ROCGDB was not attached. Setting this environment
variable may very marginally reduce wavefront launch latency for some
architectures for very short lived wavefronts.
Couldn't get registers: No such process.
region
address space. A memory access error is reported.
DX10_CLAMP
bit set in
the MODE
register, enabled arithmetic exceptions will not be
reported as SIGFPE
signals. This happens if the
DX10_CLAMP
kernel descriptor field is enabled.
See AMD GPU Signals.
Previous: S12Z, Up: Architectures [Contents][Index]