Next: , Previous: , Up: Debugging with ROCGDB   [Contents][Index]


20 Debugging Heterogeneous Programs

Note: The commands presented in this chapter are not currently fully implemented. See AMD GPU for the current support available.

In some operating systems, such as Linux with the AMD ROCm platform installed, a single program may have multiple threads in the same process, executing on different devices which may have different target architectures. Such a system is termed a heterogeneous system and a program that uses the multiple devices is termed a heterogeneous program.

The multiple devices of a heterogeneous system are termed heterogeneous agents. They can include the following kinds of devices: CPU (Central Processing Unit), GPU (Graphics Processing Unit), DSP (Digital Signal Processor), FPGA (Field Programmable Gate Array), as well as other specialized hardware.

The device of a heterogeneous system that starts the execution of the program is termed the heterogeneous host agent.

The precise way threads are created on different heterogeneous agents may vary from one heterogeneous system to another, but in general the threads behave similarly no matter what heterogeneous agent is executing them, except that the target architecture may be different.

A heterogeneous program can create heterogeneous queues associated with a heterogeneous agent. The heterogeneous program can then place heterogeneous packets on a heterogeneous queue to control the actions of the associated heterogeneous agent. A heterogeneous agent removes heterogeneous packets from the heterogeneous queues assocated with it and performs the requested actions. The packet actions and scheduling of packet processing varies depending on the heterogeneous system and the target architecture of the heterogeneous agent. See Architectures.

A heterogeneous dispatch packet is used to initiate code execution on a heterogeneous agent. A single heterogeneous dispatch packet may specify that the heterogeneous agent create a set of threads that are all associated with a corresponding heterogeneous dispatch. Each thread typically has an associated position within the heterogeneous dispatch, possibly expressed as a multi-dimensional grid position. The heterogeneous agent typically can create multiple threads that execute concurrently. If a heterogeneous dispatch is larger than the number of concurrent threads that can be created, the heterogeneous agent creates threads of the heterogeneous dispatch as other threads complete. When all the threads of a heterogeneous dispatch have been created and have completed, the heterogeneous dispatch is considered complete.

The threads of a heterogeneous dispatch may be grouped into heterogeneous work-groups. The threads that belong to the same heterogeneous work-group may have special shared memory, and efficient execution synchronization abilities. A thread that is part of a heterogeneous work-group typically has an associated position within the heterogeneous work-group, possibly also expressed as a multi-dimensional grid position.

Other heterogeneous packets may control heterogeneous packet scheduling, memory visibility between the threads of a heterogeneous dispatch and other threads, or other services supported by the heterogeneous system.

On some heterogeneous systems there can be heterogeneous agents that support SIMD (Single Instruction Multiple Data) or SIMT (Single Instruction Multiple Threads) machine instructions. On these target architectures, a single machine instruction can operate in parallel on multiple heterogeneous lanes.

When an heterogeneous lane is not associated with any work-group, it is said to be unused. For example, the left over lanes when the work-group size is not a multiple of the lanes in a SIMD/SIMT thread, or when the grid size is not a multiple of the work-group size (resulting in partial work-groups on the dimension edges of the grid). ROCGDB hides unused lanes by default.

Source languages used by heterogeneous programs can be implemented on target architectures that support multiple heterogeneous lanes by mapping a source language thread of execution onto a heterogeneous lane of a single target architecture thread. Control flow in the source language may be implemented by controlling which heterogeneous lanes are active. If the source language control flow may result in some heterogeneous lanes becoming inactive while some remain active, the control flow is said to be divergent. Typically, the machine code may execute different divergent paths for different sets of heterogeneous lanes, before the control flow reconverges and all heterogeneous lanes become active.

Just because a target architecture supports multiple lanes, does not mean that the source language is mapped to use them to implement source language threads of execution. Therefore, a thread is only considered to have multiple heterogeneous lanes if it’s current frame corresponds to a source language that does do such a mapping.

On some heterogeneous systems there can be heterogeneous agents with target architectures that support multiple address spaces. In these target architectures, there may be memory that is physically disjoint from regular global virtual memory. There can also be cases when the same underlying memory can be accessed using linear addresses that map to the underlying physical memory in an interleaved manner. In these target architectures there can be distinct machine instructions to access the distinct address spaces. For example, there may be physical hardware scratch pad memory that is allocated and accessible only to the threads that are associated with the same heterogeneous work-group. There may be hardware address swizzle logic that allows regular global virtual memory to be allocated per heterogeneous lane such that they have a linear address view, which in fact maps to an interleaved global virtual memory access to improve cache performance.

ROCGDB provides these facilities for debugging heterogeneous programs:

A heterogeneous system may use separate code objects for the different target architectures of the heterogeneous agents. The info sharedlibrary command lists all the code objects currently loaded, regardless of their target architecture.

The following rules apply in determining the target architecture used by commands when debugging heterogeneous programs:

  1. Typically the target architecture of the heterogeneous host agent is the target architecture of the program’s code object. The set architecture command (see Specifying a Debugging Target) can be used to change this target architecture. The target architecture of other heterogeneous agents is typically the target architecture of the associated device.
  2. The target architecture of a thread is the target architecture of the selected stack frame. Typically stack frames will have the same target architecture as the heterogeneous agent on which the thread was created, however, a target may assocociate different target architectures for different stack frames.
  3. The current target architecture is the target architecture of the selected thread, or the target architecture of the heterogeneous host agent if there are no threads.

ROCGDB handles the heterogeneous agent, queue, and dispatch entities in a similar manner to threads (see Debugging Programs with Multiple Threads):

Each lane of a thread is assigned a single number, also known as lane ID. This is a single integer, which is an index number starting at 0, going up to the number of hardware-supported lanes per thread. This ID is unique per thread, not global, and thus multiple threads can have lanes with the same ID.

Some commands accept a space-separated lane ID list as argument. A list element can be:

  1. A lane ID as shown in the first field of the ‘info lanes’ display. E.g., ‘1’.
  2. A range of lane IDs, as in lane1-lane2. E.g., ‘0-20’.

The following debugger convenience variables (see Convenience Variables) are related to heterogeneous debugging. You may find these useful in writing breakpoint conditional expressions, command scripts, and so forth.

$_thread
$_gthread
$_thread_systag
$_thread_name

See Convenience Variables.

$_agent
$_queue
$_dispatch

There are debugger convenience variables that contain the number of each heterogeneous entity associated with the current thread if it was created by a heterogeneous dispatch, or 0 otherwise. $_agent, $_queue, and $_dispatch contain the corresponding per-inferior heterogeneous entity number.

$_lane

The heterogeneous lane number of the current lane of the current thread. If the current thread does not support SIMD/SIMT lanes, it is treated as if it has a single lane.

$_dispatch_pos

The heterogeneous dispatch position string of the current thread within its associated heterogeneous dispatch if it is was created by a heterogeneous dispatch, or the empty string otherwise. The format varies depending on the heterogeneous system and target architecture of the heterogeneous agent. See Architectures.

$_lane_name

The heterogeneous lane name string of the current heterogeneous lane, or the empty string if no name has been assigned by the lane name command.

$_thread_workgroup_pos
$_lane_workgroup_pos

The heterogeneous work-group position string of the current thread or heterogeneous lane within its associated heterogeneous dispatch if it is was created by a heterogeneous dispatch, or the empty string otherwise. The format varies depending on the heterogeneous system and target architecture of the heterogeneous agent. See Architectures.

$_lane_systag

The target system’s heterogeneous lane identifier (lane_systag) string of the current heterogeneous lane. See target system lane identifier.

The target system’s heterogeneous agent identifier (agent_systag) string of the heterogeneous agent of the current thread.

The target system’s heterogeneous queue identifier (queue_systag) string of the heterogeneous queue of the current thread.

$_dispatch_systag

The target system’s heterogeneous dispatch identifier (dispatch_systag) string of the heterogeneous dispatch of the current thread.

The following debugger convenience functions (see Convenience Functions) are related to heterogeneous debugging. Given the very large number of threads on heterogeneous systems, these may be very useful. They allow threads or thread lists to be specified based on the target system’s thread identifier (systag) or thread name, and allow heterogeneous lanes or heterogeneous lane lists to be specified based on the target system’s heterogeneous lane identifier (lane_systag) or heterogeneous lane name.

$_thread_find
$_thread_find_first_gid

See Convenience Functions.

$_lane_find(regex)

Searches for heterogeneous lanes whose name or lane_systag matches the supplied regular expression. The syntax of the regular expression is that specified by Python’s regular expression support.

Returns a string that is the space separated list of per-inferior heterogeneous lane numbers of the found heterogeneous lanes. If debugging multiple inferiors, the heterogeneous lane numbers are qualified with the inferior number. If no heterogeneous lane are found, the empty string is returned. The string can be used in commands that accept a heterogeneous lane ID list. See heterogeneous entity ID list.

For example, the following command lists all heterogeneous lanes that are part of a heterogeneous work-group with work-group position ‘(1,2,3)’ (see Debugging Heterogeneous Programs):

(gdb) info lanes $_thread_find ("(1,2,3)")
$_lane_find_first_gid(regex)

Similar to the $_lane_find convenience function, except it returns a number that is the global heterogeneous lane number of one of the heterogeneous lanes found, or 0 if no heterogeneous lanes were found. The number can be used in commands that accept a global heterogeneous lane number. See global heterogeneous entity numbers.

For example, the following command sets the current heterogeneous lane to one of the heterogeneous lanes that are part of a heterogeneous work-group with work-item position ‘[1,2,3]’:

(gdb) lane -gid $_lane_find_first_gid ("[1,2,3]")

The following commands are related to heterogeneous debugging:

info agents [agent-id-list]

The info agents command lists the following information for each heterogeneous agent (in this order):

  1. The heterogeneous agent ID. See qualified heterogeneous entity numbers.
  2. The target system’s heterogeneous agent state. Agents that support debugging are marked with ‘A’ for active. ‘U’ marks unsupported agents which cannot be debugged and so do not report any queues, dispatches, or wavefronts.
  3. The target system’s heterogeneous agent identifier.
  4. The target system’s architecture name.
  5. The target system’s device name.
  6. The number of compute unit cores.
  7. The total number of threads. The number of threads that a core can execute concurrently is dependent on the target architecture of the device.
  8. The target-specific string identifying the location of the agent. For example, some target agents may use the PCI slot number in BDF (Bus:Device.Function) notation.

An asterisk ‘*’ to the left of the ROCGDB heterogeneous agent number indicates the heterogeneous agent executing the current thread.

Some heterogeneous agents may not be listed until the inferior has started execution of the program.

With no arguments displays information about all heterogeneous agents. You can specify the list of heterogeneous agents that you want to display using the heterogeneous entity ID list syntax (see heterogeneous entity ID list).

For example,

(gdb) info agents
  Id State Target Id                  Architecture Device Name Cores Threads Location
* 1  A     AMDGPU Agent (GPUID 45151) gfx906       vega20      240   2400    0a:00.0
  2  A     AMDGPU Agent (GPUID 39113) gfx906       vega20      240   2400    44:00.0

If you’re debugging multiple inferiors, ROCGDB displays heterogeneous agent IDs using the qualified inferior-num.agent-num format. Otherwise, only agent-num is shown.

info queues [queue-id-list]

The info queues command lists the following information for each heterogeneous queue (in this order):

  1. The heterogeneous queue ID. See qualified heterogeneous entity numbers.
  2. The target system’s heterogeneous queue identifier.
  3. The type of the heterogeneous queue. The meaning of the queue types is target architecture and operating system depentent.
  4. The target system’s heterogeneous packet identifier for the next packet that will be read from the heterogeneous queue by the device.
  5. The target system’s heterogeneous packet identifier for the next packet that will be written to the heterogeneous queue for submission to the device.
  6. The size in bytes of the heterogeneous queue packet buffer.
  7. The global memory address of the heterogeneous queue packet buffer.

An asterisk ‘*’ to the left of the ROCGDB heterogeneous queue number indicates the heterogeneous queue executing the current thread.

With no arguments displays information about all heterogeneous queues. You can specify the list of heterogeneous queues that you want to display using the heterogeneous entity ID list syntax (see heterogeneous entity ID list).

For example,

(gdb) info queues
  Id   Target Id                Type         Read   Write  Size     Address
* 1    AMDGPU Queue 1:1 (QID 1) HSA (Multi)  0      2      65536    0x00007ffff7f60000
  2    AMDGPU Queue 1:2 (QID 2) DMA                        1048576  0x00007ffde4e00000
  3    AMDGPU Queue 1:3 (QID 0) HSA (Multi)  4      4      262144   0x00007ffff7f00000

If you’re debugging multiple inferiors, ROCGDB displays heterogeneous queue IDs using the qualified inferior-num.queue-num format. Otherwise, only queue-num is shown.

info dispatches [-full] [dispatch-id-list]

The info dispatches command lists the following information for each heterogeneous dispatch (in this order):

  1. The heterogeneous dispatch ID. See qualified heterogeneous entity numbers.
  2. The target system’s identifier of the dispatch packet that initiated the dispatch.
  3. The x, y, and z heterogeneous grid dimensions, in that order, in terms of work-items of the heterogeneous dispatch. The number of dimensions displayed matches the dimensionality of the heterogeneous dispatch.
  4. The x, y, and z heterogeneous work-group dimensions, in that order, in terms of work-items of the heterogeneous dispatch. The number of dimensions displayed matches the dimensionality of the heterogeneous dispatch.
  5. The fences associated with the heterogeneous dispatch packet. The kinds of fences are target architecture dependent.
  6. The size in bytes of address spaces associated with the heterogeneous dispatch. The address spaces are target architecture dependent. Only displayed if the -full option is specified.
  7. The global memory address of the kernel descriptor for the heterogeneous dispatch. The descriptor is target architecture dependent. Only displayed if the -full option is specified.
  8. The global memory address of the kernel arguments for the heterogeneous dispatch. Only displayed if the -full option is specified.
  9. The global memory address of the completion event for the heterogeneous dispatch. Omitted if the heterogeneous dispatch has no completion event. The event is target architecture and operating system dependent. Only displayed if the -full option is specified.
  10. The kernel function prototype.

An asterisk ‘*’ to the left of the ROCGDB heterogeneous dispatch number indicates the heterogeneous dispatch executing the current thread.

With no arguments displays information about all heterogeneous dispatches. You can specify the list of heterogeneous dispatches that you want to display using the heterogeneous entity ID list syntax (see heterogeneous entity ID list).

For example,

(gdb) info dispatches -full
  Id   Target Id                      Grid      Workgroup Fence   Address Spaces          Kernel Descriptor  Kernel Args        Completion Signal  Kernel Function
* 1    AMDGPU Dispatch 1:1:1 (PKID 0) [256,1,1] [128,1,1] B|As    Shared(0), Private(220) 0x00007ffde5409800 0x00007ffff7e00000 (nil)              bit_extract_kernel(unsigned int*, unsigned int const*, unsigned long)

If you’re debugging multiple inferiors, ROCGDB displays heterogeneous dispatch IDs using the qualified inferior-num.dispatch-num format. Otherwise, only dispatch-num is shown.

queue find regexp
dispatch find regexp

These commands operate the same way as the ‘thread find’ command (see thread find) except that they use the target system’s heterogeneous agent, queue, and dispatch identifiers respectively.

info packets [queue-id-list]

Display information about the heterogeneous packets on one or more heterogeneous queues. With no arguments displays information about all heterogeneous queues. You can specify the list of heterogeneous queues that you want to display using the heterogeneous queue ID list syntax (see heterogeneous entity ID list).

Since heterogeneous agents may be processing heterogeneous packets asynchronously, the display is at best a snapshot, and may be inconsistent due to the heterogeneous queues being updated while they are being inspected.

The heterogeneous packets are listed contiguously for each heterogeneous agent, and for each heterogeneous queue of that heterogeneous agent, with the oldest packet first.

ROCGDB displays for each heterogeneous packet (in this order):

  1. The associated heterogeneous agent ID. See qualified heterogeneous entity numbers.
  2. The associated heterogeneous queue ID. See qualified heterogeneous entity numbers.
  3. The packet position in the heterogeneous queue, with the oldest one being 1.
  4. Additional information about the heterogeneous packet that varies depending on the heterogeneous system and may vary depending on the target architecture of the heterogeneous entity (see Architectures).
info threads [-gid] [thread-id-list]

The info threads command (see Debugging Programs with Multiple Threads) lists the threads created on all the heterogeneous agents.

If any of the threads listed have multiple heterogeneous lanes, then an additional Lanes column is displayed before the target system’s thread identifier (systag) column. For threads that have multiple heterogeneous lanes, the number of heterogeneous lanes that are active followed by a slash and the total number of heterogeneous lanes of the current frame of the thread is displayed. Otherwise, nothing is displayed.

The target system’s thread identifier (systag) (see target system thread identifier) for threads associated with heterogeneous dispatches varies depending on the heterogeneous system and target architecture of the heterogeneous agent. However, it typically will include information about the heterogeneous agent, heterogeneous queue, heterogeneous dispatch, heterogeneous work-group position within the heterogeneous dispatch, and thread position within the heterogeneous work-group. See Architectures.

The stack frame summary displayed is for the active lanes of the thread. This may differ from the stack frame information for the current lane if the focus is on an inactive lane. Use the info lanes command for information about individual lanes of a thread. See Debugging Programs with Multiple Threads.

For example,

(gdb) info threads
  Id  Lanes  Target Id                                       Frame
  1          Thread 0x7ffff7fc4cc0 (LWP 74764) "bit_extract" 0x00007ffff6b56f37 in sched_yield () from /lib/x86_64-linux-gnu/libc.so.6
  2          Thread 0x7ffff59cb700 (LWP 74773) "bit_extract" 0x00007ffff6b696d7 in ioctl () from /lib/x86_64-linux-gnu/libc.so.6
  4          Thread 0x7ffff7fc1700 (LWP 74775) "bit_extract" 0x00007ffff6b696d7 in ioctl () from /lib/x86_64-linux-gnu/libc.so.6
* 5   62/64  AMDGPU Wave 1:1:1:1 (0,0,0)/0 "bit_extract"     bit_extract_kernel (C_d=0x7ffde8800000, A_d=0x7ffde8e00000, N=4000000) at bit_extract.cpp:38
  6   2/64   AMDGPU Wave 1:1:1:2 (0,0,0)/1 "bit_extract"     __hip_get_block_dim_x () at /opt/rocm-3.8.0-3471/hip/include/hip/hcc_detail/hip_runtime.h:462
  7   64/64  AMDGPU Wave 1:1:1:3 (1,0,0)/0 "bit_extract"     __hip_get_block_dim_x () at /opt/rocm-3.8.0-3471/hip/include/hip/hcc_detail/hip_runtime.h:462
  8   8/64   AMDGPU Wave 1:1:1:4 (1,0,0)/1 "bit_extract"     __hip_get_block_dim_x () at /opt/rocm-3.8.0-3471/hip/include/hip/hcc_detail/hip_runtime.h:462
thread [-gid] thread-id [lane-index]

The thread command has an optional lane-index argument to specify the heterogeneous lane index. If the value is not between 1 and the number of heterogeneous lanes of the current frame of the thread, then ROCGDB will print an error. If omitted it defaults to 1.

The current thread is set to thread-id and the current heterogeneous lane is set to the heterogeneous lane corresponding to the specified heterogeneous lane index.

If the thread has multiple heterogeneous lanes, ROCGDB responds by displaying the system identifier of the heterogeneous lane you selected, otherwise it responds with the system identifier of the thread you selected, followed by its current stack frame summary.

thread apply [thread-id-list | all [-ascending]] [flag]… command
taas [option]… command
tfaas [option]… command
thread name [name]

These commands operate the same way for all threads, regardless of whether or not the thread is associated with a heterogeneous dispatch.

If the thread’s frame has multiple heterogeneous lanes then the heterogeneous lane index 1 is used. Use the heterogeneous lane counterpart commands if it is desired to perform the the command on each lane of a thread.

See Debugging Programs with Multiple Threads.

thread find regexp

In addition to searching thread information, if a thread’s frame has multiple heterogeneous lanes then the command also searches the thread’s lanes for those whose name or systag matches the supplied regular expression.

See Debugging Programs with Multiple Threads.

info lanes [-all | -active | -inactived] lane-id-list

Display information about one or more heterogeneous lanes of the current thread. With no arguments displays information about all used lanes of the current thread. You can specify the list of lanes that you want to display using the lane ID list syntax (see lane ID list).

By default, ROCGDB lists all the used lanes of the thread; lanes which are not associated with any work-group (see unused heterogeneous lane) are not displayed. The following options can be used to fine-tune this behavior:

-all

All lanes, including active, inactive and unused lanes.

-active

Only active lanes.

-inactive

Only used inactive lanes.

ROCGDB displays for each heterogeneous lane (in this order):

  1. The heterogeneous lane ID assigned by ROCGDB. See lane ID.
  2. The state of the heterogeneous lane. If a thread is stopped, then its lanes can be either in the ‘A’ (active), or ‘I’ (inactive) states. An unused lane (see unused heterogeneous lane) is always shown in the ‘U’ (unused) state. When a thread is running (as opposed to stopped), whether one of its lanes is active or inactive can’t be determined, as the thread is executing instructions and may be constinuously changing its lanes between active and inactive states. Therefore, when a thread is running, all its used lanes are in the ‘R’ (running) state.
  3. The target system’s heterogeneous lane identifier (lane_systag). This varies depending on the system and target architecture of the heterogeneous agent. However, for heterogeneous agents it typically will include information about the heterogeneous agent, heterogeneous queue, heterogeneous dispatch, heterogeneous work-group position within the heterogeneous dispatch, and position of the heterogeneous lane in the heterogeneous work-group. See Architectures.
  4. The heterogeneous lane’s name, if one is assigned by the user (see lane name, below).
  5. The current stack frame summary for that heterogeneous lane. If the heterogeneous lane is inactive, and it can be determined from the debug information, this is the source position at which the heterogeneous lane will resume. If, on the other hand, it cannot be determined, it is “(inactive)”. If the lane is running, it is “(running)”.

An asterisk ‘*’ to the left of the ROCGDB heterogeneous lane number indicates the current heterogeneous lane.

For example,

(gdb) info lanes 1-2
  Id  State   Target Id                                Frame
* 1   A       AMDGPU Lane 1:2:3:463/2 (2,3,4)[1,2,4]   0x34e5 in saxpy ()
  2   I       AMDGPU Lane 1:2:4:456/12 (2,4,4)[1,2,3]  (inactive)
lane lane-id

Make lane ID lane-id the current lane of the current thread. The command argument lane-id is the ROCGDB per-thread lane ID (see lane ID), as shown in the first field of the info lanes display.

ROCGDB responds by displaying the system identifier of the heterogeneous lane you selected, and its current stack frame summary:

(gdb) lane 2
[Switching to thread 5, lane 2 (AMDGPU Lane 1:1:1:1/2 (0,0,0)[2,0,0])]
#0  some_function (ignore=0x0) at example.c:8
8	    printf ("hello\n");

As with the ‘[New …]’ message, the form of the text in parentheses after ‘Switching to’ depends on your system’s conventions for identifying heterogeneous lanes.

If you switch to an inactive lane, ROCGDB displays a warning, so that you are made aware that values of local variables, function arguments, etc. will have meaningless values. For example:

(gdb) lane 1
[Switching to thread 6, lane 1 (AMDGPU Lane 1:1:1:2/1 (0,0,0)[65,0,0])]
warning: Current lane is inactive.
#0  some_function (ignore=0x0) at example.c:8
8	    printf ("hello\n");

See description of how execution commands and heterogeneous debugging interact, for what happens if you run a stepping command with an inactive lane selected.

lane name [name]

This command assigns a name to the current heterogeneous lane. If no argument is given, any existing user-specified name is removed. The heterogeneous lane name appears in the info lanes display.

lane find [regexp]

Search for and display heterogeneous lane ids whose name or lane_systag matches the supplied regular expression. The syntax of the regular expression is that specified by Python’s regular expression support.

As well as being the complement to the lane name command, this command also allows you to identify a heterogeneous lane by its target lane_systag. For instance, on the AMD ROCm platform, the target lane_systag is the heterogeneous agent, heterogeneous queue, heterogeneous dispatch, heterogeneous work-group position and heterogeneous work-item position.

(gdb) lane find "work-group(2,3,4)"
Lane 2 has lane id 'ROCm process 35 agent 1 queue 2 dispatch 3 work-group(2,3,4) work-item(1,2,4)'
(gdb) info lane 2
  Id  Thread  Active  Target Id                                Frame
  2   5/2     Y       AMDGPU Lane 1:2:3:324/2 (2,3,4)[]1,2,4]  0x34e5 in saxpy ()
lane apply [lane-id-list | all] [-all | -active | -inactive] [flag]… command
laas [option]… command
lfaas [option]… command

lane apply, laas, and lfass commands are similar to their thread counterparts thread apply, taas, and tfaas respectively, except they operate on heterogeneous lanes. See Debugging Programs with Multiple Threads.

By default, ROCGDB operates on all the used lanes of the thread. The following flags can be used to fine-tune this behavior:

-all

All lanes, including active, inactive and unused lanes.

-active

Only active lanes.

-inactive

Only used inactive lanes.

The flag arguments control what output to produce and how to handle errors raised when applying command to a lane. See info threads command (see Debugging Programs with Multiple Threads) for available flags and their meaning.

backtrace [option]… [qualifier]… [count]
frame [ frame-selection-spec ]
frame apply [all | count | -count | level level…] [option]… command
select-frame [ frame-selection-spec ]
up-silently n
down-silently n
info frame
info args [-q] [-t type_regexp] [regexp]
info locals [-q] [-t type_regexp] [regexp]
faas command

The frame commands apply to the current heterogeneous lane.

If the frame is switched from one that has multiple heterogeneous lanes to one with fewer (including only one) then the current lane is switched to the heterogeneous lane corresponding to the highest heterogeneous lane index of the new frame and ROCGDB responds by displaying the system identifier of the heterogeneous lane selected.

See Examining the Stack.

set libthread-db-search-path
show libthread-db-search-path
set debug libthread-db
show debug libthread-db

These commands only apply to threads created on the heterogeneous host agent that are not associated with a heterogeneous dispatch. There are no commands that support reporting of heterogeneous dispatch thread events.

x/i
display/i

The x/i and display/i commands (see Examining Memory) can be used to disassemble machine instructions. They use the current target architecture.

disassemble

The disassemble command (see Source and Machine Code) can also be used to disassemble machine instructions. If the start address of the range is within a loaded code object, then the target architecture of the code object is used. Otherwise, the current target architecture is used.

info registers
info all-registers
maint print reggroups

The register commands display information about the current architecture.

print

The print command evaluates the source language expression in the context of the current heterogeneous lane.

step
next
finish
until
stepi
nexti

Execution commands such as step, next, finish and until (see Continuing and Stepping) automatically skip over code regions where the current heterogeneous lane becomes inactive. Similarly, if the current heterogeneous lane is inactive when the execution command is entered, ROCGDB advances the thread until the current lane becomes active.

At the hardware level, all lanes of a thread execute instructions in lock-step, though some lanes may be set inactive (as result of special instructions emitted by the compiler) so that instructions do not have an effect in them. Thus, stepping a lane may cause other active heterogeneous lanes of the same thread to also execute code. This may even result in other heterogeneous lanes completing whole functions.

If the current heterogeneous lane is set to an inactive heterogeneous lane, then the stepi and nexti commands (see Continuing and Stepping) may not cause the source position to appear to move until execution reaches a point that makes the current heterogeneous lane active. However, other heterogeneous lanes of the same thread will advance.

break [-lane lane-index] [location] [if cond]
tbreak [-lane lane-index] [location] [if cond]
hbreak [-lane lane-index] [location] [if cond]
thbreak [-lane lane-index] [location] [if cond]
rbreak [-lane lane-index] regex
info breakpoints [list]
watch [-lane lane-index] [-l|-location] expr [thread thread-id] [mask maskvalue]
rwatch [-lane lane-index] [-l|-location] expr [thread thread-id] [mask maskvalue]
awatch [-lane lane-index] [-l|-location] expr [thread thread-id] [mask maskvalue]
info watchpoints [list]
catch [-lane lane-index] event
tcatch [-lane lane-index] event

When a breakpoint, watchpoint, or catchpoint (see Breakpoints; Watchpoints; and Catchpoints) is hit by a frame of a thread with multiple heterogeneous lanes:

  • The breakpoint condition, if present, is evaluated for each active heterogeneous lane.
  • The breakpoint command, if present, is executed for the first active heterogeneous lane that evaluates the breakpoint condition as true. If you want to execute the command for other lanes, you can use the lane apply command.
  • If the breakpoint causes heterogeneous lanes to halt then the current heterogeneous lane is set to the first lane in the set of lanes that are active, and that evaluate the breakpoint condition as true (if there’s a breakpoint condition). ROCGDB responds by displaying the system identifier of the heterogeneous lane selected. The breakpoint hit notification mentions the intersection of which heterogeneous lanes were active and evaluated the condition as true, like so:
    (gdb) c
    Continuing.
    [Switching to thread 5, lane 1 (AMDGPU Lane 1:1:1:1/1 (0,0,0)[1,0,0])]
    
    Thread 5 hit Breakpoint 2, with lanes [1-10 13], func () at example.cpp:48
    

If a heterogeneous lane causes a thread to halt, then the other heterogeneous lanes of the thread will no longer execute even if in non-stop mode.

For break, watch, catch, and their variants, the -lane lane-index option can be specified. This limits ROCGDB to only process breakpoints if the heterogeneous lane has a heterogeneous lane index that matches lane-index.

The info break and info watch commands add a Lane column before the Address column if any breakpoint has a lane-index specified that displays the heterogeneous lane index.

maint set lane-divergence-support
maint show lane-divergence-support

Control whether or not ROCGDB understands lane divergence.

When set to ‘on’, which is the default, execution commands such as step, next, finish and until automatically skip over code regions where the selected lane is inactive, and ROCGDB ignores breakpoint hits that trigger with all lanes inactive.

When set to ‘off’, execution commands ignore lane active/inactive state, and may thus present stops in code regions where the selected lane is inactive. Also, breakpoint hits are reported even if all lanes are inactive.

maint print address-spaces

maint print address-spaces displays the address space names supported by the current architecture.

The address spaces info looks like this:

(gdb) maint print address-spaces
 Name
 global
 generic
 local
 private_lane
 private_wave

Any expression that evaluates to an integral value, can be used as an address. It can optionally specify an address space qualifier by prepending an address space name followed by a ‘#’. ROCGDB will print an error if the address space name is not supported by the current architecture. If no address space is specified the current architecture’s default address space (generally the global address space) is used.

The same syntax is used when an address that is not in the current architecture’s default address space is displayed.

For example,

(gdb) x/x local#0x10021608
local#0x10021608:     0x0022fd98

Heterogeneous systems often have very large numbers of threads. Breakpoint conditions can be used to limit the number of threads reporting breakpoint hits. For example,

break kernel_foo if $_streq($_lane_workgroup_pos, "(0,0,0)")

The tbreak command can be used so only one heterogeneous lane will report the breakpoint. Before continuing execution, the breakpoint will need to be set again if necessary.

The set scheduler-locking on command (see Non-Stop Mode) together with the -lane breakpoint option can be used to lock ROCGDB to only resume the current thread, and only report breakpoints for a fixed heterogeneous lane index. This avoids the overhead of resuming a large number of threads every time resuming from a breakpoint, and also avoids the focus being switched to other threads that hit the breakpoints. Note however that other threads will not be executed.

The scheduler locking commands can also be helpful to prevent ROCGDB switching to other threads while concentrating on debugging one particular thread. The non-stop mode can be hepful to prevent the continue command from resuming other threads that are intentionally halted or from cancelling a single step command that is in progress by another thread and resuming it instead. See Non-Stop Mode.


Next: Debugging Remote Programs, Previous: Specifying a Debugging Target, Up: Debugging with ROCGDB   [Contents][Index]