Next: , Previous: , Up: Examining Data   [Contents][Index]


10.21 Character Sets

If the program you are debugging uses a different character set to represent characters and strings than the one ROCGDB uses itself, ROCGDB can automatically translate between the character sets for you. The character set ROCGDB uses we call the host character set; the one the inferior program uses we call the target character set.

For example, if you are running ROCGDB on a GNU/Linux system, which uses the ISO Latin 1 character set, but you are using ROCGDB’s remote protocol (see Debugging Remote Programs) to debug a program running on an IBM mainframe, which uses the EBCDIC character set, then the host character set is Latin-1, and the target character set is EBCDIC. If you give ROCGDB the command set target-charset EBCDIC-US, then ROCGDB translates between EBCDIC and Latin 1 as you print character or string values, or use character and string literals in expressions.

ROCGDB has no way to automatically recognize which character set the inferior program uses; you must tell it, using the set target-charset command, described below.

Here are the commands for controlling ROCGDB’s character set support:

set target-charset charset

Set the current target character set to charset. To display the list of supported target character sets, type set target-charset TABTAB.

set host-charset charset

Set the current host character set to charset.

By default, ROCGDB uses a host character set appropriate to the system it is running on; you can override that default using the set host-charset command. On some systems, ROCGDB cannot automatically determine the appropriate host character set. In this case, ROCGDB uses ‘UTF-8’.

ROCGDB can only use certain character sets as its host character set. If you type set host-charset TABTAB, ROCGDB will list the host character sets it supports.

set charset charset

Set the current host and target character sets to charset. As above, if you type set charset TABTAB, ROCGDB will list the names of the character sets that can be used for both host and target.

show charset

Show the names of the current host and target character sets.

show host-charset

Show the name of the current host character set.

show target-charset

Show the name of the current target character set.

set target-wide-charset charset

Set the current target’s wide character set to charset. This is the character set used by the target’s wchar_t type. To display the list of supported wide character sets, type set target-wide-charset TABTAB.

show target-wide-charset

Show the name of the current target’s wide character set.

Here is an example of ROCGDB’s character set support in action. Assume that the following source code has been placed in the file charset-test.c:

#include <stdio.h>

char ascii_hello[]
  = {72, 101, 108, 108, 111, 44, 32, 119,
     111, 114, 108, 100, 33, 10, 0};
char ibm1047_hello[]
  = {200, 133, 147, 147, 150, 107, 64, 166,
     150, 153, 147, 132, 90, 37, 0};

main ()
{
  printf ("Hello, world!\n");
}

In this program, ascii_hello and ibm1047_hello are arrays containing the string ‘Hello, world!’ followed by a newline, encoded in the ASCII and IBM1047 character sets.

We compile the program, and invoke the debugger on it:

$ gcc -g charset-test.c -o charset-test
$ gdb -nw charset-test
GNU gdb 2001-12-19-cvs
Copyright 2001 Free Software Foundation, Inc.
…
(gdb)

We can use the show charset command to see what character sets ROCGDB is currently using to interpret and display characters and strings:

(gdb) show charset
The current host and target character set is `ISO-8859-1'.
(gdb)

For the sake of printing this manual, let’s use ASCII as our initial character set:

(gdb) set charset ASCII
(gdb) show charset
The current host and target character set is `ASCII'.
(gdb)

Let’s assume that ASCII is indeed the correct character set for our host system — in other words, let’s assume that if ROCGDB prints characters using the ASCII character set, our terminal will display them properly. Since our current target character set is also ASCII, the contents of ascii_hello print legibly:

(gdb) print ascii_hello
$1 = 0x401698 "Hello, world!\n"
(gdb) print ascii_hello[0]
$2 = 72 'H'
(gdb)

ROCGDB uses the target character set for character and string literals you use in expressions:

(gdb) print '+'
$3 = 43 '+'
(gdb)

The ASCII character set uses the number 43 to encode the ‘+’ character.

ROCGDB relies on the user to tell it which character set the target program uses. If we print ibm1047_hello while our target character set is still ASCII, we get jibberish:

(gdb) print ibm1047_hello
$4 = 0x4016a8 "\310\205\223\223\226k@\246\226\231\223\204Z%"
(gdb) print ibm1047_hello[0]
$5 = 200 '\310'
(gdb)

If we invoke the set target-charset followed by TABTAB, ROCGDB tells us the character sets it supports:

(gdb) set target-charset
ASCII       EBCDIC-US   IBM1047     ISO-8859-1
(gdb) set target-charset

We can select IBM1047 as our target character set, and examine the program’s strings again. Now the ASCII string is wrong, but ROCGDB translates the contents of ibm1047_hello from the target character set, IBM1047, to the host character set, ASCII, and they display correctly:

(gdb) set target-charset IBM1047
(gdb) show charset
The current host character set is `ASCII'.
The current target character set is `IBM1047'.
(gdb) print ascii_hello
$6 = 0x401698 "\110\145%%?\054\040\167?\162%\144\041\012"
(gdb) print ascii_hello[0]
$7 = 72 '\110'
(gdb) print ibm1047_hello
$8 = 0x4016a8 "Hello, world!\n"
(gdb) print ibm1047_hello[0]
$9 = 200 'H'
(gdb)

As above, ROCGDB uses the target character set for character and string literals you use in expressions:

(gdb) print '+'
$10 = 78 '+'
(gdb)

The IBM1047 character set uses the number 78 to encode the ‘+’ character.


Next: Caching Data of Targets, Previous: How to Produce a Core File from Your Program, Up: Examining Data   [Contents][Index]