12 User Process Tracing
WARNING:
Oracle Linux 7 is now in Extended Support. See Oracle Linux Extended Support and Oracle Open Source Support Policies for more information.
Migrate applications and data to Oracle Linux 8 or Oracle Linux 9 as soon as possible.
For more information about DTrace, see Oracle Linux: DTrace Release Notes and Oracle Linux: Using DTrace for System Tracing.
DTrace is a powerful tool for understanding the behavior of user processes. DTrace can be invaluable when debugging and analyzing performance problems, or for simply understanding the behavior of a complex application. This chapter focuses on the DTrace facilities that are relevant to tracing user process activity and provides examples that illustrate their use.
copyin and copyinstr Subroutines
DTrace's interaction with processes is slightly different than
most traditional debuggers and observability tools. Many such
tools appear to execute within the scope of the process, allowing
users dereference pointers to program variables directly. Rather
than appearing to execute within or as part of the process itself,
DTrace probes execute in the Oracle Linux kernel. To access process data,
a probe uses the copyin
or
copyinstr
subroutines to copy user process data
into the address space of the kernel.
For example, consider the following write()
system call:
ssize_t write(int fd, const void *buf, size_t nbytes);
The following D program illustrates an incorrect attempt to print
the contents of a string that is passed to the
write()
system call:
syscall::write:entry { printf("%s", stringof(arg1)); /* incorrect use of arg1 */ }
If you attempt to run this script, DTrace produces error messages similar to the following:
dtrace: error on enabled probe ID 1 (ID 37: syscall::write:entry): \ invalid address (0x10038a000) in action #1
The arg1
variable, which contains the value of
the buf parameter, is an address that
refers to memory in the process executing the system call. To read
the string at that address, use the copyinstr
subroutine and record its result with the
printf
action, for example:
syscall::write:entry { printf("%s", copyinstr(arg1)); /* correct use of arg1 */ }
In the previous script, the output shows all of the strings that
are being passed to the write()
system call.
Occasionally, however, you might see irregular output similar to
the following:
0 37 write:entry mada&^%**&
The copyinstr
subroutine acts on an input
argument, which is the user address of a null-terminated ASCII
string, but buffers that are passed to the
write()
system call might refer to binary data
rather than ASCII strings or to ASCII strings that do not include
a terminating null byte. To print only as much of the string as
the caller intended, use the two parameter version of the
copyinstr
subroutine, which includes the size
of the targeted string buffer:
syscall::write:entry { printf("%s", copyinstr(arg1, arg2)); }
Alternatively, you can use the copyin
subroutine, which takes an address and size, for example:
syscall::write:entry { printf("%s", stringof(copyin(arg1, arg2))); }
Note that the stringof
operator is necessary so
that DTrace properly converts the user data that is retrieved by
copyin
to a string. The use of
stringof
is not necessary with the
copyinstr
subroutine because it always returns
the type string
.
Avoiding Errors
The copyin
and copyinstr
subroutines cannot read from user addresses that have not yet
been touched, so even a valid address could cause an error if
the page containing that address has not yet been faulted in by
being accessed. Consider the following example:
# dtrace -n syscall::open:entry'{ trace(copyinstr(arg0)); }' dtrace: description 'syscall::open:entry' matched 1 probe CPU ID FUNCTION:NAME 1 8 open:entry /dev/sr0 1 8 open:entry /var/run/utmp 1 8 open:entry /dev/sr0 dtrace: error on enabled probe ID 2 (ID 8: syscall::open:entry): \ invalid address (0x9af1b) in action #1 at DIF offset 52
In the example output, the application was functioning properly,
and the address in arg0
was valid, but it
referred to a page that had not yet been accessed by the
corresponding process. To resolve this issue, you would need to
wait for the kernel or an application to use the data before
tracing it.
For example, you might wait until the system call returns to
apply copyinstr
, as shown here:
# dtrace -n syscall::open:entry'{ self->file = arg0; }' \ -n syscall::open:return'{ trace(copyinstr(self->file)); self->file = 0; }' dtrace: description 'syscall::open:entry' matched 1 probe dtrace: description 'syscall::open:return' matched 1 probe CPU ID FUNCTION:NAME 0 9 open:return /dev/sr0 1 9 open:return /usr/lib64/gconv/gconv-modules.cache 0 9 open:return /dev/sr0 0 9 open:return public/pickup 1 9 open:return maildrop 1 9 open:return /dev/sr0 1 9 open:return /dev/sr0 1 9 open:return /var/run/utmp ...
Eliminating dtrace Interference
If you trace every call to the write()
system
call, it causes a cascade of output because each call causes the
dtrace command to call
write()
as it displays the output, and so on.
This feedback loop is a good example of how the
dtrace command can interfere with the desired
data. To prevent this type of unwanted data from being traced, use
a simple predicate like the one that is shown in the following
example and save it in a file named stringof.d
:
syscall::write:entry /pid != $pid/ { printf("%s", stringof(copyin(arg1, arg2))); }
In the previous example, the $pid
macro
variable expands to the process identifier of the process that
enabled the probes. The pid
variable contains
the process identifier of the process whose thread was running on
the CPU where the probe was fired. Therefore, the predicate
/pid != $pid/
ensures that the script does not
trace any events related to itself.
Using the syscall Provider
The syscall
provider enables you to trace every
system call entry and return. System calls can be a good starting
point for understanding the behavior of a process, especially if
the process seems to be spending a large amount of time executing
or blocked in the kernel, as shown in the output of commands such
as ps and top.
For example, consider a process with a process ID of 31337 that is consuming a large amount of system time. One possible explanation for this behavior is that the process is executing a large number of system calls. You can specify a simple D program on the command line to see which system calls are happening most often:
# dtrace -n syscall:::entry'/pid == 31337/{ @syscalls[probefunc] = count(); }' dtrace: description ’syscall:::entry’ matched 215 probes ^C kill 1 clone 4 pipe 4 setpgid 4 rt_sigreturn 6 sendmsg 7 socket 7 access 8 getegid 8 geteuid 8 getgid 8 getuid 8 wait4 12 close 15 read 23 newstat 25 write 42 ioctl 65 rt_sigaction 168 rt_sigprocmask 198 write 1092
The previous report shows the system calls that are being called
most often, which in this case, is the write()
system call.
You can use the syscall
provider to further
examine the source of all of the write()
system
calls, for example:
# dtrace -n syscall::write:entry'/pid == 31337/{ @writes = quantize(arg2); }' dtrace: description ’syscall::write:entry’ matched 1 probe ^C value ------------- Distribution ------------- count 0 | 0 1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1037 2 |@ 3 4 | 0 8 | 0 16 | 0 32 |@ 3 64 | 0 128 | 0 256 | 0 512 | 0 1024 |@ 5
The previous output shows that the process is executing many
write()
system calls with a relatively small
amount of data. The ratio could be the source of the performance
problem for this particular process. This example illustrates a
general methodology for investigating system call behavior.
ustack Action
Note:
If you want to perform symbol lookup in a stripped executable, you must specify the --export-dynamic option when linking the program. This option causes the linker to add all symbols to the dynamic symbol table (the set of symbols that are visible from dynamic objects at run time). If you use gcc to link the objects, specify the option as -Wl,--export-dynamic to pass the correct option to the linker.
If you want to look up symbols in shared libraries or unstripped executables, the --export-dynamic option is not required.
Tracing a process thread's stack when a particular probe is
activated is often useful for examining a problem in more detail.
The ustack
action traces the user thread's
stack. For example, if a process that opens many files
occasionally fails in the open()
system call,
you can use the ustack
action to discover the
code path that executes the failed open. Type the following source
code and save it in a file named badopen.d
:
syscall::open:entry /pid == $1/ { self->path = copyinstr(arg0); } syscall::open:return /self->path != NULL && errno != 0/ { printf("open for '%s' failed", self->path); ustack(); }
This script also illustrates the use of the $1
macro variable, which takes the value of the first operand that is
specified on the dtrace command line:
# dtrace -s ./badopen.d 3430 dtrace: script './badopen.d' matched 2 probes CPU ID FUNCTION:NAME 1 489 openat:return open for '/usr/lib/foo' failed libc.so.6`sleep+0xe0 ld-linux-x86-64.so.2`do_lookup_x+0x847 libc.so.6`0x3cb8003630 libc.so.6`0x3cb8003c48 libc.so.6`0x3cb800e2c8 libc.so.6`0x3cb8003c48 looper`0x400612 libc.so.6`getenv+0x2a looper`0x4003c8 looper`0x4009b0 libc.so.6`0x3cb800e2c8 looper`0x4009b0 looper`doOpenLoop+0x33 looper`0x400e9c looper`main+0x5f looper`0x400ea9 libc.so.6`__libc_start_main+0xfd looper`main looper`0x4009b0 looper`__libc_csu_init
The ustack
action records program counter (PC)
values for the stack and the dtrace command
resolves the PC values to symbol names by looking though the
process's symbol tables. If dtrace cannot
resolve the PC value to a symbol, it prints out the value as a
hexadecimal integer.
If a process exits or is killed before the
ustack
data is formatted for output,
dtrace might be unable to convert the PC values
in the stack trace to symbol names and the command displays them
as hexadecimal integers.
uregs[] Array
The uregs[]
array enables you to access
individual user registers. See Table 12-1,
which lists the index constants into the
uregs[]
array for each supported architecture.
The following table lists the index constants into the
uregs[]
array for each supported architecture.
Table 12-1 x86 uregs[] Constants
Constant | Register | Architecture |
---|---|---|
|
program counter register |
x86, AMD64 |
|
stack pointer register |
x86, AMD64 |
|
first return code |
x86, AMD64 |
|
second return code |
x86, AMD64 |
|
|
x86, AMD64 |
|
|
x86, AMD64 |
|
|
x86, AMD64 |
|
|
x86, AMD64 |
|
|
x86, AMD64 |
|
|
x86, AMD64 |
|
|
x86, AMD64 |
|
|
x86, AMD64 |
|
|
x86, AMD64 |
|
|
x86, AMD64 |
|
|
x86, AMD64 |
|
|
x86, AMD64 |
|
|
x86, AMD64 |
|
|
x86, AMD64 |
|
|
x86, AMD64 |
|
|
x86, AMD64 |
|
|
x86, AMD64 |
|
|
x86, AMD64 |
|
|
x86, AMD64 |
|
|
x86, AMD64 |
|
|
AMD64 |
|
|
AMD64 |
|
|
AMD64 |
|
|
AMD64 |
|
|
AMD64 |
|
|
AMD64 |
|
|
AMD64 |
|
|
AMD64 |
|
|
AMD64 |
|
|
AMD64 |
|
|
AMD64 |
|
|
AMD64 |
|
|
AMD64 |
|
|
AMD64 |
|
|
AMD64 |
|
|
AMD64 |
|
|
AMD64 |
Using the pid Provider
The pid
provider enables you to trace any
instruction in a process. Unlike most other providers,
pid
probes are created on demand, based on the
probe descriptions that are found in your D programs. As a result,
no pid
probes are listed in the output of the
dtrace -l command until you enable them.
User Function Boundary Tracing
The simplest mode of operation for the pid
provider is to provide function boundary tracing in user space.
The following example program traces all of the function entries
and returns that are made from a single function. The
$1
macro variable, the first operand on the
command line, is the process ID for the process to trace. The
$2
macro variable, the second operand on the
command line, is the name of the function from which to trace
all function calls. Type the following source code and save it
in a file named userfunc.d
:
#!/usr/sbin/dtrace -s #pragma D option flowindent pid$1::$2:entry { self->trace = 1; } pid$1:::entry, pid$1:::return /self->trace/ { } pid$1::$2:return /self->trace/ { self->trace = 0; }
Type the previous example script and save it in a file named
userfunc.d
, then use the
chmod command to make the file executable.
This script produces output with more details on the principal
buffer:
# ./userfunc.d 123 execute dtrace: script ’./userfunc.d’ matched 11594 probes 0 -> execute 0 -> execute 0 -> Dfix 0 <- Dfix 0 -> s_strsave 0 -> malloc 0 <- malloc 0 <- s_strsave 0 -> set 0 -> malloc 0 <- malloc 0 <- set 0 -> set1 0 -> tglob 0 <- tglob 0 <- set1 0 -> setq 0 -> s_strcmp 0 <- s_strcmp ...
The pid
provider can only be used on
processes that are already running. You can use the
$target
macro variable (see
Scripting) and the
dtrace command with the -c
and -p options to create and grab processes
of interest and instrument them by using DTrace.
For example, you can use the following D script to determine the
distribution of function calls that are made to
libc
by a particular subject process. Type
the following source code and save it in a file named
libc.d
:
pid$target:libc.so::entry { @[probefunc] = count(); }
To determine the distribution of such calls that are made by the
date command, save the script in a file named
libc.d
and run the following command:
# dtrace -s libc.d -c date dtrace: script ’libc.d’ matched 2476 probes Fri Jul 30 14:08:54 PDT 2004 dtrace: pid 109196 has exited pthread_rwlock_unlock 1 _fflush_u 1 rwlock_lock 1 rw_write_held 1 strftime 1 _close 1 _read 1 __open 1 _open 1 strstr 1 load_zoneinfo 1 ... _ti_bind_guard 47 _ti_bind_clear 94
Tracing Arbitrary Instructions
You can use the pid
provider to trace any
instruction in any user function. Upon demand, the
pid
provider creates a probe for every
instruction in a function. The name of each probe is the offset
of its corresponding instruction in the function and is
expressed as a hexadecimal integer. For example, to enable a
probe that is associated with the instruction at offset
0x1c
in the function foo
of the module bar.so
in the process with PID
123, you would use the following command:
# dtrace -n pid123:bar.so:foo:1c
To enable all of the probes in the function
foo
, including the probe for each
instruction, you would use the following command:
# dtrace -n pid123:bar.so:foo:
Using the previous command demonstrates an extremely powerful technique for debugging and analyzing user applications. Infrequent errors can be difficult to debug because they can be difficult to reproduce. Often, you can identify a problem after the failure has occurred, which is too late to reconstruct the code path.
The following example demonstrates how to combine the
pid
provider with speculative tracing to
solve this problem by tracing every instruction in a function.
See Speculative Tracing for a description.
Type the following source code and save it in a file named
errorpath.d
:
pid$1::$2:entry { self->spec = speculation(); speculate(self->spec); printf("%x %x %x %x %x", arg0, arg1, arg2, arg3, arg4); } pid$1::$2: /self->spec/ { speculate(self->spec); } pid$1::$2:return /self->spec && arg1 == 0/ { discard(self->spec); self->spec = 0; } pid$1::$2:return /self->spec && arg1 != 0/ { commit(self->spec); self->spec = 0; }
Executing the errorpath.d
script results in
output similar to the following:
# ./errorpath.d 123 _chdir dtrace: script ’./errorpath.d’ matched 19 probes CPU ID FUNCTION:NAME 0 25253 _chdir:entry 81e08 6d140 ffbfcb20 656c73 0 0 25253 _chdir:entry 0 25269 _chdir:0 0 25270 _chdir:4 0 25271 _chdir:8 0 25272 _chdir:c 0 25273 _chdir:10 0 25274 _chdir:14 0 25275 _chdir:18 0 25276 _chdir:1c 0 25277 _chdir:20 0 25278 _chdir:24 0 25279 _chdir:28 0 25280 _chdir:2c 0 25268 _chdir:return