12 User Process Tracing
WARNING:
Oracle Linux 7 is now in Extended Support. See Oracle Linux Extended Support and Oracle Open Source Support Policies for more information.
Migrate applications and data to Oracle Linux 8 or Oracle Linux 9 as soon as possible.
For more information about DTrace, see Oracle Linux: DTrace Release Notes and Oracle Linux: Using DTrace for System Tracing.
DTrace is a powerful tool for understanding the behavior of user processes. DTrace can be invaluable when debugging and analyzing performance problems, or for simply understanding the behavior of a complex application. This chapter focuses on the DTrace facilities that are relevant to tracing user process activity and provides examples that illustrate their use.
copyin and copyinstr Subroutines
DTrace's interaction with processes is slightly different than
most traditional debuggers and observability tools. Many such
tools appear to execute within the scope of the process, allowing
users dereference pointers to program variables directly. Rather
than appearing to execute within or as part of the process itself,
DTrace probes execute in the Oracle Linux kernel. To access process data,
a probe uses the copyin or
copyinstr subroutines to copy user process data
into the address space of the kernel.
For example, consider the following write()
system call:
ssize_t write(int fd, const void *buf, size_t nbytes);
The following D program illustrates an incorrect attempt to print
the contents of a string that is passed to the
write() system call:
syscall::write:entry
{
printf("%s", stringof(arg1)); /* incorrect use of arg1 */
}If you attempt to run this script, DTrace produces error messages similar to the following:
dtrace: error on enabled probe ID 1 (ID 37: syscall::write:entry): \ invalid address (0x10038a000) in action #1
The arg1 variable, which contains the value of
the buf parameter, is an address that
refers to memory in the process executing the system call. To read
the string at that address, use the copyinstr
subroutine and record its result with the
printf action, for example:
syscall::write:entry
{
printf("%s", copyinstr(arg1)); /* correct use of arg1 */
}
In the previous script, the output shows all of the strings that
are being passed to the write() system call.
Occasionally, however, you might see irregular output similar to
the following:
0 37 write:entry mada&^%**&
The copyinstr subroutine acts on an input
argument, which is the user address of a null-terminated ASCII
string, but buffers that are passed to the
write() system call might refer to binary data
rather than ASCII strings or to ASCII strings that do not include
a terminating null byte. To print only as much of the string as
the caller intended, use the two parameter version of the
copyinstr subroutine, which includes the size
of the targeted string buffer:
syscall::write:entry { printf("%s", copyinstr(arg1, arg2)); }
Alternatively, you can use the copyin
subroutine, which takes an address and size, for example:
syscall::write:entry
{
printf("%s", stringof(copyin(arg1, arg2)));
}
Note that the stringof operator is necessary so
that DTrace properly converts the user data that is retrieved by
copyin to a string. The use of
stringof is not necessary with the
copyinstr subroutine because it always returns
the type string.
Avoiding Errors
The copyin and copyinstr
subroutines cannot read from user addresses that have not yet
been touched, so even a valid address could cause an error if
the page containing that address has not yet been faulted in by
being accessed. Consider the following example:
# dtrace -n syscall::open:entry'{ trace(copyinstr(arg0)); }'
dtrace: description 'syscall::open:entry' matched 1 probe
CPU ID FUNCTION:NAME
1 8 open:entry /dev/sr0
1 8 open:entry /var/run/utmp
1 8 open:entry /dev/sr0
dtrace: error on enabled probe ID 2 (ID 8: syscall::open:entry): \
invalid address (0x9af1b) in action #1 at DIF offset 52
In the example output, the application was functioning properly,
and the address in arg0 was valid, but it
referred to a page that had not yet been accessed by the
corresponding process. To resolve this issue, you would need to
wait for the kernel or an application to use the data before
tracing it.
For example, you might wait until the system call returns to
apply copyinstr, as shown here:
# dtrace -n syscall::open:entry'{ self->file = arg0; }' \
-n syscall::open:return'{ trace(copyinstr(self->file)); self->file = 0; }'
dtrace: description 'syscall::open:entry' matched 1 probe
dtrace: description 'syscall::open:return' matched 1 probe
CPU ID FUNCTION:NAME
0 9 open:return /dev/sr0
1 9 open:return /usr/lib64/gconv/gconv-modules.cache
0 9 open:return /dev/sr0
0 9 open:return public/pickup
1 9 open:return maildrop
1 9 open:return /dev/sr0
1 9 open:return /dev/sr0
1 9 open:return /var/run/utmp
...Eliminating dtrace Interference
If you trace every call to the write() system
call, it causes a cascade of output because each call causes the
dtrace command to call
write() as it displays the output, and so on.
This feedback loop is a good example of how the
dtrace command can interfere with the desired
data. To prevent this type of unwanted data from being traced, use
a simple predicate like the one that is shown in the following
example and save it in a file named stringof.d:
syscall::write:entry
/pid != $pid/
{
printf("%s", stringof(copyin(arg1, arg2)));
}
In the previous example, the $pid macro
variable expands to the process identifier of the process that
enabled the probes. The pid variable contains
the process identifier of the process whose thread was running on
the CPU where the probe was fired. Therefore, the predicate
/pid != $pid/ ensures that the script does not
trace any events related to itself.
Using the syscall Provider
The syscall provider enables you to trace every
system call entry and return. System calls can be a good starting
point for understanding the behavior of a process, especially if
the process seems to be spending a large amount of time executing
or blocked in the kernel, as shown in the output of commands such
as ps and top.
For example, consider a process with a process ID of 31337 that is consuming a large amount of system time. One possible explanation for this behavior is that the process is executing a large number of system calls. You can specify a simple D program on the command line to see which system calls are happening most often:
# dtrace -n syscall:::entry'/pid == 31337/{ @syscalls[probefunc] = count(); }'
dtrace: description ’syscall:::entry’ matched 215 probes
^C
kill 1
clone 4
pipe 4
setpgid 4
rt_sigreturn 6
sendmsg 7
socket 7
access 8
getegid 8
geteuid 8
getgid 8
getuid 8
wait4 12
close 15
read 23
newstat 25
write 42
ioctl 65
rt_sigaction 168
rt_sigprocmask 198
write 1092
The previous report shows the system calls that are being called
most often, which in this case, is the write()
system call.
You can use the syscall provider to further
examine the source of all of the write() system
calls, for example:
# dtrace -n syscall::write:entry'/pid == 31337/{ @writes = quantize(arg2); }'
dtrace: description ’syscall::write:entry’ matched 1 probe
^C
value ------------- Distribution ------------- count
0 | 0
1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1037
2 |@ 3
4 | 0
8 | 0
16 | 0
32 |@ 3
64 | 0
128 | 0
256 | 0
512 | 0
1024 |@ 5
The previous output shows that the process is executing many
write() system calls with a relatively small
amount of data. The ratio could be the source of the performance
problem for this particular process. This example illustrates a
general methodology for investigating system call behavior.
ustack Action
Note:
If you want to perform symbol lookup in a stripped executable, you must specify the --export-dynamic option when linking the program. This option causes the linker to add all symbols to the dynamic symbol table (the set of symbols that are visible from dynamic objects at run time). If you use gcc to link the objects, specify the option as -Wl,--export-dynamic to pass the correct option to the linker.
If you want to look up symbols in shared libraries or unstripped executables, the --export-dynamic option is not required.
Tracing a process thread's stack when a particular probe is
activated is often useful for examining a problem in more detail.
The ustack action traces the user thread's
stack. For example, if a process that opens many files
occasionally fails in the open() system call,
you can use the ustack action to discover the
code path that executes the failed open. Type the following source
code and save it in a file named badopen.d:
syscall::open:entry
/pid == $1/
{
self->path = copyinstr(arg0);
}
syscall::open:return
/self->path != NULL && errno != 0/
{
printf("open for '%s' failed", self->path);
ustack();
}
This script also illustrates the use of the $1
macro variable, which takes the value of the first operand that is
specified on the dtrace command line:
# dtrace -s ./badopen.d 3430
dtrace: script './badopen.d' matched 2 probes
CPU ID FUNCTION:NAME
1 489 openat:return open for '/usr/lib/foo' failed
libc.so.6`sleep+0xe0
ld-linux-x86-64.so.2`do_lookup_x+0x847
libc.so.6`0x3cb8003630
libc.so.6`0x3cb8003c48
libc.so.6`0x3cb800e2c8
libc.so.6`0x3cb8003c48
looper`0x400612
libc.so.6`getenv+0x2a
looper`0x4003c8
looper`0x4009b0
libc.so.6`0x3cb800e2c8
looper`0x4009b0
looper`doOpenLoop+0x33
looper`0x400e9c
looper`main+0x5f
looper`0x400ea9
libc.so.6`__libc_start_main+0xfd
looper`main
looper`0x4009b0
looper`__libc_csu_init
The ustack action records program counter (PC)
values for the stack and the dtrace command
resolves the PC values to symbol names by looking though the
process's symbol tables. If dtrace cannot
resolve the PC value to a symbol, it prints out the value as a
hexadecimal integer.
If a process exits or is killed before the
ustack data is formatted for output,
dtrace might be unable to convert the PC values
in the stack trace to symbol names and the command displays them
as hexadecimal integers.
uregs[] Array
The uregs[] array enables you to access
individual user registers. See Table 12-1,
which lists the index constants into the
uregs[] array for each supported architecture.
The following table lists the index constants into the
uregs[] array for each supported architecture.
Table 12-1 x86 uregs[] Constants
| Constant | Register | Architecture |
|---|---|---|
|
|
program counter register |
x86, AMD64 |
|
|
stack pointer register |
x86, AMD64 |
|
|
first return code |
x86, AMD64 |
|
|
second return code |
x86, AMD64 |
|
|
|
x86, AMD64 |
|
|
|
x86, AMD64 |
|
|
|
x86, AMD64 |
|
|
|
x86, AMD64 |
|
|
|
x86, AMD64 |
|
|
|
x86, AMD64 |
|
|
|
x86, AMD64 |
|
|
|
x86, AMD64 |
|
|
|
x86, AMD64 |
|
|
|
x86, AMD64 |
|
|
|
x86, AMD64 |
|
|
|
x86, AMD64 |
|
|
|
x86, AMD64 |
|
|
|
x86, AMD64 |
|
|
|
x86, AMD64 |
|
|
|
x86, AMD64 |
|
|
|
x86, AMD64 |
|
|
|
x86, AMD64 |
|
|
|
x86, AMD64 |
|
|
|
x86, AMD64 |
|
|
|
AMD64 |
|
|
|
AMD64 |
|
|
|
AMD64 |
|
|
|
AMD64 |
|
|
|
AMD64 |
|
|
|
AMD64 |
|
|
|
AMD64 |
|
|
|
AMD64 |
|
|
|
AMD64 |
|
|
|
AMD64 |
|
|
|
AMD64 |
|
|
|
AMD64 |
|
|
|
AMD64 |
|
|
|
AMD64 |
|
|
|
AMD64 |
|
|
|
AMD64 |
|
|
|
AMD64 |
Using the pid Provider
The pid provider enables you to trace any
instruction in a process. Unlike most other providers,
pid probes are created on demand, based on the
probe descriptions that are found in your D programs. As a result,
no pid probes are listed in the output of the
dtrace -l command until you enable them.
User Function Boundary Tracing
The simplest mode of operation for the pid
provider is to provide function boundary tracing in user space.
The following example program traces all of the function entries
and returns that are made from a single function. The
$1 macro variable, the first operand on the
command line, is the process ID for the process to trace. The
$2 macro variable, the second operand on the
command line, is the name of the function from which to trace
all function calls. Type the following source code and save it
in a file named userfunc.d:
#!/usr/sbin/dtrace -s
#pragma D option flowindent
pid$1::$2:entry
{
self->trace = 1;
}
pid$1:::entry,
pid$1:::return
/self->trace/
{
}
pid$1::$2:return
/self->trace/
{
self->trace = 0;
}
Type the previous example script and save it in a file named
userfunc.d, then use the
chmod command to make the file executable.
This script produces output with more details on the principal
buffer:
# ./userfunc.d 123 execute dtrace: script ’./userfunc.d’ matched 11594 probes 0 -> execute 0 -> execute 0 -> Dfix 0 <- Dfix 0 -> s_strsave 0 -> malloc 0 <- malloc 0 <- s_strsave 0 -> set 0 -> malloc 0 <- malloc 0 <- set 0 -> set1 0 -> tglob 0 <- tglob 0 <- set1 0 -> setq 0 -> s_strcmp 0 <- s_strcmp ...
The pid provider can only be used on
processes that are already running. You can use the
$target macro variable (see
Scripting) and the
dtrace command with the -c
and -p options to create and grab processes
of interest and instrument them by using DTrace.
For example, you can use the following D script to determine the
distribution of function calls that are made to
libc by a particular subject process. Type
the following source code and save it in a file named
libc.d:
pid$target:libc.so::entry
{
@[probefunc] = count();
}
To determine the distribution of such calls that are made by the
date command, save the script in a file named
libc.d and run the following command:
# dtrace -s libc.d -c date dtrace: script ’libc.d’ matched 2476 probes Fri Jul 30 14:08:54 PDT 2004 dtrace: pid 109196 has exited pthread_rwlock_unlock 1 _fflush_u 1 rwlock_lock 1 rw_write_held 1 strftime 1 _close 1 _read 1 __open 1 _open 1 strstr 1 load_zoneinfo 1 ... _ti_bind_guard 47 _ti_bind_clear 94
Tracing Arbitrary Instructions
You can use the pid provider to trace any
instruction in any user function. Upon demand, the
pid provider creates a probe for every
instruction in a function. The name of each probe is the offset
of its corresponding instruction in the function and is
expressed as a hexadecimal integer. For example, to enable a
probe that is associated with the instruction at offset
0x1c in the function foo
of the module bar.so in the process with PID
123, you would use the following command:
# dtrace -n pid123:bar.so:foo:1c
To enable all of the probes in the function
foo, including the probe for each
instruction, you would use the following command:
# dtrace -n pid123:bar.so:foo:
Using the previous command demonstrates an extremely powerful technique for debugging and analyzing user applications. Infrequent errors can be difficult to debug because they can be difficult to reproduce. Often, you can identify a problem after the failure has occurred, which is too late to reconstruct the code path.
The following example demonstrates how to combine the
pid provider with speculative tracing to
solve this problem by tracing every instruction in a function.
See Speculative Tracing for a description.
Type the following source code and save it in a file named
errorpath.d:
pid$1::$2:entry
{
self->spec = speculation();
speculate(self->spec);
printf("%x %x %x %x %x", arg0, arg1, arg2, arg3, arg4);
}
pid$1::$2:
/self->spec/
{
speculate(self->spec);
}
pid$1::$2:return
/self->spec && arg1 == 0/
{
discard(self->spec);
self->spec = 0;
}
pid$1::$2:return
/self->spec && arg1 != 0/
{
commit(self->spec);
self->spec = 0;
}
Executing the errorpath.d script results in
output similar to the following:
# ./errorpath.d 123 _chdir dtrace: script ’./errorpath.d’ matched 19 probes CPU ID FUNCTION:NAME 0 25253 _chdir:entry 81e08 6d140 ffbfcb20 656c73 0 0 25253 _chdir:entry 0 25269 _chdir:0 0 25270 _chdir:4 0 25271 _chdir:8 0 25272 _chdir:c 0 25273 _chdir:10 0 25274 _chdir:14 0 25275 _chdir:18 0 25276 _chdir:1c 0 25277 _chdir:20 0 25278 _chdir:24 0 25279 _chdir:28 0 25280 _chdir:2c 0 25268 _chdir:return