15 Performance Considerations
WARNING:
Oracle Linux 7 is now in Extended Support. See Oracle Linux Extended Support and Oracle Open Source Support Policies for more information.
Migrate applications and data to Oracle Linux 8 or Oracle Linux 9 as soon as possible.
For more information about DTrace, see Oracle Linux: DTrace Release Notes and Oracle Linux: Using DTrace for System Tracing.
DTrace creates additional work in the system. Therefore, enabling DTrace always affects system performance in some way. Often, this effect is negligible, but it can become substantial if many probes with significant enablings are enabled. This chapter describes some techniques for minimizing the performance effect of DTrace.
Limit Enabled Probes
Dynamic instrumentation techniques enable DTrace to provide
unparalleled tracing coverage of the kernel and arbitrary user
processes. While this coverage provides revolutionary new insight
into system behavior, it also can cause enormous probe effect. If
tens of thousands or hundreds of thousands of probes are enabled,
the effect on the system can easily be substantial. Therefore, you
should only enable as many probes as you need to solve a problem.
For example, you should not enable all syscall
probes if a more concise enabling can answer your question. Your
question might require that you concentrate on a specific module
of interest or a specific function.
Caution:
When using the pid
provider, be especially
careful. Because the pid
provider can
instrument every instruction, you could enable millions of
probes in an application and therefore slow the target process
to a crawl.
You can also use DTrace in situations where large numbers of probes must be enabled to answer a question. Enabling a large number of probes might slow down the system significantly, but it never induces fatal failure on the system. You should therefore not hesitate to enable many probes, if so required.
Using Aggregations
As discussed in Aggregations, DTrace aggregations provide a scalable way to aggregate data. Associative arrays might appear to offer functionality that is similar to aggregations, but because general-purpose variables are global by nature, associative arrays cannot offer the linear scalability of aggregations. Therefore, the preference is to use aggregations over associative arrays whenever possible. For example, the following D program uses an associative array to aggregate data:
syscall:::entry { totals[execname]++; } syscall::rexit:entry { printf("%40s %d\n", execname, totals[execname]); totals[execname] = 0; }
Whereas, the following D program is preferred, as it uses an aggregation to achieve the same result:
syscall:::entry { @totals[execname] = count(); } END { printa("%40s %@d\n", @totals); }
Using Cacheable Predicates
You use DTrace predicates to filter unwanted data from the
experiment by tracing data only if a specified condition is found
to be true. When enabling many probes, you generally use
predicates of a form that identifies a specific thread, or threads
of interest, such as /self->traceme/
or
/pid == 12345/
. Although many of these
predicates evaluate to a false value for most threads in most
probes, the evaluation itself can become costly when done for many
thousands of probes. To reduce this cost, DTrace caches the
evaluation of a predicate if it includes only thread-local
variables, such as /self->traceme/
, or for
immutable variables, such as /pid == 12345/
.
The cost of evaluating a cached predicate is much less than the
cost of evaluating a non-cached predicate, especially if the
predicate involves thread-local variables, string comparisons, or
other relatively costly operations. While predicate caching is
transparent to the user, it does require some guidelines for
constructing optimal predicates. Some guidelines for constructing
optimal predicates are outlined in the following table.
Cacheable | Uncacheable |
---|---|
|
|
|
|
|
|
|
|
|
|
The following example uses an associative array in the predicate and is not cacheable:
syscall::read:entry { follow[pid, tid] = 1; } lockstat::: /follow[pid, tid]/ {} syscall::read:return /follow[pid, tid]/ { follow[pid, tid] = 0; }
Using a cacheable, thread-local variable, per the following example, is preferable:
syscall::read:entry { self->follow = 1; } lockstat::: /self->follow/ {} syscall::read:return /self->follow/ { self->follow = 0; }
For a predicate to be cacheable, it must consist exclusively of cacheable expressions. All of the following predicates all cacheable:
/execname == "myprogram"/ /execname == $$1/ /pid == 12345/ /pid == $1/ /self->traceme == 1/
The following examples, which use global variables, are not cacheable:
/execname == one_to_watch/ /traceme[execname]/ /pid == pid_i_care_about/ /self->traceme == my_global/