5 Buffers and Buffering
WARNING:
Oracle Linux 7 is now in Extended Support. See Oracle Linux Extended Support and Oracle Open Source Support Policies for more information.
Migrate applications and data to Oracle Linux 8 or Oracle Linux 9 as soon as possible.
For more information about DTrace, see Oracle Linux: DTrace Release Notes and Oracle Linux: Using DTrace for System Tracing.
Data buffering and management is an essential service that is provided by the DTrace framework for it clients, for example, the dtrace command. This chapter explores data buffering in detail and describes options that you can use to change DTrace's buffer management policies.
Principal Buffers
By default, the principal buffer is present
in every DTrace invocation and is the buffer to which tracing
actions record their data. These actions include the following:
printa
, printf
,
stack
, trace
, and
tracemem
.
The principal buffers are always allocated on a per-CPU basis.
This policy is not tunable, but you can restrict tracing and
buffer allocation to a single CPU by using the
cpu
option.
Principal Buffer Policies
DTrace permits tracing in highly constrained contexts in the
kernel. In particular, DTrace permits tracing in contexts in which
kernel software might not reliably allocate memory. One
consequence of this flexibility of context is that there always
exists a possibility that DTrace might attempt to trace data when
there is no space available. DTrace must have a policy to deal
with such situations as they arise. However, you might choose to
tune the policy based on the needs of a given experiment.
Sometimes the appropriate policy might be to discard the new data.
Other times, it might be desirable to reuse the space containing
the oldest recorded data to enable the tracing of new data. Most
often, the desired policy is to minimize the likelihood of running
out of available space in the first place. To accommodate these
varying demands, DTrace supports several different buffer
policies. This support is implemented with the
bufpolicy
option and can be set on a
per-consumer basis. See Options and Tunables for more
details.
switch Policy
By default, the principal buffer has a switch
buffer policy. Under this policy, per-CPU buffers are allocated
in pairs, where one buffer is active and the other buffer is
inactive. When a DTrace consumer attempts to read a buffer, the
kernel first switches the inactive and active buffers. Buffer
switching is done in such a manner that there is no window in
which tracing data can be lost. When the buffers are switched,
the newly inactive buffer is copied out to the DTrace consumer.
This policy assures that the consumer always sees a
self-consistent buffer. Note that a buffer is never
simultaneously traced to and copied out. This technique also
avoids introducing a window of time in which tracing is paused
or otherwise prevented. The rate at which the buffer is switched
and read out is controlled by the consumer with the
switchrate
option. As with any rate option,
switchrate
can be specified with the any time
suffix, but defaults to rate-per-second. For more information
about switchrate
and other options, see
Options and Tunables.
Under the switch
policy, if a given enabled
probe would trace more data than there is space available in the
active principal buffer, the data is
dropped and a per-CPU drop count is
incremented. In the event of one or more drops,
dtrace displays a message similar to the
following:
dtrace: 11 drops on CPU 0
If a given record is larger than the total buffer size, the
record is dropped, regardless of buffer policy. You can reduce
or eliminate drops, either by increasing the size of the
principal buffer with the bufsize
option, or
by increasing the switching rate with the
switchrate
option.
Under the switch
policy, scratch memory for
DTrace subroutines is allocated out of the active buffer.
fill Policy
For some problems, you might want to use a single, in-kernel
buffer. While this approach can be implemented with the
switch
policy and appropriate D constructs by
incrementing a variable in D and predicating an
exit
action appropriately, such an
implementation does not eliminate the possibility of drops. To
request a single, large in-kernel buffer and continue tracing
until one or more of the per-CPU buffers has filled, use the
fill
buffer policy. Under this policy,
tracing continues until an enabled probe attempts to trace more
data than can fit in the remaining principal buffer space. When
insufficient space remains, the buffer is marked as filled and
the consumer is notified that at least one of its per-CPU
buffers is filled. When dtrace detects a
single filled buffer, tracing is stopped, all buffers are
processed, and dtrace exits. No further data
is traced to a filled buffer even if the data would fit in the
buffer.
To use the fill
policy, set the
bufpolicy
option to fill
.
For example, the following command traces every system call
entry into a per-CPU 2 KB buffer with the buffer policy set to
fill
:
# dtrace -n syscall:::entry -b 2k -x bufpolicy=fill
fill Policy and END Probes
END
probes usually do not fire until tracing
has been explicitly stopped by the DTrace consumer.
END
probes are guaranteed to fire only on one
CPU, but the CPU on which the probe fires is undefined. With
fill
buffers, tracing is explicitly stopped
when at least one of the per-CPU principal buffers has been
marked as filled. If the fill
policy is
selected, the END
probe might fire on a CPU
that has a filled buffer. To accommodate END
tracing in fill
buffers, DTrace calculates
the amount of space that is potentially consumed by
END
probes and subtracts this space from the
size of the principal buffer. If the net size is negative,
DTrace does not start and dtrace outputs the
following error message:
dtrace: END enablings exceed size of principal buffer
The reservation mechanism ensures that a full buffer always has
sufficient space for any END
probes.
ring Policy
The DTrace ring
buffer policy assists with
tracing the events leading up to a failure. If reproducing the
failure takes hours or days, you might want to keep only the
most recent data. When a principal buffer has filled, tracing
wraps around to the first entry, overwriting older tracing data.
You establish the ring buffer by specifying
bufpolicy=ring
as follows:
# dtrace -s foo.d -x bufpolicy=ring
When used to create a ring buffer, dtrace
does not display any output until the process is terminated. At
that time, the ring buffer is consumed and processed. The
dtrace command processes each ring buffer in
CPU order. Within a CPU's buffer, trace records are displayed in
order from oldest to youngest. Just as with the
switch
buffering policy, no ordering exists
between records from different CPUs. If such an ordering is
required, you should trace the timestamp
variable as part of your tracing request.
The following example demonstrates the use of a #pragma
option
directive to enable ring buffering:
#pragma D option bufpolicy=ring #pragma D option bufsize=16k syscall:::entry /execname == $1/ { trace(timestamp); } syscall::exit:entry { exit(0); }
Other Buffers
Principal buffers exist in every DTrace enabling. Beyond principal buffers, some DTrace consumers might have additional in-kernel data buffers, such as an aggregation buffer, and one or more speculative buffers. See Aggregations and Speculative Tracing for more details.
Buffer Sizes
The size of each buffer can be tuned on a per-consumer basis. Separate options are provided to tune each buffer size, as shown in the following table.
Buffer | Size Option |
---|---|
Aggregation |
|
Principal |
|
Speculative |
|
Each of these options is set with a value that denotes the size. As with any size option, the value might have an optional size suffix. See Options and Tunables for more details.
For example, you would set the buffer size to 10 megabytes on the dtrace command line as follows:
# dtrace -P syscall -x bufsize=10m
Alternatively, you can use the -b option with the dtrace command:
# dtrace -P syscall -b 10m
Finally, you can set bufsize
by using a pragma,
for example:
#pragma D option bufsize=10m
The buffer size that you select denotes the size of the buffer on
each CPU. Moreover, for the switch
buffer
policy, bufsize
denotes the size of each buffer
on each CPU. The default buffer size is four megabytes.
Buffer Resizing Policy
Occasionally, the system might not have adequate free kernel
memory to allocate a buffer of the desired size, either because
not enough memory is available or because the DTrace consumer has
exceeded one of the tunable limits that are described in
Options and Tunables. You can configure the policy
for buffer allocation failure by using the
bufresize
option, which defaults to
auto
. Under the auto
buffer
resize policy, the size of a buffer is halved until a successful
allocation occurs. dtrace generates a message
if a buffer, as allocated, is smaller than the requested size, as
shown in the following example:
# dtrace -P syscall -b 4g dtrace: description 'syscall' matched 430 probes dtrace: buffer size lowered to 128m ...
Or, a message similar to the following is generated:
# dtrace -P syscall'{@a[probefunc] = count()}' -x aggsize=1g dtrace: description 'syscall' matched 430 probes dtrace: aggregation size lowered to 128m ...
Alternatively, you can require manual intervention after buffer
allocation failure by setting bufresize
to
manual
. Under this policy, an allocation
failure prevents DTrace from starting:
# dtrace -P syscall -x bufsize=1g -x bufresize=manual dtrace: description 'syscall' matched 430 probes dtrace: could not enable tracing: Not enough space #
The buffer resizing policy for all buffers (principal, speculative
and aggregation) is dictated by the bufresize
option.