1 Monitoring the System and Optimizing Performance

WARNING:

Oracle Linux 7 is now in Extended Support. See Oracle Linux Extended Support and Oracle Open Source Support Policies for more information.

Migrate applications and data to Oracle Linux 8 or Oracle Linux 9 as soon as possible.

This chapter provides information and tasks for monitoring your systems to ensure optimal performance. The various monitoring tools that are provided in Oracle Linux are also described.

About System Performance Problems

Many performance issues are the result of configuration errors, which can be avoided by using a validated configuration that has been pre-tested for the supported software, hardware, storage, drivers, and networking components. A validated configuration incorporates best practices for an Oracle Linux deployment, which includes undergoing real-world testing of the complete stack. Oracle publishes more than 100 validated configurations, which are freely available for download. Refer to the release notes for the Oracle Linux release that you are running for further recommendations on setting kernel parameters.

For example, a typical problem involves out-of-memory errors and generally poor performance when running Oracle Database. The cause of this problem is likely to be that the system is not configured to use the HugePages feature for the System Global Area (SGA). With HugePages, you can set the page size to between 2MB and 256MB, so reducing the total number of pages that the kernel needs to manage. The memory that is associated with HugePages cannot be swapped out, which forces the SGA to remain resident in memory. Problems such as this one can be avoided by using validated configurations and referring to setting recommendations for kernel parameters.

See Oracle Linux 7: Managing Core System Configuration for more information about kernel parameters that affect system performance. You can also review the manual pages for each command with overlapping functionality on your Oracle Linux system.

Working With System Performance and Monitoring Utilities

Performance issues can be caused by any system component, software or hardware, and their interaction. Many performance diagnostics utilities are available for Oracle Linux, including tools that monitor and analyze resource usage by different hardware components and tracing tools for diagnosing performance issues in multiple processes and their threads.

The following utilities enable you to collect information about system resource usage and errors, which can help you to identify performance problems that are caused by overloaded disks, network, memory, or CPUs:

dmesg

Displays the contents of the kernel ring buffer, which can contain errors about system resource usage. Provided by the util-linux-ng package.

dstat

Displays statistics about system resource usage. Provided by the dstat package.

free

Displays the amount of free and used memory in the system. Provided by the procps package.

iostat

Reports I/O statistics. Provided by the sysstat package.

iotop

Monitors disk and swap I/O on a per-process basis. Provided by the iotop package.

ip

Reports network interface statistics and errors. Provided by the iproute package.

mpstat

Reports processor-related statistics. Provided by the sysstat package.

nfsiostat

Reports I/O statistics for NFS mounts. Provided by the nfs-utils package.

sar

Reports information about system activity. Provided by the sysstat package.

ss

Reports network interface statistics. Provided by the iproute package.

top

Provides a dynamic real-time view of the tasks that are running on a system. Provided by the procps package.

uptime

Displays the system load averages for the past 1, 5, and 15 minutes. Provided by the procps package.

vmstat

Reports virtual memory statistics. Provided by the procps package.

Monitoring the Usage of System Resources

You need to collect and monitor system resources regularly so that you are provided with a continuous record of a system's performance. First, establish a baseline of acceptable measurements under typical operating conditions. You can then use that baseline as a reference point to make it easier to identify memory shortages, spikes in resource usage, and other problems when they occur. Monitoring system performance also enables you to plan for future growth and determine how configuration changes might affect future performance.

To run a monitoring command for a set number of seconds in real time and watch the output change, use the watch command. For example, run the mpstat command once per second with the following command:

sudo watch -n 1 mpstat

Alternatively, many of the commands enable you to specify the sampling interval in seconds, for example:

sudo mpstat seconds
               

If it is installed, the sar command records statistics every 10 minutes while the system is running and retains that information for every day of the current month. The following command displays all the statistics that sar recorded for day DD of the current month:

sudo sar -A -f /var/log/sa/saDD
               

To run the sar command as a background process and collect data in a file that you can display later by using the -f option:

sudo sar -o datafile
                  seconds
                  count >/dev/null 2>&1 &

In the previous command, count is the number of samples to record.

OSWbb and OSWbb analyzer (OSWbba) are useful tools for collecting and analyzing performance statistics. For more information, see Working With OSWatcher Black Box.

Monitoring CPU Usage

The uptime, mpstat, sar, dstat, and top utilities enable you to monitor CPU usage. When your system's CPU cores are all occupied executing the code of processes, other processes must wait until a CPU core becomes free or the scheduler switches a CPU core to run their code. If too many processes are queued too often, then that can represent a bottleneck in the performance of the system.

The commands mpstat -P ALL and sar -u -P ALL display CPU usage statistics for each CPU core and is averaged across all CPU cores.

The %idle value shows the percentage of time that a CPU was not running system code or process code. If the value of %idle is near 0% most of the time on all CPU cores, the system is CPU-bound for the workload that it is running. The percentage of time spent running system code (%system or %sys) should not usually exceed 30%, especially if %idle is close to 0%.

The system load average represents the number of processes that are running on CPU cores, waiting to run, or waiting for disk I/O activity to complete averaged over a period of time. On a busy system, the load average reported by uptime or sar -q should usually be not greater than two times the number of CPU cores over periods as long as 5 or 15 minutes. If the load average exceeds four times the number of CPU cores for long periods, the system is overloaded.

In addition to load averages (ldavg-*), the sar -q command reports the number of processes currently waiting to run (the run-queue size, runq-sz) and the total number of processes (plist_sz). The value of runq-sz also provides an indication of CPU saturation.

Determine the system's average load under normal loads where users and applications do not experience problems with system responsiveness, and then look for deviations from this benchmark over time. A dramatic rise in the load average can indicate a serious performance problem.

A combination of sustained large load average or large run queue size and low %idle can indicate that the system has insufficient CPU capacity for the workload. When CPU usage is high, use a command such as dstat or top to determine which processes are most likely to be responsible. For example, the following dstat command shows which processes are using CPUs, memory, and block I/O most intensively:

sudo dstat --top-cpu --top-mem --top-bio

The top command provides a real-time display of CPU activity. By default, top lists the most CPU-intensive processes on the system. In its upper section, top displays general information including the load averages over the past 1, 5 and 15 minutes, the number of running and sleeping processes (tasks), and total CPU and memory usage. In its lower section, top displays a list of processes, including the process ID number (PID), the process owner, CPU usage, memory usage, running time, and the command name. By default, the list is sorted by CPU usage, with the top consumer of CPU listed first. Type f to select which fields top displays, o to change the order of the fields, or O to change the sort field. For example, entering On sorts the list on the percentage memory usage field (%MEM).

Monitoring Memory Usage

The sar -r command reports memory utilization statistics, including %memused, which is the percentage of physical memory in use.

sar -B reports memory paging statistics, including pgscank/s, which is the number of memory pages scanned by the kswapd daemon per second, and pgscand/s, which is the number of memory pages scanned directly per second.

sar -W reports swapping statistics, including pswpin/s and pswpout/s, which are the numbers of pages per second swapped in and out per second.

If %memused is near 100% and the scan rate is continuously over 200 pages per second, the system has a memory shortage.

Once a system runs out of real or physical memory and starts using swap space, its performance deteriorates dramatically. If you run out of swap space, your programs or the entire operating system are likely to crash. If free or top indicate that little swap space remains available, this is also an indication you are running low on memory.

The output from the dmesg command might include notification of any problems with physical memory that were detected at boot time.

Monitoring Block I/O Usage

The iostat command monitors the loading of block I/O devices by observing the time that the devices are active relative to the average data transfer rates. You can use this information to adjust the system configuration to balance the I/O loading across disks and host adapters.

iostat -x reports extended statistics about block I/O activity at one second intervals, including %util, which is the percentage of CPU time spent handling I/O requests to a device, and avgqu-sz, which is the average queue length of I/O requests that were issued to that device. If %util approaches 100% or avgqu-sz is greater than 1, device saturation is occurring.

You can also use the sar -d command to report on block I/O activity, including values for %util and avgqu-sz.

The iotop command can help you identify which processes are responsible for excessive disk I/O. iotop has a similar user interface to top. In its upper section, iotop displays the total disk input and output usage in bytes per second. In its lower section, iotop displays I/O information for each process, including disk input output usage in bytes per second, the percentage of time spent swapping in pages from disk or waiting on I/O, and the command name. Use the left and right arrow keys to change the sort field, and press A to toggle the I/O units between bytes per second and total number of bytes, or O to toggle between displaying all processes or only those processes that are performing I/O.

Monitoring File System Usage

The sar -v command reports the number of unused cache entries in the directory cache (dentunusd) and the numbers of in-use file handles (file-nr), inode handlers (inode-nr), and pseudo terminals (pty-nr).

nfsiostat reports I/O statistics for each NFS file system that is mounted. If this command is not available install the nfs-utils package.

Monitoring Network Usage

The ip -s link command displays network statistics and errors for all network devices, including the numbers of bytes transmitted (TX) and received (RX). The dropped and overrun fields provide an indicator of network interface saturation.

The ss -s command displays summary statistics for each protocol.

Using the Graphical System Monitor

The GNOME desktop environment includes a graphical system monitor that enables you to display information about the system configuration, running processes, resource usage, and file systems.

To display the System Monitor, use the following command:

sudo gnome-system-monitor

The Resources tab displays:

  • CPU usage history in graphical form and the current CPU usage as a percentage.

  • Memory and swap usage history in graphical form and the current memory and swap usage.

  • Network usage history in graphical form, the current network usage for reception and transmission, and the total amount of data received and transmitted.

To display the System Monitor Manual, press F1 or select Contents from the Help menu.