Notable Changes
The following sections describe the major new features of Unbreakable Enterprise Kernel Release 4 (UEK R4) relative to UEK R3.
Containers
The following notable features of containers are implemented in UEK R4:
-
Local device cgroup changes are now propagated down the device cgroup hierarchy.
For more information, see git commit bd2953ebbb533aeda9b86c82a53d5197a9a38f1b.
-
The
__DEVEL__sane_behavior
option has been introduced for mounting cgroup controllers.For more information, see git commit 873fe09ea5df6ccf6bb34811d8c9992aacb67598.
-
memory.numa_stat
now includes hierarchical statistics for child memory cgroups (memcgs
) in addition to the parentmemcg
.For more information, see git commit 071aee138410210e3764f3ae8d37ef46dc6d3b42.
-
An optional unified control group hierarchy has been introduced.
For more information, see https://lwn.net/Articles/601840/.
-
Hierarchy restrictions for
swappiness
andoom_control
have been removed frommemcgs
.For more information, see git commit 3dae7fec5e884a4e72e5416db0894de66f586201.
Core Kernel Functionality
The following notable core kernel features are implemented in UEK R4:
-
The performance of SPECjbb is improved for a system with more than 10 CPUs by removing contention for the global
epmutex
lock, which is used inEPOLL_CTL_ADD
andEPOLL_CTL_DEL
operations. For example, in a typical 16-socket run the performance increases from 35k jOPS to 125k jOPS. Benchmarks also exhibit good scaling from 10 sockets to over 40 sockets. -
The
sysctl_numa_balancing_settle_count
parameter used by the NUMA scheduler has been removed. -
The following tracepoints are now provided to monitor NUMA scheduler activity:
-
trace_sched_move_numa
-
Triggered when a task is moved to a node.
-
trace_sched_stick_numa
-
Triggered when a NUMA migration fails.
-
trace_sched_swap_numa
-
Triggered when a task is swapped for another task.
-
-
The new
SCHED_STACK_END_CHECK
kernel debugging option can be used to check for a stack overrun on calls toschedule()
on a NUMA system. If the stack end location is overwritten, the system panics as the content of the corrupted region cannot be trusted. -
Sysbench performance has been improved by preventing spurious active NUMA migration.
-
CPU clock frequency scaling for performance management. The possible governor settings as displayed by
/sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
are:-
ondemand
-
Sets the CPU clock frequency between the minimum and maximum possible frequencies, according to the current demand usage. The following
sysfs
parameters are adjustable:-
ignore_nice_load
-
Whether processes with a
nice
value count (0) or do not count (1) toward CPU usage. The default value is 0. -
powersave_bias
-
How much to reduce the target CPU frequency by as a fraction of 1000. A value of 0 disables this feature.
-
sampling_down_factor
-
A multiplier that the kernel applies to
sampling_rate
when the CPU is running at its maximum clock frequency. The default value is 1. -
sampling_rate_min
-
Minimum sampling rate.
-
sampling_rate
-
Interval in microseconds between assessments of whether the kernel needs to change the clock frequency.
-
up_threshold
-
Threshold of average CPU usage as a percentage for the kernel to increase the clock frequency.
ondemand
is the default governor setting iftuned
is not configured.This setting is equivalent to
powersave
for more recent microarchitecture CPUs (for example, Haswell, Broadwell, and later) with which thepstate
power scaling driver can interact. For older design architecture CPUs (for example, Ivy Bridge, Sandy Bridge, and earlier),ondemand
is equivalent toperformance
as the cores must be kept in a higher power state to minimize CPU latency. -
-
performance
-
Sets the CPU clock frequency to the maximum possible frequency.
Note:
performance
is the default governor setting for thetuned
throughput-performance
profile.The
performance
profile is appropriate for some real-time applications but it might not be appropriate for all workloads. Running a CPU at maximum frequency can prevent turbo mode from being enabled because doing so would exceed the thermal envelope. -
powersave
-
Sets the CPU clock frequency to the minimum possible frequency.
-
userspace
-
Permits a user-space program running as an effective
root
user to control the CPU clock frequency by creating and using a file namedscaling_setspeed
in the CPU-device directory undersysfs
.
Oracle recommends that you use tuned-adm to select a
tuned
performance profile for your system that is based on its hardware and software configuration, for example:-
If your system has Xeon processors or multiple disks, choose a profile such as
latency-performance
for a cloud server,throughput-performance
for a database server, orvirtual-host
for a virtual host server.Note:
These profiles set the CPU governor setting to
performance
, which might not be appropriate for all workloads. -
For a virtual machine guest, choose the
virtual-guest
profile. -
For a laptop, choose a suitable laptop profile such as
laptop-ac-powersave
orlaptop-battery-powersave
. -
For a desktop machine, choose either the
desktop
orbalanced
profile.
You can use the tuned-adm list command to display the available profiles.
If
tuned
is not configured, the default CPU governor setting isondemand
, which can cause some bursty, CPU-intensive workloads to run more slowly because of demand hysteresis.If necessary, you can create your own performance profiles based on the profiles that are provided in the
/etc/tune-profiles
directory hierarchy.When comparing system performance under different profiles, use benchmarks that simulate your server's typical workload.
For more information, see the
tuned(8)
andtuned-adm(1)
manual pages, which are available in thetuned
package. -
Cryptography
The following notable cryptographic features are implemented in UEK R4:
-
Accelerated CRC T10 DIF computation with the
PCLMULQDQ
instruction. -
LZ4 Cryptographic API.
-
Support for
sha256_ssse3
,SHA-224
,sha512_ssse3
, andSHA-384
. -
Support for the AMD cryptographic coprocessor, which can be used to accelerate or offload AES, SHA, and other encryption operations.
File Systems
The following sections detail the most notable features that have been implemented for file systems in UEK R4:
btrfs
-
The skinny-metadata feature is not enabled by default as it is incompatible with UEK R3. (Bug ID 22123918)
-
The btrfs filesystem balance command does not warn that the RAID level can be changed under certain circumstances, and does not provide the choice of cancelling the operation. (Bug ID 16472824)
-
Commands such as du can show inconsistent results for file sizes in a btrfs file system when the number of bytes that is under delayed allocation is changing. (Bug ID 13096268)
-
The copy-on-write nature of btrfs means that every operation on the file system initially requires disk space. It is possible that you cannot execute any operation on a disk that has no space left; even removing a file might not be possible. The workaround is to run sync before retrying the operation. If this does not help, remount the file system with the -o nodatacow option and delete some files to free up space. See https://btrfs.wiki.kernel.org/index.php/ENOSPC.
-
If you run the btrfs quota enable command on a non-empty file system, any existing files do not count toward space usage. Removing these files can cause usage reports to display negative numbers and the file system to be inaccessible. The workaround is to enable quotas immediately after creating the file system. If you have already written data to the file system, it is too late to enable quotas. (Bug ID 16569350)
-
The btrfs quota rescan command is not currently implemented. The command does not perform a rescan and returns without displaying any message. (Bug ID 16569350)
-
When you overwrite data in a file, starting somewhere in the middle of the file, the overwritten space is counted twice in the space usage numbers that btrfs qgroup show displays. (Bug ID 16609467)
-
If you run btrfsck --init-csum-tree on a file system and then run a simple btrfsck on the same file system, the command displays a Backref mismatch error that was not previously present. (Bug ID 16972799)
-
Btrfs tracks the devices on which you create btrfs file systems. If you subsequently reuse these devices in a file system other than btrfs, you might see error messages such as the following when performing a device scan or creating a RAID-1 file system, for example:
ERROR: device scan failed '/dev/cciss/c0d0p1' - Invalid argument
You can safely ignore these errors. (Bug ID 17087097)
-
If you use the -s option to specify a sector size to mkfs.btrfs that is different from the page size, the created file system cannot be mounted. By default, the sector size is set to be the same as the page size. (Bug ID 17087232)
-
The
btrfs-progs
andbtrfs-progs-devel
packages for use with UEK R4 are made available in theol6_x86_64_UEKR4
andol7_x86_64_UEKR4
ULN channels and theol6_UEKR4
andol7_UEKR4
Oracle Linux yum server repositories. In UEK R3, these packages were made available in theol6_x86_64_latest
andol7_x86_64_latest
ULN channels and theol6_latest
andol7_latest
Oracle Linux yum server repositories.
efivarfs
The Unified Extensible Firmware Interface (UEFI) variable
file system (efivarfs) is enabled on
systems that support UEFI. For Oracle Linux 7,
systemd
automatically mounts
efivarfs
. For Oracle Linux 6,
efivarfs
is not mounted by default. If
required, you can mount efivarfs
, for
example:
# mount -t efivarfs efivarfs /sys/firmware/efi/efivars
ext4
The following ext4 features have been implemented:
-
Metadata checksumming can be enabled by specifying the
metadata_csum
option when making a file system. -
64-bit file system support, which allows you to format a file system that is larger than 16 TB, can be enabled by specifying the
64bit
option when making a file system. -
Improved synchronization speed for database workloads.
-
Improved write-back performance if delayed allocation is disabled using the
nodelalloc
mount option or if ext2 or ext3 compatibility mode is used. -
Improved extent-tree memory caching.
-
Improved stabilization of hole punching using
fallocate()
. -
Improved data and hole seeking using
lseek()
.
The following features are considered experimental and are not supported:
-
Big allocation (
bigalloc
), which does not currently work withfallocate()
. -
Inline data, which stores the data for small files in the available space between on-disk inode data structures.
-
File-system image creation from a directory using mke2fs.
-
Specifying an external journal by using the
pathname
mount option.
FUSE
The following FUSE features have been implemented:
-
Asynchronous I/O support.
-
Optimized short direct reads.
-
Writepages callback improves memory-mapped writeout by
mmap
.
Cached writeback support is not currently supported by the user-space applications that are provided with Oracle Linux 6 and Oracle Linux 7.
NFS
The following NFS features have been implemented:
-
Client support for NFSv4.2.
For more information, see http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-20 .
-
SELinux Labeled NFS allows many different labels to be used on an NFS share, which is useful for securing virtualization image files and home directories.
Overlayfs
The overlayfs file system is an implementation of a union file system that makes several file systems appear as a single file system when mounted. An overlayfs file system consists of a lower file system and an upper file system which share a single file system namespace. After a file is opened in an overlayfs file system, all operations go directly to the underlying lower or upper file systems, which simplifies the implementation and allows native performance compared to other union file system implementations. A typical use case is to use a read-only OS image as the lower file system and a writeable RAM-backed file system as the upper file system. Modified data is written to the upper file system only and not to the OS image.
Both the upper and lower file systems can be directory trees
within the same file system and neither needs to be the root
of a file system. The lower file system can be any supported
file system, including an overlayfs file system, and does
not need to be writable. If the upper file system is
writable, as is usually the case, it must support the
creation of trusted.*
extended attributes
and it must provide valid d_type
file
type in the direct
structure returned by
readdir()
. For example, an NFS file
system cannot be used for the upper file system.
The overlayfs file system is not available with UEK R3.
For more information, see https://www.kernel.org/doc/Documentation/filesystems/overlayfs.txt.
XFS
The following XFS features have been implemented:
-
The new directory entry file type improves the performance of directory recursion by not having to access the inode data from disk.
-
Namespace support.
-
Defragmentation support for the new CRC file system format.
-
The XFS v5 disk format provides metadata CRC, object back references, better crash recovery, and improved xfs_repair performance. The metadata CRC feature is experimental and not currently supported.
Memory Management
The following notable memory management features are implemented in UEK R4:
-
The
MAP_HUGETLB
flag has been implemented inmmap
to support huge-page memory mapping withhugetlbfs
. -
Problems have been addressed with kswapd and page reclaim behavior during large copy operations or when memory was low.
-
Improved page table access scalability in threaded huge-page workloads by reducing lock contention in the page table.
For more information, see https://lwn.net/Articles/568076/.
-
Improve page-fault scalability in
hugetlb
by handing concurrent page faults. Previously, the kernel could only handle a singlehugetlb
page fault at a time. Typically, the startup time for a 10-gigabyte Oracle database, which generates approximately 5000 page table faults, decreases to 25.7 seconds from 37.5 seconds. Larger workloads should experience even greater improvements in start-up times. -
Support gigantic page allocation in
hugetlb
at runtime in addition to the existing boot-time allocation. -
The unqueued slab allocator (SLUB) is now the default memory allocator for kernel objects. SLUB eliminates the fragmentation that is caused by memory allocation and deallocation by reusing memory that was previously allocated to a data object of the same type.
Networking
The following notable networking features are implemented in UEK R4:
-
The following VXLAN features have been implemented:
-
Layer 2 redirection with layer 3 switching.
-
Setting destination to a unicast address.
-
UDP tunnel segmentation.
-
IPv6 support.
-
Transmit-side VLAN offload for VXLAN devices.
-
Link configuration for transmitting UDPv4 checksums, and transmitting and receiving UDPv6 checksums.
-
Switch the network namespace when a packet is encapsulated or unencapsulated.
-
-
Per-socket network polling is supported with the
bnx2x
,ixgbe
, andmlx4
network card drivers, which reduces the latency inherent in the NAPI periodic polling method.For more information, see https://lwn.net/Articles/551284/ and 2012-lpc-Low-Latency-Sockets-slides-brandeburg.pdf.
-
The new PIE (Proportional Integral controller Enhanced) network packet scheduler controls the average queueing latency to overcome buffer bloat, ensure low latency and achieve high link utilization under various congestion scenarios with very small overhead.
For more information, see https://tools.ietf.org/html/draft-pan-tsvwg-pie-00.
-
Support for configuring the SR-IOV virtual function (VF) minimum and maximum transmission rates by using the ip command.
For more information, see git commit ed616689a3d95eb6c9bdbb1ef74b0f50cbdf276a.
-
Support for SR-IOV VF link state control by using the ip command. Previously, VF links were always on, regardless of the physical link status, which allows VMs on the same virtual Ethernet bridge to communicate even if the physical function (PF) link state is down. However, if the VFs were bonded in active/standby mode, this configuration prevented failover when the physical link used by a VF went down. You can now use the ip link set command to configure the behavior of a VF link:
# ip link set device vf number state { auto | enable | disable }
The possible settings are:
- auto
-
The VF link state is determined by the PF link state. This setting is suitable for VFs that are bonded in active/standby mode.
- disable
-
The VF link state is permanently down.
- enable
-
The VF link state is permanently up. This is the default setting.
-
The following Open vSwitch (OvS) features have been implemented:
-
Generic routing encapsulation (GRE) tunnels.
-
User-space tunneling interface.
-
Stream Control Transmission Protocol (SCTP) support.
-
VXLAN tunneling support.
-
Wild-carded flow implementation.
-
TCP bitwise flag matching.
For more information, see git commit 5eb26b156e29eadcc21f73fb5d14497f0db24b86
-
Allow user space to announce ability to accept unaligned Netlink messages.
-
Enable memory-mapped Netlink I/O.
-
Enable tunnel generic segmentation offloading (GSO) for Open vSwitch bridge devices so that Open vSwitch can take advantage of hardware offloading to the underling devices.
-
Add
recirc
andhash
action to support distributing packets between the ports of bond devices. -
Add support for generic network virtualization encapsulation (Geneve) tunneling.
-
-
The
nftables
framework provides packet filtering and packet classification features as a replacment for thearptables
,ebtables
,iptables
, andip6tables
frameworks. For example, see https://lwn.net/Articles/564095/.The following
nftables
features have been implemented:-
Replaced
iptables
, while providing backwards compatibility. -
IPv4 and IPv6 masquerading
-
Pre-routing and post-routing filtering.
-
Extended NFT_MSG_DELTABLE call to support flushing the rule set.
-
Add filter support for skipping accounting objects.
-
Add support for exporting the rule-set generation ID.
-
Add CPU attribute support for matching packets against CPU number.
-
Add support for matching packet types for the
inet
,ip
, andipv6
table families based on link-layer information. For loopback traffic, the packet type is deduced from the network layer header. -
Add support for matching the device group of a packet's incoming or outgoing interface.
-
-
TCP Fast Open optimization is enabled by default in UEK R4 for applications that take advantage of this feature.
-
Generic network virtualization encapsulation (Geneve) provides a tunneling framework for establishing layer 2 networks over layer 3 networks.
For more information, see http://tools.ietf.org/html/draft-gross- geneve-01 and http://blogs.vmware.com/cto/geneve-vxlan-network-virtualization-encapsulations/.
-
Transmission queue batching defers flushing transmission socket buffers to the network driver to reduce the overall cost of processing the transmission queue and can result in a higher effective packet transmission rate. The
i40e
,igb
,ixgbe
,mlx4
, andvirtio_net
drivers support this feature.For more information, see https://lwn.net/Articles/615238/ and http://netoptimizer.blogspot.com/2014/10/unlocked-10gbps-tx-wirespeed-smallest.html.
NUMA
Many modern multiprocessors have non-uniform memory access (NUMA) memory designs, where the performance of a process can depend on whether the memory range being accessed is attached to the local CPU or to another CPU. As performance is different depending on memory locality, the operating system should ideally schedule a process to run on the CPU whose memory controller is connected to the memory to be accessed.
The following notable NUMA features are implemented in UEK R4:
-
Support NUMA affinity for unbound workqueues.
-
A new NUMA subsystem provides improved performance for NUMA systems. New NUMA policies attempt to place a process near its memory, can share pages between processes and handle transparent huge pages. (3.8, 3.13)
The following sysctl parameters allow you to enable, disable and tune NUMA scheduling:
-
numa_balancing_scan_delay_ms
-
Scan delay in milliseconds used for starting a task when it initially forks.
-
numa_balancing_scan_period_max_ms
-
Maximum delay in milliseconds between scanning for tasks.
-
numa_balancing_scan_period_min_ms
-
Minimum delay in milliseconds between scanning for tasks.
-
numa_balancing_scan_period_reset
-
Resets the scan delay period.
-
numa_balancing_scan_size_mb
-
Amount of pages in megabytes scanned per scan.
For more information, see https://lwn.net/Articles/568870/.
-
-
Add the
numa_balancing
sysctl parameter to enable or disable automatic NUMA memory balancing. -
Improved algorithm for NUMA migrations that maximizes the performance of workloads that do not fit on one NUMA node.
-
Memory zones are allocated by the page allocator in node order on 64-bit NUMA systems by default.
Real Time
-
Dynamic ticks and full CPU time accounting infrastructure.
-
Timerless multitasking support allows the system to run processes without needing to fire up the timer interrupt that is traditionally used to implement multitasking. (3.10, 3.12)
For more information, see https://lwn.net/Articles/549580/ and https://lwn.net/Articles/558284/.
-
Deadline scheduling provides
deadline
,period
, andruntime
parameters for scheduling processes in theSCHED_DEADLINE
scheduling class. These process are guaranteed to receiveruntime
microseconds of execution time everyperiod
microseconds and theseruntime
microseconds are available withindeadline
microseconds from the beginning of the period. The task scheduler runs the process with the lowestdeadline
value.For more information, see git documentation 712e5e34aef449ab680b35c0d9016f59b0a4494c and https://lwn.net/Articles/575497/.
Security
The following notable security features are implemented in UEK R4:
-
The physical and virtual address at which the kernel image is decompressed is randomized to deter exploit attempts that rely on knowing the location of the kernel internals.
-
The Kexec feature, which allows faster rebooting or automatically booting a new kernel after a crash, now incorporates support for allowing only signed Kexec kernels for use with UEFI secure booting.
-
The
kexec_load_disabled
sysctl
parameter can be used to disable Kexec, which allows a system to be better protected against privilege escalation. -
An
exe
field has been added to the auditing log to record the pathname of executables that produce core dumps. -
An
audit_backlog_wait_time
configuration option has been added to the auditing subsystem so that ifauditd
cannot keep up or is blocked, callers are not blocked. -
If the value of the audit_backlog_limit parameter is set to zero, the length of the backlog queue is limited only by the amount of system memory.
-
By default, errors on
AUDIT_NEVER
rules are now logged. -
The auditing subsystem now logs task information when the state of a feature is changed.
-
A netlink multicast socket has been added to read-only user-space clients such as
systemd
to allow read-only access to the audit logs. -
Secure generation of random numbers with the
getrandom
system call. Linux systems traditionally obtained their random numbers from /dev/[u]random
. This interface is vulnerable to file descriptor exhaustion attacks, where the attacker consumes all available file descriptors, and is also inconvenient for use in containers. Thegetrandom
system call, which analogous to thegetentropy
call in OpenBSD overcomes these problems. -
SELinux now reports permissive mode in
avc: denied
messages.
Storage
The following notable storage features are implemented in UEK R4:
-
The device mapper
dm-cache
target allows you to use a fast device such as an SSD as a cache for a slower device such as a rotating disk. You can use various policy plugins to change the selection algorithms for performing actions such as promoting, demoting, cleaning blocks.dm-cache
supports both writeback and write-through modes. This feature is still flagged as experimental and might not be suitable for production systems.Updates to
dm-cache
added support for a passthrough mode when the cache contents might not be consistent with the underlying device, cache block invalidation, and cache shrinking.For more information, see https://www.kernel.org/doc/Documentation/device-mapper/cache.txt.
-
Bcache is a block layer cache that allows you to use SSDs to cache slower block devices. Bcache can perform both writeback and write-through caching, has no file-system dependencies, is simple to use, and works well on any setup without requiring any configuration.
For more information, see https://www.kernel.org/doc/Documentation/bcache.txt, https://bcache.evilpiepirate.org/, and https://lwn.net/Articles/497024/.
-
The new, scalable multiqueue block layer subsystem (
blk-mq
) for supporting high performance SSD storage implements per-CPU submission queues for receiving I/O requests, which are directed to hardware submission queues. The separate per-CPU submission and hardware submission queues balances the I/O workload across multiple CPU cores and reduces latency. The design supports the interface and features of the traditional block layer, but it is also capable of supporting many millions of I/O operations per second by taking advantage of the capabilities of NVM-Express or high-end PCI-E devices and multicore CPUs.For more information, see https://lwn.net/Articles/552904/.
-
The device mapper
dm-era
target behaves similarly to thelinear
target with the addition of tracking any blocks that were written within an era, which is a time period that you can define. Typical use cases are tracking the changed blocks in backup software and restoring cache coherency after rolling back a snapshot by partially invalidating the cache contents.For more information, see https://www.kernel.org/doc/Documentation/device-mapper/era.txt.
OFED Support
The OpenFabrics Enterprise Distribution (OFED) 2.0 stack is integrated with UEK R4, and supports all Oracle branded InfiniBand (IB) hardware, on systems with an x86-64 architecture. This includes:
-
Sun InfiniBand Dual Port 4x QDR Host Channel Adapters M2
-
Oracle Dual Port QDR Infiniband Adapter M3
-
Oracle Dual Port QDR InfiniBand Adapter M4
-
Oracle Dual Port EDR InfiniBand Adapter
OFED 2.0 supports the following protocols with UEK R4:
-
iSCSI Extensions for remote direct memory access (iSER) provide access to iSCSI storage devices
-
Reliable Datagram Sockets (RDS) is a high-performance, low-latency, reliable connectionless protocol for datagram delivery
-
Sockets Direct Protocol (SDP) supports stream sockets for RDMA network fabrics
-
Ethernet over InfiniBand (EoIB)
-
Internet Protocol over InfiniBand (IPoIB)
Note:
Ethernet tunneling over IPoIB (eIPoIB) is not supported with UEK R4.
OFED 2.0 supports the following RDS features with UEK R4:
-
Async Send (AS)
-
Quality of Service (QoS)
-
Active Bonding (AB)
-
Netfilter (NF)
-
Shared Request Queue (SRQ)
Note:
Automatic Path Migration (APM) is not supported with UEK R4.
Support for IB, OFED, and RDS is integrated into the kernel.
The OFED user-space RPMs continue to be provided, but the
kernel-ib
and
ofa-kernel
RPMs are not required.
Virtualization
The following notable virtualization features are implemented in UEK R4:
-
Hyper-V support for
netpoll
allows a network console to be used to debug kernel issues. -
The following Xen features have been implemented:
-
xen-netback
support for changing the MAC address of an interface -
ACPI support for CPU and memory hotplug, including a new memory hotplug driver.
-
xen-netback
support for gathering zerocopy statistics and TX grant mapping. -
Support for MSI message groups in Dom0.
-
Substantially improved performance of Xen virtual network interfaces by implementing multiple queue support between
xen-netback
andxen-netfront
. -
EFI support in Dom0.
-
Xen PVSCSI backend and frontend driver support for high performance passthrough of SCSI devices or LUNs from Dom0 to a Xen PV or HVM guest.
-
Remapping of existing MFNs that were replaced by the identity map to prevent non-contiguous pages occurring in Dom0.
-
Improved PV ticket locks provide more efficient locking of guests for workloads that rely on this mechanism. If a spin lock is not available for more than a brief period, the lock code stops spinning and calls the hypervisor to wait until the lock becomes available again.
-
NUMA topology and I/O exposure to guests.
-
PVH guests now support Paravirtualized Hardware extensions (v3).
-
Zram
Zram compresses everything written to specified block devices in RAM, and is used typically for swap devices to improve the responsiveness of systems that have a limited amount of memory. The following example illustrates how to create and enable a zram swap device:
# mkswap /dev/zram0 # swapon /dev/zram0
The next example illustrates how to create a file system on a zram device and then mount this file system:
# mkfs.ext4 /dev/zram1 # mount /dev/zram1 /tmp
The following notable zram features are implemented in UEK R4:
-
Zram has been moved out of staging to
drivers/block/zram
. -
Support for LZ4 compression in addition to LZO.
-
Performance improvements to concurrent compression of multiple compression streams.
-
Support for switching the compression algorithm in
/sys/block/zramN/comp_algorithm
. -
Support for limiting the maximum amount of useable memory for a zram device in
/sys/block/zramN/mem_limit
. You can use memory unit suffixes when setting a value, for example:# echo 1G > /sys/block/zram0/mem_limit
To disable the limit, set the value to 0.
-
Support for displaying the maximum memory that a zram device has consumed in
/sys/block/zramN/mem_used_max
. Writing 0 to this file resets the counter.
Zswap
Zswap is a lightweight, write-behind compressed caching mechanism for swap pages that attempts to compress a page being swapped out to RAM. A successful compression defers and, in many cases, prevents writeback to the swap device, reducing I/O and increasing the performance of a system that is swapping.
For more information, see https://lwn.net/Articles/537422/.