Fixed Issues
The following issues have been fixed in this update.
-
Severe NVMe storage performance degradation issue fixed
A bug that caused significant performance degradation when using NVMe devices in conjunction with earlier UEK R4 releases has been resolved. The issue caused sequential reads with abnormal request sizes that severely limited I/O regardless of block size. NVMe performance related patches were applied to the kernel and the issue has been resolved.
-
ocfs2
: Slow journal replay fixedA minor bug in the
ocfs2
code caused journal replay for a dead node to take longer than necessary as all blocks from the dead node's journal inode were loaded from disk to memory to avoid a stale cache, however performance is enhanced if only the cached blocks are reloaded. A patch was applied to improve recovery performance. -
ocfs2
: Fixed issue releasing disk space after file deletionAn issue that caused
ocfs2
to not release disk space after a large number of files were deleted has been fixed. The patches applied fixed the code that extended credits while there are free cached blocks and while flushing the truncate log. -
ocfs2
: Fixed issue with unknown option 'ExecRestart' in section 'Service' for theo2cb.service
fileAn invalid entry in theo2cb.service
file that was included in theocfs2-tools
user space package had been resolved. The problem caused the following message to appear in the systemd status or in/var/log/messages
:systemd[1]: [/usr/lib/systemd/system/o2cb.service:11] Unknown lvalue 'ExecRestart' in section 'Service'
The problem is now resolved inocfs2-tools-1.8.6-9.el7
and later. -
Kernel panic during storage device reset when using the
lpfc
driver moduleA bug that caused a kernel panic during a storage device reset was fixed in the
lpfc
driver module. The issue appeared during both when an sg_reset command was issued and when SCSI EH (Error Handling) triggered a reset. The patch fixes this issue. -
Hyper-V clock source changed to use TSC
An upstream fix that changed the Hyper-V clock source to use the Time Stamp Counter (TSC), for greater efficiency in kernel operations that involve reading time stamps, has been been backported into this release.
-
Hyper-V storage driver performance improvements
Upstream updates to the
storvsc
Hyper-V storage driver were included to provide moderate performance improvement of I/O operations for certain workloads. -
Hyper-V fix for guest reboot on failover issue
A bug that caused virtual machines running on Hyper-V to reboot during a graceful node failover, so that live migration was unsuccessful, has been fixed. This problem was caused as a result of a failure to check that all heartbeat and vmbus messages were correctly processed. A patch was applied and the problem is resolved.
-
Hyper-V fix for incorrect receive checksum offloading in the
netsvc
driverA bug in the Hyper-V
netvsc
driver caused TCP packets with a bad receive checksum to be passed up the stack to the application layer, potentially causing data corruption. The fix included in this update causes packets with incorrect checksums to be dropped with an error. -
Update for network bonding to fix
primary_reselect
withfailure
The
primary_reselect
option used to define the reselection policy for the primary slave in a network bond would not behave correctly when set tofailure
or2
. The issue would result in the primary slave becoming active again when it had recovered. The expected, and documented, behavior is that the primary slave should not become active until the current active slave is down. A patch was applied to the bonding code to change the bond_find_best_slave functionality to avoid traversing members if the primary interface is not a candidate for failover or reselection and the current active slave is still up. -
Fix for SCSI code that caused a kernel crash when a target node in an HA pair was rebooted
Several patches were applied to the
scsi
driver code to fix an issue that caused the kernel to crash when a target node in an HA pair was rebooted on a SAN booted LUN configured for multipath. The issue resulted when the SCSI target device was marked for removal and there was a delay in getting it into theDEL
state. This could cause the same target to get marked for removal twice. The applied patches resolved the issue. -
Fix applied for Mellanox®
mlx4
driver to resolve the "Node crashed at cache_alloc_refill+0x1ab" errorA bug that allowed multiple work queues to be allocated to the same
id_map_ent
structure in themlx4
driver code was patched. This issue could cause a kernel crash if a worker routine cleaned up and freed the structure. The patch checks that previously queued work on the structure has been successfully cancelled before new work is queued on the same structure. -
timer
code patched for race condition that could result in a kernel oopsA patch was applied to fix an issue in the
timer
code that resulted in a kernel oops when a timer was migrated to an alternate CPU and had been left unlocked on the original CPU. The fix performs a proper migration and does the appropriate checks and locks during the migration to prevent the race condition. -
Race condition in freeing aging forwarding tables in
xsigo
driver fixedA fix was applied to the
xsigo
driver code to detect and avoide a potential race condition while accessing the forwarding table during the deletion of an aged forwarding entry. This issue could cause nodes to reboot inadvertently. -
Race condition issue in the
optrom
functions in theqla2xxx
driver fixedA race condition that triggered when a thread modified the
optrom
buffer, in theqla2xxx
driver, at the same time that another thread attempted to read from it was patched in this update. This issue was fixed by getting a mutex lock before checking theoptrom
state. The problem could result in kernel panic and inadvertent system reboots. -
netxen
driver patched for incorrect error handlingThe QLogic/NetXen (1/10) GbE Intelligent Ethernet Driver was patched to fix an error handling issue that prevented the
netxen_rom_fast_read()
function from ever returning -1. Additional vendor patches were also applied. -
RDS patched to fix QoS threshold calculation
When the Reliable Datagram Sockets (RDS) protocol was placed under loads that caused it to drop packets, the
qos_threshold_exceeded
parameter was not incremented because the RDMA payloads were calculated incorrectly. This caused the Quality of Service (QoS) functionality to fail. A patch was applied to fix this calculation so that the QoS threshold could be enforced. -
RDMA package updated to fix
mlx4_ib
insertion error when RDMA startsA bug that caused a benign insertion error when the RDMA service started was fixed in the
rdma-3.10-3.0.25
package. With this update release, the RDMA package is updated tordma-3.10-3.0.31
to provide several further bug fixes and code improvements. -
NFSv4 issue with client incrementing the lock sequence number on
NFS4ERR_MOVED
fixedA change in the NFSv4 specification meant that when the NFS client connected to a server based on the newer specification and sent a lock to the source and got an
NFS4ERR_MOVED
response, if it resent the lock to the destination, it would generate a bad sequence ID error:NFS4ERR_BAD_SEQID
. The UEK R4 NFS client adheres to RFC 3530, which does not coverNFS4ERR_MOVED
. A patch was applied to better adhere to RFC 7530 and to prevent the client from incrementing the lock sequence ID after receiving anNFS4ERR_MOVED
from the server.