[aarch64] Networking fails for Mellanox ConnectX-3 Pro Ethernet controller
Mellanox networking may fail on Arm platform systems using the Mellanox ConnectX-3 Pro Ethernet controller with certain firmware versions. The issue typically results in the following dmesg output:
... [ 21.605491] mlx4_core 0001:01:00.0: Failed to initialize event queue table, aborting [ 22.660967] mlx4_core: probe of 0001:01:00.0 failed with error -12 [ 22.704966] mlx4_en: Mellanox ConnectX HCA Ethernet driver v4.0-0 [ 22.711355] mlx4_en 0000:01:00.0: Activating port:1 [ 22.742948] mlx4_en: 0000:01:00.0: Port 1: Using 32 TX rings [ 22.748600] mlx4_en: 0000:01:00.0: Port 1: Using 8 RX rings [ 22.754437] mlx4_en: 0000:01:00.0: Port 1: Initializing port [ 22.760602] mlx4_en 0000:01:00.0: registered PHC clock [ 22.766283] mlx4_en 0000:01:00.0: Activating port:2 [ 22.766956] mlx4_core 0000:01:00.0 enp1s0: renamed from eth0 [ 22.778621] mlx4_en: 0000:01:00.0: Port 2: Failed to allocate NIC resources [ 22.785776] mlx4_en 0000:01:00.0: removed PHC [ 25.488635] mlx4_en: enp1s0: Steering Mode 1 ...
This issue can be resolved by using the maxcpus=8 kernel parameter at boot, to limit the number of CPUs that are available during the boot process. Once the system has fully booted, Systemd enables all available CPUs and there is no performance impact.
To set this parameter so that it is used for all kernels when the
system boots, edit the GRUB configuration. You can do this by
editing the GRUB_CMDLINE_LINUX
line in
/etc/sysconfig/grub
in a text editor, for
example:
GRUB_CMDLINE_LINUX="crashkernel=auto resume=/dev/mapper/linux1-swap rd.lvm.lv=linux1/root \ rd.lvm.lv=linux1/swap rhgb quiet maxcpus=8"
To update your grub configuration with the changes so that they are used on the next boot if you are using legacy BIOS, run the following command:
# grub2-mkconfig -o /boot/grub2/grub.cfg
Alternately, if you are booting using UEFI, run the following command:
# grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg
This issue is only present in later firmware versions for this
hardware. The issue is not replicated on cards with the
HVE102M-0.2
firmware, but appears when the
firmware is upgraded to HVE104N-1.12
. The issue
can also be avoided by downgrading the card firmware. (Bug ID
30877943)