People seem to be happy with second-hand Mellanox ConnectX-2s on Linux so I grabbed a pair. Both cards result in an network interface showing up on one computer but neither show up in the other computer.
Not working computer
Part Number: 666172-001 Description: HP ConnectX-2 Lx EN network interface card; single-port SFP+; PCIe2.0 5.0GT/s; mem-free; RoHS R6 PSID: HP_0F60000010 FW 2.9.1000
After the ASUS splash logo, a blank screen with a only blinking cursor appears and it never gets to GRUB. The other computer was showing "Press some key to enter the Mellanox network boot manager" at this point. (I wish I could disable this screen altogether because I'm never going to PXE boot.)
I reset the box and it booted Linux this time but the kernel reports:
pci 0000:01:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff) mlx4_core: Mellanox ConnectX core driver v2.2-1 (Feb, 2014) mlx4_core: Initializing 0000:01:00.0 mlx4_core 0000:01:00.0: enabling device (0000 -> 0002) mlx4_core 0000:01:00.0: Multiple PFs not yet supported - Skipping PF mlx4_core: probe of 0000:01:00.0 failed with error -22
My onboard Intel no longer works:
e1000e 0000:00:19.0: can't find IRQ for PCI INT A; probably buggy MP table e1000e 0000:00:19.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode e1000e: probe of 0000:00:19.0 failed with error -2
Removing the Mellanox card doesn't bring the Intel card back. The Intel returns only after I cut power to the motherboard and powered it back on.
I disabled all PCIe power saving things in the UEFI setup, try a different PCIe port, and pass acpi=off or pcie_aspm=off to Linux.
mlx4_core: Mellanox ConnectX core driver v2.2-1 (Feb, 2014) mlx4_core: Initializing 0000:02:00.0 mlx4_core 0000:02:00.0: Missing DCS, aborting (driver_data: 0x2, pci_resource_flags(pdev, 0):0x0)
According to the driver source, this means the "PCIe BAR" was 4 MB but it was expecting 1 MB? Maybe I need to disable SR-IOV on the card but I don't know how; for ConnectX-3 it can be done through mlxconfig. I don't even need SR-IOV, I'm not planning on using VFs.
I downloaded a non-HP branded firmware image from Mellanox's website, backed up the current image, and flashed one the cards using:
sudo flint -d /dev/mst/mt26448_pci_cr0 -i fw-ConnectX2-rel-2_9_1200-MNPA19_A1-A3-FlexBoot-3.3.400.bin -allow_psid_change burn
Now it looks like this:
Part Number: MNPA19_A1-A3 Description: ConnectX-2 Lx EN network interface card; single-port SFP+; PCIe2.0 5.0GT/s; mem-free; RoHS R6 PSID: MT_0F60110010 FW 2.9.1200
Now, when I boot it with pcie_aspm=off, I get this:
mlx4_core 0000:02:00.0: command 0xff6 timed out (go bit not cleared) mlx4_core 0000:02:00.0: device is going to be reset mlx4_core 0000:02:00.0: PCI can't be accessed to read vendor id mlx4_core 0000:02:00.0: device was reset successfully mlx4_core 0000:02:00.0: RUN_FW command failed, aborting mlx4_core 0000:02:00.0: Failed to start FW, aborting mlx4_core 0000:02:00.0: Failed to init fw, aborting. mlx4_core: probe of 0000:02:00.0 failed with error -5
According to an OFED FAQ, "The error message above indicates that the device's hardware capabilities do not match the firmware configuration (.ini) file parameter settings," but it still works in the other machine.
Can I get this card to work with this motherboard? (Virtual functions not needed)
User contributions licensed under CC BY-SA 3.0