- Owner's Guide
- Extending Oracle Zero Data Loss Recovery Appliance
- Extending a Rack by Adding Another Rack
- Cabling Two Racks Together
- Cabling Two RoCE Network Fabric Racks Together with No Down Time
- Extending an RA21 or Later Model Rack with No Down Time by Adding Another RA21 or Later Model Rack
Extending an RA21 or Later Model Rack with No Down Time by Adding Another RA21 or Later Model Rack
WARNING:
Take time to read and understand this procedure before implementation. Pay careful attention to the instructions that surround the command examples. A system outage may occur if the procedure is not applied correctly.
Note:
For additional background information, see Understanding Multi-Rack Cabling for RA21 and Later Model Racks.
Use this procedure to extend a typical RA21 or later model rack by cabling it together with a second RA21 or later model rack. The primary rack (designated R1) and all of the systems it supports remain online throughout the procedure. At the beginning of the procedure, the additional rack (designated R2) is shut down.
The following is an outline of the procedure:
-
In this phase, you prepare the racks, switches, and cables. Also, you install and cable the spine switches in both racks.
-
Configuration and Physical Cabling
In this phase, you reconfigure the leaf switches and finalize the cabling to the spine switches. These tasks are carefully orchestrated to avoid downtime on the primary system, as follows:
-
Partially configure the lower leaf switches (step 3)
In this step, you reconfigure the switch ports on the lower leaf switches. There is no physical cabling performed in this step.
-
Partially configure the upper leaf switches (step 4)
In this step, you reconfigure the switch ports on the upper leaf switches, remove the inter-switch cables that connect the leaf switches in both racks and connect the cables between the upper leaf switches and the spine switches.
-
Finalize the lower leaf switches (step 5)
In this step, you finalize the switch port configuration on the lower leaf switches. You also complete the physical cabling by connecting the cables between the lower leaf switches and the spine switches.
-
Finalize the upper leaf switches (step 6)
In this step, you finalize the switch port configuration on the upper leaf switches.
-
-
Validation and Testing (steps 7 and 8)
In this phase, you validate and test the RoCE Network Fabric across both of the interconnect racks.
After completing the procedure, both racks share the RoCE Network Fabric, and the combined system is ready for further configuration. For example, you can extend existing disk groups and Oracle RAC databases to consume resources across both racks.
Note:
-
This procedure applies only to typical rack configurations that initially have leaf switches with the following specifications:
-
The inter-switch ports are ports 4 to 7, and ports 30 to 33.
-
The storage server ports are ports 8 to 14, and ports 23 to 29.
-
The database server ports are ports 15 to 22.
For other rack configurations (for example, X9M-8 systems with three database servers and 11 storage servers) a different procedure and different RoCE Network Fabric switch configuration files are required. Contact Oracle for further guidance.
-
-
The procedure uses the following naming abbreviations and conventions:
-
The abbreviation for the existing rack is R1, and the new rack is R2.
-
LL identifies a lower leaf switch and UL identifies an upper leaf switch.
-
SS identifies a spine switch.
-
A specific switch is identified by combining abbreviations. For example, R1LL identifies the lower leaf switch (LL) on the existing rack (R1).
-
-
Most operations must be performed in multiple locations. For example, step 1.h instructs you to update the firmware on all the RoCE Network Fabric leaf switches (R1LL, R1UL, R2LL, and R2UL). Pay attention to the instructions and keep track of your actions.
Tip:
When a step must be performed on multiple switches, the instruction contains a list of the applicable switches. For example, (R1LL, R1UL, R2LL, and R2UL). You can use this list as a checklist to keep track of your actions.
-
Perform operations sequentially, and complete every operation before proceeding. For example, run the entire command sequence at 3.a.i as one operation and complete it before proceeding.
-
All of commands that are run on a RoCE Network Fabric switch must be run while connected to the switch management interface as the switch administrator.
- Prepare the systems.
- Position the new rack (R2) so that it is physically near the existing rack
(R1).
The RDMA Network Fabric cables must be able to reach the switches in each rack.
For the required cross-rack cabling information, see Two-Rack Cabling for RA21 and Later Model Racks.
- Power on all of the servers and network switches in the new rack (R2).
This includes the database servers, storage servers, RoCE Network Fabric leaf switches, and the Management Network Switch.
- Prepare the RoCE Network Fabric cables that you will
use to interconnect the racks.
Label both ends of every cable.
For the required cross-rack cabling information, see Two-Rack Cabling for RA21 and Later Model Racks.
- Connect the new rack (R2) to your existing management network.
Ensure that there are no IP address conflicts across the racks and that you can access the management interfaces on the RoCE Network Fabric switches.
- Ensure that you have a backup of the current switch configuration for each RoCE Network Fabric switch (R1LL, R1UL, R2LL, and
R2UL).
See Backing Up Settings on the RoCE Network Fabric Switch in Oracle Exadata Database Machine Maintenance Guide.
- Download the required RoCE Network Fabric switch
configuration files.
This procedure requires specific RoCE Network Fabric switch configuration files, which you must download from My Oracle Support document 2704997.1.
WARNING:
You must use different switch configuration files depending on whether your system uses Exadata Secure RDMA Fabric Isolation. Ensure that you download the correct archive that matches your system configuration.
For system configurations without Secure Fabric, download
online_multi-rack_14uplinks.zip
. For system configurations with Secure Fabric, downloadonline_SF_enabled_multi-rack_14uplinks.zip
.Download and extract the archive containing the required RoCE Network Fabric switch configuration files. Place the files on a server with access to the management interfaces on the RoCE Network Fabric switches.
- Copy the required RoCE Network Fabric switch
configuration files to the leaf switches on both racks.
You can use the following commands to copy the required configuration files to all of the RoCE Network Fabric switches on a system without Secure Fabric enabled:
-
# scp roce_multi_14uplinks_online_step3_R1_LL.cfg admin@R1LL_IP:/
-
# scp roce_multi_14uplinks_online_step3_R2_LL.cfg admin@R2LL_IP:/
-
# scp roce_multi_14uplinks_online_step4_R1_UL.cfg admin@R1UL_IP:/
-
# scp roce_multi_14uplinks_online_step4_R2_UL.cfg admin@R2UL_IP:/
-
# scp roce_multi_14uplinks_online_step5.cfg admin@R1LL_IP:/
-
# scp roce_multi_14uplinks_online_step5.cfg admin@R2LL_IP:/
On a system with Secure Fabric enabled, you can use the following commands:
-
# scp roce_SF_multi_14uplinks_online_step3_R1_LL.cfg admin@R1LL_IP:/
-
# scp roce_SF_multi_14uplinks_online_step3_R2_LL.cfg admin@R2LL_IP:/
-
# scp roce_SF_multi_14uplinks_online_step4_R1_UL.cfg admin@R1UL_IP:/
-
# scp roce_SF_multi_14uplinks_online_step4_R2_UL.cfg admin@R2UL_IP:/
-
# scp roce_SF_multi_14uplinks_online_step5.cfg admin@R1LL_IP:/
-
# scp roce_SF_multi_14uplinks_online_step5.cfg admin@R2LL_IP:/
In the above commands, substitute the appropriate IP address or host name where applicable. For example, in place of R1LL_IP, substitute the management IP address or host name for the lower leaf switch (LL) on the existing rack (R1).
Note:
The command examples in the rest of this procedure use the configuration files for a system configuration without Secure Fabric enabled. If required, adjust the commands to use the Secure Fabric-enabled switch configuration files. -
- Update the firmware to the latest available release on all of the RoCE Network Fabric leaf switches (R1LL, R1UL, R2LL, and
R2UL).
See Updating RoCE Network Fabric Switch Firmware in Oracle Exadata Database Machine Maintenance Guide.
- Examine the RoCE Network Fabric leaf switches (R1LL,
R1UL, R2LL, and R2UL) and confirm the port categories for the cabled ports.
Run the
show interface status
command on every RoCE Network Fabric leaf switch:-
R1LL# show interface status
-
R1UL# show interface status
-
R2LL# show interface status
-
R2UL# show interface status
Examine the output and confirm the port categories as follows:
-
Confirm that the inter-switch ports are ports 4 to 7, and ports 30 to 33.
-
Confirm that the storage server ports are ports 8 to 14, and ports 23 to 29.
-
Confirm that the database server ports are ports 15 to 22.
For example:
R1LL# show interface status -------------------------------------------------------------------------------- Port Name Status Vlan Duplex Speed Type -------------------------------------------------------------------------------- mgmt0 -- connected routed full 1000 -- -------------------------------------------------------------------------------- Port Name Status Vlan Duplex Speed Type -------------------------------------------------------------------------------- Eth1/1 -- xcvrAbsen 1 auto auto -- Eth1/2 -- xcvrAbsen 1 auto auto -- Eth1/3 -- xcvrAbsen 1 auto auto -- Eth1/4 ISL1 connected trunk full 100G QSFP-100G-CR4 Eth1/5 ISL2 connected trunk full 100G QSFP-100G-CR4 Eth1/6 ISL3 connected trunk full 100G QSFP-100G-CR4 Eth1/7 ISL4 connected trunk full 100G QSFP-100G-CR4 Eth1/8 celadm14 connected 3888 full 100G QSFP-100G-CR4 Eth1/9 celadm13 connected 3888 full 100G QSFP-100G-CR4 Eth1/10 celadm12 connected 3888 full 100G QSFP-100G-CR4 Eth1/11 celadm11 connected 3888 full 100G QSFP-100G-CR4 Eth1/12 celadm10 connected 3888 full 100G QSFP-100G-CR4 Eth1/13 celadm09 connected 3888 full 100G QSFP-100G-CR4 Eth1/14 celadm08 connected 3888 full 100G QSFP-100G-CR4 Eth1/15 adm08 connected 3888 full 100G QSFP-100G-CR4 Eth1/16 adm07 connected 3888 full 100G QSFP-100G-CR4 Eth1/17 adm06 connected 3888 full 100G QSFP-100G-CR4 Eth1/18 adm05 connected 3888 full 100G QSFP-100G-CR4 Eth1/19 adm04 connected 3888 full 100G QSFP-100G-CR4 Eth1/20 adm03 connected 3888 full 100G QSFP-100G-CR4 Eth1/21 adm02 connected 3888 full 100G QSFP-100G-CR4 Eth1/22 adm01 connected 3888 full 100G QSFP-100G-CR4 Eth1/23 celadm07 connected 3888 full 100G QSFP-100G-CR4 Eth1/24 celadm06 connected 3888 full 100G QSFP-100G-CR4 Eth1/25 celadm05 connected 3888 full 100G QSFP-100G-CR4 Eth1/26 celadm04 connected 3888 full 100G QSFP-100G-CR4 Eth1/27 celadm03 connected 3888 full 100G QSFP-100G-CR4 Eth1/28 celadm02 connected 3888 full 100G QSFP-100G-CR4 Eth1/29 celadm01 connected 3888 full 100G QSFP-100G-CR4 Eth1/30 ISL5 connected trunk full 100G QSFP-100G-CR4 Eth1/31 ISL6 connected trunk full 100G QSFP-100G-CR4 Eth1/32 ISL7 connected trunk full 100G QSFP-100G-CR4 Eth1/33 ISL8 connected trunk full 100G QSFP-100G-CR4 Eth1/34 -- xcvrAbsen 1 auto auto -- Eth1/35 -- xcvrAbsen 1 auto auto -- Eth1/36 -- xcvrAbsen 1 auto auto -- Po100 -- connected trunk full 100G -- Lo0 Routing loopback i connected routed auto auto -- Lo1 VTEP loopback inte connected routed auto auto -- Vlan1 -- down routed auto auto -- nve1 -- connected -- auto auto --
-
- For each rack (R1 and R2), confirm the RoCE Network Fabric cabling by running the
verify_roce_cables.py
script.The
verify_roce_cables.py
script uses two input files; one for database servers and storage servers (nodes.rackN
), and another for switches (switches.rackN
). In each file, every server or switch must be listed on separate lines. Use fully qualified domain names or IP addresses for each server and switch.See My Oracle Support document 2587717.1 for download and detailed usage instructions.
Run the
verify_roce_cables.py
script against both of the racks:-
# cd /opt/oracle.SupportTools/ibdiagtools # ./verify_roce_cables.py -n nodes.rack1 -s switches.rack1
-
# cd /opt/oracle.SupportTools/ibdiagtools # ./verify_roce_cables.py -n nodes.rack2 -s switches.rack2
Check that output in the
CABLE OK?
columns contains theOK
status.The following example shows the expected command results:
# cd /opt/oracle.SupportTools/ibdiagtools # ./verify_roce_cables.py -n nodes.rack1 -s switches.rack1 SWITCH PORT (EXPECTED PEER) LOWER LEAF (rack1sw-rocea0) : CABLE OK? UPPER LEAF (rack1sw-roceb0) : CABLE OK? ----------- --------------- -------------------------------- : -------- -------------------------------- : --------- Eth1/4 (ISL peer switch) : rack1sw-rocea0 Ethernet1/4 : OK rack1sw-roceb0 Ethernet1/4 : OK Eth1/5 (ISL peer switch) : rack1sw-rocea0 Ethernet1/5 : OK rack1sw-roceb0 Ethernet1/5 : OK Eth1/6 (ISL peer switch) : rack1sw-rocea0 Ethernet1/6 : OK rack1sw-roceb0 Ethernet1/6 : OK Eth1/7 (ISL peer switch) : rack1sw-rocea0 Ethernet1/7 : OK rack1sw-roceb0 Ethernet1/7 : OK Eth1/8 (RU39) : rack1celadm14 port-1 : OK rack1celadm14 port-2 : OK Eth1/9 (RU37) : rack1celadm13 port-1 : OK rack1celadm13 port-2 : OK Eth1/10 (RU35) : rack1celadm12 port-1 : OK rack1celadm12 port-2 : OK Eth1/11 (RU33) : rack1celadm11 port-1 : OK rack1celadm11 port-2 : OK Eth1/12 (RU31) : rack1celadm10 port-1 : OK rack1celadm10 port-2 : OK Eth1/13 (RU29) : rack1celadm09 port-1 : OK rack1celadm09 port-2 : OK Eth1/14 (RU27) : rack1celadm08 port-1 : OK rack1celadm08 port-2 : OK Eth1/15 (RU26) : rack1adm08 port-1 : OK rack1adm08 port-2 : OK Eth1/16 (RU25) : rack1adm07 port-1 : OK rack1adm07 port-2 : OK Eth1/17 (RU24) : rack1adm06 port-1 : OK rack1adm06 port-2 : OK Eth1/18 (RU23) : rack1adm05 port-1 : OK rack1adm05 port-2 : OK Eth1/19 (RU19) : rack1adm04 port-1 : OK rack1adm04 port-2 : OK Eth1/20 (RU18) : rack1adm03 port-1 : OK rack1adm03 port-2 : OK Eth1/21 (RU17) : rack1adm02 port-1 : OK rack1adm02 port-2 : OK Eth1/22 (RU16) : rack1adm01 port-1 : OK rack1adm01 port-2 : OK Eth1/23 (RU14) : rack1celadm07 port-1 : OK rack1celadm07 port-2 : OK Eth1/24 (RU12) : rack1celadm06 port-1 : OK rack1celadm06 port-2 : OK Eth1/25 (RU10) : rack1celadm05 port-1 : OK rack1celadm05 port-2 : OK Eth1/26 (RU08) : rack1celadm04 port-1 : OK rack1celadm04 port-2 : OK Eth1/27 (RU06) : rack1celadm03 port-1 : OK rack1celadm03 port-2 : OK Eth1/28 (RU04) : rack1celadm02 port-1 : OK rack1celadm02 port-2 : OK Eth1/29 (RU02) : rack1celadm01 port-1 : OK rack1celadm01 port-2 : OK Eth1/30 (ISL peer switch) : rack1sw-rocea0 Ethernet1/30 : OK rack1sw-roceb0 Ethernet1/30 : OK Eth1/31 (ISL peer switch) : rack1sw-rocea0 Ethernet1/31 : OK rack1sw-roceb0 Ethernet1/31 : OK Eth1/32 (ISL peer switch) : rack1sw-rocea0 Ethernet1/32 : OK rack1sw-roceb0 Ethernet1/32 : OK Eth1/33 (ISL peer switch) : rack1sw-rocea0 Ethernet1/33 : OK rack1sw-roceb0 Ethernet1/33 : OK
-
- For each rack (R1 and R2), verify the RoCE Network Fabric operation by using the
infinicheck
command.-
Use
infinicheck
with the-z
option to clear the files that were created during the last run of theinfinicheck
command. -
Use
infinicheck
with the-s
option to set up user equivalence for password-less SSH across the RoCE Network Fabric. -
Finally, verify the RoCE Network Fabric operation by using
infinicheck
with the-b
option, which is recommended on newly imaged machines where it is acceptable to suppress thecellip.ora
andcellinit.ora
configuration checks.
In each command, the hosts input file (
hosts.rack1
andhosts.rack2
) contains a list of database server RoCE Network Fabric IP addresses (2 RoCE Network Fabric IP addresses for each database server), and the cells input file (cells.rack1
andcells.rack2
) contains a list of RoCE Network Fabric IP addresses for the storage servers (2 RoCE Network Fabric IP addresses for each storage server).-
Use the following recommended command sequence on the existing rack (R1):
-
# cd /opt/oracle.SupportTools/ibdiagtools # ./infinicheck -g hosts.rack1 -c cells.rack1 -z
-
# cd /opt/oracle.SupportTools/ibdiagtools # ./infinicheck -g hosts.rack1 -c cells.rack1 -s
-
# cd /opt/oracle.SupportTools/ibdiagtools # ./infinicheck -g hosts.rack1 -c cells.rack1 -b
-
-
Use the following recommended command sequence on the new rack (R2):
-
# cd /opt/oracle.SupportTools/ibdiagtools # ./infinicheck -g hosts.rack2 -c cells.rack2 -z
-
# cd /opt/oracle.SupportTools/ibdiagtools # ./infinicheck -g hosts.rack2 -c cells.rack2 -s
-
# cd /opt/oracle.SupportTools/ibdiagtools # ./infinicheck -g hosts.rack2 -c cells.rack2 -b
-
The following example shows the expected command results for the final command in the sequence:
# cd /opt/oracle.SupportTools/ibdiagtools # ./infinicheck -g hosts.rackN -c cells.rackN -b INFINICHECK [Network Connectivity, Configuration and Performance] #### FABRIC TYPE TESTS #### System type identified: RoCE Verifying User Equivalance of user=root from all DBs to all CELLs. #### RoCE CONFIGURATION TESTS #### Checking for presence of RoCE devices on all DBs and CELLs [SUCCESS].... RoCE devices on all DBs and CELLs look good Checking for RoCE Policy Routing settings on all DBs and CELLs [SUCCESS].... RoCE Policy Routing settings look good Checking for RoCE DSCP ToS mapping on all DBs and CELLs [SUCCESS].... RoCE DSCP ToS settings look good Checking for RoCE PFC settings and DSCP mapping on all DBs and CELLs [SUCCESS].... RoCE PFC and DSCP settings look good Checking for RoCE interface MTU settings. Expected value : 2300 [SUCCESS].... RoCE interface MTU settings look good Verifying switch advertised DSCP on all DBs and CELLs ports ( ) [SUCCESS].... Advertised DSCP settings from RoCE switch looks good #### CONNECTIVITY TESTS #### [COMPUTE NODES -> STORAGE CELLS] (60 seconds approx.) (Will walk through QoS values: 0-6) [SUCCESS]..........Results OK [SUCCESS]....... All can talk to all storage cells [COMPUTE NODES -> COMPUTE NODES] ...
-
- Position the new rack (R2) so that it is physically near the existing rack
(R1).
- Install the spine switches (R1SS and R2SS).
- Physically install and power up the spine switches in the existing rack (R1SS) and
the new rack (R2SS).
-
Physically install each spine switch in RU1.
-
For each spine switch, ensure that the management Ethernet interface is connected to the management network and then supply power.
-
On each spine switch, perform the initial configuration steps outlined in Configuring the Cisco Nexus C9336C-FX2 Switch. Skip the step for applying the golden configuration settings as you will do this later.
-
For each spine switch, perform a ping test to the management Ethernet interface to ensure that the switch is online and accessible.
-
- Apply the golden configuration settings to the new spine switches.
See Applying Golden Configuration Settings on RoCE Network Fabric Switches in Oracle Exadata Database Machine Maintenance Guide.
You can use the instance of
patchmgr
that you previously used to update the firmware on the leaf switches (in step 1.h).Use a switch list file (
spines.lst
) to apply the golden configuration settings to both spine switches using onepatchmgr
command:# cat spines.lst R1SS_IP:mspine.201 R2SS_IP:mspine.202 # ./patchmgr --roceswitches spines.lst --apply-config -log_dir /tmp/spinelogs
Note:
In the switch list file, R1SS_IP is the management IP address or host name for the spine switch on the existing rack (R1SS) and R2SS_IP is the management IP address or host name for the spine switch on the new rack (R2SS).
- Upgrade the firmware on the spine switches.
See Updating RoCE Network Fabric Switch Firmware in Oracle Exadata Database Machine Maintenance Guide.
You can use the instance of
patchmgr
that you used in the previous step.Use a switch list file (
spines.lst
) to perform the firmware upgrade on both spine switches using onepatchmgr
command:# cat spines.lst R1SS_IP:mspine.201 R2SS_IP:mspine.202 # ./patchmgr --roceswitches spines.lst --upgrade -log_dir /tmp/spinelogs
Note:
In the switch list file, R1SS_IP is the management IP address or host name for the spine switch on the existing rack (R1SS) and R2SS_IP is the management IP address or host name for the spine switch on the new rack (R2SS).
- Connect the RoCE Network Fabric cables to the spine
switches (R1SS and R2SS).
WARNING:
At this stage, only connect the cables to the spine switches.
To avoid later complications, ensure that each cable connects to the correct switch and port.
DO NOT CONNECT ANY OF THE CABLES TO THE LEAF SWITCHES.
Use the cables that you prepared earlier (in step 1.c).
For the required cross-rack cabling information, see Two-Rack Cabling for RA21 and Later Model Racks.
- Physically install and power up the spine switches in the existing rack (R1SS) and
the new rack (R2SS).
- Perform the first round of configuration on the lower leaf switches (R1LL and
R2LL).
Perform this step on the lower leaf switches (R1LL and R2LL) only.
Note:
During this step, the lower leaf switch ports are shut down. While the R1LL ports are down, R1UL exclusively supports the RoCE Network Fabric. During this time, there is no redundancy in the RoCE Network Fabric, and availability cannot be maintained if R1UL goes down.
- Shut down the switch ports on the lower leaf switches (R1LL and R2LL).
-
On R1LL:
R1LL# configure terminal Enter configuration commands, one per line. End with CNTL/Z. R1LL(config)# interface ethernet 1/1-36 R1LL(config-if-range)# shut R1LL(config-if-range)# exit R1LL(config)# copy running-config startup-config [########################################] 100% Copy complete, now saving to disk (please wait)... Copy complete R1LL(config)# <Ctrl-Z> R1LL#
-
Repeat the command sequence on R2LL:
R2LL# configure terminal Enter configuration commands, one per line. End with CNTL/Z. R2LL(config)# interface ethernet 1/1-36 R2LL(config-if-range)# shut R2LL(config-if-range)# exit R2LL(config)# copy running-config startup-config [########################################] 100% Copy complete, now saving to disk (please wait)... Copy complete R2LL(config)# <Ctrl-Z> R2LL#
-
- Reconfigure the lower leaf switch ports (R1LL and R2LL) .
For each switch, you must use the correct corresponding switch configuration file, which you earlier copied to the switch (in step 1.g).
-
On R1LL, the switch configuration file name must end with
step3_R1_LL.cfg
:R1LL# configure terminal Enter configuration commands, one per line. End with CNTL/Z. R1LL(config)# run-script bootflash:///roce_multi_14uplinks_online_step3_R1_LL.cfg | grep 'none' R1LL(config)# copy running-config startup-config [########################################] 100% Copy complete, now saving to disk (please wait)... Copy complete R1LL(config)# <Ctrl-Z> R1LL#
-
On R2LL, the switch configuration file name must end with
step3_R2_LL.cfg
:R2LL# configure terminal Enter configuration commands, one per line. End with CNTL/Z. R2LL(config)# run-script bootflash:///roce_multi_14uplinks_online_step3_R2_LL.cfg | grep 'none' R2LL(config)# copy running-config startup-config [########################################] 100% Copy complete, now saving to disk (please wait)... Copy complete R2LL(config)# <Ctrl-Z> R2LL#
Note:
This step can take approximately 5 to 8 minutes on each switch.
-
- Start the inter-switch ports on the lower leaf switches (R1LL and R2LL) .
-
On R1LL:
R1LL# configure terminal Enter configuration commands, one per line. End with CNTL/Z. R1LL(config)# interface ethernet 1/1-7, ethernet 1/30-36 R1LL(config-if-range)# no shut R1LL(config-if-range)# exit R1LL(config)# copy running-config startup-config [########################################] 100% Copy complete, now saving to disk (please wait)... Copy complete R1LL(config)# <Ctrl-Z> R1LL#
-
Repeat the command sequence on R2LL:
R2LL# configure terminal Enter configuration commands, one per line. End with CNTL/Z. R2LL(config)# interface ethernet 1/1-7, ethernet 1/30-36 R2LL(config-if-range)# no shut R2LL(config-if-range)# exit R2LL(config)# copy running-config startup-config [########################################] 100% Copy complete, now saving to disk (please wait)... Copy complete R2LL(config)# <Ctrl-Z> R2LL#
-
- Wait for 5 minutes to ensure that the ports you just started are fully operational before continuing.
- Verify the status of the inter-switch ports on the lower leaf switches (R1LL and
R2LL) .
Run the
show interface status
command on each lower leaf switch:-
R1LL# show interface status
-
R2LL# show interface status
Examine the output to ensure that the inter-switch ports are
connected
.For example:
R1LL# show interface status -------------------------------------------------------------------------------- Port Name Status Vlan Duplex Speed Type -------------------------------------------------------------------------------- mgmt0 -- connected routed full 1000 -- -------------------------------------------------------------------------------- Port Name Status Vlan Duplex Speed Type -------------------------------------------------------------------------------- Eth1/1 -- xcvrAbsen 1 auto auto -- Eth1/2 -- xcvrAbsen 1 auto auto -- Eth1/3 -- xcvrAbsen 1 auto auto -- Eth1/4 ISL1 connected trunk full 100G QSFP-100G-CR4 Eth1/5 ISL2 connected trunk full 100G QSFP-100G-CR4 Eth1/6 ISL3 connected trunk full 100G QSFP-100G-CR4 Eth1/7 ISL4 connected trunk full 100G QSFP-100G-CR4 Eth1/8 celadm14 disabled 3888 full 100G QSFP-100G-CR4 Eth1/9 celadm13 disabled 3888 full 100G QSFP-100G-CR4 Eth1/10 celadm12 disabled 3888 full 100G QSFP-100G-CR4 Eth1/11 celadm11 disabled 3888 full 100G QSFP-100G-CR4 Eth1/12 celadm10 disabled 3888 full 100G QSFP-100G-CR4 Eth1/13 celadm09 disabled 3888 full 100G QSFP-100G-CR4 Eth1/14 celadm08 disabled 3888 full 100G QSFP-100G-CR4 Eth1/15 adm08 disabled 3888 full 100G QSFP-100G-CR4 Eth1/16 adm07 disabled 3888 full 100G QSFP-100G-CR4 Eth1/17 adm06 disabled 3888 full 100G QSFP-100G-CR4 Eth1/18 adm05 disabled 3888 full 100G QSFP-100G-CR4 Eth1/19 adm04 disabled 3888 full 100G QSFP-100G-CR4 Eth1/20 adm03 disabled 3888 full 100G QSFP-100G-CR4 Eth1/21 adm02 disabled 3888 full 100G QSFP-100G-CR4 Eth1/22 adm01 disabled 3888 full 100G QSFP-100G-CR4 Eth1/23 celadm07 disabled 3888 full 100G QSFP-100G-CR4 Eth1/24 celadm06 disabled 3888 full 100G QSFP-100G-CR4 Eth1/25 celadm05 disabled 3888 full 100G QSFP-100G-CR4 Eth1/26 celadm04 disabled 3888 full 100G QSFP-100G-CR4 Eth1/27 celadm03 disabled 3888 full 100G QSFP-100G-CR4 Eth1/28 celadm02 disabled 3888 full 100G QSFP-100G-CR4 Eth1/29 celadm01 disabled 3888 full 100G QSFP-100G-CR4 Eth1/30 ISL5 connected trunk full 100G QSFP-100G-CR4 Eth1/31 ISL6 connected trunk full 100G QSFP-100G-CR4 Eth1/32 ISL7 connected trunk full 100G QSFP-100G-CR4 Eth1/33 ISL8 connected trunk full 100G QSFP-100G-CR4 Eth1/34 -- xcvrAbsen 1 auto auto -- Eth1/35 -- xcvrAbsen 1 auto auto -- Eth1/36 -- xcvrAbsen 1 auto auto -- Po100 -- connected trunk full 100G -- Lo0 Routing loopback i connected routed auto auto -- Lo1 VTEP loopback inte connected routed auto auto -- Vlan1 -- down routed auto auto -- nve1 -- connected -- auto auto --
-
- Start the storage server ports on the lower leaf switches (R1LL and R2LL) .
-
On R1LL:
R1LL# configure terminal Enter configuration commands, one per line. End with CNTL/Z. R1LL(config)# interface ethernet 1/8-14, ethernet 1/23-29 R1LL(config-if-range)# no shut R1LL(config-if-range)# exit R1LL(config)# copy running-config startup-config [########################################] 100% Copy complete, now saving to disk (please wait)... Copy complete R1LL(config)# <Ctrl-Z> R1LL#
-
Repeat the command sequence on R2LL:
R2LL# configure terminal Enter configuration commands, one per line. End with CNTL/Z. R2LL(config)# interface ethernet 1/8-14, ethernet 1/23-29 R2LL(config-if-range)# no shut R2LL(config-if-range)# exit R2LL(config)# copy running-config startup-config [########################################] 100% Copy complete, now saving to disk (please wait)... Copy complete R2LL(config)# <Ctrl-Z> R2LL#
-
- Wait for 5 minutes to ensure that the ports you just started are fully operational before continuing.
- Verify the status of the storage server ports on the lower leaf switches (R1LL and
R2LL).
Run the
show interface status
command on each lower leaf switch:-
R1LL# show interface status
-
R2LL# show interface status
Examine the output to ensure that the storage server ports are
connected
.For example:
R1LL# show interface status -------------------------------------------------------------------------------- Port Name Status Vlan Duplex Speed Type -------------------------------------------------------------------------------- mgmt0 -- connected routed full 1000 -- -------------------------------------------------------------------------------- Port Name Status Vlan Duplex Speed Type -------------------------------------------------------------------------------- Eth1/1 -- xcvrAbsen 1 auto auto -- Eth1/2 -- xcvrAbsen 1 auto auto -- Eth1/3 -- xcvrAbsen 1 auto auto -- Eth1/4 ISL1 connected trunk full 100G QSFP-100G-CR4 Eth1/5 ISL2 connected trunk full 100G QSFP-100G-CR4 Eth1/6 ISL3 connected trunk full 100G QSFP-100G-CR4 Eth1/7 ISL4 connected trunk full 100G QSFP-100G-CR4 Eth1/8 celadm14 connected 3888 full 100G QSFP-100G-CR4 Eth1/9 celadm13 connected 3888 full 100G QSFP-100G-CR4 Eth1/10 celadm12 connected 3888 full 100G QSFP-100G-CR4 Eth1/11 celadm11 connected 3888 full 100G QSFP-100G-CR4 Eth1/12 celadm10 connected 3888 full 100G QSFP-100G-CR4 Eth1/13 celadm09 connected 3888 full 100G QSFP-100G-CR4 Eth1/14 celadm08 connected 3888 full 100G QSFP-100G-CR4 Eth1/15 adm08 disabled 3888 full 100G QSFP-100G-CR4 Eth1/16 adm07 disabled 3888 full 100G QSFP-100G-CR4 Eth1/17 adm06 disabled 3888 full 100G QSFP-100G-CR4 Eth1/18 adm05 disabled 3888 full 100G QSFP-100G-CR4 Eth1/19 adm04 disabled 3888 full 100G QSFP-100G-CR4 Eth1/20 adm03 disabled 3888 full 100G QSFP-100G-CR4 Eth1/21 adm02 disabled 3888 full 100G QSFP-100G-CR4 Eth1/22 adm01 disabled 3888 full 100G QSFP-100G-CR4 Eth1/23 celadm07 connected 3888 full 100G QSFP-100G-CR4 Eth1/24 celadm06 connected 3888 full 100G QSFP-100G-CR4 Eth1/25 celadm05 connected 3888 full 100G QSFP-100G-CR4 Eth1/26 celadm04 connected 3888 full 100G QSFP-100G-CR4 Eth1/27 celadm03 connected 3888 full 100G QSFP-100G-CR4 Eth1/28 celadm02 connected 3888 full 100G QSFP-100G-CR4 Eth1/29 celadm01 connected 3888 full 100G QSFP-100G-CR4 Eth1/30 ISL5 connected trunk full 100G QSFP-100G-CR4 Eth1/31 ISL6 connected trunk full 100G QSFP-100G-CR4 Eth1/32 ISL7 connected trunk full 100G QSFP-100G-CR4 Eth1/33 ISL8 connected trunk full 100G QSFP-100G-CR4 Eth1/34 -- xcvrAbsen 1 auto auto -- Eth1/35 -- xcvrAbsen 1 auto auto -- Eth1/36 -- xcvrAbsen 1 auto auto -- Po100 -- connected trunk full 100G -- Lo0 Routing loopback i connected routed auto auto -- Lo1 VTEP loopback inte connected routed auto auto -- Vlan1 -- down routed auto auto -- nve1 -- connected -- auto auto --
-
- Start the database server ports on the lower leaf switches (R1LL and R2LL).
-
On R1LL:
R1LL# configure terminal Enter configuration commands, one per line. End with CNTL/Z. R1LL(config)# interface ethernet 1/15-22 R1LL(config-if-range)# no shut R1LL(config-if-range)# exit R1LL(config)# copy running-config startup-config [########################################] 100% Copy complete, now saving to disk (please wait)... Copy complete R1LL(config)# <Ctrl-Z> R1LL#
-
Repeat the command sequence on R2LL:
R2LL# configure terminal Enter configuration commands, one per line. End with CNTL/Z. R2LL(config)# interface ethernet 1/15-22 R2LL(config-if-range)# no shut R2LL(config-if-range)# exit R2LL(config)# copy running-config startup-config [########################################] 100% Copy complete, now saving to disk (please wait)... Copy complete R2LL(config)# <Ctrl-Z> R2LL#
-
- Wait for 5 minutes to ensure that the ports you just started are fully operational before continuing.
- Verify the status of the database server ports on the lower leaf switches (R1LL and
R2LL).
Run the
show interface status
command on each lower leaf switch:-
R1LL# show interface status
-
R2LL# show interface status
Examine the output to ensure that the database server ports are
connected
.For example:
R1LL# show interface status -------------------------------------------------------------------------------- Port Name Status Vlan Duplex Speed Type -------------------------------------------------------------------------------- mgmt0 -- connected routed full 1000 -- -------------------------------------------------------------------------------- Port Name Status Vlan Duplex Speed Type -------------------------------------------------------------------------------- Eth1/1 -- xcvrAbsen 1 auto auto -- Eth1/2 -- xcvrAbsen 1 auto auto -- Eth1/3 -- xcvrAbsen 1 auto auto -- Eth1/4 ISL1 connected trunk full 100G QSFP-100G-CR4 Eth1/5 ISL2 connected trunk full 100G QSFP-100G-CR4 Eth1/6 ISL3 connected trunk full 100G QSFP-100G-CR4 Eth1/7 ISL4 connected trunk full 100G QSFP-100G-CR4 Eth1/8 celadm14 connected 3888 full 100G QSFP-100G-CR4 Eth1/9 celadm13 connected 3888 full 100G QSFP-100G-CR4 Eth1/10 celadm12 connected 3888 full 100G QSFP-100G-CR4 Eth1/11 celadm11 connected 3888 full 100G QSFP-100G-CR4 Eth1/12 celadm10 connected 3888 full 100G QSFP-100G-CR4 Eth1/13 celadm09 connected 3888 full 100G QSFP-100G-CR4 Eth1/14 celadm08 connected 3888 full 100G QSFP-100G-CR4 Eth1/15 adm08 connected 3888 full 100G QSFP-100G-CR4 Eth1/16 adm07 connected 3888 full 100G QSFP-100G-CR4 Eth1/17 adm06 connected 3888 full 100G QSFP-100G-CR4 Eth1/18 adm05 connected 3888 full 100G QSFP-100G-CR4 Eth1/19 adm04 connected 3888 full 100G QSFP-100G-CR4 Eth1/20 adm03 connected 3888 full 100G QSFP-100G-CR4 Eth1/21 adm02 connected 3888 full 100G QSFP-100G-CR4 Eth1/22 adm01 connected 3888 full 100G QSFP-100G-CR4 Eth1/23 celadm07 connected 3888 full 100G QSFP-100G-CR4 Eth1/24 celadm06 connected 3888 full 100G QSFP-100G-CR4 Eth1/25 celadm05 connected 3888 full 100G QSFP-100G-CR4 Eth1/26 celadm04 connected 3888 full 100G QSFP-100G-CR4 Eth1/27 celadm03 connected 3888 full 100G QSFP-100G-CR4 Eth1/28 celadm02 connected 3888 full 100G QSFP-100G-CR4 Eth1/29 celadm01 connected 3888 full 100G QSFP-100G-CR4 Eth1/30 ISL5 connected trunk full 100G QSFP-100G-CR4 Eth1/31 ISL6 connected trunk full 100G QSFP-100G-CR4 Eth1/32 ISL7 connected trunk full 100G QSFP-100G-CR4 Eth1/33 ISL8 connected trunk full 100G QSFP-100G-CR4 Eth1/34 -- xcvrAbsen 1 auto auto -- Eth1/35 -- xcvrAbsen 1 auto auto -- Eth1/36 -- xcvrAbsen 1 auto auto -- Po100 -- connected trunk full 100G -- Lo0 Routing loopback i connected routed auto auto -- Lo1 VTEP loopback inte connected routed auto auto -- Vlan1 -- down routed auto auto -- nve1 -- connected -- auto auto --
-
Note:
Before proceeding, ensure that you have completed all of the actions in step 3 on both lower leaf switches (R1LL and R2LL). If not, then ensure that you go back and perform the missing actions.
- Shut down the switch ports on the lower leaf switches (R1LL and R2LL).
- Perform the first round of configuration on the upper leaf switches (R1UL and
R2UL).
Perform this step on the upper leaf switches (R1UL and R2UL) only.
Note:
At the start of this step, the upper leaf switch ports are shut down. While the R1UL ports are down, R1LL exclusively supports the RoCE Network Fabric on the existing rack. During this time, there is no redundancy in the RoCE Network Fabric, and availability cannot be maintained if R1LL goes down.
- Shut down the upper leaf switch ports (R1UL and R2UL).
-
On R1UL:
R1UL# configure terminal Enter configuration commands, one per line. End with CNTL/Z. R1UL(config)# interface ethernet 1/1-36 R1UL(config-if-range)# shut R1UL(config-if-range)# exit R1UL(config)# copy running-config startup-config [########################################] 100% Copy complete, now saving to disk (please wait)... Copy complete R1UL(config)# <Ctrl-Z> R1UL#
-
Repeat the command sequence on R2UL:
R2UL# configure terminal Enter configuration commands, one per line. End with CNTL/Z. R2UL(config)# interface ethernet 1/1-36 R2UL(config-if-range)# shut R2UL(config-if-range)# exit R2UL(config)# copy running-config startup-config [########################################] 100% Copy complete, now saving to disk (please wait)... Copy complete R2UL(config)# <Ctrl-Z> R2UL#
-
- On both racks, remove the inter-switch links between the leaf switches (R1LL to
R1UL, and R2LL to R2UL).
On every leaf switch, remove the cables for the inter-switch links:
-
On R1LL, disconnect the inter-switch links from ports 04, 05, 06, 07, 30, 31, 32, and 33.
-
On R1UL, disconnect the inter-switch links from ports 04, 05, 06, 07, 30, 31, 32, and 33.
-
On R2LL, disconnect the inter-switch links from ports 04, 05, 06, 07, 30, 31, 32, and 33.
-
On R2UL, disconnect the inter-switch links from ports 04, 05, 06, 07, 30, 31, 32, and 33.
-
- On both racks, cable the upper leaf switch to both of the spine switches (R1UL and
R2UL to R1SS and R2SS).
Connect the cables from the spine switches that you prepared earlier (in step 2.d).
Cable the switches as described in Two-Rack Cabling for RA21 and Later Model Racks:
-
On R1UL, cable ports 01, 02, 03, 04, 05, 06, 07, 30, 31, 32, 33, 34, 35, and 36 to R1SS and R2SS.
-
On R2UL, cable ports 01, 02, 03, 04, 05, 06, 07, 30, 31, 32, 33, 34, 35, and 36 to R1SS and R2SS.
Note:
Ensure that each cable connects to the correct switch and port at both ends. In addition to physically checking each connection, you can run the
show lldp neighbors
command on each network switch and examine the output to confirm correct connections. You can individually check each cable connection to catch and correct errors quickly. -
- Reconfigure the upper leaf switch ports (R1UL and R2UL).
For each switch, you must use the correct corresponding switch configuration file, which you earlier copied to the switch (in step 1.g):
-
On R1UL, the switch configuration file name must end with
step4_R1_UL.cfg
:R1UL# configure terminal Enter configuration commands, one per line. End with CNTL/Z. R1UL(config)# run-script bootflash:///roce_multi_14uplinks_online_step4_R1_UL.cfg | grep 'none' R1UL(config)# copy running-config startup-config [########################################] 100% Copy complete, now saving to disk (please wait)... Copy complete R1UL(config)# <Ctrl-Z> R1UL#
-
On R2UL, the switch configuration file name must end with
step4_R2_UL.cfg
:R2UL# configure terminal Enter configuration commands, one per line. End with CNTL/Z. R2UL(config)# run-script bootflash:///roce_multi_14uplinks_online_step4_R2_UL.cfg | grep 'none' R2UL(config)# copy running-config startup-config [########################################] 100% Copy complete, now saving to disk (please wait)... Copy complete R2UL(config)# <Ctrl-Z> R2UL#
Note:
This step can take approximately 5 to 8 minutes on each switch.
-
- Check the status of the RoCE Network Fabric ports on
the upper leaf switches (R1UL and R2UL).
Run the
show interface status
command on each upper leaf switch:-
R1UL# show interface status
-
R2UL# show interface status
Examine the output to ensure that all of the cabled ports are
disabled
.For example:
R1UL# show interface status -------------------------------------------------------------------------------- Port Name Status Vlan Duplex Speed Type -------------------------------------------------------------------------------- mgmt0 -- connected routed full 1000 -- -------------------------------------------------------------------------------- Port Name Status Vlan Duplex Speed Type -------------------------------------------------------------------------------- Eth1/1 RouterPort1 disabled routed full 100G QSFP-100G-CR4 Eth1/2 RouterPort2 disabled routed full 100G QSFP-100G-CR4 Eth1/3 RouterPort3 disabled routed full 100G QSFP-100G-CR4 Eth1/4 RouterPort4 disabled routed full 100G QSFP-100G-CR4 Eth1/5 RouterPort5 disabled routed full 100G QSFP-100G-CR4 Eth1/6 RouterPort6 disabled routed full 100G QSFP-100G-CR4 Eth1/7 RouterPort7 disabled routed full 100G QSFP-100G-CR4 Eth1/8 celadm14 disabled 3888 full 100G QSFP-100G-CR4 Eth1/9 celadm13 disabled 3888 full 100G QSFP-100G-CR4 Eth1/10 celadm12 disabled 3888 full 100G QSFP-100G-CR4 Eth1/11 celadm11 disabled 3888 full 100G QSFP-100G-CR4 Eth1/12 celadm10 disabled 3888 full 100G QSFP-100G-CR4 Eth1/13 celadm09 disabled 3888 full 100G QSFP-100G-CR4 Eth1/14 celadm08 disabled 3888 full 100G QSFP-100G-CR4 Eth1/15 adm08 disabled 3888 full 100G QSFP-100G-CR4 Eth1/16 adm07 disabled 3888 full 100G QSFP-100G-CR4 Eth1/17 adm06 disabled 3888 full 100G QSFP-100G-CR4 Eth1/18 adm05 disabled 3888 full 100G QSFP-100G-CR4 Eth1/19 adm04 disabled 3888 full 100G QSFP-100G-CR4 Eth1/20 adm03 disabled 3888 full 100G QSFP-100G-CR4 Eth1/21 adm02 disabled 3888 full 100G QSFP-100G-CR4 Eth1/22 adm01 disabled 3888 full 100G QSFP-100G-CR4 Eth1/23 celadm07 disabled 3888 full 100G QSFP-100G-CR4 Eth1/24 celadm06 disabled 3888 full 100G QSFP-100G-CR4 Eth1/25 celadm05 disabled 3888 full 100G QSFP-100G-CR4 Eth1/26 celadm04 disabled 3888 full 100G QSFP-100G-CR4 Eth1/27 celadm03 disabled 3888 full 100G QSFP-100G-CR4 Eth1/28 celadm02 disabled 3888 full 100G QSFP-100G-CR4 Eth1/29 celadm01 disabled 3888 full 100G QSFP-100G-CR4 Eth1/30 RouterPort8 disabled routed full 100G QSFP-100G-CR4 Eth1/31 RouterPort9 disabled routed full 100G QSFP-100G-CR4 Eth1/32 RouterPort10 disabled routed full 100G QSFP-100G-CR4 Eth1/33 RouterPort11 disabled routed full 100G QSFP-100G-CR4 Eth1/34 RouterPort12 disabled routed full 100G QSFP-100G-CR4 Eth1/35 RouterPort13 disabled routed full 100G QSFP-100G-CR4 Eth1/36 RouterPort14 disabled routed full 100G QSFP-100G-CR4 Lo0 Routing loopback i connected routed auto auto -- Lo1 VTEP loopback inte connected routed auto auto -- Vlan1 -- down routed auto auto -- nve1 -- connected -- auto auto --
Note:
Before proceeding, ensure that you have completed all of the actions to this point in step 4 on both upper leaf switches (R1UL and R2UL). If not, then ensure that you go back and perform the missing actions.
-
- Verify the configuration of the upper leaf switches.
You can use the instance of
patchmgr
that you previously used to update the switch firmware (in step 1.h).Use a switch list file (
ul.lst
) to check both upper leaf switches using onepatchmgr
command:# cat ul.lst R1UL_IP:mleaf_u14.102 R2UL_IP:mleaf_u14.104
On a system with Secure Fabric enabled, use the
msfleaf_u14
tag in the switch list file:# cat ul.lst R1UL_IP:msfleaf_u14.102 R2UL_IP:msfleaf_u14.104
The following shows the recommended command and an example of the expected results:
# ./patchmgr --roceswitches ul.lst --verify-config -log_dir /tmp/log 2020-08-10 13:40:09 -0700 :Working: Initiating config verification... Expect up to 6 minutes for each switch Mon Aug 10 13:40:13 PDT 2020 1 of 4 :Verifying config on switch ... ... Mon Aug 10 13:40:32 PDT 2020: [INFO ] Config matches template: ... Mon Aug 10 13:40:32 PDT 2020: [SUCCESS ] Config validation successful! 2020-08-10 13:40:32 -0700 Config check on RoCE switch(es) 2020-08-10 13:40:32 -0700 Completed run of command: ./patchmgr --roceswitches ul.lst --verify-config -log_dir /tmp/log 2020-08-10 13:40:32 -0700 :INFO : config attempted on nodes in file ul.lst: [R1UL_IP R2UL_IP] 2020-08-10 13:40:32 -0700 :INFO : For details, check the following files in /tmp/log: 2020-08-10 13:40:32 -0700 :INFO : - updateRoceSwitch.log 2020-08-10 13:40:32 -0700 :INFO : - updateRoceSwitch.trc 2020-08-10 13:40:32 -0700 :INFO : - patchmgr.stdout 2020-08-10 13:40:32 -0700 :INFO : - patchmgr.stderr 2020-08-10 13:40:32 -0700 :INFO : - patchmgr.log 2020-08-10 13:40:32 -0700 :INFO : - patchmgr.trc 2020-08-10 13:40:32 -0700 :INFO : Exit status:0 2020-08-10 13:40:32 -0700 :INFO : Exiting.
In the command output, verify that the switch configuration is good for both upper leaf switches. You can ignore messages about the ports that are down.
- Shut down the upper leaf switch ports (R1UL and R2UL).
- Finalize the configuration of the lower leaf switches (R1LL and R2LL).
Perform this step on the lower leaf switches (R1LL and R2LL) only.
- Reconfigure the lower leaf switch ports (R1LL and R2LL).
Run the following command sequence on both of the lower leaf switches (R1LL and R2LL).
You must use the correct switch configuration file, which you earlier copied to the switch (in step 1.g). In this step, the configuration file name must end with
step5.cfg
.-
On R1LL:
R1LL# configure terminal Enter configuration commands, one per line. End with CNTL/Z. R1LL(config)# run-script bootflash:///roce_multi_14uplinks_online_step5.cfg | grep 'none' R1LL(config)# copy running-config startup-config [########################################] 100% Copy complete, now saving to disk (please wait)... Copy complete R1LL(config)# <Ctrl-Z> R1LL#
-
Repeat the command sequence on R2LL:
R2LL# configure terminal Enter configuration commands, one per line. End with CNTL/Z. R2LL(config)# run-script bootflash:///roce_multi_14uplinks_online_step5.cfg | grep 'none' R2LL(config)# copy running-config startup-config [########################################] 100% Copy complete, now saving to disk (please wait)... Copy complete R2LL(config)# <Ctrl-Z> R2LL#
Note:
This step can take approximately 5 to 8 minutes on each switch.
-
- On both racks, cable the lower leaf switch to both of the spine switches (R1LL and
R2LL to R1SS and R2SS).
Connect the cables from the spine switches that you prepared earlier (in step 2.d).
Cable the switches as described in Two-Rack Cabling for RA21 and Later Model Racks:
-
On R1LL, cable ports 01, 02, 03, 04, 05, 06, 07, 30, 31, 32, 33, 34, 35, and 36 to R1SS and R2SS.
-
On R2LL, cable ports 01, 02, 03, 04, 05, 06, 07, 30, 31, 32, 33, 34, 35, and 36 to R1SS and R2SS.
Note:
Ensure that each cable connects to the correct switch and port at both ends. In addition to physically checking each connection, you can run the
show lldp neighbors
command on each network switch and examine the output to confirm correct connections. You can individually check each cable connection to catch and correct errors quickly. -
- On the lower leaf switches, verify that all of the cabled RoCE Network Fabric ports are connected (R1LL and R2LL).
Run the
show interface status
command on each lower leaf switch:-
R1LL# show interface status
-
R2LL# show interface status
Examine the output to ensure that all of the cabled ports are
connected
.For example:
R1LL# show interface status -------------------------------------------------------------------------------- Port Name Status Vlan Duplex Speed Type -------------------------------------------------------------------------------- mgmt0 -- connected routed full 1000 -- -------------------------------------------------------------------------------- Port Name Status Vlan Duplex Speed Type -------------------------------------------------------------------------------- Eth1/1 RouterPort1 connected routed full 100G QSFP-100G-CR4 Eth1/2 RouterPort2 connected routed full 100G QSFP-100G-CR4 Eth1/3 RouterPort3 connected routed full 100G QSFP-100G-CR4 Eth1/4 RouterPort4 connected routed full 100G QSFP-100G-CR4 Eth1/5 RouterPort5 connected routed full 100G QSFP-100G-CR4 Eth1/6 RouterPort6 connected routed full 100G QSFP-100G-CR4 Eth1/7 RouterPort7 connected routed full 100G QSFP-100G-CR4 Eth1/8 celadm14 connected 3888 full 100G QSFP-100G-CR4 Eth1/9 celadm13 connected 3888 full 100G QSFP-100G-CR4 Eth1/10 celadm12 connected 3888 full 100G QSFP-100G-CR4 Eth1/11 celadm11 connected 3888 full 100G QSFP-100G-CR4 Eth1/12 celadm10 connected 3888 full 100G QSFP-100G-CR4 Eth1/13 celadm09 connected 3888 full 100G QSFP-100G-CR4 Eth1/14 celadm08 connected 3888 full 100G QSFP-100G-CR4 Eth1/15 adm08 connected 3888 full 100G QSFP-100G-CR4 Eth1/16 adm07 connected 3888 full 100G QSFP-100G-CR4 Eth1/17 adm06 connected 3888 full 100G QSFP-100G-CR4 Eth1/18 adm05 connected 3888 full 100G QSFP-100G-CR4 Eth1/19 adm04 connected 3888 full 100G QSFP-100G-CR4 Eth1/20 adm03 connected 3888 full 100G QSFP-100G-CR4 Eth1/21 adm02 connected 3888 full 100G QSFP-100G-CR4 Eth1/22 adm01 connected 3888 full 100G QSFP-100G-CR4 Eth1/23 celadm07 connected 3888 full 100G QSFP-100G-CR4 Eth1/24 celadm06 connected 3888 full 100G QSFP-100G-CR4 Eth1/25 celadm05 connected 3888 full 100G QSFP-100G-CR4 Eth1/26 celadm04 connected 3888 full 100G QSFP-100G-CR4 Eth1/27 celadm03 connected 3888 full 100G QSFP-100G-CR4 Eth1/28 celadm02 connected 3888 full 100G QSFP-100G-CR4 Eth1/29 celadm01 connected 3888 full 100G QSFP-100G-CR4 Eth1/30 RouterPort8 connected routed full 100G QSFP-100G-CR4 Eth1/31 RouterPort9 connected routed full 100G QSFP-100G-CR4 Eth1/32 RouterPort10 connected routed full 100G QSFP-100G-CR4 Eth1/33 RouterPort11 connected routed full 100G QSFP-100G-CR4 Eth1/34 RouterPort12 connected routed full 100G QSFP-100G-CR4 Eth1/35 RouterPort13 connected routed full 100G QSFP-100G-CR4 Eth1/36 RouterPort14 connected routed full 100G QSFP-100G-CR4 Lo0 Routing loopback i connected routed auto auto -- Lo1 VTEP loopback inte connected routed auto auto -- Vlan1 -- down routed auto auto -- nve1 -- connected -- auto auto --
Note:
Before proceeding, ensure that you have completed all of the actions to this point in step 5 on both lower leaf switches (R1LL and R2LL). If not, then ensure that you go back and perform the missing actions.
-
- Verify the configuration of the lower leaf switches.
You can use the instance of
patchmgr
that you previously used to update the switch firmware (in step 1.h).Use a switch list file (
ll.lst
) to check both lower leaf switches using onepatchmgr
command:# cat ll.lst R1LL_IP:mleaf_u14.101 R2LL_IP:mleaf_u14.103
On a system with Secure Fabric enabled, use the
msfleaf_u14
tag in the switch list file:# cat ll.lst R1LL_IP:msfleaf_u14.101 R2LL_IP:msfleaf_u14.103
The following shows the recommended command and an example of the expected results:
# ./patchmgr --roceswitches ll.lst --verify-config -log_dir /tmp/log 2020-08-10 13:45:09 -0700 :Working: Initiating config verification... Expect up to 6 minutes for each switch Mon Aug 10 13:45:13 PDT 2020 1 of 4 :Verifying config on switch ... ... Mon Aug 10 13:45:32 PDT 2020: [INFO ] Config matches template: ... Mon Aug 10 13:45:32 PDT 2020: [SUCCESS ] Config validation successful! 2020-08-10 13:45:32 -0700 Config check on RoCE switch(es) 2020-08-10 13:45:32 -0700 Completed run of command: ./patchmgr --roceswitches ll.lst --verify-config -log_dir /tmp/log 2020-08-10 13:45:32 -0700 :INFO : config attempted on nodes in file ll.lst: [R1LL_IP R2LL_IP] 2020-08-10 13:45:32 -0700 :INFO : For details, check the following files in /tmp/log: 2020-08-10 13:45:32 -0700 :INFO : - updateRoceSwitch.log 2020-08-10 13:45:32 -0700 :INFO : - updateRoceSwitch.trc 2020-08-10 13:45:32 -0700 :INFO : - patchmgr.stdout 2020-08-10 13:45:32 -0700 :INFO : - patchmgr.stderr 2020-08-10 13:45:32 -0700 :INFO : - patchmgr.log 2020-08-10 13:45:32 -0700 :INFO : - patchmgr.trc 2020-08-10 13:45:32 -0700 :INFO : Exit status:0 2020-08-10 13:45:32 -0700 :INFO : Exiting.
In the command output, verify that the switch configuration is good for both lower leaf switches.
- Verify that nve is up on the lower leaf switches (R1LL and R2LL).
Run the following command on each lower leaf switch and examine the output:
-
R1LL# show nve peers
-
R2LL# show nve peers
At this point, you should see one nve peer with
State=Up
.For example:
R1LL# show nve peers Interface Peer-IP State LearnType Uptime Router-Mac --------- --------------- ----- --------- -------- ----------------- nve1 100.64.1.103 Up CP 00:04:29 n/a
-
- Verify that BGP is up on the lower leaf switches (R1LL and R2LL).
Run the following command on each lower leaf switch and examine the output:
-
R1LL# show logging log | grep BGP
-
R2LL# show logging log | grep BGP
Look for two entries with
Up
in the rightmost column that are associated with different IP addresses.For example:
R1LL# show logging log | grep BGP 2020 Aug 10 13:47:13 R1LL %BGP-5-ADJCHANGE: bgp- [29342] (default) neighbor 100.64.0.201 Up 2020 Aug 10 13:47:24 R1LL %BGP-5-ADJCHANGE: bgp- [29342] (default) neighbor 100.64.0.202 Up
-
- Reconfigure the lower leaf switch ports (R1LL and R2LL).
- Finalize the configuration of the upper leaf switches (R1UL and R2UL).
Perform this step on the upper leaf switches (R1UL and R2UL) only.
- Start the inter-switch ports on the upper leaf switches (R1UL and R2UL).
-
On R1UL:
R1UL# configure terminal Enter configuration commands, one per line. End with CNTL/Z. R1UL(config)# interface ethernet 1/1-7, ethernet 1/30-36 R1UL(config-if-range)# no shut R1UL(config-if-range)# exit R1UL(config)# copy running-config startup-config [########################################] 100% Copy complete, now saving to disk (please wait)... Copy complete R1UL(config)# <Ctrl-Z> R1UL#
-
Repeat the command sequence on R2UL:
R2UL# configure terminal Enter configuration commands, one per line. End with CNTL/Z. R2UL(config)# interface ethernet 1/1-7, ethernet 1/30-36 R2UL(config-if-range)# no shut R2UL(config-if-range)# exit R2UL(config)# copy running-config startup-config [########################################] 100% Copy complete, now saving to disk (please wait)... Copy complete R2UL(config)# <Ctrl-Z> R2UL#
-
- Wait for 5 minutes to ensure that the ports you just started are fully operational before continuing.
- Verify the status of the inter-switch ports on the upper leaf switches (R1UL and
R2UL).
Run the
show interface status
command on each upper leaf switch:-
R1UL# show interface status
-
R2UL# show interface status
Examine the output to ensure that the inter-switch ports are
connected
.For example:
R1UL# show interface status -------------------------------------------------------------------------------- Port Name Status Vlan Duplex Speed Type -------------------------------------------------------------------------------- mgmt0 -- connected routed full 1000 -- -------------------------------------------------------------------------------- Port Name Status Vlan Duplex Speed Type -------------------------------------------------------------------------------- Eth1/1 RouterPort1 connected routed full 100G QSFP-100G-CR4 Eth1/2 RouterPort2 connected routed full 100G QSFP-100G-CR4 Eth1/3 RouterPort3 connected routed full 100G QSFP-100G-CR4 Eth1/4 RouterPort4 connected routed full 100G QSFP-100G-CR4 Eth1/5 RouterPort5 connected routed full 100G QSFP-100G-CR4 Eth1/6 RouterPort6 connected routed full 100G QSFP-100G-CR4 Eth1/7 RouterPort7 connected routed full 100G QSFP-100G-CR4 Eth1/8 celadm14 disabled 3888 full 100G QSFP-100G-CR4 Eth1/9 celadm13 disabled 3888 full 100G QSFP-100G-CR4 Eth1/10 celadm12 disabled 3888 full 100G QSFP-100G-CR4 Eth1/11 celadm11 disabled 3888 full 100G QSFP-100G-CR4 Eth1/12 celadm10 disabled 3888 full 100G QSFP-100G-CR4 Eth1/13 celadm09 disabled 3888 full 100G QSFP-100G-CR4 Eth1/14 celadm08 disabled 3888 full 100G QSFP-100G-CR4 Eth1/15 adm08 disabled 3888 full 100G QSFP-100G-CR4 Eth1/16 adm07 disabled 3888 full 100G QSFP-100G-CR4 Eth1/17 adm06 disabled 3888 full 100G QSFP-100G-CR4 Eth1/18 adm05 disabled 3888 full 100G QSFP-100G-CR4 Eth1/19 adm04 disabled 3888 full 100G QSFP-100G-CR4 Eth1/20 adm03 disabled 3888 full 100G QSFP-100G-CR4 Eth1/21 adm02 disabled 3888 full 100G QSFP-100G-CR4 Eth1/22 adm01 disabled 3888 full 100G QSFP-100G-CR4 Eth1/23 celadm07 disabled 3888 full 100G QSFP-100G-CR4 Eth1/24 celadm06 disabled 3888 full 100G QSFP-100G-CR4 Eth1/25 celadm05 disabled 3888 full 100G QSFP-100G-CR4 Eth1/26 celadm04 disabled 3888 full 100G QSFP-100G-CR4 Eth1/27 celadm03 disabled 3888 full 100G QSFP-100G-CR4 Eth1/28 celadm02 disabled 3888 full 100G QSFP-100G-CR4 Eth1/29 celadm01 disabled 3888 full 100G QSFP-100G-CR4 Eth1/30 RouterPort8 connected routed full 100G QSFP-100G-CR4 Eth1/31 RouterPort9 connected routed full 100G QSFP-100G-CR4 Eth1/32 RouterPort10 connected routed full 100G QSFP-100G-CR4 Eth1/33 RouterPort11 connected routed full 100G QSFP-100G-CR4 Eth1/34 RouterPort12 connected routed full 100G QSFP-100G-CR4 Eth1/35 RouterPort13 connected routed full 100G QSFP-100G-CR4 Eth1/36 RouterPort14 connected routed full 100G QSFP-100G-CR4 Lo0 Routing loopback i connected routed auto auto -- Lo1 VTEP loopback inte connected routed auto auto -- Vlan1 -- down routed auto auto -- nve1 -- connected -- auto auto --
-
- Start the storage server ports on the upper leaf switches (R1UL and R2UL).
-
On R1UL:
R1UL# configure terminal Enter configuration commands, one per line. End with CNTL/Z. R1UL(config)# interface ethernet 1/8-14, ethernet 1/23-29 R1UL(config-if-range)# no shut R1UL(config-if-range)# exit R1UL(config)# copy running-config startup-config [########################################] 100% Copy complete, now saving to disk (please wait)... Copy complete R1UL(config)# <Ctrl-Z> R1UL#
-
Repeat the command sequence on R2UL:
R2UL# configure terminal Enter configuration commands, one per line. End with CNTL/Z. R2UL(config)# interface ethernet 1/8-14, ethernet 1/23-29 R2UL(config-if-range)# no shut R2UL(config-if-range)# exit R2UL(config)# copy running-config startup-config [########################################] 100% Copy complete, now saving to disk (please wait)... Copy complete R2UL(config)# <Ctrl-Z> R2UL#
-
- Wait for 5 minutes to ensure that the ports you just started are fully operational before continuing.
- Verify the status of the storage server ports on the upper leaf switches (R1UL and
R2UL).
Run the
show interface status
command on each upper leaf switch:-
R1UL# show interface status
-
R2UL# show interface status
Examine the output to ensure that the storage server ports are
connected
.For example:
R1UL# show interface status -------------------------------------------------------------------------------- Port Name Status Vlan Duplex Speed Type -------------------------------------------------------------------------------- mgmt0 -- connected routed full 1000 -- -------------------------------------------------------------------------------- Port Name Status Vlan Duplex Speed Type -------------------------------------------------------------------------------- Eth1/1 RouterPort1 connected routed full 100G QSFP-100G-CR4 Eth1/2 RouterPort2 connected routed full 100G QSFP-100G-CR4 Eth1/3 RouterPort3 connected routed full 100G QSFP-100G-CR4 Eth1/4 RouterPort4 connected routed full 100G QSFP-100G-CR4 Eth1/5 RouterPort5 connected routed full 100G QSFP-100G-CR4 Eth1/6 RouterPort6 connected routed full 100G QSFP-100G-CR4 Eth1/7 RouterPort7 connected routed full 100G QSFP-100G-CR4 Eth1/8 celadm14 connected 3888 full 100G QSFP-100G-CR4 Eth1/9 celadm13 connected 3888 full 100G QSFP-100G-CR4 Eth1/10 celadm12 connected 3888 full 100G QSFP-100G-CR4 Eth1/11 celadm11 connected 3888 full 100G QSFP-100G-CR4 Eth1/12 celadm10 connected 3888 full 100G QSFP-100G-CR4 Eth1/13 celadm09 connected 3888 full 100G QSFP-100G-CR4 Eth1/14 celadm08 connected 3888 full 100G QSFP-100G-CR4 Eth1/15 adm08 disabled 3888 full 100G QSFP-100G-CR4 Eth1/16 adm07 disabled 3888 full 100G QSFP-100G-CR4 Eth1/17 adm06 disabled 3888 full 100G QSFP-100G-CR4 Eth1/18 adm05 disabled 3888 full 100G QSFP-100G-CR4 Eth1/19 adm04 disabled 3888 full 100G QSFP-100G-CR4 Eth1/20 adm03 disabled 3888 full 100G QSFP-100G-CR4 Eth1/21 adm02 disabled 3888 full 100G QSFP-100G-CR4 Eth1/22 adm01 disabled 3888 full 100G QSFP-100G-CR4 Eth1/23 celadm07 connected 3888 full 100G QSFP-100G-CR4 Eth1/24 celadm06 connected 3888 full 100G QSFP-100G-CR4 Eth1/25 celadm05 connected 3888 full 100G QSFP-100G-CR4 Eth1/26 celadm04 connected 3888 full 100G QSFP-100G-CR4 Eth1/27 celadm03 connected 3888 full 100G QSFP-100G-CR4 Eth1/28 celadm02 connected 3888 full 100G QSFP-100G-CR4 Eth1/29 celadm01 connected 3888 full 100G QSFP-100G-CR4 Eth1/30 RouterPort8 connected routed full 100G QSFP-100G-CR4 Eth1/31 RouterPort9 connected routed full 100G QSFP-100G-CR4 Eth1/32 RouterPort10 connected routed full 100G QSFP-100G-CR4 Eth1/33 RouterPort11 connected routed full 100G QSFP-100G-CR4 Eth1/34 RouterPort12 connected routed full 100G QSFP-100G-CR4 Eth1/35 RouterPort13 connected routed full 100G QSFP-100G-CR4 Eth1/36 RouterPort14 connected routed full 100G QSFP-100G-CR4 Lo0 Routing loopback i connected routed auto auto -- Lo1 VTEP loopback inte connected routed auto auto -- Vlan1 -- down routed auto auto -- nve1 -- connected -- auto auto --
-
- Start the database server ports on the upper leaf switches (R1UL and R2UL).
-
On R1UL:
R1UL# configure terminal Enter configuration commands, one per line. End with CNTL/Z. R1UL(config)# interface ethernet 1/15-22 R1UL(config-if-range)# no shut R1UL(config-if-range)# exit R1UL(config)# copy running-config startup-config [########################################] 100% Copy complete, now saving to disk (please wait)... Copy complete R1UL(config)# <Ctrl-Z> R1UL#
-
Repeat the command sequence on R2UL:
R2UL# configure terminal Enter configuration commands, one per line. End with CNTL/Z. R2UL(config)# interface ethernet 1/15-22 R2UL(config-if-range)# no shut R2UL(config-if-range)# exit R2UL(config)# copy running-config startup-config [########################################] 100% Copy complete, now saving to disk (please wait)... Copy complete R2UL(config)# <Ctrl-Z> R2UL#
-
- Wait for 5 minutes to ensure that the ports you just started are fully operational before continuing.
- Verify the status of the database server ports on the upper leaf switches (R1UL and
R2UL).
Run the
show interface status
command on each upper leaf switch:-
R1UL# show interface status
-
R2UL# show interface status
Examine the output to ensure that the database server ports are
connected
.For example:
R1UL# show interface status -------------------------------------------------------------------------------- Port Name Status Vlan Duplex Speed Type -------------------------------------------------------------------------------- mgmt0 -- connected routed full 1000 -- -------------------------------------------------------------------------------- Port Name Status Vlan Duplex Speed Type -------------------------------------------------------------------------------- Eth1/1 RouterPort1 connected routed full 100G QSFP-100G-CR4 Eth1/2 RouterPort2 connected routed full 100G QSFP-100G-CR4 Eth1/3 RouterPort3 connected routed full 100G QSFP-100G-CR4 Eth1/4 RouterPort4 connected routed full 100G QSFP-100G-CR4 Eth1/5 RouterPort5 connected routed full 100G QSFP-100G-CR4 Eth1/6 RouterPort6 connected routed full 100G QSFP-100G-CR4 Eth1/7 RouterPort7 connected routed full 100G QSFP-100G-CR4 Eth1/8 celadm14 connected 3888 full 100G QSFP-100G-CR4 Eth1/9 celadm13 connected 3888 full 100G QSFP-100G-CR4 Eth1/10 celadm12 connected 3888 full 100G QSFP-100G-CR4 Eth1/11 celadm11 connected 3888 full 100G QSFP-100G-CR4 Eth1/12 celadm10 connected 3888 full 100G QSFP-100G-CR4 Eth1/13 celadm09 connected 3888 full 100G QSFP-100G-CR4 Eth1/14 celadm08 connected 3888 full 100G QSFP-100G-CR4 Eth1/15 adm08 connected 3888 full 100G QSFP-100G-CR4 Eth1/16 adm07 connected 3888 full 100G QSFP-100G-CR4 Eth1/17 adm06 connected 3888 full 100G QSFP-100G-CR4 Eth1/18 adm05 connected 3888 full 100G QSFP-100G-CR4 Eth1/19 adm04 connected 3888 full 100G QSFP-100G-CR4 Eth1/20 adm03 connected 3888 full 100G QSFP-100G-CR4 Eth1/21 adm02 connected 3888 full 100G QSFP-100G-CR4 Eth1/22 adm01 connected 3888 full 100G QSFP-100G-CR4 Eth1/23 celadm07 connected 3888 full 100G QSFP-100G-CR4 Eth1/24 celadm06 connected 3888 full 100G QSFP-100G-CR4 Eth1/25 celadm05 connected 3888 full 100G QSFP-100G-CR4 Eth1/26 celadm04 connected 3888 full 100G QSFP-100G-CR4 Eth1/27 celadm03 connected 3888 full 100G QSFP-100G-CR4 Eth1/28 celadm02 connected 3888 full 100G QSFP-100G-CR4 Eth1/29 celadm01 connected 3888 full 100G QSFP-100G-CR4 Eth1/30 RouterPort8 connected routed full 100G QSFP-100G-CR4 Eth1/31 RouterPort9 connected routed full 100G QSFP-100G-CR4 Eth1/32 RouterPort10 connected routed full 100G QSFP-100G-CR4 Eth1/33 RouterPort11 connected routed full 100G QSFP-100G-CR4 Eth1/34 RouterPort12 connected routed full 100G QSFP-100G-CR4 Eth1/35 RouterPort13 connected routed full 100G QSFP-100G-CR4 Eth1/36 RouterPort14 connected routed full 100G QSFP-100G-CR4 Lo0 Routing loopback i connected routed auto auto -- Lo1 VTEP loopback inte connected routed auto auto -- Vlan1 -- down routed auto auto -- nve1 -- connected -- auto auto --
-
- Verify that nve is up on the leaf switches (R1LL, R1UL, R2LL, and R2UL).
Run the following command on each leaf switch and examine the output:
-
R1LL# show nve peers
-
R1UL# show nve peers
-
R2LL# show nve peers
-
R2UL# show nve peers
In the output, you should see three nve peers with
State=Up
.For example:
R1UL# show nve peers Interface Peer-IP State LearnType Uptime Router-Mac --------- --------------- ----- --------- -------- ----------------- nve1 100.64.1.101 Up CP 00:04:29 n/a nve1 100.64.1.103 Up CP 00:07:48 n/a nve1 100.64.1.104 Up CP 00:04:10 n/a
-
- Verify that BGP is up on the upper leaf switches (R1UL and R2UL).
Run the following command on each upper leaf switch and examine the output:
-
R1UL# show logging log | grep BGP
-
R2UL# show logging log | grep BGP
In the output, look for two entries with
Up
in the rightmost column that are associated with different IP addresses.For example:
R1UL# show logging log | grep BGP 2020 Aug 10 13:57:13 R1UL %BGP-5-ADJCHANGE: bgp- [32782] (default) neighbor 100.64.0.201 Up 2020 Aug 10 13:57:24 R1UL %BGP-5-ADJCHANGE: bgp- [32782] (default) neighbor 100.64.0.202 Up
-
- Start the inter-switch ports on the upper leaf switches (R1UL and R2UL).
- For each rack (R1 and R2), confirm the multi-rack cabling by running the
verify_roce_cables.py
script.The
verify_roce_cables.py
script uses two input files; one for database servers and storage servers (nodes.rackN
), and another for switches (switches.rackN
). In each file, every server or switch must be listed on separate lines. Use fully qualified domain names or IP addresses for each server and switch.See My Oracle Support document 2587717.1 for download and detailed usage instructions.
Run the
verify_roce_cables.py
script against both of the racks:-
# cd /opt/oracle.SupportTools/ibdiagtools # ./verify_roce_cables.py -n nodes.rack1 -s switches.rack1
-
# cd /opt/oracle.SupportTools/ibdiagtools # ./verify_roce_cables.py -n nodes.rack2 -s switches.rack2
Check the output of the
verify_roce_cables.py
script against the tables in Two-Rack Cabling for RA21 and Later Model Racks. Also, check that output in theCABLE OK?
columns contains theOK
status.The following examples show extracts of the expected command results:
# cd /opt/oracle.SupportTools/ibdiagtools # ./verify_roce_cables.py -n nodes.rack1 -s switches.rack1 SWITCH PORT (EXPECTED PEER) LOWER LEAF (rack1sw-rocea0) : CABLE OK? UPPER LEAF (rack1sw-roceb0) : CABLE OK? ----------- --------------- --------------------------- : --------- --------------------------- : --------- ...
# cd /opt/oracle.SupportTools/ibdiagtools # ./verify_roce_cables.py -n nodes.rack2 -s switches.rack2 SWITCH PORT (EXPECTED PEER) LOWER LEAF (rack2sw-rocea0) : CABLE OK? UPPER LEAF (rack2sw-roceb0) : CABLE OK? ----------- --------------- --------------------------- : --------- --------------------------- : --------- ...
-
- Verify the RoCE Network Fabric operation across both
interconnected racks by using the
infinicheck
command.Use the following recommended command sequence to verify the RoCE Network Fabric operation across both racks.
In each command,
hosts.all
contains a list of database server RoCE Network Fabric IP addresses from both racks (2 RoCE Network Fabric IP addresses for each database server), andcells.all
contains a list of RoCE Network Fabric IP addresses for the storage servers from both racks (2 RoCE Network Fabric IP addresses for each storage server).-
# cd /opt/oracle.SupportTools/ibdiagtools # ./infinicheck -g hosts.all -c cells.all -z
-
# cd /opt/oracle.SupportTools/ibdiagtools # ./infinicheck -g hosts.all -c cells.all -s
-
# cd /opt/oracle.SupportTools/ibdiagtools # ./infinicheck -g hosts.all -c cells.all -b
See step 1.k for most information about each
infinicheck
command.The following example shows the expected command results for the final command in the sequence:
# cd /opt/oracle.SupportTools/ibdiagtools # ./infinicheck -g hosts.all -c cells.all -b INFINICHECK [Network Connectivity, Configuration and Performance] #### FABRIC TYPE TESTS #### System type identified: RoCE Verifying User Equivalance of user=root from all DBs to all CELLs. #### RoCE CONFIGURATION TESTS #### Checking for presence of RoCE devices on all DBs and CELLs [SUCCESS].... RoCE devices on all DBs and CELLs look good Checking for RoCE Policy Routing settings on all DBs and CELLs [SUCCESS].... RoCE Policy Routing settings look good Checking for RoCE DSCP ToS mapping on all DBs and CELLs [SUCCESS].... RoCE DSCP ToS settings look good Checking for RoCE PFC settings and DSCP mapping on all DBs and CELLs [SUCCESS].... RoCE PFC and DSCP settings look good Checking for RoCE interface MTU settings. Expected value : 2300 [SUCCESS].... RoCE interface MTU settings look good Verifying switch advertised DSCP on all DBs and CELLs ports ( ) [SUCCESS].... Advertised DSCP settings from RoCE switch looks good #### CONNECTIVITY TESTS #### [COMPUTE NODES -> STORAGE CELLS] (60 seconds approx.) (Will walk through QoS values: 0-6) [SUCCESS]..........Results OK [SUCCESS]....... All can talk to all storage cells [COMPUTE NODES -> COMPUTE NODES] ...
-
At this point, both racks share the RoCE Network Fabric, and the combined system is ready for further configuration.