4.7.8 Verifying RoCE Network Fabric Operation
Verify the RoCE Network Fabric is operating properly after making modifications to the underlying hardware.
If hardware maintenance has taken place with any component in the RoCE Network Fabric, including replacing an RDMA Network Fabric Adapter on a server, a switch, or a cable, or if the operation of the RoCE Network Fabric is suspected to be substandard, then verify the RoCE Network Fabric is operating properly. The following procedure describes how to verify network operation:
- Complete the steps in Verifying the RoCE Network Fabric Configuration.
- Prepare for
infinicheck
.You may need to run the following commands before you can use the
infinicheck
command to perform RoCE Network Fabric configuration, connectivity, and performance checks.-
If required, use the
-s
option set up user equivalence for password-less SSH across the RoCE Network Fabric. For example:# /opt/oracle.SupportTools/ibdiagtools/infinicheck -g hostips -c cellips -s
-
You can use the
-z
option to clear the files that were created during the last run of theinfinicheck
command. For example:# /opt/oracle.SupportTools/ibdiagtools/infinicheck -g hostips -c cellips -z
In the previous commands,
hostips
is the name of an input file that contains a list of RoCE Network Fabric IP addresses for the database servers, andcellips
is the name of an input file that contains a list of RoCE Network Fabric IP addresses for the storage servers. -
- Run the
infinicheck
command to perform RoCE Network Fabric configuration, connectivity, and performance checks.On a properly configured system, you can run the
infinicheck
command on any database server with minimal arguments. For example:# /opt/oracle.SupportTools/ibdiagtools/infinicheck
By default, the
infinicheck
command performs a group of configuration and connectivity checks on the RoCE Network Fabric. You can use the-p
option to run the optional performance tests. Or, use the-a
option to perform all checks, including the performance tests. For example:# /opt/oracle.SupportTools/ibdiagtools/infinicheck -a
Note:
System performance may be impacted when theinfinicheck
command performs performance stress tests. Consequently, only run theinfinicheck
performance tests when required and preferably when there is no workload on the system.You can also specify the servers in your system explicitly by using the
-g
option to specify the database servers and the-c
option to specify the storage servers. For example:# /opt/oracle.SupportTools/ibdiagtools/infinicheck -g hostips -c cellips
In the previous example,
hostips
is the name of an input file that contains a list of RoCE Network Fabric IP addresses for the database servers, andcellips
is the name of an input file that contains a list of RoCE Network Fabric IP addresses for the storage servers.Instead of listing the database servers and storage servers in input files, you can supply a comma-separated list of IP addresses on the command line.
The following example displays typical terminal output from the
infinicheck
command.# /opt/oracle.SupportTools/ibdiagtools/infinicheck -g hostips -c cellips INFINICHECK [Network Connectivity, Configuration and Performance] #### FABRIC TYPE TESTS #### System type identified: RoCE Verifying User Equivalence of user=root from all DBs to all CELLs. #### RoCE CONFIGURATION TESTS #### Checking for presence of RoCE devices on all DBs and CELLs [SUCCESS].... RoCE devices on all DBs and CELLs look good Checking for RoCE Policy Routing settings on all DBs and CELLs [SUCCESS].... RoCE Policy Routing settings look good Checking for RoCE DSCP ToS mapping on all DBs and CELLs [SUCCESS].... RoCE DSCP ToS settings look good Checking for RoCE PFC settings and DSCP mapping on all DBs and CELLs [SUCCESS].... RoCE PFC and DSCP settings look good Checking for RoCE interface MTU settings. Expected value : 2300 [SUCCESS].... RoCE interface MTU settings look good Verifying switch advertised DSCP on all DBs and CELLs ports ( ~ 2 min ) [SUCCESS].... Advertised DSCP settings from RoCE switch looks good #### CONNECTIVITY TESTS #### [COMPUTE NODES -> STORAGE CELLS] (60 seconds approx.) (Will walk through QoS values: 0-6) [SUCCESS]..............Results OK [SUCCESS]....... All can talk to all storage cells [COMPUTE NODES -> COMPUTE NODES] (60 seconds approx.) (Will walk through QoS values: 0-6) [SUCCESS]..............Results OK [SUCCESS]....... All hosts can talk to all other nodes Verifying Subnet Masks on all nodes [SUCCESS] ......... Subnet Masks is same across the network
Parent topic: Maintaining the RoCE Network Fabric