19 Infiniband Switch
For each metric, it provides the following information:
-
Description
-
Metric table
The metric table can include some or all of the following: target version, default collection frequency, default warning threshold, default critical threshold, and alert text.
These metrics describe the performance of each port of the switch and the aggregation of performance for Switch-to-Node and Switch-to-Switch link types. They also define whether a switch is a subnet manager for the network or not. Switch statistics are also covered.
Aggregate Sensors
Note:
This metric is used only for generating alerts. No data is uploaded to repository. The All Metrics page will not show any data for this metric.Alarm Status
This metric reports whether the severity is set or cleared (Major/Cleared).
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions |
Every 5 Minutes |
Not Defined |
major |
The aggregate sensor %keyValue% has a fault. |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
Fan Speed Sensors
Similar to Aggregate sensors, this metric category contains SNMP trap based metrics.
Alarm Status
This metric reports the alarm status. These values (Critical/Major/Warning) indicate fan speed has exceeded fatal, critical, and non-critical thresholds, respectively. The first two states are shown as Critical alert in Enterprise Manager and the last state is shown as Warning.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions |
Every 5 Minutes |
warning|FAULT_DIAGNOSED|FAULT_SUSPECTED|WARNING |
critical|major|CRITICAL|ERROR|FAILED|FAULTED|NOT_PRESENT|NON_RECOVERABLE|PREDICTIVE_FAILURE_ASSERTED|LOWER_CRITICAL|UPPER_CRITICAL|LOWER_NON_RECOVERABLE|UPPER_NON_RECOVERABLE |
The speed of fan %keyValue% has exceeded its threshold. |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
Fan Speed Sensor Alerts
Similar to Fan Speed Sensors, this metric category contains SNMP trap based metrics.
Alarm Status
This metric reports the alarm status. These values (Critical/Major/Warning) indicate that fan speed has exceeded fatal, critical, and non-critical thresholds, respectively. The first two states are shown as a Critical alert in Enterprise Manager and the last state is shown as Warning.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions |
Every 5 Minutes |
warning|WARNING |
critical|major|CRITICAL|ERROR|FAILED |
The speed of fan %keyValue% has exceeded its threshold. |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
FRU Removal Alerts
This metric category provides information about field replaceable unit (FRU) removal alerts.
FRU Status
This metric displays an alert that is sent for all FRU removals.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions |
Every 5 Minutes |
Not Defined |
Not Defined |
The FRU %keyValue% has been removed from the system. |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
Response
The metric in this category is used to detect whether the management server on the cell is running.
Response Status
This metric is checked at 1 minute intervals. A one in the status column indicates that the cell is up, otherwise the cell is down.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions |
Every 1 Minute |
Not Defined |
0 |
Failed to connect to Infiniband switch %target%. |
Data Source
Not available.
User Action
No user action is required.
Switch Gateway Port State
This metric category provides information about the gateway metrics for gateway ports of an Infiniband switch.
10 Gb/s Ethernet Port
This metric displays the 10 Gb/s Ethernet port number.
Target Version | Collection Frequency |
---|---|
All Versions |
Every 5 Minutes |
State
This metric displays the state of the gateway.
Target Version | Collection Frequency |
---|---|
All Versions |
Every 5 Minutes |
Received Bytes
This metric displays the number of bytes received by the gateway
Target Version | Collection Frequency |
---|---|
All Versions |
Every 5 Minutes |
Received Packets
This metric displays the number of packets received by the gateway.
Target Version | Collection Frequency |
---|---|
All Versions |
Every 5 Minutes |
Received Jumbo Packets
This metric displays the number of jumbo packets received by the gateway.
Target Version | Collection Frequency |
---|---|
All Versions |
Every 5 Minutes |
Received Unicast Packets
This metric displays the number of unicast packets received by the gateway.
Target Version | Collection Frequency |
---|---|
All Versions |
Every 5 Minutes |
Received Broadcast Packets
This metric displays the number of broadcast packets received by the gateway.
Target Version | Collection Frequency |
---|---|
All Versions |
Every 5 Minutes |
Received Buffers
This metric displays the number of buffers received by the gateway.
Target Version | Collection Frequency |
---|---|
All Versions |
Every 5 Minutes |
Received CRC Errors
This metric displays the number of Cyclic Redundancy Check (CRC) errors received by the gateway.
Target Version | Collection Frequency |
---|---|
All Versions |
Every 5 Minutes |
Received Runtime Errors
This metric displays the number of runtime errors received by the gateway.
Target Version | Collection Frequency |
---|---|
All Versions |
Every 5 Minutes |
Received Total Errors
This metric displays the total number of errors received by the gateway.
Target Version | Collection Frequency |
---|---|
All Versions |
Every 5 Minutes |
Transmitted Bytes
This metric displays the number of bytes transmitted by the gateway.
Target Version | Collection Frequency |
---|---|
All Versions |
Every 5 Minutes |
Transmitted Packets
This metric displays the number of packets transmitted by the gateway.
Target Version | Collection Frequency |
---|---|
All Versions |
Every 5 Minutes |
Transmitted Jumbo Packets
This metric displays the number of jumbo packets transmitted by the gateway.
Target Version | Collection Frequency |
---|---|
All Versions |
Every 5 Minutes |
Transmitted Unicast Packets
This metric displays the number of unicast packets transmitted by the gateway.
Target Version | Collection Frequency |
---|---|
All Versions |
Every 5 Minutes |
Transmitted Multicast Packets
This metric displays the number of multicast packets transmitted by the gateway.
Target Version | Collection Frequency |
---|---|
All Versions |
Every 5 Minutes |
Transmitted Broadcast Packets
This metric displays the number of broadcast packets transmitted by the gateway.
Target Version | Collection Frequency |
---|---|
All Versions |
Every 5 Minutes |
Switch Performance Summary
This metric category provides overall performance of the ibswitch across all ports.
Average link throughput (KBPS)
This metric reports the average number of bytes received and transmitted per second across all ports in the ibswitch (KBPS).
Target Version | Collection Frequency |
---|---|
All Versions |
Every 5 Minutes |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
Highest link throughput (KBPS)
This metric reports the maximum number of bytes received and transmitted per second across all ports in ibswitch (KBPS).
Target Version | Collection Frequency |
---|---|
All Versions |
Every 5 Minutes |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
Lowest link throughput (KBPS)
This metric reports the minimum number of bytes received and transmitted per second across all ports in ibswitch (KBPS).
Target Version | Collection Frequency |
---|---|
All Versions |
Every 5 Minutes |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
Switch Port Configuration Monitor
This metric category is mainly used for monitoring the connectivity of ports and raising alerts when there is a disconnection.
GUID on the other end of the link
This metric reports the IB globally unique identifier (GUID). This is not an Enterprise Manager target GUID of the entity to which the port is connected. This can be switch GUID, if the other end is a switch port, or port GUID if it is an HCA port.
Target Version | Collection Frequency |
---|---|
All Versions |
Every 5 Minutes |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
Name of the entity to which this port is connected
This metric reports the name of the entity (Switch/Cell/Compute Node) to which this switch port is connected.
Target Version | Collection Frequency |
---|---|
All Versions |
Every 5 Minutes |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
Node GUID if the peer is a Switch port, Port GUID otherwise
This metric displays the node GUID if the peer port is a switch port. Otherwise, it displays the port GUID, indicating a HCA port.
Target Version | Collection Frequency |
---|---|
All Versions |
Every 5 Minutes |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
Port number of the peer port
This metric reports the port number of the peer port.
Target Version | Collection Frequency |
---|---|
All Versions |
Every 5 Minutes |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
Type of entity to which this disconnected port was connected
If this port is currently disconnected, then this field provides the type of the entity from which disconnection happened. It can take four possible values (Switch/Cell/Node/None). When the port is in connected state then the value for this metric is None.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions |
Every 5 Minutes |
Not Defined |
node|cell|switch |
Port %PortNumber% on %target% is disconnected from port %ConnectedToPortNumberPrev% on %ConnectedToNamePrev%. |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
Type of the entity to which this port is connected
This metric can take any of the three values (Switch/Cell/Compute Node) depending on what entity this port is connected to.
Target Version | Collection Frequency |
---|---|
All Versions |
Every 5 Minutes |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
Switch Port Errors
The metrics in this metric category provide statistics obtained from perfquery output on the switch. This metric values provide the delta change in error counters since last collection. Alerts are raised only if there are new errors since last metric collection.
Excessive buffer overruns
This metric reports the number of “buffer overruns exceeding the threshold" since last Collection (which is 5 minutes).
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions |
Every 5 Minutes |
Not Defined |
Not Defined |
Port %PortNumber% has %value% excessive buffer overruns, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
Incoming VL15 packets dropped due to resource limitation
This metric reports the number of incoming VL 15 packets dropped due to lack of buffers since last metric collection.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions |
Every 5 Minutes |
Not Defined |
Not Defined |
Port %PortNumber% has %value% incoming VL15 packets dropped, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
Link integrity errors
This metric displays the number of link integrity errors, that is errors on the local link.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions |
Every 5 Minutes |
Not Defined |
Not Defined |
Port %PortNumber% has %value% link integrity errors, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
Link recovers
This metric reports the number of times the link error recovery process was completed successfully since last collection.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions |
Every 5 Minutes |
Not Defined |
Not Defined |
Port %PortNumber% has %value% link recovers, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
Packets not transmitted due to constraints
This metric reports the number of packets not transmitted due to constrains since last collection.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions |
Every 5 Minutes |
Not Defined |
Not Defined |
Port %PortNumber% has %value% packets not transmitted due to constraints, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
Received packets discarded due to constraints
This metric reports the number of packets discarded due to constraints since last collection.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions |
Every 5 Minutes |
Not Defined |
Not Defined |
Port %PortNumber% has %value% received packets discarded due to constraints, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
Received packets marked with the EBP delimiter
This metric reports the number of packets marked with the EBP delimiter received on the port.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions |
Every 5 Minutes |
Not Defined |
Not Defined |
Port %PortNumber% has %value% received packets marked with the EBP delimiter, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
Received packets with error
This metric reports the number of packets received with errors since last collection.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions |
Every 5 Minutes |
Not Defined |
Not Defined |
Port %PortNumber% has %value% received packets containing an error, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
Symbol errors
This metric reports the number of symbols errors detected since last collection.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions |
Every 5 Minutes |
Not Defined |
Not Defined |
Port %PortNumber% has %value% symbol errors, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
Total errors
This metric reports the sum total of all errors mentioned above.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions |
Every 5 Minutes |
10 |
Not Defined |
Port %PortNumber% has %value% total errors, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
Switch Port Performance
This metrics category contains performance metrics at the switch port level.
Link Throughput: bytes transmitted and received per sec (KBPS)
This metric reports the number of bytes transmitted and received.
Target Version | Collection Frequency |
---|---|
All Versions |
Every 5 Minutes |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
Number of bytes received per sec (KBPS)
This metric reports the number of bytes received per second (KBPS).
Target Version | Collection Frequency |
---|---|
All Versions |
Every 5 Minutes |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
Number of bytes transmitted per sec (KBPS)
This metric reports the number of bytes transmitted per second (KBPS).
Target Version | Collection Frequency |
---|---|
All Versions |
Every 5 Minutes |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
Number of packets received per sec
This metric reports the number of packets received per second.
Target Version | Collection Frequency |
---|---|
All Versions |
Every 5 Minutes |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
Switch Port State
This metrics category contains Switch Port state metrics.
Active link width of port based on cable connectivity
This metric displays the active link width of the port based on the cable connectivity.
Target Version | Collection Frequency |
---|---|
All Versions |
Every 5 Minutes |
Is the link degraded?
This metric reports whether or not the link is degraded. If the active speed of a link is less than the enabled speed, then it is considered to be degraded and this column value is set to 1. It is mainly used for raising alerts.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions |
Every 5 Minutes |
Not Defined |
1 |
Port %PortNumber% is running in degraded mode. |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
Link state
This metric reports the link state. The link is down if the physical link state is 0.
Target Version | Collection Frequency |
---|---|
All Versions |
Every 5 Minutes |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
Physical link state
This metric reports the physical link state. The physical link state is 0 if the port is in polling or disabled state.
Target Version | Collection Frequency |
---|---|
All Versions |
Every 5 Minutes |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
Switch Port State (For Alerts)
This metrics category contains Switch Port state metrics (for alerts.
Indicates that cable is present but port is disabled
This metric reports that the cable is present but that the port is disabled. This metric's collection frequency is event-driven.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions |
Event-driven |
Not Defined |
1 |
Cable is present on Port %PortNumber% but the port is disabled. |
Indicates that cable is present but port is polling for peer port
This metric reports that the cable is present but the port is checking for the peer port. This metric's collection frequency is event-driven.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions |
Event-driven |
Not Defined |
1 |
Cable is present on Port %PortNumber% but it is polling for peer port. This could happen when the peer port is unplugged/disabled. |
Switch State Summary
This metrics category contains metrics that report the overall state of switch ports.
Number of active ports
This metric reports the total number of active ports.
Target Version | Collection Frequency |
---|---|
All Versions |
Every 5 Minutes |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
Number of degraded ports
This metric reports the total number of degraded ports.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions |
Every 5 Minutes |
Not Defined |
Not Defined |
Number of degraded ports is %value%, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
Number of ports with errors
This metric reports the number of ports with errors. From 12.1.0.3 Exadata plug-in onwards, degraded ports are counted both in Degraded ports and Error ports categories.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions |
Every 5 Minutes |
Not Defined |
Not Defined |
Number of ports with errors is %value%, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
Switch Temperatures
This metrics category contains metrics that report the switch temperature.
Back of switch temperature
This metric reports the rear chassis temperature.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions |
Every 5 Minutes |
Not Defined |
Not Defined |
Switch back temperature is %value%, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
Front of switch temperature
This metric reports the front chassis temperature.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions |
Every 5 Minutes |
Not Defined |
Not Defined |
Switch front temperature is %value%, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
Switch I4 chip temperature
This metric reports the I4 chip temperature.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions |
Every 5 Minutes |
Not Defined |
Not Defined |
Switch I4 chip temperature is %value%, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
Switch Service Processor temperature
This metric reports the management controller temperature.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions |
Every 5 Minutes |
Not Defined |
Not Defined |
Switch service processor temperature is %value%, crossed warning (%warning_threshold%) or critical (%critical_threshold%) threshold. |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
Temperature Sensors
Similar to other SNMP trap based metrics, this metric category contains metrics that are also used only for generating alerts and are not uploaded to the repository.
Alarm Status
This metric reports the alarm status. These values (Critical/Major/Warning) indicate if the temperature has exceeded fatal, critical, and non-critical thresholds, respectively. The first two states are shown as Critical alert in Enterprise Manager and the last state is shown as Warning.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions |
Every 5 Minutes |
Not Defined |
critical|major|CRITICAL|ERROR|FAILED |
The temperature sensor %keyValue% has exceeded its threshold. |
Data Source
The data is collected using SNMP.
User Action
No user action is required.
Voltage Sensors
This metrics category contains metrics that report the voltage sensor.
Alarm Status
This metric reports the alarm status. These values (Critical/Major/Warning) indicate if the temperature has exceeded fatal, critical, and non-critical thresholds, respectively. The first two states are shown as Critical alert in Enterprise Manager and the last state is shown as Warning.
Target Version | Evaluation and Collection Frequency | Default Warning Threshold | Default Critical Threshold | Alert Text |
---|---|---|---|---|
All Versions |
Every 5 Minutes |
Not Defined |
critical|major|CRITICAL|ERROR|FAILED |
The voltage sensor %keyValue% has exceeded its threshold. |
Data Source
The data is collected using SNMP.
User Action
No user action is required.