Georedundant High Availability (HA)
You can locate the two nodes that make up an HA pair in different locations from one another. This is known as georedundancy, which increases fault tolerance. A georedundant pair must adhere to rigid network operating conditions to ensure that all state and call data is shared between the systems, and that failovers happen quickly without losing calls.
The following network constraints are required for georedundant operation:
- A pair of dedicated fiber routes between sites is required. Each route must have non-blocking bandwidth sufficient to connect wancom1 and wancom2 ports (i.e., 1Gbps per port)
- Inter-site round-trip time (RTT) must be less than 10 ms. 5 ms or less is ideal. Georedundant operation must be built upon a properly engineered layer-2 WAN (eg. MPLS or Metro Ethernet) that connects active and standby HA pair members.
- Simultaneous packet loss across the inter-site link pair must be 0%. Loss of consecutive heartbeats could potentially result in split-brain behaviors.
- Security (privacy and data-integrity) must be provided by the network itself.
As with local HA nodes, management traffic (e.g. SSH, SFTP, SNMP, etc.) must be confined to the wancom0 management interface. HA node peers must have their wancom0 IP addresses on the same subnet. All Oracle Communications Subscriber-Aware Load Balancer configuration, including host routes and the system-config's default-gateway, is shared between the HA pair so it is not possible to have two different management interface default-gateways. This implies the requirement of an L2-switched connection between the 2 wancom0 management interfaces.
Troubleshooting Georedundant Deployments
The Oracle Communications Subscriber-Aware Load Balancer provides rich statistics and status information on HA operation, documented in the ACLI Reference and Maintenance and Troubleshooting Guides. Some of this information is especially suited for troubleshooting the latency and packet-loss requirements for georedundant deployments, including:
- Details within the
show redundancy command output, including:
- Request-response round-trip time measurements (show redundancy <task-name>)
- Request-response loss measurements (show redundancy <task-name>)
- journal statistics (show redundancy <task-name> journals)
- journal latency (show redundancy <task-name> journals)
- protocol-specific redundancy actions (show redundancy <task-name> actions)
- protocol-specific redundancy objects (show redundancy <task-name> objects)
- Details within the
show queues command output, including:
- sipd command queue statistics (show queue <task-name> commands)
- Protocol-specific log messaging