Skip Navigation Links | |
Exit Print View | |
![]() |
Oracle Solaris Cluster Geographic Edition Data Replication Guide for EMC Symmetrix Remote Data Facility Oracle Solaris Cluster 4.1 |
1. Replicating Data With EMC Symmetrix Remote Data Facility Software
2. Administering SRDF Protection Groups
3. Migrating Services That Use SRDF Data Replication
Detecting Cluster Failure on a System That Uses SRDF Data Replication
Migrating Services That Use SRDF Data Replication With a Switchover
Validations That Occur Before a Switchover
Results of a Switchover From a Replication Perspective
How to Switch Over an SRDF Protection Group From Primary to Secondary
Forcing a Takeover on a System That Uses SRDF Data Replication
Validations That Occur Before a Takeover
Results of a Takeover From a Replication Perspective
How to Force Immediate Takeover of SRDF Services by a Secondary Cluster
Recovering Services to a Cluster on a System That Uses SRDF Replication
How to Resynchronize and Revalidate the Protection Group Configuration
How to Perform a Failback-Switchover on a System That Uses SRDF Replication
How to Perform a Failback-Takeover on a System That Uses SRDF Replication
Recovering From a Switchover Failure on a System That Uses SRDF Replication
Recovering From Switchover Failure
How to Make the Original Primary Cluster Primary for an SRDF Protection Group
How to Make the Original Secondary Cluster Primary for an SRDF Protection Group
Recovering From an SRDF Data Replication Error
How to Detect Data Replication Errors
How to Recover From an SRDF Data Replication Error
This section describes the internal processes that occur when failure is detected on a primary or a secondary cluster.
When the primary cluster for a protection group fails, the secondary cluster in the partnership detects the failure. The cluster that fails might be a member of more than one partnership, resulting in multiple failure detections.
The following actions take place when a primary cluster failure occurs. During a failure, the appropriate protection groups are in the Unknown state on the cluster that failed.
Heartbeat failure is detected by a partner cluster.
The heartbeat is activated in emergency mode to verify that the heartbeat loss is not transient and that the primary cluster has failed. The heartbeat remains in the Online state during this default time-out interval, while the heartbeat mechanism continues to retry the primary cluster.
This query interval is set by using the Query_interval heartbeat property. If the heartbeat still fails after the interval you configured, a heartbeat-lost event is generated and logged in the system log. When you use the default interval, the emergency-mode retry behavior might delay heartbeat-loss notification for about nine minutes. Messages are displayed in the graphical user interface (GUI) and in the output of the geoadm status command.
For more information about logging, see Viewing the Geographic Edition Log Messages in Oracle Solaris Cluster Geographic Edition System Administration Guide.
If the partnership is configured for heartbeat-loss notification, then one or both of the following actions occurs:
An email is sent to the address that is configured by the Notification_emailaddrs property.
The script defined in Notification_actioncmd is executed.
For more information about configuring heartbeat-loss notification, see Configuring Heartbeat-Loss Notification in Oracle Solaris Cluster Geographic Edition System Administration Guide.
When a secondary cluster for a protection group fails, a cluster in the same partnership detects the failure. The cluster that failed might be a member of more than one partnership, resulting in multiple failure detections.
During failure detection, the following actions take place:
Heartbeat failure is detected by a partner cluster.
The heartbeat is activated in emergency mode to verify that the secondary cluster is dead.
When a failure is confirmed by the Geographic Editionproduct, the cluster notifies the administrator. The system detects all protection groups for which the cluster that failed was acting as secondary. The state of the appropriate protection groups is marked Unknown.