Skip Navigation Links | |
Exit Print View | |
![]() |
Oracle Solaris Cluster Data Services Planning and Administration Guide Oracle Solaris Cluster 4.1 |
1. Planning for Oracle Solaris Cluster Data Services
2. Administering Data Service Resources
Overview of Tasks for Administering Data Service Resources
Configuring and Administering Oracle Solaris Cluster Data Services
How to Register a Resource Type
How to Install and Register an Upgrade of a Resource Type
How to Migrate Existing Resources to a New Version of the Resource Type
How to Unregister Older Unused Versions of the Resource Type
How to Downgrade a Resource to an Older Version of Its Resource Type
How to Create a Failover Resource Group
How to Create a Scalable Resource Group
Configuring Failover and Scalable Data Services on Shared File Systems
How to Configure a Failover Application Using the ScalMountPoint Resource
How to Configure a Scalable Application Using the ScalMountPoint Resource
Tools for Adding Resources to Resource Groups
How to Add a Logical Hostname Resource to a Resource Group by Using the clsetup Utility
How to Add a Logical Hostname Resource to a Resource Group Using the Command-Line Interface
How to Add a Shared Address Resource to a Resource Group by Using the clsetup Utility
How to Add a Shared Address Resource to a Resource Group Using the Command-Line Interface
How to Add a Failover Application Resource to a Resource Group
How to Add a Scalable Application Resource to a Resource Group
Bringing Resource Groups Online
How to Bring Resource Groups Online
Switching Resource Groups to Preferred Primaries
How to Switch Resource Groups to Preferred Primaries
How to Quiesce a Resource Group
How to Quiesce a Resource Group Immediately
Suspending and Resuming the Automatic Recovery Actions of Resource Groups
Immediately Suspending Automatic Recovery by Killing Methods
How to Suspend the Automatic Recovery Actions of a Resource Group
How to Suspend the Automatic Recovery Actions of a Resource Group Immediately
How to Resume the Automatic Recovery Actions of a Resource Group
Disabling and Enabling Resource Monitors
How to Disable a Resource Fault Monitor
How to Enable a Resource Fault Monitor
How to Remove a Resource Group
Switching the Current Primary of a Resource Group
How to Switch the Current Primary of a Resource Group
Disabling Resources and Moving Their Resource Group Into the UNMANAGED State
How to Disable a Resource and Move Its Resource Group Into the UNMANAGED State
Displaying Resource Type, Resource Group, and Resource Configuration Information
Changing Resource Type, Resource Group, and Resource Properties
How to Change Resource Type Properties
How to Change Resource Group Properties
How to Change Resource Properties
How to Change Resource Dependency Properties
How to Modify a Logical Hostname Resource or a Shared Address Resource
Clearing the STOP_FAILED Error Flag on Resources
How to Clear the STOP_FAILED Error Flag on Resources
Clearing the Start_failed Resource State
How to Clear a Start_failed Resource State by Switching Over a Resource Group
How to Clear a Start_failed Resource State by Restarting a Resource Group
How to Clear a Start_failed Resource State by Disabling and Enabling a Resource
Upgrading a Preregistered Resource Type
Information for Registering the New Resource Type Version
Information for Migrating Existing Instances of the Resource Type
Reregistering Preregistered Resource Types After Inadvertent Deletion
How to Reregister Preregistered Resource Types After Inadvertent Deletion
Adding or Removing a Node to or From a Resource Group
Adding a Node to a Resource Group
How to Add a Node to a Scalable Resource Group
How to Add a Node to a Failover Resource Group
Removing a Node From a Resource Group
How to Remove a Node From a Scalable Resource Group
How to Remove a Node From a Failover Resource Group
How to Remove a Node From a Failover Resource Group That Contains Shared Address Resources
Example - Removing a Node From a Resource Group
Synchronizing the Startups Between Resource Groups and Device Groups
Managed Entity Monitoring by HAStoragePlus
Troubleshooting Monitoring for Managed Entities
Additional Administrative Tasks to Configure HAStoragePlus Resources for a Zone Cluster
How to Set Up the HAStoragePlus Resource Type for New Resources
How to Set Up the HAStoragePlus Resource Type for Existing Resources
Configuring an HAStoragePlus Resource for Cluster File Systems
Sample Entries in /etc/vfstab for Cluster File Systems
How to Set Up the HAStoragePlus Resource for Cluster File Systems
How to Delete an HAStoragePlus Resource Type for Cluster File Systems
Enabling Highly Available Local File Systems
Configuration Requirements for Highly Available Local File Systems
Format of Device Names for Devices Without a Volume Manager
Sample Entries in /etc/vfstab for Highly Available Local File Systems
How to Set Up the HAStoragePlus Resource Type by Using the clsetup Utility
How to Delete an HAStoragePlus Resource That Makes a Local Solaris ZFS Highly Available
Sharing a Highly Available Local File System Across Zone Clusters
Modifying Online the Resource for a Highly Available Local File System
How to Add File Systems Other Than Solaris ZFS to an Online HAStoragePlus Resource
How to Remove File Systems Other Than Solaris ZFS From an Online HAStoragePlus Resource
How to Add a Solaris ZFS Storage Pool to an Online HAStoragePlus Resource
How to Remove a Solaris ZFS Storage Pool From an Online HAStoragePlus Resource
Changing a ZFS Pool Configuration That is Managed by an HAStoragePlus Resource
How to Change a ZFS Pool Configuration That is Managed by an Online HAStoragePlus Resource
How to Recover From a Fault After Modifying the Zpools Property of an HAStoragePlus Resource
Changing the Cluster File System to a Local File System in an HAStoragePlus Resource
How to Change the Cluster File System to Local File System in an HAStoragePlus Resource
Upgrading the HAStoragePlus Resource Type
Information for Registering the New Resource Type Version
Information for Migrating Existing Instances of the Resource Type
Distributing Online Resource Groups Among Cluster Nodes
Enforcing Collocation of a Resource Group With Another Resource Group
Specifying a Preferred Collocation of a Resource Group With Another Resource Group
Distributing a Set of Resource Groups Evenly Among Cluster Nodes
Specifying That a Critical Service Has Precedence
Delegating the Failover or Switchover of a Resource Group
Combining Affinities Between Resource Groups
Zone Cluster Resource Group Affinities
Configuring the Distribution of Resource Group Load Across Nodes
How to Configure Load Limits for a Node
How to Set Priority for a Resource Group
How to Set Load Factors for a Resource Group
How to Set Preemption Mode for a Resource Group
How to Concentrate Load Onto Fewer Nodes in the Cluster
Enabling Oracle Solaris SMF Services to Run With Oracle Solaris Cluster
Encapsulating an SMF Service Into a Failover Proxy Resource Configuration
Encapsulating an SMF Service Into a Multi-Master Proxy Resource Configuration
Encapsulating an SMF Service Into a Scalable Proxy Resource Configuration
Tuning Fault Monitors for Oracle Solaris Cluster Data Services
Setting the Interval Between Fault Monitor Probes
Setting the Timeout for Fault Monitor Probes
Defining the Criteria for Persistent Faults
Complete Failures and Partial Failures of a Resource
Dependencies of the Threshold and the Retry Interval on Other Properties
System Properties for Setting the Threshold and the Retry Interval
Each data service that is supplied with the Oracle Solaris Cluster product has a built-in fault monitor. The fault monitor performs the following functions:
Detecting the unexpected termination of processes for the data service server
Checking the health of the data service
The fault monitor is contained in the resource that represents the application for which the data service was written. You create this resource when you register and configure the data service. For more information, see the documentation for the data service.
System properties and extension properties of this resource control the behavior of the fault monitor. The default values of these properties determine the preset behavior of the fault monitor. The preset behavior should be suitable for most Oracle Solaris Cluster installations. Therefore, you should tune a fault monitor only if you need to modify this preset behavior.
Tuning a fault monitor involves the following tasks:
Setting the interval between fault monitor probes
Setting the timeout for fault monitor probes
Defining the criteria for persistent faults
Specifying the failover behavior of a resource
Perform these tasks when you register and configure the data service. For more information, see the documentation for the data service.
Note - A resource's fault monitor is started when you bring online the resource group that contains the resource. You do not need to start the fault monitor explicitly.
To determine whether a resource is operating correctly, the fault monitor probes this resource periodically. The interval between fault monitor probes affects the availability of the resource and the performance of your system as follows:
The interval between fault monitor probes affects the length of time that is required to detect a fault and respond to the fault. Therefore, if you decrease the interval between fault monitor probes, the time that is required to detect a fault and respond to the fault is also decreased. This decrease enhances the availability of the resource.
Each fault monitor probe consumes system resources such as processor cycles and memory. Therefore, if you decrease the interval between fault monitor probes, the performance of the system is degraded.
The optimum interval between fault monitor probes also depends on the time that is required to respond to a fault in the resource. This time depends on how the complexity of the resource affects the time that is required for operations such as restarting the resource.
To set the interval between fault monitor probes, set the Thorough_probe_interval system property of the resource to the interval in seconds that you require.
The timeout for fault monitor probes specifies the length of time that a fault monitor waits for a response from a resource to a probe. If the fault monitor does not receive a response within this timeout, the fault monitor treats the resource as faulty. The time that a resource requires to respond to a fault monitor probe depends on the operations that the fault monitor performs to probe the resource. For information about operations that a data service's fault monitor performs to probe a resource, see the documentation for the data service.
The time that is required for a resource to respond also depends on factors that are unrelated to the fault monitor or the application, for example:
System configuration
Cluster configuration
System load
Amount of network traffic
To set the timeout for fault monitor probes, set the Probe_timeout extension property of the resource to the timeout in seconds that you require.
To minimize the disruption that transient faults in a resource cause, a fault monitor restarts the resource in response to such faults. For persistent faults, more disruptive action than restarting the resource is required:
For a failover resource, the fault monitor fails over the resource to another node.
For a scalable resource, the fault monitor takes the resource offline.
A fault monitor treats a fault as persistent if the number of complete failures of a resource exceeds a specified threshold within a specified retry interval. Defining the criteria for persistent faults enables you to set the threshold and the retry interval to accommodate the performance characteristics of your cluster and your availability requirements.
A fault monitor treats some faults as a complete failure of a resource. A complete failure typically causes a complete loss of service. The following failures are examples of a complete failure:
Unexpected termination of the process for a data service server
Inability of a fault monitor to connect to a data service server
A complete failure causes the fault monitor to increase by 1 the count of complete failures in the retry interval.
A fault monitor treats other faults as a partial failure of a resource. A partial failure is less serious than a complete failure, and typically causes a degradation of service, but not a complete loss of service. An example of a partial failure is an incomplete response from a data service server before a fault monitor probe is timed out.
A partial failure causes the fault monitor to increase by a fractional amount the count of complete failures in the retry interval. Partial failures are still accumulated over the retry interval.
The following characteristics of partial failures depend on the data service:
The types of faults that the fault monitor treats as partial failure
The fractional amount that each partial failure adds to the count of complete failures
For information about faults that a data service's fault monitor detects, see the documentation for the data service.
The maximum length of time that is required for a single restart of a faulty resource is the sum of the values of the following properties:
Thorough_probe_interval system property
Probe_timeout extension property
To ensure that you allow enough time for the threshold to be reached within the retry interval, use the following expression to calculate values for the retry interval and the threshold:
retry_interval >= 2 x threshold × (thorough_probe_interval + probe_timeout)The factor of 2 accounts for partial probe failures that do not immediately cause the resource to be failed over or taken offline.
To set the threshold and the retry interval, set the following system properties of the resource:
To set the threshold, set the Retry_count system property to the maximum allowed number of complete failures.
To set the retry interval, set the Retry_interval system property to the interval in seconds that you require.
The failover behavior of a resource determines how the RGM responds to the following faults:
Failure of the resource to start
Failure of the resource to stop
Failure of the resource's fault monitor to stop
To specify the failover behavior of a resource, set the Failover_mode system property of the resource. For information about the possible values of this property, see the description of the Failover_mode system property in the r_properties(5) man page.