![]() ![]() ![]() ![]() ![]() ![]() |
A variety of events can lead to the failure of a server instance. Often one failure condition leads to another. Loss of power, hardware malfunction, operating system crashes, network partitions, or unexpected application behavior may each contribute to the failure of a server instance.
WebLogic SIP Server uses a highly clustered architecture as the basis for minimizing the impact of failure events. However, even in a clustered environment it is important to prepare for a sound recovery process in the event that an engine tier server, data tier server, or Diameter relay node were to suddenly fail.
The following sections provide information and procedures for recovering failed server instances:
WebLogic SIP Server provides several features that facilitate recovery from and protection against server failure.
WebLogic SIP Server detects increases in system load that could affect the performance and stability of deployed SIP Servlets, and automatically throttles message processing at predefined load thresholds.
Using overload protection helps you avoid failures that could result from unanticipated levels of application traffic or resource utilization.
WebLogic SIP Server attempts to avoid failure when certain conditions occur:
See overload in the Configuration Reference for more information.
You can increase the reliability and availability of your applications by using multiple engine tier servers in a dedicated cluster, as well as multiple data tier servers (replicas) in a dedicated data tier cluster. Because engine tier clusters maintain no stateful information about applications, the failure of an engine tier server does not result in any data loss or dropped calls. Multiple replicas in a data tier partition store redundant copies of call state information, and automatically failover to one another should a replica fail.
See Overview of the WebLogic SIP Server Architecture for more information.
Using Node Manager, server self-health monitoring enables you to automatically reboot servers that have failed. This improves the overall reliability of a domain, and requires no direct intervention from an administrator.
For more information, see Configuring, Starting, and Stopping Node Manager in the WebLogic Server 8.1 documentation.
Managed Servers maintain a local copy of the domain configuration. When a Managed Server starts, it contacts its Administration Server to retrieve any changes to the domain configuration that were made since the Managed Server was last shut down. If a Managed Server cannot connect to the Administration Server during startup, it can use its locally-cached configuration information—this is the configuration that was current at the time of the Managed Server's most recent shutdown. A Managed Server that starts up without contacting its Administration Server to check for configuration updates is running in Managed Server Independence (MSI) mode. By default, MSI mode is enabled. See Replicating a Domain's Configuration Files for Managed Server Independence in the WebLogic Server 8.1 documentation.
Recovery from the failure of a server instance requires access to the domain's configuration and security data. This section describes file backups that WebLogic SIP Server performs automatically, as well as manual backup procedures that an administrator should perform periodically.
By default, an Administration Server stores a domain's configuration data in a file called domain_name
/config.xml
, where domain_name
is the root directory of the domain.
Back up config.xml
to a secure location in case a failure of the Administration Server renders the original copy unavailable. BEA recommends storing each new version of a config.xml
file to a source control repository. If an Administration Server fails, you can copy the most recent backup version to a different machine and restart the Administration Server on that machine.
By default, the Administration Server archives up to 5 previous versions of config.xml
in the domain-name
/configArchive
directory.
When you save a change to a domain's configuration, the Administration Server saves the previous configuration in domain-name
\configArchive\config.xml#
n
. Each time the Administration Server saves a file in the configArchive
directory, it increments the value of the #
n
suffix, up to a configurable number of copies—5 by default. Thereafter, each time you change the domain configuration:
To configure how the number of config.xml
file versions that the server maintains:
In addition to the files in domain-name
\configArchive
, the Administration Server creates two other files that back up the domain's configuration at key points during the startup process:
domain-name
\config-file
.xml.original
—The configuration file just before the Administration Server parses it and adds subsystem data.domain-name
\config-file
.xml.booted
—The configuration file just after the Administration Server successfully boots. If the config.xml
becomes corrupted, you can boot the Administration Server with this file.
As with the config.xml
file, the sipserver
implementation application contains configuration information used by all engine and data tier servers deployed within a domain. The sipserver
application also generally includes the diameter
application for engine tier servers that act as Diameter client nodes.
By default the sipserver
application is stored in domain_name
/sipserver
. Backup the entire application directory, which includes the sipserver.xml
, datatier.xml
, and diameter.xml
configuration files, as well as any additional patches you may have installed.
If you configure one or more WebLogic SIP Server instances to function as Diameter relay agent nodes, the Diameter Web Application is generally deployed as a standalone application (outside of the sipserver
implementation application). Backup each Diameter application used to configure a relay agent node. This generally involves a separate Diameter application directory for each relay.
In a WebLogic SIP Server deployment, the start scripts used to boot engine and data tier servers are generally customized to include domain-specific configuration information such as:
WlssEchoServer
process.Backup each distinct start script used to boot engine tier, data tier, or diameter relay servers in your domain.
If you use WebLogic SIP Server logging Servlets (see Logging SIP Requests and Responses) to perform regular logging or auditing of SIP messages, backup the complete application source files so that you can easily redeploy the applications should the staging server fail or the original deployment directory becomes corrupted.
The WebLogic Security service stores its configuration data config.xml
file, and also in an LDAP repository and other files.
The default Authentication, Authorization, Role Mapper, and Credential Mapper providers that are installed with WebLogic SIP Server store their data in an LDAP server. Each WebLogic SIP Server contains an embedded LDAP server. The Administration Server contains the master LDAP server, which is replicated on all Managed Servers. If any of your security realms use these installed providers, you should maintain an up-to-date backup of the following directory tree:
where domain_name
is the domain's root directory and adminServer
is the directory in which the Administration Server stores runtime and security data.
Each WebLogic SIP Server has an LDAP directory, but you only need to back up the LDAP data on the Administration Server—the master LDAP server replicates the LDAP data from each Managed Server when updates to security data are made. WebLogic security providers cannot modify security data while the domain's Administration Server is unavailable. The LDAP repositories on Managed Servers are replicas and cannot be modified.
The ldap
/ldapfiles
subdirectory contains the data files for the LDAP server. The files in this directory contain user, group, group membership, policies, and role information. Other subdirectories under the ldap
directory contain LDAP server message logs and data about replicated LDAP servers.
Do not update the configuration of a security provider while a backup of LDAP data is in progress. If a change is made—for instance, if an administrator adds a user—while you are backing up the ldap
directory tree, the backups in the ldapfiles
subdirectory could become inconsistent. If this does occur, consistent, but potentially out-of-date, LDAP backups are available.
Once a day, a server suspends write operations and creates its own backup of the LDAP data. It archives this backup in a ZIP
file below the ldap\backup
directory and then resumes write operations. This backup is guaranteed to be consistent, but it might not contain the latest security data.
For information about configuring the LDAP backup, see Configuring Backups for the Embedded LDAP Server in the WebLogic Server 8.1 Documentation.
All servers create a file named SerializedSystemIni.dat
and place it in the server's root directory. This file contains encrypted security data that must be present to boot the server. You must back up this file.
If you configured a server to use SSL, also back up the security certificates and keys. The location of these files is user-configurable.
Certain files maintained at the operating system level are also critical in helping you recover from system failures. Consider backing up the following information as necessary for your system:
If no Managed Servers in the domain are running when you restart a failed Administration Server, no special steps are required. Start the Administration Server as you normally would.
If the Administration Server shuts down while Managed Servers continue to run, you do not need to restart the Managed Servers that are already running in order to recover management of the domain. The procedure for recovering management of an active domain depends upon whether you can restart the Administration Server on the same machine it was running on when the domain was started.
If you restart the WebLogic Administration Server while Managed Servers continue to run, by default the Administration Server can discover the presence of the running Managed Servers.
Note: | Make sure that the startup command or startup script does not include -Dweblogic.management.discover=false , which disables an Administration Server from discovering its running Managed Servers. |
The root directory for the domain contains a file, running-managed-servers.xml
, which contains a list of the Managed Servers in the domain and describes whether they are running or not. When the Administration Server restarts, it checks this file to determine which Managed Servers were under its control before it stopped running.
When a Managed Server is gracefully or forcefully shut down, its status in running-managed-servers.xml
is updated to "not-running". When an Administration Server restarts, it does not try to discover Managed Servers with the "not-running" status. A Managed Server that stops running because of a system crash, or that was stopped by killing the JVM or the command prompt (shell) in which it was running, will still have the status "running' in running-managed-servers.xml
. The Administration Server will attempt to discover them, and will throw an exception when it determines that the Managed Server is no longer running.
Restarting the Administration Server does not cause Managed Servers to update the configuration of static attributes. Static attributes are those that a server refers to only during its startup process. Servers instances must be restarted to take account of changes to static configuration attributes. Discovery of the Managed Servers only enables the Administration Server to monitor the Managed Servers or make runtime changes in attributes that can be configured while a server is running (dynamic attributes).
If a machine crash prevents you from restarting the Administration Server on the same machine, you can recover management of the running Managed Servers as follows:
Make sure that the startup command or startup script does not include -Dweblogic.management.discover=false
, which disables an Administration Server from discovering its running Managed Servers.
When the Administration Server starts, it communicates with the Managed Servers and informs them that the Administration Server is now running on a different IP address.
If the Administration Server is reachable by Managed Server that failed, you can:
If a Managed Server cannot connect to the Administration Server during startup, it can retrieve its configuration by reading locally-cached configuration data. A Managed Server that starts in this way is running in Managed Server Independence (MSI) mode. For a description of MSI mode, and the files that a Managed Server must access to start up in MSI mode, see Replicating a Domain's Configuration Files for Managed Server Independence in the WebLogic Server 8.1 documentation.
To start up a Managed Server in MSI mode:
config.xml
and SerializedSystemIni.dat
file from the Administration Server's root directory (or from a backup) to the Managed Server's root directory. msi-config.xml
. When you start the server, it will use the copied configuration files.Note: | Alternatively, use the -Dweblogic.RootDirectory=path startup option to specify a root directory that already contains these files. |
The Managed Server will run in MSI mode until it is contacted by its Administration Server. For information about restarting the Administration Server in this scenario, see Restarting a Failed Administration Server.
![]() ![]() ![]() |