![]() |
![]() |
|
|
Repairing Partitioned Networks
This topic provides instructions for troubleshooting a partition, identifying its cause, and taking action to recover from it. A network partition exists if one or more machines cannot access the MASTER machine. As the application administrator, you are responsible for detecting partitions and recovering from them.
A network partition may be caused by any the following failures:
The procedure you follow to recover from a partitioned network depends on the cause of the partition.
Detecting a Partitioned Network
You can detect a network partition in one of the following ways:
How to Check the ULOG
When problems occur with the network, BEA Tuxedo system administrative servers start sending messages to the ULOG. If the ULOG is set up over a remote file system, all messages are written to the same log. In this scenario, you can run the tail(1) command on one file and check the failure messages displayed on the screen.
If, however, the remote file system is using the network in which the problem has occurred, the remote file system may no longer be available.
Example of a ULOG Error Message
151804.gumby!DBBL.28446: ... : ERROR: BBL partitioned, machine=SITE2
How to Gather Information About the Network, Server, and Service
The following is an example of a tmadmin session in which information is being collected about a partitioned network, a server, and a service on that network. Three tmadmin commands are run:
Example tmadmin Session
$ tmadmin
> pnw SITE2
Could not retrieve status from SITE2
> psr -m SITE1
a.out Name Queue Name Grp Name ID Rq Done Load Done Current Service
BBL 30002.00000 SITE1 0 - - ( - )
DBBL 123456 SITE1 0 121 6050 MASTERBB
simpserv 00001.00001 GROUP1 1 - - ( - )
BRIDGE 16900672 SITE1 0 - - ( DEAD )
>psc -m SITE1
Service Name Routine Name a.out Grp Name ID Machine # Done Status
------------ ------------ -------- -------- -- ------- ------------
ADJUNCTADMIN ADJUNCTADMIN BBL SITE1 0 SITE1 - PART
ADJUNCTBB ADJUNCTBB BBL SITE1 0 SITE1 - PART
TOUPPER TOUPPER simpserv GROUP1 1 SITE1 - PART
BRIDGESVCNM BRIDGESVCNM BRIDGE SITE1 1 SITE1 - PART
Restoring a Network Connection
This topic provides instructions for recovering from transient and severe network failures.
How to Recover from Transient Network Failures
Because the BRIDGE tries, automatically, to recover from any transient network failures and reconnect, transient network failures are usually not noticed. If, however, you need to perform a manual recovery from a transient network failure, complete the following procedure.
rco non-partioned_node1 partioned_node2
How to Recover from Severe Network Failures
To recover from severe network failure, complete the following procedure.
pcl partioned_machine
![]() |
![]() |
|
Copyright © 2001 BEA Systems, Inc. All rights reserved.
|