Installation and Configuration Guide

     Previous  Next    Open TOC in new window  Open Index in new window  View as PDF - New Window  Get Adobe Reader - New Window
Content starts here

Diagnostics and Troubleshooting

This section describes what to do when things go wrong. It describes how to deal with problems that occur in both the server and the JVM and problems endemic to WLS-VE specifically. It also describes how to obtain information about your WLS-VE instance and provide that information to BEA's Support organization. This chapter includes information on the following subjects:

 


Troubleshooting WLS-VE Problems

This section provides information you will find helpful in solving problems that might occur with WLS-VE. Generally, you handle WebLogic Server and LiquidVM (the BEA JRockit component) problems the same way you would for their non-virtualized versions. You should follow BEA Support's instructions for information collection, augmented with those in Reporting a Problem to BEA Support. For BEA JRockit, you can use the standard tools available with BEA JRockit Mission Control—such as the JRockit Runtime Analyzer and Memory Leak Detector—to help you diagnose problems and collect relevant information about runtime activity.

This section contains information on the following subjects:

Troubleshooting Common WLS-VE Issues

Problems with WLS-VE not specifically associated with WebLogic Server or with LiquidVM, can probably be traced to configuration errors. This section will help you identify the problem and figure out what caused it and how to resolve it. If you cannot find the solution here, collect the necessary information about your system, as described in Reporting a Problem to BEA Support, and open a case with BEA Support.

The most common error conditions you might encounter are:

"Could not find the disk" Error

Symptom: When you launch your instance and you get the following output in your OS console window:

Starting WLS-MyServer. connect...configure...create...
Could not find the disk: [storage2] wlsve/isoName.iso

Problem: When the virtual machine was created, VMware could not find the ISO image that you need to boot up WLS-VE.

Solution:

Check that you have uploaded the WLS-VE ISO image to the ESX server (see Copying the ISO Image to ESX Server Datastores for more information).

Check the bea.lvm.info file in your home directory that it points to that same location. You can do this either by manually editing the bea.lvm.info file in your favorite editor or by rerunning the LiquidVM configuration wizard (tools\virtualization\control_1.0\bin\lvm_configwizard.cmd or .sh), as described in User Access Credentials.

Confirm that the ISO image exists, using the LiquidVM configuration wizard:

  1. Select VM Host.
  2. Select the Configuration tab and note the Datastore Name.
  3. Select Browse Datastore and confirm that wlsve921.iso is available in your datastore.

"Could not lookup NFS server" Error

Symptom: When you launch your instance, you get the following output in your OS console window:

WARNING: Could not lookup NFS server foobar
Could be ok if LiquidVM network topology is different.
To ignore warnings. Set environment variable LVM_ACCEPT_WARNING=true.
LiquidVM start-up aborted.

Problem: Probably caused by a non-existent NFS server being provided or a different network topology on the launching machine where your WLS-VE instance will start,. You can ignore the warning.

Solution: Verify that the server exists then check DOMAIN_MOUNT, BEA_MOUNT, TMP_MOUNT in the start-up script and ensure that the server name is spelled correctly.

This check is done on the launching OS; the check could be wrong, because LiquidVM will start from another machine that might have access to other pieces of the network. If this is the case, to ignore the warning, set the environment variable LVM_ACCEPT_WARNING=true. This will preempt the check.

"Could not ping NFS server" Error

Symptom: When you launch your instance, you get the following output in your OS console window:

WARNING: Could not ping NFS server named myserver
Maybe this is ok - because network topology may be different for LiquidVM To ignore warnings. Set environment variable LVM_ACCEPT_WARNING=true.
LiquidVM start-up aborted.

Problem: Probably caused by a non-existent (or misspelled) NFS server being provided or a different network topology on the launching machine where your WLS-VE instance will start,. You can ignore the warning.

Solution: Verify that the server exists then check DOMAIN_MOUNT, BEA_MOUNT, TMP_MOUNT in the start-up script and ensure that the server name is spelled correctly.

This check is done on the launching OS; the check could be wrong, because LiquidVM will start from another machine that might have access to other pieces of the network. If this is the case, to ignore the warning set the environment variable LVM_ACCEPT_WARNING=true. This will preempt the check.

"(Mount Daemon) not registered" Error

Symptom: When you launch your instance, you get the following output in your OS console window and/or in the VM's Console tab in VirtualCenter:

Baremetal hostname: "172.23.82.203"  IP address: 172.23.82.203
000000 [rpcconn WRN] program 100005 (Mount Daemon) not registered
000001 [rpcconn WRN] Is Mount Deamoen (needed for NFS) running on host nfserver.foo.com?
000002 [rpcconn WRN] pmapGetPort returned: 22
000003 [nfsconn WRN] Error setting up connection to mountd: 22
000004 [nfs WRN] Failed to mount: snfsserver.foo.com:/share/Temp

Problem: One of the following has happened:

Solution: Do the following:

  1. Check your DOMAIN_MOUNT, BEA_HOME_MOUNT, TMP_MOUNT to verify that you actually picked servers that you know have a NFS-server service running. If this is correct:
  2. Check your static IP address, your gateway (LVM_GATEWAY) and your netmask (LVM_NETMASK) and verify that they are correct.

"Error stating root" Error

Symptom: When you launch your instance you get the following output in your OS console window and/or in the VM's Console tab in VirtualCenter:

Starting WLS-AdminServer. connect...configure...start...booting...
VM-log:
Baremetal hostname: "172.23.82.203" IP address: 172.23.82.203
000000 [nfs WRN] nfsReadSuperBlock: Error stating root (116)
000001 [vfs WRN] Error reading super block
000002 [vfsx WRN] Failed to read super block
000003 [mount WRN] Could not mount nfs at /domain (-1)
000004 [mount WRN] Failed to process mount points

Problem: The user credentials you have provided for the NFS domain directory do not have the necessary rights to create files in the domain directory. You have specified a user that lacks rights to access this NFS-share

Solution: Check your DOMAIN_MOUNT and verify that the uid and gid are correct.

"Failed to open conslog" Error

Symptom: When you launch your instance, you get the following output in your OS console window and/or in the VM's Console tab in VirtualCenter:

Starting WLS-AdminServer. connect...configure...start...booting...
VM-log:
Baremetal hostname: "172.23.82.203" IP address: 172.23.82.203
000000 [console WRN] Failed to open conslog [WLS-AdminServer.log] error: Permission denied

Problem: The user credentials you have provided for the NFS domain directory do not have the necessary rights to create files in the domain directory. Either you have specified the wrong user or the access rights on the domain directory are wrong.

Solution: Check your domain mount and verify that the uid and gid are correct. Check that the user or group has the necessary rights to the domain directory on the NFS server (use chmod, chown, and chgrp to change the ownership and rights, as appropriate).

The Server Shuts Down Soon After Startup I

Symptom: The server shuts down soon after startup and a log file named WLS-<servername>.log appears in your domain directory. In that file, you find the following:

<22-Mar-2007 19:46:36 o'clock CET> <Info> <Security> <BEA-090065> <Getting boot identity from user.> 
Enter username to boot WebLogic server:000257 [procfs WRN] Implement console read
<22-Mar-2007 19:46:36 o'clock CET> <Error> <Security> <BEA-090782> <Server is Running in Production Mode and Native Library(terminalio) to read the password securely from commandline is not found.>
<22-Mar-2007 19:46:36 o'clock CET> <Notice> <WebLogicServer> <BEA-000388> <JVM called WLS shutdown hook. The server will force shutdown now>
<22-Mar-2007 19:46:36 o'clock CET> <Alert> <WebLogicServer> <BEA-000396> <Server shutdown has been requested by <WLS Kernel>>
<22-Mar-2007 19:46:36 o'clock CET> <Notice> <WebLogicServer> <BEA-000365> <Server state changed to FORCE_SHUTTING_DOWN>
Note: If your server shuts down soon after startup and the preceding was not the error message you received, see The Server Shuts Down Soon After Startup II.

Problem: You didn't provide a user name and password either in the security directory of the Administration Server's root directory or in the script WLS_USERNAME, WLS_PASSWORD. Keep in mind that WLS-VE does not support normal keyboard input, so you cannot enter a username and password on the keyboard.

Solution: Add the password to the security directory of the Administration Server's root directory (see Creating a Boot Identity File for an Administration Server in the WebLogic Server 9.2 document, Managing Server Startup and Shutdown) or add username and password in the startup script and relaunch your WLS-VE server.

The Server Shuts Down Soon After Startup II

Symptom: The server shuts down soon after startup and a log file named WLS-<servername>.log appears in your domain directory. In that file, you find the following:

<22-Mar-2007 18:23:39 o'clock CET> <Critical> <WebLogicServer> <BEA-000362> <Server failed. Reason: 
Unable to start WebLogic Server!
Exception occurred while reading the license file.
Please make sure you have a valid license.bea
license...

Problem: The BEA_HOME_MOUNT points to a directory that is not a BEA home. For some reason, you don't have a file named license.bea in the directory specified.

Solution: Ensure that your BEA_HOME_MOUNT points to the right directory (that is, a BEA home directory).

"looking up myserver" Error

Symptom: You receive the following error message on the VirtualCenter console:

000000 [rpcconn   WRN] Error looking up myserver
000001 [rpcconn WRN] Error getting address for host myserver
000002 [nfsconn WRN] Error setting up connection to mountd: 22
000003 [nfs WRN] Failed to mount myserver:/share/Temp

Problem: You have specified only the short name for the NFS server; that is, you didn't specify a domain name, too (for example, you just specified myserver, instead of myserver.foo.com). When the WLS-VE instance starts up, it doesn't belong to your domain and hence it will not know to ask for myserver.foo.com when you say myserver. This causes the name lookup to fail.

Solution: Either specify the NFS server with its fully-qualified name (myserver.foo.com) or specify the NFS server using its IP address.

"multiple gid provided only one allowed" Error

Symptom: When you launch your instance, you get the following output in your OS console window:

NFS syntax should be on the form: nfsserver:/nfs/path,uid=#,gid=#
ERROR: Illegal nfs path:
myserver.bea.com:/share/Temp/smith/dom,gid=506,gid=502
multiple gid provided only one allowed
LiquidVM start-up aborted.

Problem: You have typed the gid twice; you probably meant for the first one to be uid and not gid.

Solution: Remove the wrong gid from the start-up script. You might also want to add a uid=# too, if that was what you intended.

"uid=# must be a number" Error

Symptom: When you launch your instance, you get the following output in your OS console window:

NFS syntax should be on the form: nfsserver:/nfs/path,uid=#,gid=#
ERROR: Illegal nfs path:
myserver.bea.com:/share/Temp/jsmith/dom2,uid=jsmith,gid=502
uid=# # must be a number but is : jsmith
LiquidVM start-up aborted.

Problem: You have specified a username instead of a uid for user credentials to NFS. NFS only understands a uid number.

Solution: Change your *_MOUNT in the start-up script to use a uid number instead of the username string.

"netSend failed: -3" Error

Symptom: You receive the following error message on the Virtual Center console:

000000 [rpcconn    WRN] netSend failed: -3
000000 [rpcconn WRN] Rpc call failed
000000 [rpc WRN] Rpc request failed: 3
000000 [rpc WRN] rpcDoReqeust returned 3
000000 [rpc WRN] rpcCall 3 returned 8549398

Problem: Your network configuration is incorrect.

Solution: In the start-up script, check your static IP address, your gateway, and your netmask and verify that they are correct. If they aren't, obtain the correct information and enter it in the respective property.

"Configured IP [...] in use by MAC" Error

Symptom: When you attempt to start a server, you receive this message:

000000 [net   WRN] Configured IP [172.18.134.55] in use by MAC: 00:50:56:a0:
06:96
000001 [net WRN] Network stack initialization FAILED: 98

Problem: Someone else is already using the IP address you have specified.

Solution: Another running VM might be using the same IP address. Do the following:

If neither of the above is the case, someone else is using your IP address. Since finding out who that might be is difficult, contact your system administrator to obtain another IP address.

Troubleshooting WebLogic Server Issues

Server-related problems that can befall WLS-VE are the same sort of problems you might encounter running non-virtualized WebLogic Server. This section provides an overview of the kinds of WebLogic Server problems you should watch for when running WLS-VE. It includes information on these subjects:

Performance Issues

Often, a problem with WebLogic Server is the result of poor tuning. For example, pool sizes (such as pools for JDBC connections, Stateless Session EJBs, and MDBs) that don't maximize concurrency for the expected thread utilization can adversely affect performance. Similarly, applications that handle large amounts of data per request will experience a boost in performance if the chunk size—that is, a unit of memory that the WebLogic Server network layer uses to read data from and write data to sockets—the size on both the client and server sides can be increased, a process called tuning the chunk size.

You can find well-tested tuning and performance guidelines in WebLogic Server Performance and Tuning.

Server Failure

A server instance can fail and different events can lead to this failure. Often one failure condition leads to another. Loss of power, hardware malfunction, operating system crashes, network partitions, and unexpected application behavior can all contribute to the failure of a server instance. Even in a clustered environment, server instances may fail periodically and you must be prepared for the recovery process. See Avoiding and Recovering From Server Failure in Managing Server Startup and Shutdown for information on dealing with server failure.

Clustering Issues

A number of cluster problems can affect the performance of WebLogic Server. These problems can occur for many reasons, including licensing and versioning errors, multicast addressing problems, errors or misspellings in start-up commands, and even a poorly-tuned memory management systems. You can find guidelines for troubleshooting cluster problems in Troubleshoot Common Problems in Using WebLogic Server Clusters.

Other WebLogic Server Problems

Other, non-specific problems can also occur with WebLogic Server. When these problems occur, they usually generate an error message with an associated error code. The Index of Messages by Message Range provides descriptions, possible causes, and corrective actions for all WebLogic Server error conditions.

Troubleshooting LiquidVM Issues

Problems that don't originate with WebLogic Server may occur in LiquidVM and are typical to the kinds of problems you might encounter in non-virtualized JVMs. Problems such as these are documented in the BEA JRockit Diagnostics Guide (BEA JRockit is the JVM component of LiquidVM). This document provides information for either resolving the problem yourself or mining the necessary information required to open a case with BEA Support.

The sort of LiquidVM problems you might encounter when running WLS-VE are:

Note that in most UNIX operating systems there is a file descriptor limit that limits the number of files and sockets you can have open. LiquidVM does not have such limits so there is no need (and no way) to set a file descriptor limit.

For complete information on BEA JRockit problem determination and resolution, please see Diagnosing and Resolving Problems in the BEA JRockit Diagnostics Guide.

 


Handling Suspend Files

When WLS-VE crashes, the VM goes into a state of suspension. A pause button will appear on the VirtualCenter and information about the crash will be written to the console. When a suspend file is created, do the following:

  1. tar-gzip the suspend file. You will find it on the VM's home directory; it will have a filetype of .vmss.
  2. Copy the tgz file from the ESX server to your normal environment (for example, your My Documents/ folder).
  3. Upload the tgz file to BEA Support.

Be aware that you might not realize that your machine has actually crashed when it suspends. You should avoid the temptation to resume execution, as you might lose critical information that would be helpful in diagnosing the problems causing the crash. You should also be aware that suspend files are huge and might not be easy for you to copy from the ESX server.

 


Displaying Version Information

A critical piece of information that Support will need to help diagnose any problems you report to them is the version number. You can find this number in the file LVM_VERSION, which is located in the tools/virtualization/control_1.0/ directory of your LVM_HOME directory:

<BEA_HOME>/<WLSVE_HOME>/<LVM_HOME>/tools/virtualization/control_1.0/LVM_VERSION

Open this file to find the version number; for example:

LiquidVM R1.0_77103

 


Reporting a Problem to BEA Support

If you determine that you need to file a trouble report, this section discusses what you need to do before opening the case to ensure that you supply the support personnel assigned to your issue as complete picture of what is wrong as possible. The more information you can provide, the more quickly will the support staff be able to resolve your issue. This section includes information on these subjects:

Trouble Reporting Process Overview

When you encounter a problem with WebLogic Server Virtual Edition and can't resolve it using the information provided in the relevant BEA documentation, you need to collect the information that best describes your problem and open a case with BEA Support. If you have a service agreement with BEA, the normal process is to contact your Level 1 service provider, who will make the initial attempts to correct the problem. If the case cannot be solved by the Level 1 staff, it is escalated to the Level 2 staff, who will draw on their particular expertise to get your JVM running again. For serious problems, the issue will be handled by the Level 3 staff (the WebLogic Server Virtual Edition developers)

Identify Your Problem Type

Is your machine crashing? Is it running slowly or returning unpredictable results? These are the kind of symptoms that indicate a problem with WebLogic Server Virtual Edition. Being able to identify what kind of problem you are experiencing will help you know what kind of information you need to include when you open the trouble report.

Verify That You're Running a Supported Configuration

Before submitting a bug, verify that the environment where the problem arises is a supported configuration. Please see Verifying That Your Environment Supports WLS-VE.

Collect Enough Information to Define Your Issue

In addition to testing with the latest update release, use the following guidelines to prepare for submitting a trouble report:

  1. Collect as much relevant data as possible. For example, generate a thread-dump in the case of a deadlock, or locate the core file (where applicable) and hs_err file in the case of a crash. In all cases it is important to document the environment and the actions performed just before the problem is encountered.
  2. Where applicable, try to restore the original state and reproduce the problem using the documented steps. This helps to determine if the problem is reproducible or an intermittent issue.
  3. If the issue is reproducible, try to narrow the problem. In some cases, a bug can be demonstrated with a small standalone test case. Bugs demonstrated by small test cases will typically be easy to diagnose when compared to test cases that consists of a large complex application.
  4. Search the bug database to see if the bug, or similar bugs, have been reported. If the bug has already been reported, the bug report may have further information. For example, if the bug has already been fixed it will indicate the release that the bug was fixed in. The bug may also contain information such as a work around or include comments in the evaluation that explain, in further detail, the circumstances that cause the bug to arise.

If you conclude that the bug has not already been reported, then it is important to submit a new bug.


  Back to Top       Previous  Next