Diagnostics and Troubleshooting

This section describes what to do when things go wrong. It describes how to deal with problems that occur in both the server and the JVM and problems endemic to WLS-VE specifically. It also describes how to obtain information about your WLS-VE instance and provide that information to BEA's Support organization. This chapter includes information on the following subjects:

Troubleshooting WLS-VE Problems

This section provides information you will find helpful in solving problems that might occur with WLS-VE. Generally, you handle WebLogic Server and LiquidVM (the BEA JRockit component) problems the same way you would for their non-virtualized versions. You should follow BEA Support's instructions for information collection, augmented with those in Reporting a Problem to BEA Support. For BEA JRockit, you can use the standard tools available with BEA JRockit Mission Control—such as the JRockit Runtime Analyzer and Memory Leak Detector—to help you diagnose problems and collect relevant information about runtime activity.

Troubleshooting Common WLS-VE Issues

Problems with WLS-VE not specifically associated with WebLogic Server or with LiquidVM, can probably be traced to configuration errors. This section will help you identify the problem and figure out what caused it and how to resolve it. If you cannot find the solution here, collect the necessary information about your system, as described in Reporting a Problem to BEA Support, and open a case with BEA Support.

"Could not find the disk" Error

Symptom: When you launch your instance and you get the following output in your OS console window:

Starting WLS-MyServer. connect...configure...create...
Could not find the disk: [storage2] wlsve/isoName.iso

Problem: When the virtual machine was created, VMware could not find the ISO image that you need to boot up WLS-VE.

Check the bea.lvm.info file in your home directory that it points to that same location. You can do this either by manually editing the bea.lvm.info file in your favorite editor or by rerunning the LiquidVM configuration wizard (tools\virtualization\control_1.0\bin\lvm_configwizard.cmd or .sh), as described in User Access Credentials.

Select VM Host.
Select the Configuration tab and note the Datastore Name.
Select Browse Datastore and confirm that wlsve921.iso is available in your datastore.

"Could not lookup NFS server" Error

Symptom: When you launch your instance, you get the following output in your OS console window:

WARNING: Could not lookup NFS server foobar
         Could be ok if LiquidVM network topology is different.
To ignore warnings. Set environment variable LVM_ACCEPT_WARNING=true.
LiquidVM start-up aborted.

Problem: Probably caused by a non-existent NFS server being provided or a different network topology on the launching machine where your WLS-VE instance will start,. You can ignore the warning.

Solution: Verify that the server exists then check DOMAIN_MOUNT, BEA_MOUNT, TMP_MOUNT in the start-up script and ensure that the server name is spelled correctly.

This check is done on the launching OS; the check could be wrong, because LiquidVM will start from another machine that might have access to other pieces of the network. If this is the case, to ignore the warning, set the environment variable LVM_ACCEPT_WARNING=true. This will preempt the check.

"Could not ping NFS server" Error

Symptom: When you launch your instance, you get the following output in your OS console window:

WARNING: Could not ping NFS server named myserver
    Maybe this is ok - because network topology may be different for LiquidVM To ignore warnings. Set environment variable LVM_ACCEPT_WARNING=true.

Problem: Probably caused by a non-existent (or misspelled) NFS server being provided or a different network topology on the launching machine where your WLS-VE instance will start,. You can ignore the warning.

Solution: Verify that the server exists then check DOMAIN_MOUNT, BEA_MOUNT, TMP_MOUNT in the start-up script and ensure that the server name is spelled correctly.

This check is done on the launching OS; the check could be wrong, because LiquidVM will start from another machine that might have access to other pieces of the network. If this is the case, to ignore the warning set the environment variable LVM_ACCEPT_WARNING=true. This will preempt the check.

"(Mount Daemon) not registered" Error

Symptom: When you launch your instance, you get the following output in your OS console window and/or in the VM's Console tab in VirtualCenter:

Baremetal hostname: "172.23.82.203"  IP address: 172.23.82.203
000000 [rpcconn   WRN] program 100005 (Mount Daemon) not registered
000001 [rpcconn   WRN] Is Mount Deamoen (needed for NFS) running on host nfserver.foo.com?
000002 [rpcconn   WRN] pmapGetPort returned: 22
000003 [nfsconn   WRN] Error setting up connection to mountd: 22
000004 [nfs       WRN] Failed to mount: snfsserver.foo.com:/share/Temp

Check your DOMAIN_MOUNT, BEA_HOME_MOUNT, TMP_MOUNT to verify that you actually picked servers that you know have a NFS-server service running. If this is correct:
Check your static IP address, your gateway (LVM_GATEWAY) and your netmask (LVM_NETMASK) and verify that they are correct.

"Error stating root" Error

Symptom: When you launch your instance you get the following output in your OS console window and/or in the VM's Console tab in VirtualCenter:

Starting WLS-AdminServer. connect...configure...start...booting...
VM-log:
Baremetal hostname: "172.23.82.203"  IP address: 172.23.82.203
000000 [nfs       WRN] nfsReadSuperBlock: Error stating root (116)
000001 [vfs       WRN] Error reading super block
000002 [vfsx      WRN] Failed to read super block
000003 [mount     WRN] Could not mount nfs at /domain (-1)
000004 [mount     WRN] Failed to process mount points

Problem: The user credentials you have provided for the NFS domain directory do not have the necessary rights to create files in the domain directory. You have specified a user that lacks rights to access this NFS-share

Solution: Check your DOMAIN_MOUNT and verify that the uid and gid are correct.

"Failed to open conslog" Error

Symptom: When you launch your instance, you get the following output in your OS console window and/or in the VM's Console tab in VirtualCenter:

Starting WLS-AdminServer. connect...configure...start...booting...
VM-log:
Baremetal hostname: "172.23.82.203"  IP address: 172.23.82.203
000000 [console   WRN] Failed to open conslog [WLS-AdminServer.log] error: Permission denied

Problem: The user credentials you have provided for the NFS domain directory do not have the necessary rights to create files in the domain directory. Either you have specified the wrong user or the access rights on the domain directory are wrong.

Solution: Check your domain mount and verify that the uid and gid are correct. Check that the user or group has the necessary rights to the domain directory on the NFS server (use chmod, chown, and chgrp to change the ownership and rights, as appropriate).

The Server Shuts Down Soon After Startup I

Symptom: The server shuts down soon after startup and a log file named WLS-<servername>.log appears in your domain directory. In that file, you find the following:

<22-Mar-2007 19:46:36 o'clock CET> <Info> <Security> <BEA-090065> <Getting boot identity from user.> 
Enter username to boot WebLogic server:000257 [procfs    WRN] Implement console read
<22-Mar-2007 19:46:36 o'clock CET> <Error> <Security> <BEA-090782> <Server is Running in Production Mode and Native Library(terminalio) to read the password securely from commandline is not found.> 
<22-Mar-2007 19:46:36 o'clock CET> <Notice> <WebLogicServer> <BEA-000388> <JVM called WLS shutdown hook. The server will force shutdown now> 
<22-Mar-2007 19:46:36 o'clock CET> <Alert> <WebLogicServer> <BEA-000396> <Server shutdown has been requested by <WLS Kernel>> 
<22-Mar-2007 19:46:36 o'clock CET> <Notice> <WebLogicServer> <BEA-000365> <Server state changed to FORCE_SHUTTING_DOWN>

Problem: You didn't provide a user name and password either in the security directory of the Administration Server's root directory or in the script WLS_USERNAME, WLS_PASSWORD. Keep in mind that WLS-VE does not support normal keyboard input, so you cannot enter a username and password on the keyboard.

The Server Shuts Down Soon After Startup II

Symptom: The server shuts down soon after startup and a log file named WLS-<servername>.log appears in your domain directory. In that file, you find the following:

<22-Mar-2007 18:23:39 o'clock CET> <Critical> <WebLogicServer> <BEA-000362> <Server failed. Reason: 
Unable to start WebLogic Server!
Exception occurred while reading the license file.
Please make sure you have a valid license.bea
license...

Problem: The BEA_HOME_MOUNT points to a directory that is not a BEA home. For some reason, you don't have a file named license.bea in the directory specified.

Solution: Ensure that your BEA_HOME_MOUNT points to the right directory (that is, a BEA home directory).

"looking up myserver" Error

000000 [rpcconn   WRN] Error looking up myserver
000001 [rpcconn   WRN] Error getting address for host myserver
000002 [nfsconn   WRN] Error setting up connection to mountd: 22
000003 [nfs       WRN] Failed to mount myserver:/share/Temp

Problem: You have specified only the short name for the NFS server; that is, you didn't specify a domain name, too (for example, you just specified myserver, instead of myserver.foo.com). When the WLS-VE instance starts up, it doesn't belong to your domain and hence it will not know to ask for myserver.foo.com when you say myserver. This causes the name lookup to fail.

Solution: Either specify the NFS server with its fully-qualified name (myserver.foo.com) or specify the NFS server using its IP address.

"multiple gid provided only one allowed" Error

Symptom: When you launch your instance, you get the following output in your OS console window:

NFS syntax should be on the form: nfsserver:/nfs/path,uid=#,gid=#
ERROR: Illegal nfs path:
       myserver.bea.com:/share/Temp/smith/dom,gid=506,gid=502
       multiple gid provided only one allowed
LiquidVM start-up aborted.

Problem: You have typed the gid twice; you probably meant for the first one to be uid and not gid.

Solution: Remove the wrong gid from the start-up script. You might also want to add a uid=# too, if that was what you intended.

"uid=# must be a number" Error

Symptom: When you launch your instance, you get the following output in your OS console window:

NFS syntax should be on the form: nfsserver:/nfs/path,uid=#,gid=#
ERROR: Illegal nfs path:
       myserver.bea.com:/share/Temp/jsmith/dom2,uid=jsmith,gid=502
       uid=# # must be a number but is : jsmith
LiquidVM start-up aborted.

Problem: You have specified a username instead of a uid for user credentials to NFS. NFS only understands a uid number.

Solution: Change your *_MOUNT in the start-up script to use a uid number instead of the username string.

"netSend failed: -3" Error

000000 [rpcconn    WRN] netSend failed: -3
000000 [rpcconn    WRN] Rpc call failed
000000 [rpc        WRN] Rpc request failed: 3
000000 [rpc        WRN] rpcDoReqeust returned 3
000000 [rpc        WRN] rpcCall 3 returned 8549398

Solution: In the start-up script, check your static IP address, your gateway, and your netmask and verify that they are correct. If they aren't, obtain the correct information and enter it in the respective property.

"Configured IP [...] in use by MAC" Error

Solution: Another running VM might be using the same IP address. Do the following:

If neither of the above is the case, someone else is using your IP address. Since finding out who that might be is difficult, contact your system administrator to obtain another IP address.

Troubleshooting WebLogic Server Issues

Server-related problems that can befall WLS-VE are the same sort of problems you might encounter running non-virtualized WebLogic Server. This section provides an overview of the kinds of WebLogic Server problems you should watch for when running WLS-VE. It includes information on these subjects:

Performance Issues

Often, a problem with WebLogic Server is the result of poor tuning. For example, pool sizes (such as pools for JDBC connections, Stateless Session EJBs, and MDBs) that don't maximize concurrency for the expected thread utilization can adversely affect performance. Similarly, applications that handle large amounts of data per request will experience a boost in performance if the chunk size—that is, a unit of memory that the WebLogic Server network layer uses to read data from and write data to sockets—the size on both the client and server sides can be increased, a process called tuning the chunk size.

Server Failure

A server instance can fail and different events can lead to this failure. Often one failure condition leads to another. Loss of power, hardware malfunction, operating system crashes, network partitions, and unexpected application behavior can all contribute to the failure of a server instance. Even in a clustered environment, server instances may fail periodically and you must be prepared for the recovery process. See Avoiding and Recovering From Server Failure in Managing Server Startup and Shutdown for information on dealing with server failure.

Clustering Issues

A number of cluster problems can affect the performance of WebLogic Server. These problems can occur for many reasons, including licensing and versioning errors, multicast addressing problems, errors or misspellings in start-up commands, and even a poorly-tuned memory management systems. You can find guidelines for troubleshooting cluster problems in Troubleshoot Common Problems in Using WebLogic Server Clusters.

Other WebLogic Server Problems

Other, non-specific problems can also occur with WebLogic Server. When these problems occur, they usually generate an error message with an associated error code. The Index of Messages by Message Range provides descriptions, possible causes, and corrective actions for all WebLogic Server error conditions.

Troubleshooting LiquidVM Issues

Problems that don't originate with WebLogic Server may occur in LiquidVM and are typical to the kinds of problems you might encounter in non-virtualized JVMs. Problems such as these are documented in the BEA JRockit Diagnostics Guide (BEA JRockit is the JVM component of LiquidVM). This document provides information for either resolving the problem yourself or mining the necessary information required to open a case with BEA Support.

System crashes occur when the entire system shuts down involuntarily and usually without warning. See Troubleshooting BEA JRockit Crashes.
System freezes occur when the application stops answering requests but the process is still there. See BEA JRockit is Freezing.
Slow startups usually occurs when BEA JRockit's optimizing compiler must run extensively to ensure that the most efficient code possible is compiled. See BEA JRockit Starts Slowly.
Poor performance usually occurs when your application experiences poor throughput. This usually indicates that the memory management system has not been tuned for optimal performance. See Too Few Transactions are Executing Per Minute.
Occasional slow response times usually indicate that transactions are taking too long to execute, a bottleneck most often caused by garbage collection pause times lasting too long. See Individual Transactions are Taking Too Long.
Performance degrading after the application has been running is characterized by your application, although working fine early in its run but, after a while reporting the wrong results, throwing exceptions where it shouldn't, or it simply crashing or hanging at roughly the same time each time you run. See BEA JRockit's Performance is Degrading Over Time.

Note that in most UNIX operating systems there is a file descriptor limit that limits the number of files and sockets you can have open. LiquidVM does not have such limits so there is no need (and no way) to set a file descriptor limit.

Handling Suspend Files

When WLS-VE crashes, the VM goes into a state of suspension. A pause button will appear on the VirtualCenter and information about the crash will be written to the console. When a suspend file is created, do the following:

tar-gzip the suspend file. You will find it on the VM's home directory; it will have a filetype of .vmss.
Copy the tgz file from the ESX server to your normal environment (for example, your My Documents/ folder).
Upload the tgz file to BEA Support.

Be aware that you might not realize that your machine has actually crashed when it suspends. You should avoid the temptation to resume execution, as you might lose critical information that would be helpful in diagnosing the problems causing the crash. You should also be aware that suspend files are huge and might not be easy for you to copy from the ESX server.

Displaying Version Information

A critical piece of information that Support will need to help diagnose any problems you report to them is the version number. You can find this number in the file LVM_VERSION, which is located in the tools/virtualization/control_1.0/ directory of your LVM_HOME directory:

<BEA_HOME>/<WLSVE_HOME>/<LVM_HOME>/tools/virtualization/control_1.0/LVM_VERSION

Reporting a Problem to BEA Support

If you determine that you need to file a trouble report, this section discusses what you need to do before opening the case to ensure that you supply the support personnel assigned to your issue as complete picture of what is wrong as possible. The more information you can provide, the more quickly will the support staff be able to resolve your issue. This section includes information on these subjects:

Trouble Reporting Process Overview

When you encounter a problem with WebLogic Server Virtual Edition and can't resolve it using the information provided in the relevant BEA documentation, you need to collect the information that best describes your problem and open a case with BEA Support. If you have a service agreement with BEA, the normal process is to contact your Level 1 service provider, who will make the initial attempts to correct the problem. If the case cannot be solved by the Level 1 staff, it is escalated to the Level 2 staff, who will draw on their particular expertise to get your JVM running again. For serious problems, the issue will be handled by the Level 3 staff (the WebLogic Server Virtual Edition developers)

Identify Your Problem Type

Is your machine crashing? Is it running slowly or returning unpredictable results? These are the kind of symptoms that indicate a problem with WebLogic Server Virtual Edition. Being able to identify what kind of problem you are experiencing will help you know what kind of information you need to include when you open the trouble report.

Verify That You're Running a Supported Configuration

Collect Enough Information to Define Your Issue

In addition to testing with the latest update release, use the following guidelines to prepare for submitting a trouble report:

Collect as much relevant data as possible. For example, generate a thread-dump in the case of a deadlock, or locate the core file (where applicable) and hs_err file in the case of a crash. In all cases it is important to document the environment and the actions performed just before the problem is encountered.
Where applicable, try to restore the original state and reproduce the problem using the documented steps. This helps to determine if the problem is reproducible or an intermittent issue.
If the issue is reproducible, try to narrow the problem. In some cases, a bug can be demonstrated with a small standalone test case. Bugs demonstrated by small test cases will typically be easy to diagnose when compared to test cases that consists of a large complex application.
Search the bug database to see if the bug, or similar bugs, have been reported. If the bug has already been reported, the bug report may have further information. For example, if the bug has already been fixed it will indicate the release that the bug was fixed in. The bug may also contain information such as a work around or include comments in the evaluation that explain, in further detail, the circumstances that cause the bug to arise.

If you conclude that the bug has not already been reported, then it is important to submit a new bug.

Installation and Configuration Guide

Diagnostics and Troubleshooting

Troubleshooting WLS-VE Problems

Troubleshooting Common WLS-VE Issues

"Could not find the disk" Error

"Could not lookup NFS server" Error

"Could not ping NFS server" Error

"(Mount Daemon) not registered" Error

"Error stating root" Error

"Failed to open conslog" Error

The Server Shuts Down Soon After Startup I

The Server Shuts Down Soon After Startup II

"looking up myserver" Error

"multiple gid provided only one allowed" Error

"uid=# must be a number" Error

"netSend failed: -3" Error

"Configured IP [...] in use by MAC" Error

Troubleshooting WebLogic Server Issues

Performance Issues

Server Failure

Clustering Issues

Other WebLogic Server Problems

Troubleshooting LiquidVM Issues

Handling Suspend Files

Displaying Version Information

Reporting a Problem to BEA Support

Trouble Reporting Process Overview

Identify Your Problem Type

Verify That You're Running a Supported Configuration

Collect Enough Information to Define Your Issue