Debugging File System Locks
If an OCFS2 volume hangs, you can use the following procedure to find out which locks are busy and which processes are likely to be holding the locks.
In the following procedure, the Lockres
value refers to the lock name that's
used by DLM, which is a combination of a lock-type identifier, inode number, and a
generation number. The following table lists the various lock types and their associated
identifier.
Table 6-1 DLM Lock Types
Identifier | Lock Type |
---|---|
|
File data |
|
Metadata |
|
Rename |
|
Superblock |
|
Read-write |
If a process is waiting for I/O to complete, the problem could be anywhere in the I/O subsystem,
from the block device layer through the drivers, to the disk array. If the hang concerns a
user lock (flock()
), the problem could lie with the application. If
possible, end the process holding the lock. If the hang is because of lack of memory or
fragmented memory, you can free up memory by ending nonessential processes. The most
immediate solution is to reset the node that's holding the lock. The DLM recovery process
can then clear all the locks owned by the dead node, enabling the cluster to continue to
operate.