Debugging File System Locks

If an OCFS2 volume hangs, you can use the following procedure to find out which locks are busy and which processes are likely to be holding the locks.

In the following procedure, the Lockres value refers to the lock name that's used by DLM, which is a combination of a lock-type identifier, inode number, and a generation number. The following table lists the various lock types and their associated identifier.

Table 6-1 DLM Lock Types

Identifier Lock Type

D

File data

M

Metadata

R

Rename

S

Superblock

W

Read-write

  1. Mount the debug file system.

    Mount the debug file system using the following command:

    sudo mount -t debugfs debugfs /sys/kernel/debug
  2. View the lock statuses.

    Dump the lock statuses for the file system device, which is /dev/sdx1 in the following example:

    echo "fs_locks" | sudo debugfs.ocfs2 /dev/sdx1 | sudo tee /tmp/fslocks
    Lockres: M00000000000006672078b84822 Mode: Protected Read
    ...
  3. Retrieve the inode and generation number.

    Use the Lockres value from the previous output to obtain the inode number and generation number for the lock.

    sudo echo "stat lockres-value" | sudo debugfs.ocfs2 -n /dev/sdx1

    For example, for the Locres value M00000000000006672078b84822 from the previous step, the command output might resemble the following:

    Inode: 419616   Mode: 0666   Generation: 2025343010 (0x78b84822)
    ... 
  4. Look up the file system object.

    Relate the file system object to the inode number from the previous output:

    sudo echo "locate inode" | sudo debugfs.ocfs2 -n /dev/sdx1

    For example, for the Inode value 419616 from the previous step, the command output might resemble the following:

    419616 /linux-2.6.15/arch/i386/kernel/semaphore.c
  5. Obtain the lock names for the file system object.

    Obtain the names of the locks that are associated with the file system object, which in the previous step's output is /linux-2.6.15/arch/i386/kernel/semaphore.c. Thus, you would type:

    sudo echo "encode /linux-2.6.15/arch/i386/kernel/semaphore.c" | sudo debugfs.ocfs2 -n /dev/sdx1
    M00000000000006672078b84822 D00000000000006672078b84822 W00000000000006672078b84822  

    In the previous example, a metadata lock, a file data lock, and a read-write lock are associated with the file system object.

  6. Retrieve the DLM domain.

    Establish the DLM domain of the file system by running the following command:

    sudo echo "stats" | sudo debugfs.ocfs2 -n /dev/sdX1 | grep UUID: | while read a b ; do echo $b ; done
    82DA8137A49A47E4B187F74E09FBBB4B  
  7. Enable debugging.

    Using the values of the DLM domain and the lock name, run the following command to let you debug that DLM:

    sudo echo R 82DA8137A49A47E4B187F74E09FBBB4B M00000000000006672078b84822 | sudo tee /proc/fs/ocfs2_dlm/debug
  8. View the debug messages.

    Examine the debug messages by using the dmesg | tail command, for example:

    struct dlm_ctxt: 82DA8137A49A47E4B187F74E09FBBB4B, node=3, key=965960985
      lockres: M00000000000006672078b84822, owner=1, state=0 last used: 0, 
      on purge list: no granted queue:
          type=3, conv=-1, node=3, cookie=11673330234144325711, ast=(empty=y,pend=n), 
          bast=(empty=y,pend=n) 
        converting queue:
        blocked queue:  

    The DLM has three lock modes: no lock (type=0), protected read (type=3), and exclusive (type=5). In the previous example, the lock is owned by node 1 (owner=1) and node 3 has been granted a protected-read lock on the file-system resource.

  9. Identify sleeping processes.

    Use the following command to search for processes that are in an uninterruptable sleep state, which are indicated by the D flag in the STAT column:

    ps -e -o pid,stat,comm,wchan=WIDE-WCHAN-COLUMN

    Note that at least one of the processes that are in the uninterruptable sleep state is responsible for the hang on the other node.

If a process is waiting for I/O to complete, the problem could be anywhere in the I/O subsystem, from the block device layer through the drivers, to the disk array. If the hang concerns a user lock (flock()), the problem could lie with the application. If possible, end the process holding the lock. If the hang is because of lack of memory or fragmented memory, you can free up memory by ending nonessential processes. The most immediate solution is to reset the node that's holding the lock. The DLM recovery process can then clear all the locks owned by the dead node, enabling the cluster to continue to operate.