Working With Crash Dumps

7 Working With Crash Dumps

WARNING:

Oracle Linux 7 is now in Extended Support. See Oracle Linux Extended Support and Oracle Open Source Support Policies for more information.

Migrate applications and data to Oracle Linux 8 or Oracle Linux 9 as soon as possible.

This chapter describes how to configure a system to create a memory image in the event of a system crash, and how to use the crash debugger to analyze the memory image in a crash dump or for a live system.

About Kdump

Kdump is the Linux kernel crash-dump mechanism. Oracle recommends that you enable the Kdump feature. In the event of a system crash, Kdump creates a memory image (vmcore) that can help in determining the cause of the crash. Enabling Kdump requires you to reserve a portion of system memory for exclusive use by Kdump. This memory is unavailable for other uses.

Kdump uses kexec to boot into a second kernel whenever the system crashes. kexec is a fast-boot mechanism that enables a Linux kernel to boot from inside the context of a kernel that is already running without passing through the bootloader stage.

Configuring and Using Kdump

During installation, you are given the option of enabling Kdump and specifying the amount of memory to reserve for it. If you prefer, you can enable kdump at a later time as described in this section.

If the kexec-tools and system-config-kdump packages are not already installed on your system, use yum to install them.

To enable Kdump by using the Kernel Dump Configuration GUI.

Enter the following command:
```
sudo system-config-kdump
```
The Kernel Dump Configuration GUI starts. If Kdump is currently disabled, the green Enable button is selectable and the Disable button is greyed out.
Click Enable to enable Kdump.
You can select the following settings tags to adjust the configuration of Kdump.

Basic Settings

Specify the amount of memory to reserve for Kdump. The default setting is 128 MB.

Target Settings

Specify the target location for the vmcore dump file on a locally accessible file system, to a raw disk device, or to a remote directory using NFS or SSH over IPv4. The default location is /var/crash.

You cannot save a dump file on an eCryptfs file system, on remote directories that are NFS mounted on the rootfs file system, or on remote directories that access require the use of IPv6, SMB, CIFS, FCoE, wireless NICs, multipathed storage, or iSCSI over software initiators to access them.

Filtering Settings

Specify the type of data to include in or exclude from the dump file. Selecting or deselecting the options alters the value of the argument that Kdump specifies to the -d option of the core collector program, makedumpfile.

Expert Settings

Choose which kernel to use, edit the command line options that are passed to the kernel and the core collector program, choose the default action if the dump fails, and modify the options to the core collector program, makedumpfile.

The Unbreakable Enterprise Kernel supports the use of the crashkernel=auto setting for UEK Release 3 Quarterly Update 1 and later. If you use the crashkernel=auto setting, the output of the dmesg command shows crashkernel=XM@0M, which is normal. The setting actually reserves 128 MB plus 64 MB for each terabyte of physical memory.

Note:

You cannot configure crashkernel=auto for Xen or for the UEK prior to UEK Release 3 Quarterly Update 1. Only standard settings such as crashkernel=128M@48M are supported. For systems with more than 128 GB of memory, the recommended setting is crashkernel=512M@64M.

You can select one of five default actions should the dump fail:

mount rootfs and run /sbin/init

Mount the root file system and run init. The /etc/init.d/kdump script attempts to save the dump to /var/crash, which requires a large amount of memory to be reserved.

reboot

Reboot the system, losing the vmcore. This is the default action.

shell

Enter a shell session inside the initramfs so that you can attempt to record the core. To reboot the system, exit the shell.

halt

Halt the system.

poweroff

Power down the system.

Click Help for more information on these settings.
Click Apply to save your changes. The GUI displays a popup message to remind you that you must reboot the system for the changes to take effect.
Click OK to dismiss the popup messages.
Select File > Quit.
Reboot the system at a suitable time.

Files Used by Kdump

The Kernel Dump Configuration GUI modifies the following files:

File	Description
`/boot/grub2/grub.cfg`	Appends the `crashkernel` option to the kernel line to specify the amount of reserved memory and any offset value.
`/etc/kdump.conf`	Sets the location where the dump file can be written, the filtering level for the `makedumpfile` command, and the default behavior to take if the dump fails. See the comments in the file for information about the supported parameters.

If you edit these files, you must reboot the system for the changes to take effect.

For more information, see the kdump.conf(5) manual page.

Using Kdump with OCFS2

By default, a fenced node in an OCFS2 cluster restarts instead of panicking so that it can quickly rejoin the cluster. If the reason for the restart is not apparent, you can change the node's behavior so that it panics and generates a vmcore for analysis.

To configure a node to panic when it next fences, run the following command on the node after the cluster starts:

sudo echo panic > /sys/kernel/config/cluster/cluster_name/fence_method

In the previous command, cluster_name is the name of the cluster. To set the value after each reboot of the system, add this line to /etc/rc.local. To restore the default behavior, set the value of fence_method to reset instead of panic and remove the line from /etc/rc.local.

For more information, see the Oracle Cluster File System Version 2 chapter in Oracle Linux 7: Managing File Systems .

Using the crash Debugger

The crash command enables you to analyze the state of an Oracle Linux system while it is running; or, the state of a core dump resulting from a kernel crash. The crash is merged with the GNU Debugger gdb to provide source code debugging capabilities.

Installing the crash Packages

To use the crash command, you must install the crash package and the appropriate debuginfo and debuginfo-common packages.

To install the required packages:

Install the latest version of the crash package:
```
sudo yum install crash
```
Download the appropriate debuginfo and debuginfo-common packages for the vmcore or kernel that you want to examine from https://oss.oracle.com/ol7/debuginfo/:
- If you want to examine the Unbreakable Enterprise Kernel that is running on the system, you would use command similar to the following to download the packages:
```
sudo export DLP="https://oss.oracle.com/ol7/debuginfo"
```
```
sudo wget ${DLP}/kernel-uek-debuginfo-`uname -r`.rpm
```
```
sudo wget ${DLP}/kernel-uek-debuginfo-common-`uname -r`.rpm
```
- If you want to examine the Red Hat Compatible Kernel (RHCK) that is running on the system, you would use commands similar to the following to download the packages:
```
sudo export DLP="https://oss.oracle.com/ol7/debuginfo"
```
```
sudo wget ${DLP}/kernel-debuginfo-`uname -r`.rpm
```
```
sudo wget ${DLP}/kernel-debuginfo-common-`uname -r`.rpm
```
- If you want to examine a vmcore file that relates to kernel that is different than the currently running kernel, download the appropriate debuginfo and debuginfo-common packages for the kernel that produced the vmcore, for example:
```
sudo export DLP="https://oss.oracle.com/ol7/debuginfo"
```
```
sudo wget ${DLP}/kernel-uek-debuginfo-4.1.12-112.14.15.el7uek.x86_64.rpm
```
```
sudo wget ${DLP}/kernel-uek-debuginfo-common-4.1.12-112.14.15.el7uek.x86_64.rpm
```
  Note:
  
  If the vmcore file was produced by Kdump, you can use the following command to determine the version:
```
sudo crash --osrelease /var/tmp/vmcore/2013-0211-2358.45-host03.28.core
```
```
2.6.39-200.24.1.el6uek.x86_64
```
Install the debuginfo and debuginfo-common packages.
```
sudo rpm -Uhv kernel-uek-debuginfo-4.1.12-112.14.15.el7uek.x86_64.rpm \
  kernel-uek-debuginfo-common-4.1.12-112.14.15.el7uek.x86_64.rpm
```
The vmlinux kernel object file, also known as the namelist file, that the crash command requires is installed in /usr/lib/debug/lib/modules/kernel_version/.

Running crash

Attention:

Running the crash command on a live system can cause data corruption or total system failure. Do not use the command to examine a production system, unless directed to do so by Oracle Support.

To examine the currently running kernel, run the following command:

sudo crash

To determine the version of the kernel that produced a vmcore file:

sudo crash --osrelease /var/tmp/vmcore/2013-0211-2358.45-host03.28.core

2.6.39-200.24.1.el6uek.x86_64

To examine a vmcore file, specify the path to the file as an argument, for example:

sudo crash /var/tmp/vmcore/2013-0211-2358.45-host03.28.core

Note:

The appropriate vmlinux file must exist in /usr/lib/debug/lib/modules/kernel_version/.

If the vmlinux file is located elsewhere, you will need to specify its path in the command first, followed by path to the vmcore file, for example:

sudo crash /var/tmp/namelist/vmlinux-host03.28 /var/tmp/vmcore/2013-0211-2358.45-host03.28.core

For example, the following crash output is from a vmcore file that was dumped after a system panic:

      KERNEL: /usr/lib/debug/lib/modules/2.6.39-200.24.1.el6uek.x86_64/vmlinux
    DUMPFILE: /var/tmp/vmcore/2013-0211-2358.45-host03.28.core
        CPUS: 2
        DATE: Fri Feb 11 16:55:41 2013
      UPTIME: 04:24;54
LOAD AVERAGE: 0.00, 0.01, 0.05
       TASKS: 84
    NODENAME: host03.mydom.com
     RELEASE: 2.6.39-200.24.1.el6uek.x86_64
     VERSION: #1 SMP Sat Jun 23 02:39:07 EDT 2012
     MACHINE: x86_64  (2992 MHz)
      MEMORY: 2 GB
       PANIC: "Oops: 0002" (check log for details)
         PID: 1696
     COMMAND: "insmod“
        TASK: c74de000
         CPU: 0
       STATE: TASK_RUNNING (PANIC)

crash>

In the previous example, the output includes the following information:

Number of CPUs
Load average over the last 1 minute, 5 minutes, and 15 minutes,
Number of tasks running
Amount of memory,
Panic string
Command that was executing at the time the dump was created

In the example, an attempt that was made by insmod to install a module resulted in an oops violation.

At the crash> prompt, you can type help or ? to display the available crash commands. Type help command to display more information for a specified command.

The crash commands can be grouped into the following groups, according to purpose:

Kernel Data Structure Analysis Commands: Display kernel text and data structures. See Kernel Data Structure Analysis Commands .
System state commands: Examine kernel subsystems on a system-wide or a per-task basis. See System State Commands.
Helper commands: Perform calculation, translation, and search functions. See Helper Commands
Session control commands: Control the crash session. See Session Control Commands

For more information, see the crash(8) manual page.

Kernel Data Structure Analysis Commands

The following crash commands takes advantage of gdb integration to display the following kernel data structures symbolically:

*

The pointer-to command can be used instead struct or union. The gdb module calls the appropriate function, for example:

crash> *buffer_head

struct buffer_head {
    long unsigned int b_state;
    struct buffer_head *b_this_page;
    struct page *b_page;
    sector_t b_blocknr;
    size_t b_size;
    char *b_data;
    struct block_device *b_bdev;
    bh_end_io_t *b_end_io;
    void *b_private;
    struct list_head b_assoc_buffers;
    struct address_space *b_assoc_map;
    atomic_t b_count;
}
SIZE: 104

dis

Disassembles source code instructions of a complete kernel function, from a specified address for a specified number of instructions, or from the beginning of a function up to a specified address, for example:

crash> dis fixup_irqs

0xffffffff81014486 <fixup_irqs>:        push   %rbp
0xffffffff81014487 <fixup_irqs+1>:      mov    %rsp,%rbp
0xffffffff8101448a <fixup_irqs+4>:      push   %r15
0xffffffff8101448c <fixup_irqs+6>:      push   %r14
0xffffffff8101448e <fixup_irqs+8>:      push   %r13
0xffffffff81014490 <fixup_irqs+10>:     push   %r12
0xffffffff81014492 <fixup_irqs+12>:     push   %rbx
0xffffffff81014493 <fixup_irqs+13>:     sub    $0x18,%rsp
0xffffffff81014497 <fixup_irqs+17>:     nopl   0x0(%rax,%rax,1)
...

p

Displays the contents of a kernel variable, for example:

crash> p init_mm

init_mm = $5 = {
  mmap = 0x0,
  mm_rb = {
    rb_node = 0x0
  },
  mmap_cache = 0x0,
  get_unmapped_area = 0,
  unmap_area = 0,
  mmap_base = 0,
  task_size = 0,
  cached_hole_size = 0,
  free_area_cache = 0,
  pgd = 0xffffffff81001000,
...

struct

Displays either a structure definition, or a formatted display of the contents of a structure at a specified address, for example:

crash> struct cpu

struct cpu {
    int node_id;
    int hotpluggable;
    struct sys_device sysdev;
}
SIZE: 88

sym

Translates a kernel symbol name to a kernel virtual address and section, or a kernel virtual address to a symbol name and section. You can also query (-q) the symbol list for all symbols containing a specified string or list (-l) all kernel symbols, for example:

crash> sym jiffies

ffffffff81b45880 (A) jiffies

crash> sym -q runstate

c590 (d) per_cpu__runstate
c5c0 (d) per_cpu__runstate_snapshot
ffffffff8100e563 (T) xen_setup_runstate_info

crash> sym -l

0 (D) __per_cpu_start
0 (D) per_cpu__irq_stack_union
4000 (D) per_cpu__gdt_page
5000 (d) per_cpu__exception_stacks
b000 (d) per_cpu__idt_desc
b010 (d) per_cpu__xen_cr0_value
b018 (D) per_cpu__xen_vcpu
b020 (D) per_cpu__xen_vcpu_info
b060 (d) per_cpu__mc_buffer
c570 (D) per_cpu__xen_mc_irq_flags
c578 (D) per_cpu__xen_cr3
c580 (D) per_cpu__xen_current_cr3
c590 (d) per_cpu__runstate
c5c0 (d) per_cpu__runstate_snapshot
...

union

Similar to the struct command, displaying kernel data types that are defined as unions instead of structures.

whatis

Displays the definition of structures, unions, typedefs or text or data symbols, for example:

crash> whatis linux_binfmt

struct linux_binfmt {
    struct list_head lh;
    struct module *module;
    int (*load_binary)(struct linux_binprm *, struct pt_regs *);
    int (*load_shlib)(struct file *);
    int (*core_dump)(long int, struct pt_regs *, struct file *, long unsigned int);
    long unsigned int min_coredump;
    int hasvdso;
}
SIZE: 64

System State Commands

The following commands display kernel subsystems, on a system-wide or per-task basis:

bt

Displays a kernel stack trace of the current context or of a specified PID or task. In the case of a dump that followed a kernel panic, the command traces the functions that were called leading up to the panic. For example:

crash> bt

PID: 10651  TASK: d1347000  CPU: 1   COMMAND: "insmod"
 #0 [d1547e44] die at c010785a
 #1 [d1547e54] do_invalid_op at c0107b2c
 #2 [d1547f0c] error_code (via invalid_op) at c01073dc
...

You can use the -l option to display the line number of the source file that corresponds to each function call in a stack trace.

crash> bt -l 1

PID: 1      TASK: ffff88007d032040  CPU: 1   COMMAND: "init"
 #0 [ffff88007d035878] schedule at ffffffff8144fdd4
    /usr/src/debug/kernel-2.6.32/linux-2.6.32.x86_64/kernel/sched.c: 3091
 #1 [ffff88007d035950] schedule_hrtimeout_range at ffffffff814508e4
    /usr/src/debug/kernel-2.6.32/linux-2.6.32.x86_64/arch/x86/include/asm/current.h: 14
 #2 [ffff88007d0359f0] poll_schedule_timeout at ffffffff811297d5
    /usr/src/debug/kernel-2.6.32/linux-2.6.32.x86_64/arch/x86/include/asm/current.h: 14
 #3 [ffff88007d035a10] do_select at ffffffff81129d72
    /usr/src/debug/kernel-2.6.32/linux-2.6.32.x86_64/fs/select.c: 500
 #4 [ffff88007d035d80] core_sys_select at ffffffff8112a04c
    /usr/src/debug/kernel-2.6.32/linux-2.6.32.x86_64/fs/select.c: 575
 #5 [ffff88007d035f10] sys_select at ffffffff8112a326
    /usr/src/debug/kernel-2.6.32/linux-2.6.32.x86_64/fs/select.c: 615
 #6 [ffff88007d035f80] system_call_fastpath at ffffffff81011cf2
    /usr/src/debug////////kernel-2.6.32/linux-2.6.32.x86_64/arch/x86/kernel/entry_64.S:
    488
    RIP: 00007fce20a66243  RSP: 00007fff552c1038  RFLAGS: 00000246
    RAX: 0000000000000017  RBX: ffffffff81011cf2  RCX: ffffffffffffffff
    RDX: 00007fff552c10e0  RSI: 00007fff552c1160  RDI: 000000000000000a
    RBP: 0000000000000000   R8: 0000000000000000   R9: 0000000000000200
    R10: 00007fff552c1060  R11: 0000000000000246  R12: 00007fff552c1160
    R13: 00007fff552c10e0  R14: 00007fff552c1060  R15: 00007fff552c121f
    ORIG_RAX: 0000000000000017  CS: 0033  SS: 002b

bt is probably the most useful crash command. It has a large number of options that you can use to examine a kernel stack trace. For more information, enter help bt.

dev

Displays character and block device data. The -d and -i options display disk I/O statistics and I/O port usage. For example:

crash> dev

CHRDEV    NAME                 CDEV        OPERATIONS
   1      mem            ffff88007d2a66c0  memory_fops
   4      /dev/vc/0      ffffffff821f6e30  console_fops
   4      tty            ffff88007a395008  tty_fops
   4      ttyS           ffff88007a3d3808  tty_fops
   5      /dev/tty       ffffffff821f48c0  tty_fops
...
BLKDEV    NAME                GENDISK      OPERATIONS
   1      ramdisk        ffff88007a3de800  brd_fops
 259      blkext              (none)
   7      loop           ffff880037809800  lo_fops
   8      sd             ffff8800378e9800  sd_fops
   9      md                  (none)
...

crash> dev -d

MAJOR GENDISK            NAME       REQUEST QUEUE      TOTAL ASYNC  SYNC   DRV
    8 0xffff8800378e9800 sda        0xffff880037b513e0    10     0    10     0
   11 0xffff880037cde400 sr0        0xffff880037b50b10     0     0     0     0
  253 0xffff880037902c00 dm-0       0xffff88003705b420     0     0     0     0
  253 0xffff880037d5f000 dm-1       0xffff88003705ab50     0     0     0     0

crash> dev -i

    RESOURCE        RANGE    NAME
ffffffff81a9e1e0  0000-ffff  PCI IO
ffffffff81a96e30  0000-001f  dma1
ffffffff81a96e68  0020-0021  pic1
ffffffff81a96ea0  0040-0043  timer0
ffffffff81a96ed8  0050-0053  timer1
ffffffff81a96f10  0060-0060  keyboard
...

files

Displays information about files that are open in the current context or in the context of a specific PID or task. For example:

crash> files 12916

PID: 12916  TASK: ffff8800276a2480  CPU: 0   COMMAND: "firefox"
ROOT: /    CWD: /home/guest
 FD       FILE            DENTRY           INODE       TYPE PATH
  0 ffff88001c57ab00 ffff88007ac399c0 ffff8800378b1b68 CHR  /null
  1 ffff88007b315cc0 ffff88006046f800 ffff8800604464f0 REG  /home/guest/.xsession-errors
  2 ffff88007b315cc0 ffff88006046f800 ffff8800604464f0 REG  /home/guest/.xsession-errors
  3 ffff88001c571a40 ffff88001d605980 ffff88001be45cd0 REG  /home/guest/.mozilla/firefox
  4 ffff88003faa7300 ffff880063d83440 ffff88001c315bc8 SOCK
  5 ffff88003f8f6a40 ffff88007b41f080 ffff88007aef0a48 FIFO
...

fuser

Displays the tasks that reference a specified file name or inode address as the current root directory, current working directory, open file descriptor, or that memory map the file. For example:

crash> fuser /home/guest

 PID         TASK        COMM             USAGE
 2990  ffff88007a2a8440  "gnome-session"  cwd
 3116  ffff8800372e6380  "gnome-session"  cwd
 3142  ffff88007c54e540  "metacity"       cwd
 3147  ffff88007aa1e440  "gnome-panel"    cwd
 3162  ffff88007a2d04c0  "nautilus"       cwd
 3185  ffff88007c00a140  "bluetooth-appl  cwd
...

irq

Displays interrupt request queue data. For example:

crash> irq 0

    IRQ: 0
 STATUS: 400000 ()
HANDLER: ffffffff81b3da30            <ioapic_chip>
         typename: ffffffff815cdaef  "IO-APIC"
          startup: ffffffff8102a513  <startup_ioapic_irq>
         shutdown: ffffffff810aef92  <default_shutdown>
           enable: ffffffff810aefe3  <default_enable>
          disable: ffffffff810aeecc  <default_disable>
              ack: ffffffff8102a43d  <ack_apic_edge>
             mask: ffffffff81029be1  <mask_IO_APIC_irq>
...

kmem

Displays the state of the kernel memory subsystems. For example:

crash> kmem -i

              PAGES        TOTAL      PERCENTAGE
 TOTAL MEM   512658         2 GB         ----
      FREE    20867      81.5 MB    4% of TOTAL MEM
      USED   491791       1.9 GB   95% of TOTAL MEM
    SHARED   176201     688.3 MB   34% of TOTAL MEM
   BUFFERS     8375      32.7 MB    1% of TOTAL MEM
    CACHED   229933     898.2 MB   44% of TOTAL MEM
      SLAB    39551     154.5 MB    7% of TOTAL MEM

TOTAL SWAP  1032190       3.9 GB         ----
 SWAP USED     2067       8.1 MB    0% of TOTAL SWAP
 SWAP FREE  1030123       3.9 GB   99% of TOTAL SWAP

kmem has a large number of options. For more information, enter help kmem.

log

Displays the kernel message buffer in chronological order. This is the same data that dmesg displays but the output can include messages that never made it to syslog or disk.

mach

Displays machine-specific information such as the cpuinfo structure and the physical memory map.

mod

Displays information about the currently installed kernel modules. The -s and -S options load debug data (if available) from the specified module object files to enable symbolic debugging.

mount

Displays information about currently mounted file systems.

net

Displays network-related information.

ps

Displays information about processes. For example:

crash> ps Xorg crash bash

   PID    PPID  CPU       TASK        ST  %MEM     VSZ    RSS  COMM
   2679   2677   0  ffff88007cbcc400  IN   4.0  215488  84880  Xorg
> 13362  11853   0  ffff88007b25a500  RU   6.9  277632 145612  crash
   3685   3683   1  ffff880058714580  IN   0.1  108464   1984  bash
  11853  11845   1  ffff88001c6826c0  IN   0.1  108464   1896  bash

pte

Translates a page table entry (PTE) to the physical page address and page bit settings. If the PTE refers to a swap location, the command displays the swap device and offset.

runq

Displays the list of tasks that are on the run queue of each CPU.

sig

Displays signal-handling information for the current context or for a specified PID or task.

swap

Displays information about the configured swap devices.

task

Displays the contents of the task_struct for the current context or for a specified PID or task.

timer

Displays the entries in the timer queue in chronological order.

vm

Displays the virtual memory data, including the addresses of mm_struct and the page directory, resident set size, and total virtual memory size for the current context or for a specified PID or task.

vtop

Translates a user or kernel virtual address to a physical address. The command also displays the PTE translation, vm_area_struct data for user virtual addresses, mem_map page data for a physical page, and the swap location or file location if the page is not mapped.

waitq

Displays tasks that are blocked on a specified wait queue.

Helper Commands

The following commands perform calculation, translation, and search functions:

ascii

Translates a hexadecimal value to ASCII. With no argument, the command displays an ASCII chart.

btop

Translates a hexadecimal address to a page number.

eval

Evaluates an expression and displays the result in hexadecimal, decimal, octal, and binary. For example:

crash> eval 4g / 0x100

hexadecimal: 1000000  (16MB)
    decimal: 16777216
      octal: 100000000
     binary: 0000000000000000000000000000000000000001000000000000000000000000

list

Displays the contents of a linked list of data objects, typically structures, starting at a specified address.

ptob

Translates a page number to its physical address (byte value).

ptov

Translates a physical address to a kernel virtual address.

search

Searches for a specified value in a specified range of user virtual memory, kernel virtual memory, or physical memory.

rd

Displays a selected range of user virtual memory, kernel virtual memory, or physical memory using the specified format.

wr

Writes a value to a memory location specified by symbol or address.

Attention:

To avoid data loss or data corruption, take great care when using the wr command.

Session Control Commands

The following commands control the crash session:

alias: Defines an alias for a command. With no argument, the command displays the current list of aliases.
exit, q, or quit: Ends the crash session.
extend: Loads or unloads the specified crash extension shared object libraries.
foreach: Executes the bt, files, net, task, set, sig, vm, or vtop command on multiple tasks.
gdb: Passes any arguments to the GNU Debugger for processing.
repeat: Repeats a command indefinitely until you type Ctrl-C. This command is only useful when you use crash to examine a live system.
set: Sets the context to a specified PID or task. With no argument, the command displays the current context.

Guidelines for Examining a Dump File

The steps for debugging a memory dump from a kernel crash vary widely, according to the problem.

The following are guidelines for some basic investigations that you can try:

Use bt to trace the functions that led to the kernel panic.
Use bt -a to trace the active task on each CPU. There is often a relationship between the panicking task on one CPU and the running tasks on the other CPUs. If the listed command is cpu_idle or swapper, no task was running on a CPU.
Use bt -l to display the line number of the source files corresponding to each function call in the stack trace.
Use kmem -i to obtain a summary of memory and swap usage. Look for a SLAB value greater than 500 MB and a SWAP USED value greater than 0%.
Use ps | grep UN to check for processes in the TASK_UNINTERRUPTIBLE state (D state), usually because they are waiting on I/O. Such processes contribute to the load average and cannot be killed.
Use files to display the files that a process had open.

You can shell indirection operators to save output from a command, to a file for later analysis, or to pipe the output through commands such as grep, as shown in the following example:

crash> foreach files > files.txt

crash> foreach bt | grep bash

PID: 3685   TASK: ffff880058714580  CPU: 1   COMMAND: "bash"
PID: 11853  TASK: ffff88001c6826c0  CPU: 0   COMMAND: "bash"