Parameters That Control Kernel Panics

The following parameters control the circumstances under which a kernel panic can occur.

Parameter Description
kernel.hung_task_panic

If set to 1, the kernel panics if any kernel or user thread sleeps in the TASK_UNINTERRUPTIBLE state (D state) for more than kernel.hung_task_timeout_secs seconds. A process remains in D state while waiting for I/O to complete. You can't stop or interrupt a process in this state.

The default value is 0, which disables the panic.

Tip:

To diagnose a hung thread, you can examine /proc/PID/stack, which displays the kernel stack for both kernel and user threads.

kernel.hung_task_timeout_secs

Specifies how long a user or kernel thread can remain in D state before a warning message is generated or the kernel panics, if the value of kernel.hung_task_panic is 1. The default value is 120 seconds. A value of 0 disables the timeout.

kernel.nmi_watchdog

If set to 1 (default), enables the nonmaskable interrupt (NMI) watchdog thread in the kernel. To use the NMI switch or the OProfile system profiler to generate an undefined NMI, set the value of kernel.nmi_watchdog to 0.

kernel.panic

Specifies the number of seconds after a panic before a system automatically resets itself.

If the value is 0, which is the default value, the system becomes suspended, and you can collect detailed information about the panic for troubleshooting.

To enable automatic reset, set a nonzero value. If you require a memory image (vmcore), leave enough time for Kdump to create this image. The suggested value is 30 seconds, although large systems require a longer time.

kernel.panic_on_io_nmi

If set to 0 (default), the system tries to continue operations if the kernel detects an I/O channel check (IOCHK) NMI that typically indicates a uncorrectable hardware error. If set to 1, the system panics.

kernel.panic_on_oops

If set to 0, the system tries to continue operations if the kernel detects an oops or BUG condition. If set to 1 (default), the system delays a few seconds to give the kernel log daemon, klogd, time to record the oops output before the panic occurs.

In an OCFS2 cluster. set the value to 1 to specify that a system must panic if a kernel oops occurs. If a kernel thread required for cluster operation fails, the system must reset itself. Otherwise, another node might not detect whether a node is slow to respond or unable to respond, causing cluster operations to halt.

kernel.panic_on_unrecovered_nmi

If set to 0 (default), the system tries to continue operations if the kernel detects an NMI that might indicate an uncorrectable parity or ECC memory error. If set to 1, the system panics.

kernel.softlockup_panic

If set to 0 (default), the system tries to continue operations if the kernel detects a soft-lockup error that causes the NMI watchdog thread to fail to update its timestamp for more than twice the value of kernel.watchdog_thresh seconds. If set to 1, the system panics.

kernel.unknown_nmi_panic

If set to 1, the system panics if the kernel detects an undefined NMI. You can generate an undefined NMI by manually pressing an NMI switch. As the NMI watchdog thread also uses the undefined NMI, set the value of kernel.unknown_nmi_panic to 0 if you set kernel.nmi_watchdog to 1.

kernel.watchdog_thresh

Specifies the interval between generating an NMI performance monitoring interrupt that the kernel uses to check for hard-lockup and soft-lockup errors. A hard-lockup error is assumed if a CPU is unresponsive to the interrupt for more than kernel.watchdog_thresh seconds. The default value is 10 seconds. A value of 0 disables the detection of lockup errors.

vm.panic_on_oom

If set to 0 (default), the kernel’s OOM-killer scans through the entire task list and stops a memory-hogging process to avoid a panic. If set to 1, the kernel panics but can survive under certain conditions. If a process limits allocations to certain nodes by using memory policies or cpusets, and those nodes reach memory exhaustion status, the OOM-killer can stop one process. No panic occurs in this case because other nodes’ memory might be free and the system as a whole might not yet be out of memory. If set to 2, the kernel always panics when an OOM condition occurs. Settings of 1 and 2 are for intended for use with clusters, depending on the defined failover policy.