Redhat 8.8 system hang 挂起-创锋一号

某服务器操作系统挂起无反应，查询日志可以看到如下内容：

Nov 10 06:51:01 localhost kernel: Tainted: G OE --------- - - 4.18.0-477.10.1.el8_8.x86_64 #1
Nov 10 06:51:01 localhost kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 10 06:51:01 localhost kernel: task:filebeat state:D stack: 0 pid:13205 ppid: 1 flags:0x00000080
Nov 10 06:51:01 localhost kernel: Call Trace:
Nov 10 06:51:01 localhost kernel: __schedule+0x2d1/0x870
Nov 10 06:51:01 localhost kernel: ? call_function_interrupt+0xa/0x20
Nov 10 06:51:01 localhost kernel: schedule+0x55/0xf0
Nov 10 06:51:01 localhost kernel: io_schedule+0x12/0x40
Nov 10 06:51:01 localhost kernel: migration_entry_wait_on_locked+0x1ea/0x290
Nov 10 06:51:01 localhost kernel: ? filemap_fdatawait_keep_errors+0x50/0x50
Nov 10 06:51:01 localhost kernel: do_swap_page+0x5b0/0x710
Nov 10 06:51:01 localhost kernel: ? pmd_devmap_trans_unstable+0x2e/0x40
Nov 10 06:51:01 localhost kernel: ? handle_pte_fault+0x5d/0x880
Nov 10 06:51:01 localhost kernel: __handle_mm_fault+0x453/0x6c0
Nov 10 06:51:01 localhost kernel: handle_mm_fault+0xca/0x2a0
Nov 10 06:51:01 localhost kernel: __do_page_fault+0x1f0/0x450
Nov 10 06:51:01 localhost kernel: do_page_fault+0x37/0x130
Nov 10 06:51:01 localhost kernel: ? page_fault+0x8/0x30
Nov 10 06:51:01 localhost kernel: page_fault+0x1e/0x30
Nov 10 06:51:01 localhost kernel: RIP: 0033:0x15c297e
Nov 10 06:51:01 localhost kernel: Code: Unable to access opcode bytes at RIP 0x15c2954.

原因分析：

filebeat 进程在内存缺页（page fault）时，被阻塞在 swap / 页回迁（page migration）过程中，长期等待 IO，触发 hung task 检测。

本质不是 filebeat bug，而是：

内存压力过大
swap / 后端存储（磁盘 / SAN / 虚拟磁盘）IO 卡顿
或 NUMA / 内存页迁移被锁住导致内核线程无法继续调度。

Tainted: G OE --------- - -

标志	含义
G	内核是“干净的”（没有严重内核错误）
O	加载了 out-of-tree module（非官方内核模块）
E	出现过 error（通常是 driver 或硬件 error）

Red Hat 知识库文章（Solution 7014646）描述了RHEL 8.8 或 RHEL 8.6 EUS 上出现hung_task_timeout_secs+migration_entry_wait_on_locked的现象。

Resolution

Red Hat Enterprise Linux 8.8

The issue has been resolved withkernel-4.18.0-477.13.1.el8_8via errata: RHSA-2023:3349.

Raw

# rpm -qp kernel-4.18.0-477.13.1.el8_8.x86_64.rpm --changelog | grep 2188249 - migrate: grab the compound head in migration_entry_wait_on_locked (Nico Pache) [2189629 2188249]

Possible workaround:

Boot the system with an older kernel released before the RHEL8.8 (GA).

Root Cause

There is a regression bug since RHEL8 kernel commita598e2338f01 ("mm/migrate.c: rework migration_entry_wait() to not take a pageref")was introduced in RHEL-8.8 (GA).
The RHEL-8.8 kernel patch note explains how to resolve the issue:

Raw

commit f20af36bf5b7c25b94f73263629c202753d470d7 Author: Nico Pache <npache@redhat.com> Date: Mon Apr 24 14:18:16 2023 -0600 migrate: grab the compound head in migration_entry_wait_on_locked Y-Commit 22609d42496e64d42bbb79b6929d2b6c3b47fc2f RHEL commit a598e2338f01 ("mm/migrate.c: rework migration_entry_wait() to not take a pageref") differs from upstream due to the folio changes. In converting the function to work with the page struct I mistakenly forgot to make sure we are operating on the compound page head. Without this we are occasional splats of hung tasks due to the page never being woken up. Upstream-status: RHEL-only O-Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2188249 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2189629 Signed-off-by: Nico Pache <npache@redhat.com> diff --git a/mm/filemap.c b/mm/filemap.c index b8fa03e9a685..5b13e47f1fbe 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1401,7 +1401,7 @@ void migration_entry_wait_on_locked(swp_entry_t entry, pte_t *ptep, unsigned long pflags; bool in_thrashing; wait_queue_head_t *q; - struct page *page = pfn_swap_entry_to_page(entry); + struct page *page = compound_head(pfn_swap_entry_to_page(entry)); q = page_waitqueue(page); if (!PageUptodate(page) && PageWorkingset(page)) {

RHEL 8.8 内核在回迁 swap / migration 页时，没有确保操作的是 compound page 的 head page，导致 waitqueue 绑错 page，对应的唤醒永远不会发生，从而触发 hung task。

migration_entry_wait_on_locked do_swap_page hung_task_timeout_secs

正是这个 bug 的直接表现。

migration_entry_wait_on_locked 是干嘛的？

1️⃣ 典型调用路径

page_fault └─ handle_mm_fault └─ do_swap_page └─ migration_entry_wait_on_locked └─ io_schedule

2️⃣ migration_entry_wait_on_locked 的职责

当：

page 正在内存迁移（NUMA / compaction / swap in）
或 page 正在被 IO 填充
并且 page 被lock 住

👉 当前 task必须 sleep，等迁移完成后被唤醒。

关键点：

这个“等”和“唤醒”，是通过page 对应的 waitqueue完成的。

真正的 bug：compound page 用错了 page 指针

1️⃣ 什么是 compound page（重点）

在 RHEL 8.x（引入 folio 之前/期间）：

THP（Transparent Huge Page）
huge page
大文件 cache page

都会用compound page表示：

compound page: head page ← 唯一合法的“控制页” tail page tail page ...

📌规则：

锁（PageLocked）
waitqueue
唤醒
👉都只发生在 head page 上

2️⃣ 出问题的旧代码（buggy）

struct page *page = pfn_swap_entry_to_page(entry);

问题在这里 👆

pfn_swap_entry_to_page()
👉可能返回的是 tail page
而不是 compound head

3️⃣ 后果是什么？（非常致命）

接下来代码做了什么：

q = page_waitqueue(page); wait_event(q, ...);

但如果：

你在tail page 的 waitqueue 上睡眠
而真正的唤醒发生在 head page

👉结果：

迁移完成 ✔
head page 上 wake_up ✔
但 task 睡在 tail page 的 waitqueue ❌
永远等不到唤醒

这就解释了你看到的：

task state: D hung_task_timeout_secs

⚠️不是 IO 真慢，而是“唤醒丢失”

四、commit 是怎么修的？

1️⃣ 核心修复只有一行（但非常关键）

- struct page *page = pfn_swap_entry_to_page(entry); + struct page *page = compound_head(pfn_swap_entry_to_page(entry));

2️⃣ 这个改动的意义

强制保证：

无论 swap entry 指向的是：
- head page
- 还是 tail page
最终操作的一定是 compound head

从而保证：

waitqueue 正确
wake_up 一定能唤醒所有等待者

企业官网建设流程全解析

Resolution

Red Hat Enterprise Linux 8.8

Root Cause

migration_entry_wait_on_locked 是干嘛的？

1️⃣ 典型调用路径

2️⃣ migration_entry_wait_on_locked 的职责

真正的 bug：compound page 用错了 page 指针

1️⃣ 什么是 compound page（重点）

2️⃣ 出问题的旧代码（buggy）

3️⃣ 后果是什么？（非常致命）

四、commit 是怎么修的？

1️⃣ 核心修复只有一行（但非常关键）

2️⃣ 这个改动的意义

热门文章

文章分类

标签云

需要专业的网站建设服务？

企业官网建设流程全解析

Resolution

Red Hat Enterprise Linux 8.8

Root Cause

migration_entry_wait_on_locked 是干嘛的？

1️⃣ 典型调用路径

2️⃣ migration_entry_wait_on_locked 的职责

真正的 bug：compound page 用错了 page 指针

1️⃣ 什么是 compound page（重点）

2️⃣ 出问题的旧代码（buggy）

3️⃣ 后果是什么？（非常致命）

四、commit 是怎么修的？

1️⃣ 核心修复只有一行（但非常关键）

2️⃣ 这个改动的意义

热门文章

文章分类

标签云

相关文章

需要专业的网站建设服务？