Bootstrap

关于linux kernel softlockup 的探究

1. 基本解释

softlockup:发生在某个 CPU 长时间占用资源,但 CPU 仍然可以响应中断 和调度器。软死锁通常不会导致系统崩溃,但可能会使系统响应变慢.

2. 驱动模拟softlockup

以下为代码实现

#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/spinlock.h>
#include <linux/jiffies.h>
#include <linux/delay.h>

static spinlock_t my_lock;

static int __init softlockup_spinlock_init(void)
{
    printk(KERN_INFO "Simulating soft lockup with spinlock deadlock...\n");

    spin_lock_init(&my_lock);

    // 加锁并模拟死锁,锁永远不会被释放
    spin_lock(&my_lock);
    spin_lock(&my_lock);  // 自己锁住自己,造成死锁

    // 此时 CPU 会长时间无法处理其他任务,模拟 softlockup
    return 0;
}

static void __exit softlockup_spinlock_exit(void)
{
    printk(KERN_INFO "Soft lockup simulation with spinlock exit\n");
}

module_init(softlockup_spinlock_init);
module_exit(softlockup_spinlock_exit);

MODULE_LICENSE("GPL");
MODULE_AUTHOR("Qing");
MODULE_DESCRIPTION("Softlockup simulation with spinlock deadlock");

这是一个spin_lock 导致softlockup, 如果添加一个死循环,也会导致该现象。

3. 异常log

[ 1936.124974] watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [insmod:9098]
[ 1936.124976] Modules linked in: ......
[ 1936.125036] RIP: 0010:native_queued_spin_lock_slowpath+0x68/0x1f0
[ 1936.125038] Code: 2f 08 0f 92 c0 0f b6 c0 c1 e0 08 89 c2 8b 07 30 e4 09 d0 a9 00 01 ff ff 75 20 85 c0 75 0c b8 01 00 00 00 66 89 07 5d c3 f3 90 <8b> 07 84 c0 75 f8 b8 01 00 00 00 66 89 07 eb ec f6 c4 01 75 04 c6
[ 1936.125038] RSP: 0018:ffffc09487847c40 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
[ 1936.125039] RAX: 0000000000000101 RBX: 0000000000000000 RCX: 0000000000000006
[ 1936.125039] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffc1569380
[ 1936.125039] RBP: ffffc09487847c40 R08: 0000000000000440 R09: 0000000000000004
[ 1936.125040] R10: 0000000000000000 R11: 0000000000000001 R12: ffffffffc144e000
[ 1936.125040] R13: ffff9ec3a296d540 R14: ffffc09487847e68 R15: ffffffffc1569000
[ 1936.125041] FS:  00007fcccfe7c540(0000) GS:ffff9ec3cc040000(0000) knlGS:0000000000000000
[ 1936.125041] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1936.125042] CR2: 00007fcccfa0c3d0 CR3: 0000000294946003 CR4: 00000000007606e0
[ 1936.125042] PKRU: 55555554
[ 1936.125057] Call Trace:
[ 1936.125061]  _raw_spin_lock+0x1f/0x30
[ 1936.125063]  softlockup_spinlock_init+0x37/0x1000 [softlockup]
[ 1936.125064]  do_one_initcall+0x4a/0x210
[ 1936.125065]  ? _cond_resched+0x19/0x40
[ 1936.125067]  ? kmem_cache_alloc_trace+0x170/0x230
[ 1936.125069]  do_init_module+0x4f/0x20f
[ 1936.125070]  load_module+0x1e77/0x22f0
[ 1936.125071]  __do_sys_finit_module+0xfc/0x120
[ 1936.125072]  ? __do_sys_finit_module+0xfc/0x120
[ 1936.125073]  __x64_sys_finit_module+0x1a/0x20
[ 1936.125074]  do_syscall_64+0x57/0x190
[ 1936.125075]  entry_SYSCALL_64_after_hwframe+0x5c/0xc1

4. softlockup 如何检测

NMI Watchdog会检测到softlockup. 因为NMI 中断不可屏蔽!

5. 生成kdump 转储文件(可选)

这个会导致CPU 卡住,不会奔溃,要让发生软锁 时,产生奔溃,需要是能系统的以下参数:

echo 1 > /proc/sys/kernel/softlockup_panic

再结合kdump 就可以生成内核转储文件了。

;