找回密码
 立即注册
首页 业界区 业界 KGDB调试Linux内核与模块

KGDB调试Linux内核与模块

时思美 3 天前
前言

内核 5.10 版本

  • openEuler 使用 yum install 下载了源码,并且通过两个 VMware 虚拟机进行调试
  • ubuntu 直接使用 git 拉取了https://kernel.org/下 5.10.235 分支的代码,物理主机作为开发机,通过 virtualbox 建立虚拟机作为调试机
openEuler2204-SP4

使用两台虚拟机:

  • 开发机:使用 gdb 连接调试机进行调试
  • 调试机:编译内核,开启 KGDB ,被调试的机器
配置虚拟机

开发机

1.webp

调试机

2.webp

测试:在调试机执行 cat /dev/ttyS0阻塞,在开发机执行 echo "hello" > /dev/ttyS0,可以看到调试机测输出 hello,表示串口连通。
调试机配置

编译内核

在调试机侧下载内核源码:yum install kernel-source.x86_64,在目录 /usr/src/linux-5.10.0-257.0.0.160.oe2203sp4.x86_64下:
  1. #安装yum install
  2. pkg-config
  3. ncurses-devel
  4. openssl-libs
  5. elfutils-libelf-devel
  6. dwarves
  7. openssl-libs
复制代码
  1. cd /usr/src/linux-5.10.0-257.0.0.160.oe2203sp4.x86_64
  2. make menuconfig
复制代码
按照以下截图配置内核:
3.webp

4.webp

5.webp

6.webp

7.webp

8.webp

9.webp

问题


  • openeuler 安装时会出现:dracut-install: Failed to find module 'virtio_gpu' dracut: FAILED: /usr/lib/dracut/dracut-install -D /var/tmp/dracut.MlDs2I/initramfs --kerneldir /lib/modules/5.10.235-yielde-v1-+/ -m virtio_gpu,需要打开 Virtio GPU driver 支持,如下:
    10.webp

    11.webp

配置完成并保存后开始编译内核:
  1. make -j8
复制代码

  • -j8:表示使用 8 个 cpu 核共同编译
编译完成后检查 vmlinux 是否包含 debug 信息:
  1. [root@yielde-debugging linux-5.10.0-257.0.0.160.oe2203sp4.x86_64]# readelf -e vmlinux|grep debug
  2.   [36] .debug_aranges    PROGBITS         0000000000000000  02e00000
  3.   [37] .debug_info       PROGBITS         0000000000000000  02e2c310
  4.   [38] .debug_abbrev     PROGBITS         0000000000000000  0d7f2b9b
  5.   [39] .debug_line       PROGBITS         0000000000000000  0dcc7ee8
  6.   [40] .debug_frame      PROGBITS         0000000000000000  0f37bcf8
  7.   [41] .debug_str        PROGBITS         0000000000000000  0f64cf48
  8.   [42] .debug_loc        PROGBITS         0000000000000000  0f9adda9
  9.   [43] .debug_ranges     PROGBITS         0000000000000000  12c16be0
复制代码
调试机安装内核模块和系统:
  1. make modules_install
  2. make install
复制代码
配置 grub

设置 grub 打开 kgdb,vim /etc/default/grub 在 GRUB_CMDLINE_LINUX的末尾加入 kgdboc=ttyS0,115200 nokaslr
  1. GRUB_CMDLINE_LINUX="rhgb quiet crashkernel=auto rd.lvm.lv=VolGroup/lv_root cgroup_disable=files apparmor=0 crashkernel=512M selinux=0 kgdboc=ttyS0,115200 nokaslr"
复制代码
更新 grub
  1. grub2-mkconfig -o /boot/grub2/grub.cfg
复制代码
复制代码到 开发机
  1. rsync -avh /usr/src/linux-5.10.0-257.0.0.160.oe2203sp4.x86_64 root@10.20.41.140:/usr/src
复制代码
调试机 reboot,选择我们编译好的内核启动
12.webp

kgdb Debugger

发起 kgdb 中断,调试机的屏幕卡住,后面通过开发机的 gdb 通过串口连接后接管:
  1. echo g > /proc/sysrq-trigger
复制代码
调试

内核

在开发机侧:
  1. cd /usr/src/linux-5.10.0-257.0.0.160.oe2203sp4.x86_64
复制代码
连接调试机,给 vfs_write 函数打断点并执行
  1. gdb vmlinux
  2. (gdb) target remote /dev/ttyS0
  3. (gdb) bt
  4. #0  kgdb_breakpoint () at kernel/debug/debug_core.c:1268
  5. #1  0xffffffff811f5c2e in sysrq_handle_dbg (key=<optimized out>) at kernel/debug/debug_core.c:1008
  6. #2  0xffffffff81b31761 in __handle_sysrq (key=key@entry=103, check_mask=check_mask@entry=false) at drivers/tty/sysrq.c:604
  7. #3  0xffffffff8174776a in write_sysrq_trigger (file=<optimized out>, buf=<optimized out>, count=2, ppos=<optimized out>) at drivers/tty/sysrq.c:1168
  8. #4  0xffffffff81455ce3 in pde_write (ppos=<optimized out>, count=0, buf=<optimized out>, file=0x67, pde=0xffff888448b14540) at fs/proc/inode.c:345
  9. #5  proc_reg_write (file=0x67, buf=0xffff88882fba0710 "", count=0, ppos=0x0 <fixed_percpu_data>) at fs/proc/inode.c:357
  10. #6  0xffffffff813bd5df in vfs_write (file=file@entry=0xffff88844b4b3540, buf=buf@entry=0x564f86210f00 <error: Cannot access memory at address 0x564f86210f00>, count=count@entry=2, pos=pos@entry=0xffffc900039fbef0) at fs/read_write.c:600
  11. #7  0xffffffff813bda67 in ksys_write (fd=<optimized out>, buf=0x564f86210f00 <error: Cannot access memory at address 0x564f86210f00>, count=2) at fs/read_write.c:655
  12. #8  0xffffffff813bdb0a in __do_sys_write (count=<optimized out>, buf=<optimized out>, fd=<optimized out>) at fs/read_write.c:667
  13. #9  __se_sys_write (count=<optimized out>, buf=<optimized out>, fd=<optimized out>) at fs/read_write.c:664
  14. #10 __x64_sys_write (regs=<optimized out>) at fs/read_write.c:664
  15. #11 0xffffffff81b510bd in do_syscall_64 (nr=<optimized out>, regs=0xffffc900039fbf58) at arch/x86/entry/common.c:47
  16. #12 0xffffffff81c000df in entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:125
  17. #13 0x00007fa5a8d777a0 in ?? ()
  18. #14 0x0000000000000002 in fixed_percpu_data ()
  19. #15 0x00007fa5a8d775a0 in ?? ()
  20. (gdb) b vfs_write
  21. Breakpoint 1 at 0xffffffff813bd500: file fs/read_write.c, line 583.
  22. (gdb) c
  23. Continuing.
  24. [Switching to Thread 7142]
  25. Thread 409 hit Breakpoint 1, vfs_write (file=file@entry=0xffff888449e45680, buf=buf@entry=0xc0002f7a93 <error: Cannot access memory at address 0xc0002f7a93>, count=count@entry=1, pos=pos@entry=0x0 <fixed_percpu_data>) at fs/read_write.c:583
  26. 583     {
  27. (gdb) l
  28. 578             return ret;
  29. 579     }
  30. 580     EXPORT_SYMBOL(kernel_write);
  31. 581
  32. 582     ssize_t vfs_write(struct file *file, const char __user *buf, size_t count, loff_t *pos)
  33. 583     {
  34. 584             ssize_t ret;
  35. 585
  36. 586             if (!(file->f_mode & FMODE_WRITE))
  37. 587                     return -EBADF;
复制代码
内核模块

以 bcache 举例,想要调试 bcache 需要将 bcache.ko 导入进来,否则无法获取符号表:

  • 在调试机侧
  1. modprobe bcache
复制代码
获取内存布局:
  1. [root@yielde-debugging ~]# cat /sys/module/bcache/sections/.text
  2. 0xffffffffa0672000
  3. [root@yielde-debugging ~]# cat /sys/module/bcache/sections/.bss
  4. 0xffffffffa06a6140
  5. [root@yielde-debugging ~]# cat /sys/module/bcache/sections/.data
  6. 0xffffffffa06a05a0
复制代码
执行 echo g > /proc/sysrq-trigger再将控制权交给开发机的 gdb

  • 在开发机侧加载 bcache 的符号表
  1. (gdb) add-symbol-file /usr/src/linux-5.10.0-257.0.0.160.oe2203sp4.x86_64/drivers/md/bcache/bcache.ko -s .text 0xffffffffa0672000 -s .bss 0xffffffffa06a6140 -s .data 0xffffffffa06a05a0
  2. add symbol table from file "/usr/src/linux-5.10.0-257.0.0.160.oe2203sp4.x86_64/drivers/md/bcache/bcache.ko" at
  3.         .text_addr = 0xffffffffa0672000
  4.         .bss_addr = 0xffffffffa06a6140
  5.         .data_addr = 0xffffffffa06a05a0
  6. (y or n) y
  7. Reading symbols from /usr/src/linux-5.10.0-257.0.0.160.oe2203sp4.x86_64/drivers/md/bcache/bcache.ko...
复制代码
给 bcache 的函数打断点,之后继续运行
  1. (gdb) b bcache_write_super
  2. Breakpoint 2 at 0xffffffffa0689390: file drivers/md/bcache/super.c, line 375.
  3. (gdb) c
  4. Continuing.
复制代码
在调试机创建 bcache,执行
  1. make-bcache -B /dev/sdc -C /dev/sdb --writeback
复制代码
触发我们的断点如下:
  1. [New Thread 10207]
  2. [New Thread 10198]
  3. [New Thread 10201]
  4. [New Thread 10204]
  5. [New Thread 10208]
  6. [New Thread 10209]
  7. [New Thread 10210]
  8. [New Thread 10211]
  9. [New Thread 10212]
  10. [Switching to Thread 10207]
  11. Thread 444 hit Breakpoint 2, bcache_write_super (c=c@entry=0xffff888442d40000) at drivers/md/bcache/super.c:375
  12. 375     {
复制代码
查看创建 bcache 的调用栈:
  1. (gdb) bt
  2. #0  bcache_write_super (c=c@entry=0xffff888442d40000) at drivers/md/bcache/super.c:375
  3. #1  0xffffffffa068af4a in run_cache_set (c=0xffff888442d40000) at drivers/md/bcache/super.c:2137
  4. #2  0xffffffffa068b927 in register_cache_set (ca=ca@entry=0xffff888105552000) at drivers/md/bcache/super.c:2204
  5. #3  0xffffffffa068ba47 in register_cache (sb=<optimized out>, sb_disk=<optimized out>, bdev=0xffff888441756c80, ca=0xffff888105552000) at drivers/md/bcache/super.c:2401
  6. #4  0xffffffffa068bc72 in register_bcache (k=<optimized out>, attr=0xffffffffa06a07a0 <ksysfs_register>, buffer=<optimized out>, size=9) at drivers/md/bcache/super.c:2656
  7. #5  0xffffffff81609d4f in kobj_attr_store (kobj=0xffff888442d40000, attr=0xffff88844566a080, buf=0xffff888105552000 "", count=0) at lib/kobject.c:864
  8. #6  0xffffffff8146ec3b in sysfs_kf_write (of=<optimized out>, buf=0xffff888105552000 "", count=0, pos=<optimized out>) at fs/sysfs/file.c:139
  9. #7  0xffffffff8146e27c in kernfs_fop_write_iter (iocb=0xffffc900060f7e60, iter=<optimized out>) at fs/kernfs/file.c:296
  10. #8  0xffffffff813baa99 in call_write_iter (iter=0xffff88844566a080, kio=0xffff888442d40000, file=0xffff888440f672c0) at ./include/linux/fs.h:2064
  11. #9  new_sync_write (filp=filp@entry=0xffff888440f672c0, buf=buf@entry=0x5627601532a0 <error: Cannot access memory at address 0x5627601532a0>, len=len@entry=9, ppos=ppos@entry=0xffffc900060f7ef0) at fs/read_write.c:515
  12. #10 0xffffffff813bd6c0 in vfs_write (file=file@entry=0xffff888440f672c0, buf=buf@entry=0x5627601532a0 <error: Cannot access memory at address 0x5627601532a0>, count=count@entry=9, pos=pos@entry=0xffffc900060f7ef0) at fs/read_write.c:602
  13. #11 0xffffffff813bda67 in ksys_write (fd=<optimized out>, buf=0x5627601532a0 <error: Cannot access memory at address 0x5627601532a0>, count=9) at fs/read_write.c:655
  14. #12 0xffffffff813bdb0a in __do_sys_write (count=<optimized out>, buf=<optimized out>, fd=<optimized out>) at fs/read_write.c:667
  15. #13 __se_sys_write (count=<optimized out>, buf=<optimized out>, fd=<optimized out>) at fs/read_write.c:664
  16. #14 __x64_sys_write (regs=<optimized out>) at fs/read_write.c:664
  17. #15 0xffffffff81b510bd in do_syscall_64 (nr=<optimized out>, regs=0xffffc900060f7f58) at arch/x86/entry/common.c:47
  18. #16 0xffffffff81c000df in entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:125
  19. #17 0x00007f1ea5a5a7a0 in ?? ()
  20. #18 0x0000000000000009 in fixed_percpu_data ()
  21. Backtrace stopped: previous frame inner to this frame (corrupt stack?)
复制代码
单步执行如下:
13.webp

Ubuntu 22.04.5  LTS


  • 开发机:使用 gdb 连接调试机进行调试,安装 ubuntu 系统的物理机
  • 调试机:编译内核,开启 KGDB ,被调试的机器,在此物理机上通过 virtualbox 安装的虚拟机
配置虚拟机

配置串口并设置主机 pip 通信路径为 /tmp/debuglinux
14.webp

物理机编译内核

这次通过开发机来编译内核拷贝到调试机上面去
内核源码地址在 /root/workspace/linux-learn,安装 deb 包如下:
  1. # apt install
  2. gcc
  3. make
  4. perl
  5. flex
  6. bison
  7. pkg-config
  8. libncurses-dev
  9. libelf-devel
  10. build-essential
复制代码
开始编译
  1. cd /root/workspace/linux-learn
  2. mkdir build
  3. make mrproper
  4. make O=build defconfig
  5. make O=build menuconfig
复制代码

  • 配置项与 openEuler 相似,主要是开启 KGDB,关闭内核的的随机地址空间布局(KASLR),开启 debuginfo
  1. make O=build -j8
复制代码
将整个内核目录复制到 virtulbox 调试机的相同目录下:
  1. cd /root/workspace
  2. rsync -avzW linux-learn root@192.168.5.20:/root/workspace
复制代码
配置调试机

安装内核
  1. cd /root/workspace/linux-learn
  2. make O=build modules_install
  3. make O=build install
复制代码
更新 grub

当前版本的 ubuntu 使用的 grub,上面的 openEuler 使用的 grub2
  1. # 添加
  2. vim /etc/default/grub
  3. GRUB_CMDLINE_LINUX_DEFAULT="kgdboc=ttyS0,115200 nokaslr"
  4. grub-update
复制代码
重启虚拟机,进入编译好的系统如下图
15.webp

16.webp

kgdb Debugger


  • 调试机执行echo g > /proc/sysrq-trigger将控制交给 gdb
  • 开发机即主机执行:
  1. cd /root/workspace/linux-learn/build
  2. target remote /tmp/debuglinux
复制代码
尝试断点:
  1. (gdb) b vfs_write
  2. Breakpoint 1 at 0xffffffff811c34cf: file ../fs/read_write.c, line 586.
  3. (gdb) c
  4. Continuing.
  5. [Switching to Thread 377]
  6. Thread 78 hit Breakpoint 1, vfs_write (file=file@entry=0xffff888101ddf800, buf=0x7ffe9e60766f <error: Cannot access memory at address 0x7ffe9e60766f>, count=count@entry=1, pos=pos@entry=0x0 <fixed_percpu_data>) at ../fs/read_write.c:586
  7. 586        {
  8. (gdb) bt
  9. #0  vfs_write (file=file@entry=0xffff888101ddf800, buf=0x7ffe9e60766f <error: Cannot access memory at address 0x7ffe9e60766f>,
  10.     count=count@entry=1, pos=pos@entry=0x0 <fixed_percpu_data>) at ../fs/read_write.c:586
  11. #1  0xffffffff811c3786 in ksys_write (fd=<optimized out>, buf=0x7ffe9e60766f <error: Cannot access memory at address 0x7ffe9e60766f>,
  12.     count=1) at ../fs/read_write.c:658
  13. #2  0xffffffff811c37eb in __do_sys_write (count=<optimized out>, buf=<optimized out>, fd=<optimized out>) at ../fs/read_write.c:670
  14. #3  __se_sys_write (count=<optimized out>, buf=<optimized out>, fd=<optimized out>) at ../fs/read_write.c:667
  15. #4  __x64_sys_write (regs=<optimized out>) at ../fs/read_write.c:667
  16. #5  0xffffffff81a1a921 in do_syscall_64 (nr=<optimized out>, regs=0xffffc90000407f58) at ../arch/x86/entry/common.c:46
  17. #6  0xffffffff81c0011f in entry_SYSCALL_64 () at ../arch/x86/entry/entry_64.S:117
  18. #7  0x00005610590964e5 in ?? ()
  19. #8  0x0000561091a5ac50 in ?? ()
  20. #9  0x00007f6214955a60 in ?? ()
复制代码
总结

本文主要汇总了下近期调试内核与内核模块的配置方式。

  • 使用两台虚拟机通信,一台启动 gdb,另一台启动 kgdb
  • 物理机本身为 linux 系统启动 gdb,使用虚拟机启动 kgdb
  • 建议每次修改代码重新编译内核后同步到两个系统的相同的目录下,可以省去很多麻烦

来源:程序园用户自行投稿发布,如果侵权,请联系站长删除
免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!
您需要登录后才可以回帖 登录 | 立即注册