您好,登錄后才能下訂單哦!
這篇文章主要介紹“如何在x86虛擬機上使用ramoops和kdump記錄內核crash信息”,在日常操作中,相信很多人在如何在x86虛擬機上使用ramoops和kdump記錄內核crash信息問題上存在疑惑,小編查閱了各式資料,整理出簡單好用的操作方法,希望對大家解答”如何在x86虛擬機上使用ramoops和kdump記錄內核crash信息”的疑惑有所幫助!接下來,請跟著小編一起來學習吧!
ramoops是一個oops/panic記錄器(logger),它能夠在系統崩潰前將日志信息記錄到RAM中。ramoops需要一個帶有持久的(persistent)RAM,因此這些內存區域中的內容在重啟后能夠保留。
ramoops能夠以模塊的形式編譯,為了方便,我直接編進內核。需要開啟的配置宏如下:
CONFIG_PSTORE=y CONFIG_PSTORE_CONSOLE=y CONFIG_PSTORE_RAM=y # CONFIG_PSTORE_PMSG is not set 推薦開啟 # CONFIG_PSTORE_FTRACE is not set 推薦開啟
使用VirtualBox,即使配置正確下,內核crash后也不會在/sys/fs/pstore目錄下生成相應的記錄文件(這個非常坑),因此我在VMware上搭建環境。配置2G的內存。根據官方文檔,有三種使用ramoops的方法:ramoops官方文檔,我們之間使用第一種,即通過向kernel傳遞啟動參數的方式:修改/boot/grub/grub.cfg文件,增加如下參數:
mem=1920M ramoops.mem_address=0x78000000 ramoops.mem_size =0x4000000 ramoops.dump_oops=1 ramoops.ecc=1
其中,mem是給內核使用的大小,mem_address表示ramoops使用的起始內存,mem_size 表示這個預留內存的大小,dump_oops=1表示oopses和panics均記錄,ecc=1表示ECC-protected,具體可見官方文檔。在配置好啟動參數后,重啟生效。
系統重啟后,我們故意讓內核crash,執行如下命令:
echo c > /proc/sysrq-trigger
系統就會crash,之后就會重啟。進入shell下,查看/sys/fs/pstore下,會有相應的記錄文件。當前是console-ramoops-0和dmesg-ramoops-1文件,內容如下:
console-ramoops-0 [ 120.719799] CPU: 0 PID: 7475 Comm: sh Tainted: G O 4.9.166+ #2 [ 120.719942] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017 [ 120.720192] task: ffff96770c862240 task.stack: ffffba30c4a80000 [ 120.720324] RIP: 0010:[<ffffffffac832702>] [<ffffffffac832702>] sysrq_handle_crash+0x12/0x20 [ 120.720533] RSP: 0018:ffffba30c4a83e78 EFLAGS: 00010282 [ 120.720680] RAX: 000000000000000f RBX: 0000000000000063 RCX: 0000000000000000 [ 120.720819] RDX: 0000000000000000 RSI: ffff96771ba10648 RDI: 0000000000000063 [ 120.720962] RBP: ffffffffad0bffc0 R08: 0000000000000001 R09: 0000000000059284 [ 120.721131] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000004 [ 120.721278] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 120.721421] FS: 0000000001ea0880(0000) GS:ffff96771ba00000(0000) knlGS:0000000000000000 [ 120.721618] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 120.721745] CR2: 0000000000000000 CR3: 000000004cb84000 CR4: 0000000000360670 [ 120.721911] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 120.722108] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 120.722256] Stack: [ 120.722331] ffffffffac832e31 0000000000000002 fffffffffffffffb ffffba30c4a83f08 [ 120.722604] 0000000001ea4f00 ffffffffac83326b ffff9676f3effdd8 ffffffffac67c0dd [ 120.722849] 0000000000000002 ffff96770ffc8880 ffffffffac60cba0 ffff96770ffc8880 [ 120.723105] Call Trace: [ 120.723188] [<ffffffffac832e31>] ? __handle_sysrq+0xf1/0x140 [ 120.723316] [<ffffffffac83326b>] ? write_sysrq_trigger+0x2b/0x30 [ 120.723443] [<ffffffffac67c0dd>] ? proc_reg_write+0x3d/0x60 [ 120.723640] [<ffffffffac60cba0>] ? vfs_write+0xb0/0x190 [ 120.723761] [<ffffffffac60dfe2>] ? SyS_write+0x52/0xc0 [ 120.723882] [<ffffffffac403b67>] ? do_syscall_64+0x87/0xf0 [ 120.724002] [<ffffffffaca36c4e>] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6 [ 120.724167] Code: 41 5c 41 5d 41 5e 41 5f e9 0c 8a ce ff 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00 c7 05 c9 45 94 03 01 00 00 00 0f ae f8 <c6> 04 25 00 00 00 00 01 c3 0f 1f 44 00 00 0f 1f 44 00 00 53 8d [ 120.725651] RIP [<ffffffffac832702>] sysrq_handle_crash+0x12/0x20 [ 120.725801] RSP <ffffba30c4a83e78> [ 120.725894] CR2: 0000000000000000 [ 120.726019] ---[ end trace 9d0e2c84273289ed ]--- [ 120.730468] Kernel panic - not syncing: Fatal exception [ 120.730706] Kernel Offset: 0x2b400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 120.735485] Rebooting in 5 seconds.. [ 125.739764] ACPI MEMORY or I/O RESET_REG. No errors detected =============================== dmesg-ramoops-1 <4>[ 5.718737] hrtimer: interrupt took 9700 ns <6>[ 6.096115] igb_uio: Use MSIX interrupt by default <3>[ 6.169775] EXT4-fs (sda2): unable to read superblock <3>[ 6.170072] EXT4-fs (sda2): unable to read superblock <3>[ 6.170081] EXT4-fs (sda2): unable to read superblock <6>[ 6.695732] device eth0 entered promiscuous mode <6>[ 6.697183] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None <0>[ 12.596101] TIPC: Started in network mode <0>[ 12.596217] TIPC: Own node identity 1001285, cluster identity 4711 <0>[ 12.596345] TIPC: 32-bit node address hash set to 1001285 <0>[ 12.638127] TIPC: Enabled bearer <eth:eth0>, priority 10 <0>[ 18.791882] TIPC: Resetting bearer <eth:eth0> <6>[ 23.124506] ip_tables: (C) 2000-2006 Netfilter Core Team <6>[ 23.623557] nf_conntrack version 0.5.0 (16384 buckets, 65536 max) <6>[ 24.067819] Initializing XFRM netlink socket <6>[ 24.087773] Netfilter messages via NETLINK v0.30. <6>[ 24.522481] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this. <6>[ 120.714716] sysrq: SysRq : Trigger a crash <1>[ 120.714869] BUG: unable to handle kernel NULL pointer dereference at (null) <1>[ 120.715080] IP: [<ffffffffac832702>] sysrq_handle_crash+0x12/0x20 <7>[ 120.715239] PGD 800000004fe74067 <7>[ 120.715296] PUD 4fe07067 <7>[ 120.715393] PMD 0 <7>[ 120.715417] <7>[ 120.715497] Oops: 0002 [#1] SMP <7>[ 120.715587] Modules linked in: <7>[ 120.718151] mii mtd sd_mod ata_piix ahci libahci libata scsi_mod sdhci_pci sdhci mmc_block mmc_core squashfs vfat fat ext4 crc16 fscrypto jbd2 mbcache <7>[ 120.719799] CPU: 0 PID: 7475 Comm: sh Tainted: G O 4.9.166+ #2 <7>[ 120.719942] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017 <7>[ 120.720192] task: ffff96770c862240 task.stack: ffffba30c4a80000 <7>[ 120.720324] RIP: 0010:[<ffffffffac832702>] [<ffffffffac832702>] sysrq_handle_crash+0x12/0x20 <7>[ 120.720533] RSP: 0018:ffffba30c4a83e78 EFLAGS: 00010282 <7>[ 120.720680] RAX: 000000000000000f RBX: 0000000000000063 RCX: 0000000000000000 <7>[ 120.720819] RDX: 0000000000000000 RSI: ffff96771ba10648 RDI: 0000000000000063 <7>[ 120.720962] RBP: ffffffffad0bffc0 R08: 0000000000000001 R09: 0000000000059284 <7>[ 120.721131] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000004 <7>[ 120.721278] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 <7>[ 120.721421] FS: 0000000001ea0880(0000) GS:ffff96771ba00000(0000) knlGS:0000000000000000 <7>[ 120.721618] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <7>[ 120.721745] CR2: 0000000000000000 CR3: 000000004cb84000 CR4: 0000000000360670 <7>[ 120.721911] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <7>[ 120.722108] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 <7>[ 120.722256] Stack: <7>[ 120.722331] ffffffffac832e31 0000000000000002 fffffffffffffffb ffffba30c4a83f08 <7>[ 120.722604] 0000000001ea4f00 ffffffffac83326b ffff9676f3effdd8 ffffffffac67c0dd <7>[ 120.722849] 0000000000000002 ffff96770ffc8880 ffffffffac60cba0 ffff96770ffc8880 <7>[ 120.723105] Call Trace: <7>[ 120.723188] [<ffffffffac832e31>] ? __handle_sysrq+0xf1/0x140 <7>[ 120.723316] [<ffffffffac83326b>] ? write_sysrq_trigger+0x2b/0x30 <7>[ 120.723443] [<ffffffffac67c0dd>] ? proc_reg_write+0x3d/0x60 <7>[ 120.723640] [<ffffffffac60cba0>] ? vfs_write+0xb0/0x190 <7>[ 120.723761] [<ffffffffac60dfe2>] ? SyS_write+0x52/0xc0 <7>[ 120.723882] [<ffffffffac403b67>] ? do_syscall_64+0x87/0xf0 <7>[ 120.724002] [<ffffffffaca36c4e>] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6 <7>[ 120.724167] Code: 41 5c 41 5d 41 5e 41 5f e9 0c 8a ce ff 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00 c7 05 c9 45 94 03 01 00 00 00 0f ae f8 <c6> 04 25 00 00 00 00 01 c3 0f 1f 44 00 00 0f 1f 44 00 00 53 8d <1>[ 120.725651] RIP [<ffffffffac832702>] sysrq_handle_crash+0x12/0x20 <7>[ 120.725801] RSP <ffffba30c4a83e78> <7>[ 120.725894] CR2: 0000000000000000 <4>[ 120.726019] ---[ end trace 9d0e2c84273289ed ]--- <0>[ 120.730468] Kernel panic - not syncing: Fatal exception <0>[ 120.730706] Kernel Offset: 0x2b400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) No errors detected
可以看到,這次panic由system request導致。
每當system kernel(工作內核)的內存需要dump時(如系統panic),kdump便使用kexec快速啟動(繞過BIOS檢查)一個捕獲dump(dump-capture)的內核。在啟動第二個內核時,工作內核的所有信息(memory image)被保留,并且能夠被啟動的捕獲內核獲取。
在x86-64上需要開啟的配置宏:(其實分為system kernel和dump-capture kernel,具體看官方文檔;我們僅用一個kernel:啟動的捕獲內核仍為工作內核)
CONFIG_KEXEC=y CONFIG_SYSFS=y CONFIG_DEBUG_INFO=Y #dump分析工具需要帶有vmlinux的符號表 CONFIG_CRASH_DUMP=y CONFIG_PROC_VMCORE=y CONFIG_RELOCATABLE=y CONFIG_PHYSICAL_START=0x1000000 #加載內核的內存區域起點 CONFIG_SMP=n #該配置為捕獲內核配置項,我們仍然開啟,修改kernel啟動參數即可
在重新編譯內核后,我們需要修改system kernel的起動參數,為捕獲kernel預留一定的內存,當前預留512M,修改/boot/grub/grub.cfg增加如下:
crashkernel=512M@16M
重新啟動后生效,此時我們便需要編譯kexec工具, 我們在shell下執行如下命令即可:
kexec -p /boot/vmlinux-4.9.xxx --initrd=/boot/initrd-4.9.166.xxx --append="1 irqpoll maxcpus=1 reset_devices noapic recovery" --reuse-cmdline
然后,當kernel崩潰后,就會加載捕獲內核,系統啟動后,生成/proc/vmcore文件,將其拷貝至其它目錄后重啟,使用kexec-x86/sbin/vmcore-dmesg或者gdb調試該文件即可。
到此,關于“如何在x86虛擬機上使用ramoops和kdump記錄內核crash信息”的學習就結束了,希望能夠解決大家的疑惑。理論與實踐的搭配能更好的幫助大家學習,快去試試吧!若想繼續學習更多相關知識,請繼續關注億速云網站,小編會繼續努力為大家帶來更多實用的文章!
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。