Bug 35063 - BUG: unable to handle kernel NULL pointer dereference
Summary: BUG: unable to handle kernel NULL pointer dereference
Status: CLOSED WORKSFORME
Alias: None
Product: Sisyphus
Classification: Development
Component: kernel-image-ovz-el (show other bugs)
Version: unstable
Hardware: all Linux
: P3 normal
Assignee: Gleb F-Malinovskiy
QA Contact: qa-sisyphus
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-06-20 21:48 MSK by Vitaly Lipatov
Modified: 2024-04-12 07:17 MSK (History)
14 users (show)

See Also:


Attachments
# cat /proc/mdstat (917 bytes, text/plain)
2018-06-20 21:50 MSK, Vitaly Lipatov
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Vitaly Lipatov 2018-06-20 21:48:53 MSK
Регулярный (раз в сутки) kernel panic:
BUG: unable to handle kernel NULL pointer dereference

На момент возникновения проблемы 10 января 2018
стояло ядро (давно стояло)
2.6.32-ovz-el-alt152
сейчас
2.6.32-ovz-el-alt162

При падении такой лог (перенаправил с помощью netconsole на соседнюю машину):
Jun 20 07:16:14 192.168.0.223 [222078.384448] BUG: unable to handle kernel 
Jun 20 07:16:19 192.168.0.223 NULL pointer dereference
Jun 20 07:16:24 192.168.0.223  at 0000000000000010 
Jun 20 07:16:29 192.168.0.223 [222078.384704] IP:
Jun 20 07:16:34 192.168.0.223  [<ffffffffa01964f4>] clone_endio+0x34/0xd0 [dm_mod] 
Jun 20 07:16:39 192.168.0.223 [222078.384850] PGD 73a914067 
Jun 20 07:16:44 192.168.0.223 PUD 73a916067 
Jun 20 07:16:49 192.168.0.223 PMD 0 
Jun 20 07:16:54 192.168.0.223  
Jun 20 07:16:59 192.168.0.223 [222078.385022] Oops: 0000 [#1] 
Jun 20 07:17:04 192.168.0.223 SMP 
Jun 20 07:17:09 192.168.0.223  
Jun 20 07:17:14 192.168.0.223 [222078.385172] last sysfs file: /sys/devices/system/cpu/cpu1/cpuidle/state1/time 
Jun 20 07:17:19 192.168.0.223 [222078.385404] CPU 4 
Jun 20 07:17:24 192.168.0.223  
Jun 20 07:17:29 192.168.0.223 [222078.385404] Modules linked in:
....
Jun 20 07:18:45 azbykar [222078.385404]  
Jun 20 07:18:45 azbykar [222078.385404] Pid: 0, comm: swapper veid: 0 Tainted: G           -- ------------  T 2.6.32-ovz-el-alt162 #1 042stab127_2
Jun 20 07:18:45 azbykar  To be filled by O.E.M. To be filled by O.E.M.
Jun 20 07:18:45 azbykar /M5A99FX PRO R2.0
Jun 20 07:18:45 azbykar  
Jun 20 07:18:45 azbykar [222078.385404] RIP: 0010:[<ffffffffa01964f4>] 
Jun 20 07:18:45 azbykar  [<ffffffffa01964f4>] clone_endio+0x34/0xd0 [dm_mod] 
Jun 20 07:18:45 azbykar [222078.385404] RSP: 0018:ffff880028303b80  EFLAGS: 
...

Куда копать?
Очень нужна помощь, потому это самый важный сервер, который вдруг стал так себя вести. С другим, на котором вдруг стала утекать память ядра, даже и беспокоить не буду :)
Comment 1 Vitaly Lipatov 2018-06-20 21:50:34 MSK
Created attachment 7599 [details]
# cat /proc/mdstat

Скорее всего причина в активном использовании RAID1, прикладываю proc/mdstat.
Comment 2 Vitaly Lipatov 2024-04-12 04:37:14 MSK
Не актуально.
Comment 3 Vitaly Lipatov 2024-04-12 07:15:17 MSK
После обновления до Сизифа

kernel-modules-zfs-std-def-2.2.2-alt1.393557.1.x86_64
libzfs-2.2.2-alt1.x86_64
zfs-zed-2.2.2-alt1.x86_64
zfs-utils-2.2.2-alt1.x86_64

# uname -a
Linux aspetos.office.etersoft.ru 6.1.85-std-def-alt1 #1 SMP PREEMPT_DYNAMIC Wed Apr 10 20:50:39 UTC 2024 x86_64 GNU/Linux


ничего не изменилось:

[Пт апр 12 07:12:14 2024] BUG: kernel NULL pointer dereference, address: 0000000000000980
[Пт апр 12 07:12:14 2024] #PF: supervisor write access in kernel mode
[Пт апр 12 07:12:14 2024] #PF: error_code(0x0002) - not-present page
[Пт апр 12 07:12:14 2024] PGD 0 P4D 0 
[Пт апр 12 07:12:14 2024] Oops: 0002 [#1] PREEMPT SMP NOPTI
[Пт апр 12 07:12:14 2024] CPU: 17 PID: 15407 Comm: zpool Tainted: P           OE      6.1.85-std-def-alt1 #1
[Пт апр 12 07:12:14 2024] Hardware name: Gigabyte Technology Co., Ltd. X670 AORUS ELITE AX/X670 AORUS ELITE AX, BIOS F5 09/28/2022
[Пт апр 12 07:12:14 2024] RIP: 0010:mutex_lock+0x19/0x40
[Пт апр 12 07:12:14 2024] Code: 00 0f 1f 44 00 00 be 02 00 00 00 e9 31 f8 ff ff 90 0f 1f 44 00 00 53 48 89 fb e8 c2 d2 ff ff 31 c0 65 48 8b 14 25 80 fb 01 00 <f0> 48 0f b1 13 75 0c 5b 31 c0 31 d2 31 ff e9 34 3a 24 00 48 89 df
[Пт апр 12 07:12:14 2024] RSP: 0018:ffffaad99721fb98 EFLAGS: 00010246
[Пт апр 12 07:12:14 2024] RAX: 0000000000000000 RBX: 0000000000000980 RCX: 0000000000000000
[Пт апр 12 07:12:14 2024] RDX: ffff9d85c9554080 RSI: 0000000000000000 RDI: 0000000000000980
[Пт апр 12 07:12:14 2024] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff9d86df03e5e0
[Пт апр 12 07:12:14 2024] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffc10c8768
[Пт апр 12 07:12:14 2024] R13: ffff9d86dde35c80 R14: 0000000000000027 R15: 0000000000000004
[Пт апр 12 07:12:14 2024] FS:  00007fcef59ce900(0000) GS:ffff9da458640000(0000) knlGS:0000000000000000
[Пт апр 12 07:12:14 2024] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Пт апр 12 07:12:14 2024] CR2: 0000000000000980 CR3: 000000020ebcc000 CR4: 0000000000750ee0
[Пт апр 12 07:12:14 2024] PKRU: 55555554
[Пт апр 12 07:12:14 2024] Call Trace:
[Пт апр 12 07:12:14 2024]  <TASK>
[Пт апр 12 07:12:14 2024]  ? __die_body.cold+0x1a/0x1f
[Пт апр 12 07:12:14 2024]  ? page_fault_oops+0xae/0x2a0
[Пт апр 12 07:12:14 2024]  ? exc_page_fault+0x78/0x180
[Пт апр 12 07:12:14 2024]  ? asm_exc_page_fault+0x22/0x30
[Пт апр 12 07:12:14 2024]  ? mutex_lock+0x19/0x40
[Пт апр 12 07:12:14 2024]  ? mutex_lock+0xe/0x40
[Пт апр 12 07:12:14 2024]  range_tree_span+0x136/0x290 [zfs]
[Пт апр 12 07:12:14 2024]  spa_prop_get+0xab/0xfb0 [zfs]
[Пт апр 12 07:12:14 2024]  ? srso_alias_return_thunk+0x5/0x7f
[Пт апр 12 07:12:14 2024]  ? schedule+0x5a/0xe0
[Пт апр 12 07:12:14 2024]  ? srso_alias_return_thunk+0x5/0x7f
[Пт апр 12 07:12:14 2024]  ? srso_alias_return_thunk+0x5/0x7f
[Пт апр 12 07:12:14 2024]  ? default_send_IPI_single_phys+0x4f/0x80
[Пт апр 12 07:12:14 2024]  ? srso_alias_return_thunk+0x5/0x7f
[Пт апр 12 07:12:14 2024]  ? ttwu_queue_wakelist+0xf7/0x120
[Пт апр 12 07:12:14 2024]  ? srso_alias_return_thunk+0x5/0x7f
[Пт апр 12 07:12:14 2024]  ? try_to_wake_up+0xe1/0x5b0
[Пт апр 12 07:12:14 2024]  ? srso_alias_return_thunk+0x5/0x7f
[Пт апр 12 07:12:14 2024]  ? __kmem_cache_free+0x14f/0x210
[Пт апр 12 07:12:14 2024]  ? srso_alias_return_thunk+0x5/0x7f
[Пт апр 12 07:12:14 2024]  ? wake_up_q+0x4a/0x90
[Пт апр 12 07:12:14 2024]  ? srso_alias_return_thunk+0x5/0x7f
[Пт апр 12 07:12:14 2024]  ? __mutex_unlock_slowpath.isra.0+0x87/0x140
[Пт апр 12 07:12:14 2024]  ? srso_alias_return_thunk+0x5/0x7f
[Пт апр 12 07:12:14 2024]  ? __mutex_lock.constprop.0+0x36/0x720
[Пт апр 12 07:12:14 2024]  ? srso_alias_return_thunk+0x5/0x7f
[Пт апр 12 07:12:14 2024]  ? srso_alias_return_thunk+0x5/0x7f
[Пт апр 12 07:12:14 2024]  ? srso_alias_return_thunk+0x5/0x7f
[Пт апр 12 07:12:14 2024]  ? avl_find+0x50/0x2c0 [zfs]
[Пт апр 12 07:12:14 2024]  zfs_impl_get_ops+0x228c/0x65c0 [zfs]
[Пт апр 12 07:12:14 2024]  zfsdev_ioctl_common+0x874/0x990 [zfs]
[Пт апр 12 07:12:14 2024]  ? srso_alias_return_thunk+0x5/0x7f
[Пт апр 12 07:12:14 2024]  zfs_file_put+0xef/0x270 [zfs]
[Пт апр 12 07:12:14 2024]  __x64_sys_ioctl+0x95/0xe0
[Пт апр 12 07:12:14 2024]  do_syscall_64+0x56/0x90
[Пт апр 12 07:12:14 2024]  ? srso_alias_return_thunk+0x5/0x7f
[Пт апр 12 07:12:14 2024]  ? do_user_addr_fault+0x1d3/0x5c0
[Пт апр 12 07:12:14 2024]  ? srso_alias_return_thunk+0x5/0x7f
[Пт апр 12 07:12:14 2024]  ? get_vtime_delta+0xf/0xc0
[Пт апр 12 07:12:14 2024]  ? srso_alias_return_thunk+0x5/0x7f
[Пт апр 12 07:12:14 2024]  ? ct_kernel_exit.isra.0+0x71/0x90
[Пт апр 12 07:12:14 2024]  ? srso_alias_return_thunk+0x5/0x7f
[Пт апр 12 07:12:14 2024]  ? __ct_user_enter+0x5a/0xd0
[Пт апр 12 07:12:14 2024]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[Пт апр 12 07:12:14 2024] RIP: 0033:0x7fcef626c9eb
[Пт апр 12 07:12:14 2024] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00
[Пт апр 12 07:12:14 2024] RSP: 002b:00007ffe54e624a0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[Пт апр 12 07:12:14 2024] RAX: ffffffffffffffda RBX: 0000560096e683b0 RCX: 00007fcef626c9eb
[Пт апр 12 07:12:14 2024] RDX: 00007ffe54e62500 RSI: 0000000000005a27 RDI: 0000000000000003
[Пт апр 12 07:12:14 2024] RBP: 00007ffe54e65ae0 R08: 0000000000000000 R09: 0000000000000001
[Пт апр 12 07:12:14 2024] R10: 0000000000000004 R11: 0000000000000246 R12: 00007ffe54e62500
[Пт апр 12 07:12:14 2024] R13: 0000560096e04f50 R14: 0000560096e1f170 R15: 0000000000001000
[Пт апр 12 07:12:14 2024]  </TASK>
[Пт апр 12 07:12:14 2024] Modules linked in: xt_multiport af_packet veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter rpcsec_gss_krb5 nfsv4 dns_resolver nfs lockd grace fscache netfs nf_tables openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 qrtr cmac algif_hash algif_skcipher af_alg bnep msr nfnetlink_log nfnetlink btrfs blake2b_generic xor raid6_pq wl(POE) edac_mce_amd edac_core intel_rapl_msr intel_rapl_common mt7921e kvm_amd mt7921_common mt76_connac_lib snd_hda_codec_realtek mt76 kvm snd_hda_codec_generic btusb amdgpu ledtrig_audio snd_hda_codec_hdmi joydev irqbypass btrtl nls_utf8 sd_mod crct10dif_pclmul btbcm mac80211 iommu_v2 nls_cp866 snd_hda_intel crc32_pclmul gpu_sched snd_intel_dspcfg btintel drm_buddy btmtk ghash_clmulni_intel vfat snd_intel_sdw_acpi bluetooth ecdh_generic input_leds fat hid_generic drm_display_helper sha512_ssse3 sha256_ssse3 cmdlinepart snd_hda_codec drm_ttm_helper ahci
[Пт апр 12 07:12:14 2024]  sha1_ssse3 wmi_bmof ttm cfg80211 snd_hda_core mpt3sas sfc libahci aesni_intel ccp cec snd_hwdep mdio raid_class r8169 k10temp crypto_simd zfs(POE) sp5100_tco rfkill usbhid evdev libata cryptd snd_pcm rc_core pcspkr scsi_transport_sas hwmon i2c_piix4 rng_core libarc4 realtek mtd thermal video spl(OE) hid wmi tiny_power_button acpi_cpufreq button sch_fq_codel vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) scsi_dh_rdac scsi_dh_emc scsi_dh_alua vhost_net tun vhost vhost_iotlb tap rbd libceph libcrc32c crc32c_intel br_netfilter bridge stp llc snd_seq_midi snd_seq_midi_event snd_seq snd_rawmidi snd_seq_device snd_timer dm_multipath snd auth_rpcgss soundcore scsi_mod fuse efi_pstore scsi_common dm_mod sunrpc efivarfs dmi_sysfs ip_tables x_tables autofs4 xhci_pci xhci_pci_renesas xhci_hcd
[Пт апр 12 07:12:14 2024] CR2: 0000000000000980
[Пт апр 12 07:12:14 2024] ---[ end trace 0000000000000000 ]---
[Пт апр 12 07:12:15 2024] RIP: 0010:mutex_lock+0x19/0x40
[Пт апр 12 07:12:15 2024] Code: 00 0f 1f 44 00 00 be 02 00 00 00 e9 31 f8 ff ff 90 0f 1f 44 00 00 53 48 89 fb e8 c2 d2 ff ff 31 c0 65 48 8b 14 25 80 fb 01 00 <f0> 48 0f b1 13 75 0c 5b 31 c0 31 d2 31 ff e9 34 3a 24 00 48 89 df
[Пт апр 12 07:12:15 2024] RSP: 0018:ffffaad99721fb98 EFLAGS: 00010246
[Пт апр 12 07:12:15 2024] RAX: 0000000000000000 RBX: 0000000000000980 RCX: 0000000000000000
[Пт апр 12 07:12:15 2024] RDX: ffff9d85c9554080 RSI: 0000000000000000 RDI: 0000000000000980
[Пт апр 12 07:12:15 2024] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff9d86df03e5e0
[Пт апр 12 07:12:15 2024] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffc10c8768
[Пт апр 12 07:12:15 2024] R13: ffff9d86dde35c80 R14: 0000000000000027 R15: 0000000000000004
[Пт апр 12 07:12:15 2024] FS:  00007fcef59ce900(0000) GS:ffff9da458640000(0000) knlGS:0000000000000000
[Пт апр 12 07:12:15 2024] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Пт апр 12 07:12:15 2024] CR2: 0000000000000980 CR3: 000000020ebcc000 CR4: 0000000000750ee0
[Пт апр 12 07:12:15 2024] PKRU: 55555554
[Пт апр 12 07:12:15 2024] note: zpool[15407] exited with irqs disabled
Comment 4 Vitaly Lipatov 2024-04-12 07:17:30 MSK
(Ответ для Vitaly Lipatov на комментарий #3)
> После обновления до Сизифа
Прошу прощения, отправил по ошибке.