Bug 39717 - Baikal-M: deadlock в течение 2 -- 5 минут после загрузки
Summary: Baikal-M: deadlock в течение 2 -- 5 минут после загрузки
Status: CLOSED FIXED
Alias: None
Product: Sisyphus
Classification: Development
Component: kernel-image-std-def (show other bugs)
Version: unstable
Hardware: aarch64 Linux
: P5 normal
Assignee: Alexey Sheplyakov
QA Contact: qa-sisyphus
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-02-23 18:12 MSK by Alexey Sheplyakov
Modified: 2021-05-24 17:11 MSK (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Alexey Sheplyakov 2021-02-23 18:12:11 MSK
При использовании cpufreq governor schedutil (по умолчанию) ядро
намертво зависает в течение 1 - 3 минут после загрузки.
Виснет примерно так (да, это ядро 5.10, но оно почти всегда успевает
ругнуться, прежде чем повиснуть, а 5.4 быстрее зависает, и надо раз
10 загрузиться, чтобы успеть поймать аналогичный лог):

[  454.690508] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[  454.696839] 	(detected by 4, t=26002 jiffies, g=22561, q=15)
[  454.703017] rcu: All QSes seen, last rcu_preempt kthread activity 25992 (4295121102-4295095110), jiffies_till_next_fqs=3, root ->qsmask 0x0
[  454.715570] rcu: rcu_preempt kthread starved for 25992 jiffies! g22561 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x200 ->cpu=1
[  454.726117] rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
[  454.735273] rcu: RCU grace-period kthread stack dump:
[  454.740344] task:rcu_preempt     state:R stack:    0 pid:   13 ppid:     2 flags:0x00000028
[  454.748731] Call trace:
[  454.751204]  __switch_to+0x114/0x170
[  454.754803]  __schedule+0x370/0xa3c
[  454.758310]  schedule+0x50/0x104
[  454.761557]  schedule_timeout+0x9c/0x114
[  454.765503]  rcu_gp_kthread+0x598/0xb50
[  454.769360]  kthread+0x150/0x160
[  454.772607]  ret_from_fork+0x10/0x38
[  454.778130] 
[  454.779631] ================================
[  454.783912] WARNING: inconsistent lock state
[  454.788196] 5.10.17-00041-g454ed3004040-dirty #1 Not tainted
[  454.793867] --------------------------------
[  454.798147] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
[  454.804168] swapper/4/0 [HC0[0]:SC1[1]:HE0:SE0] takes:
[  454.809319] ffff800011198498 (rcu_node_0){?.-.}-{2:2}, at: rcu_sched_clock_irq+0x480/0xce0
[  454.817615] {IN-HARDIRQ-W} state was registered at:
[  454.822508]   __lock_acquire+0xad8/0x2094
[  454.826530]   lock_acquire.part.0+0xfc/0x360
[  454.830813]   lock_acquire+0x68/0x84
[  454.834399]   _raw_spin_lock_irqsave+0x84/0x158
[  454.838941]   rcu_exp_handler+0xcc/0x140
[  454.842878]   flush_smp_call_function_queue+0xec/0x304
[  454.848029]   generic_smp_call_function_single_interrupt+0x20/0x2c
[  454.854225]   ipi_handler+0x1d8/0x39c
[  454.857899]   handle_percpu_devid_fasteoi_ipi+0xb0/0xe0
[  454.863137]   __handle_domain_irq+0xbc/0x13c
[  454.867419]   gic_handle_irq+0xcc/0x14c
[  454.871265]   el1_irq+0xc4/0x180
[  454.874504]   lock_acquire.part.0+0x120/0x360
[  454.878873]   lock_acquire+0x68/0x84
[  454.882461]   lock_page_memcg+0x5c/0x150
[  454.886399]   page_add_file_rmap+0x28/0x27c
[  454.890595]   alloc_set_pte+0xb8/0x5c0
[  454.894356]   filemap_map_pages+0x4a4/0x4c0
[  454.898551]   handle_mm_fault+0xbcc/0xf50
[  454.902572]   do_page_fault+0x14c/0x404
[  454.906418]   do_translation_fault+0xbc/0xd8
[  454.910701]   do_mem_abort+0x4c/0xac
[  454.914287]   el0_ia+0x68/0xcc
[  454.917351]   el0_sync_handler+0x180/0x1b0
[  454.921458]   el0_sync+0x174/0x180
[  454.924868] irq event stamp: 449729
[  454.928368] hardirqs last  enabled at (449725): [<ffff8000109fcf74>] default_idle_call+0x24/0xdc
[  454.937171] hardirqs last disabled at (449726): [<ffff8000109f27a0>] enter_el1_irq_or_nmi+0x10/0x20
[  454.946237] softirqs last  enabled at (449728): [<ffff8000100532f0>] _local_bh_enable+0x30/0x54
[  454.954952] softirqs last disabled at (449729): [<ffff8000100534c4>] __irq_exit_rcu+0x1b0/0x1bc
[  454.963665] 
[  454.963665] other info that might help us debug this:
[  454.970206]  Possible unsafe locking scenario:
[  454.970206] 
[  454.976136]        CPU0
[  454.978590]        ----
[  454.981043]   lock(rcu_node_0);
[  454.984198]   <Interrupt>
[  454.986825]     lock(rcu_node_0);
[  454.990155] 
[  454.990155]  *** DEADLOCK ***
[  454.990155] 
[  454.996087] 1 lock held by swapper/4/0:
[  454.999932]  #0: ffff800011198498 (rcu_node_0){?.-.}-{2:2}, at: rcu_sched_clock_irq+0x480/0xce0
[  455.008662] 
[  455.008662] stack backtrace:
[  455.013033] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 5.10.17-00041-g454ed3004040-dirty #1
[  455.021313] Hardware name: Baikal Electronics Baikal-M mitx board (DT)
[  455.027854] Call trace:
[  455.030311]  dump_backtrace+0x0/0x1e4
[  455.033985]  show_stack+0x24/0x80
[  455.037312]  dump_stack+0xec/0x154
[  455.040723]  print_usage_bug.part.0+0x208/0x22c
[  455.045265]  mark_lock+0x88c/0x934
[  455.048677]  mark_held_locks+0x58/0x90
[  455.052437]  lockdep_hardirqs_on_prepare+0xe4/0x23c
[  455.057330]  trace_hardirqs_on+0x78/0x2e0
[  455.061350]  __do_softirq+0x114/0x6d0
[  455.065022]  __irq_exit_rcu+0x1b0/0x1bc
[  455.068868]  irq_exit+0x1c/0x54
[  455.072020]  __handle_domain_irq+0xc0/0x13c
[  455.076214]  gic_handle_irq+0xcc/0x14c
[  455.079972]  el1_irq+0xc4/0x180
[  455.083124]  arch_cpu_idle+0x18/0x30
[  455.086710]  default_idle_call+0x5c/0xdc
[  455.090645]  do_idle+0x260/0x2e0
[  455.093883]  cpu_startup_entry+0x30/0x8c
[  455.097818]  secondary_start_kernel+0x138/0x184
[  455.102410] BUG: scheduling while atomic: swapper/4/0/0x00000002
[  455.108449] INFO: lockdep is turned off.
[  455.112399] Modules linked in: dm_mod designware_i2s sdhci_of_dwcmshc snd_soc_core sdhci_pltfm dw_hdmi_ahb_audio snd_pcm_dmaengine ac97_bus evdev sdhci snd_pcm at24 panfrost mmc_core snd_timer pcie_baikal_v44 snd pcie_baikal bt1_pvt gpu_sched soundcore cpufreq_dt fuse configfs efivarfs ipv6
[  455.138295] Preemption disabled at:
[  455.138305] [<ffff800010028c64>] secondary_start_kernel+0xb4/0x184
[  455.148020] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 5.10.17-00041-g454ed3004040-dirty #1
[  455.156299] Hardware name: Baikal Electronics Baikal-M mitx board (DT)
[  455.162840] Call trace:
[  455.165296]  dump_backtrace+0x0/0x1e4
[  455.168969]  show_stack+0x24/0x80
[  455.172295]  dump_stack+0xec/0x154
[  455.175711]  __schedule_bug+0xcc/0xe0
[  455.179383]  __schedule+0x928/0xa3c
[  455.182881]  schedule_idle+0x34/0x5c
[  455.186467]  do_idle+0x1dc/0x2e0
[  455.189706]  cpu_startup_entry+0x30/0x8c
[  455.193640]  secondary_start_kernel+0x138/0x184
Comment 1 Dmitry V. Levin 2021-02-23 19:35:20 MSK
А почему Component: kernel-image-std-def?
Comment 2 Anton V. Boyarshinov 2021-02-24 10:29:38 MSK
(Ответ для Dmitry V. Levin на комментарий #1)
> А почему Component: kernel-image-std-def?

Потому, что виснет именно это ядро.
Comment 3 Alexey Sheplyakov 2021-03-02 17:03:47 MSK
Исправлено в kernel-image-std-def 5.4.101-alt1
Comment 4 Alexey Sheplyakov 2021-05-24 17:11:54 MSK
Исправлено в 5.4.101-alt1